In the world of NoSQL


If you have limited connectivity between nodes, i.e. if half your datanodes are connected to one switch, and the other half to another, it’s wise to configure it so that the NameNode is aware of this.

To configure this, you set property topology.script.file.name to point to a script that reads the node names as arguments, and prints out the location of the node to standard out (space separated). It needs to be able to handle both hostname and IP address.

To configure the NameNode, add the following to /etc/hadoop/conf/hdfs-site.xml:

  <property>
    <name>topology.script.file.name</name>
    <value>/etc/hadoop/conf/topology.sh</value>
  </property>

My topology script looks like this:

#!/bin/bash
 
HADOOP_CONF=/etc/hadoop/conf
 
declare -A topology
while read host ip rack; do
  topology[$host]=$rack
  topology[$ip]=$rack
done < ${HADOOP_CONF}/topology.data
 
while [ -n "$1" ]; do
  echo -n "${topology[$1]:=/default/rack} "
  shift
done

And my topology.data contains:

node1.domain.com 1.2.3.1 /dc-se/rack-1
node2.domain.com 1.2.3.2 /dc-se/rack-2

§590 · augusti 16, 2014 · Hadoop · Tags: , , , · [Print]

Leave a Reply