In the world of NoSQL

If you have limited connectivity between nodes, i.e. if half your datanodes are connected to one switch, and the other half to another, it’s wise to configure it so that the NameNode is aware of this.

To configure this, you set property topology.script.file.name to point to a script that reads the node names as arguments, and prints out the location of the node to standard out (space separated). It needs to be able to handle both hostname and IP address.

To configure the NameNode, add the following to /etc/hadoop/conf/hdfs-site.xml:

  <property>
    <name>topology.script.file.name</name>
    <value>/etc/hadoop/conf/topology.sh</value>
  </property>

My topology script looks like this:

#!/bin/bash
 
HADOOP_CONF=/etc/hadoop/conf
 
declare -A topology
while read host ip rack; do
  topology[$host]=$rack
  topology[$ip]=$rack
done < ${HADOOP_CONF}/topology.data
 
while [ -n "$1" ]; do
  echo -n "${topology[$1]:=/default/rack} "
  shift
done

And my topology.data contains:

node1.domain.com 1.2.3.1 /dc-se/rack-1
node2.domain.com 1.2.3.2 /dc-se/rack-2

Read more...

§590 · augusti 16, 2014 · Hadoop · (No comments) · Tags: , , ,


When upgrading from CDH3 to CDH4 I came along the following problem when attempting to start the NameNode again:

2013-09-23 22:53:42,859 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /dfs/nn/in_use.lock acquired by nodename 32133@hbase1.network.com
2013-09-23 22:53:42,903 INFO org.apache.hadoop.hdfs.server.namenode.NNStorage: Using clusterid: CID-9ab09a80-a367-42d4-8693-6905b9c5a605
2013-09-23 22:53:42,913 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /dfs/nn/current
2013-09-23 22:53:42,928 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file /dfs/nn/current/fsimage using no compression
2013-09-23 22:53:42,928 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 183625
2013-09-23 22:53:44,280 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under construction = 0
2013-09-23 22:53:44,282 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.lang.AssertionError: Should have reached the end of image file /dfs/nn/current/fsimage
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(FSImageFormat.java:235)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:786)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(FSImage.java:692)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:647)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(FSImage.java:349)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:261)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:639)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:476)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:437)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:613)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:598)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1169)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1233)
2013-09-23 22:53:44,286 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2013-09-23 22:53:44,288 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hbase1.network.com/1.2.3.4
************************************************************/

Luckily I came across a post to the cdh-user mailing list by Bob Copeland containing:

I instrumented the code around the exception and found that the loader had
read
all but 16 bytes of the file, and the remaining 16 bytes are all zeroes. So
chopping off the last 16 bytes of padding was a suitable workaround, i.e.:

fsimage=/var/lib/hadoop/dfs/name/current/fsimage
cp $fsimage{,~}
size=$(stat -c %s $fsimage)
dd if=$fsimage~ of=$fsimage bs=$[size-16] count=1

Is this a known issue? I did all these tests in a scratch cdh3u5 VM and can
replicate at will if needed.

-Bob

Which solved my problems.

Ref: http://grokbase.com/p/cloudera/cdh-user/12ckdj9m47/cdh4-fsimage-upgrade-failure-workaround

Read more...

§558 · oktober 1, 2013 · Hadoop · (No comments) · Tags: , , , , ,