In the world of NoSQL

If you have limited connectivity between nodes, i.e. if half your datanodes are connected to one switch, and the other half to another, it’s wise to configure it so that the NameNode is aware of this.

To configure this, you set property to point to a script that reads the node names as arguments, and prints out the location of the node to standard out (space separated). It needs to be able to handle both hostname and IP address.

To configure the NameNode, add the following to /etc/hadoop/conf/hdfs-site.xml:


My topology script looks like this:

declare -A topology
while read host ip rack; do
done < ${HADOOP_CONF}/
while [ -n "$1" ]; do
  echo -n "${topology[$1]:=/default/rack} "

And my contains: /dc-se/rack-1 /dc-se/rack-2


§590 · augusti 16, 2014 · Hadoop · (No comments) · Tags: , , ,

Got the following exception when starting the datanode after it had terminated due to a disk failure (without rebooting the server):

2013-10-11 11:24:02,122 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain Problem binding to [] Address already in use; For more details see:
	at org.apache.hadoop.ipc.Server.bind(
	at org.apache.hadoop.ipc.Server.bind(
	at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(
	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(
	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(
	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(
	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(
	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(
	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(
	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(
2013-10-11 11:24:02,126 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2013-10-11 11:24:02,128 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 
SHUTDOWN_MSG: Shutting down DataNode at

After an application crashes it might leave a lingering socket, so to reuse that socket early you need to set the socket flag SO_REUSEADDR when attempting to bind to it to be allowed to reuse it. The HDFS datanode doesn’t do that, and I didn’t want to restart the HBase regionserver (which was locking the socket with a connection it hadn’t realized was dead).
The solution was to bind to the port with an application that sets SO_REUSEADDR and then stop that application, I used netcat for that:

[root@hbase10 ~]#  nc -l 50010


§577 · oktober 11, 2013 · Hadoop · (No comments) · Tags: , , , ,

When upgrading from CDH3 to CDH4 I came along the following problem when attempting to start the NameNode again:

2013-09-23 22:53:42,859 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /dfs/nn/in_use.lock acquired by nodename
2013-09-23 22:53:42,903 INFO org.apache.hadoop.hdfs.server.namenode.NNStorage: Using clusterid: CID-9ab09a80-a367-42d4-8693-6905b9c5a605
2013-09-23 22:53:42,913 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Recovering unfinalized segments in /dfs/nn/current
2013-09-23 22:53:42,928 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Loading image file /dfs/nn/current/fsimage using no compression
2013-09-23 22:53:42,928 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files = 183625
2013-09-23 22:53:44,280 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: Number of files under construction = 0
2013-09-23 22:53:44,282 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
java.lang.AssertionError: Should have reached the end of image file /dfs/nn/current/fsimage
        at org.apache.hadoop.hdfs.server.namenode.FSImageFormat$Loader.load(
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImageFile(
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(
        at org.apache.hadoop.hdfs.server.namenode.FSImage.doUpgrade(
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(
        at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(
        at org.apache.hadoop.hdfs.server.namenode.NameNode.(
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(
2013-09-23 22:53:44,286 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2013-09-23 22:53:44,288 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down NameNode at

Luckily I came across a post to the cdh-user mailing list by Bob Copeland containing:

I instrumented the code around the exception and found that the loader had
all but 16 bytes of the file, and the remaining 16 bytes are all zeroes. So
chopping off the last 16 bytes of padding was a suitable workaround, i.e.:

cp $fsimage{,~}
size=$(stat -c %s $fsimage)
dd if=$fsimage~ of=$fsimage bs=$[size-16] count=1

Is this a known issue? I did all these tests in a scratch cdh3u5 VM and can
replicate at will if needed.


Which solved my problems.



§558 · oktober 1, 2013 · Hadoop · (No comments) · Tags: , , , , ,

I ran out of space on the server running namenode, hbase master, hbase regionserver and a datanode and during the subsequent restarts hbase master wouldn’t start.
During log splitting it died with the following error:

2013-07-02 19:52:12,269 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader$WALReaderFSDataInputStream.getPos(
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(
        at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(
        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(
2013-07-02 19:52:12,271 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

I found two ways to get it to start up again, the first one I tried was to move away the log splitting directory in hdfs with the following command (strongly discouraged to do this):

$ hadoop fs -mv /hbase/.logs/,60020,1367325077343-splitting /user/hdfs

After some help from #hbase on I moved it back and tried starting hbase master with java assertions disabled, and that solved the issue.

To disable assertions in the JVM you make sure that the parameter -da (or -disableassertions) is passed to java when invoked.

I did this by editing /etc/hbase/conf/ and adding -da to the HBASE_MASTER_OPTS environment variable.


HBase crashed for me this night, due to the extra leap second inserted (2012-06-30 23:59:60).


When attempting to restart HBase, it just didn’t start. I found this resource for a tip that might work to get it up (although I found it after rebooting my servers, so I didn’t try it):


All java processes(including all HDFS-related) were using 100% CPU, together with ksoftirq. I turned off ntpd autostart(chkconfig ntpd off), and rebooted the servers, and then started my HBase cluster back up. This solved the issue.


§163 · juli 1, 2012 · Hadoop, HBase · 1 comment · Tags: , ,

I’ve previously found a great addon to hadoop streaming called ”hadoop hbase streaming” which enables you to use a HBase table as input or output format for your hadoop streaming map reduce jobs, but it’s not been working since a recent API change.

The error it was saying was:

Error: java.lang.ClassNotFoundException:

I just found a fork of it on github by David Maust that has been updated for newer versions of HBase.

You can find the fork here:
And the original branch here:


§98 · maj 2, 2012 · Hadoop, HBase · Kommentarer inaktiverade för Fork of hadoop-hbase-streaming with support for CDH3u3 · Tags: , , ,

I increased the number of map tasks in hadoop to 64 per TaskTracker, and TaskTracker started to crash every time I launched a map reduce job.


Errors were:

java.lang.OutOfMemoryError: unable to create new native thread


org.apache.hadoop.mapred.DefaultTaskController: Unexpected error launching task JVM Cannot run program ”bash” (in directory ”/data/1/mapred/local/taskTracker/hdfs/jobcache/job_201110201642_0001/attempt_201110201642_0001_m_000031_0/work”): error=11,  Resource temporarily unavailable.


Googling for this problem presented the following solutions:

  1. Increase the heap size for the TaskTracker, I did this by changing HADOOP_HEAPSIZE to 4096 in /etc/hadoop/conf/ to test.  This did not solve it.
  2. Increase the heap size for the spawned child.  Add -Xmx1024 in mapred-site.xml for  This did not solve it.
  3. Make sure that the limit of open files is not reached, I had already done this by adding ”mapred – nofile 65536” in /etc/security/limits.conf.  This did not solve it.

I decided to sudo to the mapred user and check the ulimits again, what I noticed that was off was:

max user processes              (-u) 1024


Adding the following to /etc/security/limits.conf and restarting the TaskTracker solved it:

mapred – nproc 8192


Apparently CentOS limits the number of processes for regular users to 1024 by default.


§50 · oktober 20, 2011 · Hadoop · Kommentarer inaktiverade för Hadoop TaskTracker java.lang.OutOfMemoryError · Tags: , , ,