In the world of NoSQL

Got the following exception when starting the datanode after it had terminated due to a disk failure (without rebooting the server):

2013-10-11 11:24:02,122 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain
java.net.BindException: Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use; For more details see:  http://wiki.apache.org/hadoop/BindException
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718)
	at org.apache.hadoop.ipc.Server.bind(Server.java:403)
	at org.apache.hadoop.ipc.Server.bind(Server.java:375)
	at org.apache.hadoop.hdfs.net.TcpPeerServer.<init>(TcpPeerServer.java:106)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:555)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:741)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:344)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1795)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1728)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1751)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1904)
	at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1925)
2013-10-11 11:24:02,126 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2013-10-11 11:24:02,128 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hbase10.network.com/1.2.3.4
************************************************************/

After an application crashes it might leave a lingering socket, so to reuse that socket early you need to set the socket flag SO_REUSEADDR when attempting to bind to it to be allowed to reuse it. The HDFS datanode doesn’t do that, and I didn’t want to restart the HBase regionserver (which was locking the socket with a connection it hadn’t realized was dead).
The solution was to bind to the port with an application that sets SO_REUSEADDR and then stop that application, I used netcat for that:

[root@hbase10 ~]#  nc -l 50010

Read more...

§577 · oktober 11, 2013 · Hadoop · (No comments) · Tags: , , , ,