Got the following exception when starting the datanode after it had terminated due to a disk failure (without rebooting the server):
2013-10-11 11:24:02,122 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain java.net.BindException: Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:718) at org.apache.hadoop.ipc.Server.bind(Server.java:403) at org.apache.hadoop.ipc.Server.bind(Server.java:375) at org.apache.hadoop.hdfs.net.TcpPeerServer.<init>(TcpPeerServer.java:106) at org.apache.hadoop.hdfs.server.datanode.DataNode.initDataXceiver(DataNode.java:555) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:741) at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:344) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1795) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1728) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1751) at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1904) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1925) 2013-10-11 11:24:02,126 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 2013-10-11 11:24:02,128 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at hbase10.network.com/1.2.3.4 ************************************************************/ |
After an application crashes it might leave a lingering socket, so to reuse that socket early you need to set the socket flag SO_REUSEADDR when attempting to bind to it to be allowed to reuse it. The HDFS datanode doesn’t do that, and I didn’t want to restart the HBase regionserver (which was locking the socket with a connection it hadn’t realized was dead).
The solution was to bind to the port with an application that sets SO_REUSEADDR and then stop that application, I used netcat for that:
[root@hbase10 ~]# nc -l 50010 |