In the world of NoSQL

My HBase cluster refused to start after upgrading from CDH3 to CDH4. This is a known issue according to the cloudera documentation, and the workaround is to delete the /hbase ZNode.

— During an upgrade from CDH3 to CDH4, regions in transition may cause HBase startup failures.

Bug: None
Severity: Medium
Anticipated Resolution: To be fixed in a future release.
Workaround: Delete the /hbase ZNode in ZooKeeper before starting up CDH4.

So to delete the ZNode I did the following:

[root@hbase1 ~]# /usr/lib/zookeeper/bin/
Connecting to localhost:2181
... log entries
[zk: localhost:2181(CONNECTED) 0] rmr /hbase

After doing this the cluster started as it should.


I ran out of space on the server running namenode, hbase master, hbase regionserver and a datanode and during the subsequent restarts hbase master wouldn’t start.
During log splitting it died with the following error:

2013-07-02 19:52:12,269 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader$WALReaderFSDataInputStream.getPos(
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(
        at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(
        at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.getReader(
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(
        at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(
        at org.apache.hadoop.hbase.master.HMaster.finishInitialization(
2013-07-02 19:52:12,271 INFO org.apache.hadoop.hbase.master.HMaster: Aborting

I found two ways to get it to start up again, the first one I tried was to move away the log splitting directory in hdfs with the following command (strongly discouraged to do this):

$ hadoop fs -mv /hbase/.logs/,60020,1367325077343-splitting /user/hdfs

After some help from #hbase on I moved it back and tried starting hbase master with java assertions disabled, and that solved the issue.

To disable assertions in the JVM you make sure that the parameter -da (or -disableassertions) is passed to java when invoked.

I did this by editing /etc/hbase/conf/ and adding -da to the HBASE_MASTER_OPTS environment variable.


HBase crashed for me this night, due to the extra leap second inserted (2012-06-30 23:59:60).


When attempting to restart HBase, it just didn’t start. I found this resource for a tip that might work to get it up (although I found it after rebooting my servers, so I didn’t try it):


All java processes(including all HDFS-related) were using 100% CPU, together with ksoftirq. I turned off ntpd autostart(chkconfig ntpd off), and rebooted the servers, and then started my HBase cluster back up. This solved the issue.


I found a neat trick to enable a history file for the HBase shell, put the following into ~/.irbrc:

require 'irb/ext/save-history'
IRB.conf[:SAVE_HISTORY] = 100
IRB.conf[:HISTORY_FILE] = "#{ENV['HOME']}/.irb-save-history"

This enabled history saving for me when running irb directly, but didn’t work in the HBase shell, so I also added the following to the end of ~/.irbrc:

Kernel.at_exit do
  IRB.conf[:AT_EXIT].each do |i|

In CentOS you also need to make sure that the package ruby-irb is installed, and in debian the package is named irb1.8.


This is an example on how to import data into hbase with importtsv and completebulkload:

Step 1, run the TSV file through importtsv to create the HFiles:

[root@hbase1 bulktest]# HADOOP_CLASSPATH=$(hbase classpath) sudo -u hdfs -E hadoop jar \
/usr/lib/hbase/hbase-0.90.4-cdh3u3.jar importtsv \
-Dimporttsv.bulk.output=/bulktest-hfiles \
-Dimporttsv.columns=HBASE_ROW_KEY,a:b,a:c bulktest /bulktest-tsv

This will generate HFiles from /bulktest-tsv and store in to /bulktest-hfiles.

I have three columns in the TSV files, first being the row key, second being what I want stored in columnfamily a with qualifier b, and third with qualifier c (this was controlled by importtsv.columns).

After that job is done, you need to change the permissions of /bulktest-hfiles so that the HBase user owns the HFiles, and then run completebulkload so HBase finds the HFiles:

[root@hbase1 bulktest]# sudo -u hdfs hadoop dfs -chown -R hbase /bulktest-hfiles
[root@hbase1 bulktest]# HADOOP_CLASSPATH=$(hbase classpath) sudo -u hdfs -E hadoop jar \
/usr/lib/hbase/hbase-0.90.4-cdh3u3.jar completebulkload /bulktest-hfiles bulktest

HBase should now see the new data. For usage help, run importtsv or completebulkload without any parameters.



I’ve previously found a great addon to hadoop streaming called ”hadoop hbase streaming” which enables you to use a HBase table as input or output format for your hadoop streaming map reduce jobs, but it’s not been working since a recent API change.

The error it was saying was:

Error: java.lang.ClassNotFoundException:

I just found a fork of it on github by David Maust that has been updated for newer versions of HBase.

You can find the fork here:
And the original branch here:


I got the following exceptions whenever running heavy map reduce jobs towards my HBase tables:

INFO mapred.JobClient: Task Id : attempt_201204240028_0048_m_000015_2, Status : FAILED
      lease '-8170712910260484725' does not exist
 at org.apache.hadoop.hbase.regionserver.Leases.removeLease(
 at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
 at java.lang.reflect.Method.invoke(

Most oftenly they were severe enough to cause the entire job to fail. This indicated that I needed to raise, which says how long a scanner lives between calls to This however didn’t help, apparently you also need to raise hbase.rpc.timeout (the Exception that indicated this was hidden in log level DEBUG, so took a while to realise that).

So, adding the following to hbase-site.xml solved it:

    <value>900000</value> <!-- 900 000, 15 minutes -->
    <value>900000>/value> <!-- 15 minutes -->


tcollector is a nice subset of scripts to graph system performance and store it into OpenTSDB (which in turn stores the data in HBase).


Most people seam to install it via puppet, but this is how you do it otherwise. First of all you need to make sure that you have started TSD with –auto-metric so that you don’t have to run mkmetric for every metric that a collector tracks, then you need to run the following commands:

# cd /usr/local
# git clone git://
# cd tcollector
# sed -i startstop
# ./startstop start

It will log into /var/log/tcollector.log. Dependencies (in Debian): git, python-json, python-mysqldb.


Recently I have been playing around with HBase for a project that will need to store billions of rows (long scale), with a column count variating from 1 to 1 million. The test data (13.3 million rows, 130.8 million columns) resulted in 27 GB of storage, without compression.  After activating compression it only took 6.6 GB.

I followed some guides on the net on how to activate LZO (which can't be enabled by default due to license terms), but all I tried had some minor faults in them (probably due to version issues).

Anyhow, this is how I did it(assuming Debian or Ubuntu):

apt-get install liblzo2-dev sun-java6-jdk ant
svn checkout hadoop-gpl-compression
cd hadoop-gpl-compression
export CFLAGS=”-m64″
export JAVA_HOME=/usr/lib/jvm/java6-sun/
export HBASE_HOME=/path/to/hbase/
ant compile-native
ant jar
cp build/hadoop-gpl-compression-*.jar $HBASE_HOME/lib/
cp build/native/Linux-amd64-64/lib/* /usr/local/lib/
echo ”export HBASE_LIBRARY_PATH=/usr/local/lib/” >> $HBASE_HOME/conf/
mkdir -p $HBASE_HOME/build
cp -r build/native $HBASE_HOME/build/native

Then verify that it works with:

./bin/hbase org.apache.hadoop.hbase.util.CompressionTest file:///tmp/testfile lzo


