In the world of NoSQL


Recently I have been playing around with HBase for a project that will need to store billions of rows (long scale), with a column count variating from 1 to 1 million. The test data (13.3 million rows, 130.8 million columns) resulted in 27 GB of storage, without compression.  After activating compression it only took 6.6 GB.

I followed some guides on the net on how to activate LZO (which can't be enabled by default due to license terms), but all I tried had some minor faults in them (probably due to version issues).

Anyhow, this is how I did it(assuming Debian or Ubuntu):

apt-get install liblzo2-dev sun-java6-jdk ant
svn checkout http://svn.codespot.com/a/apache-extras.org/hadoop-gpl-compression/trunk/ hadoop-gpl-compression
cd hadoop-gpl-compression
export CFLAGS=”-m64″
export JAVA_HOME=/usr/lib/jvm/java6-sun/
export HBASE_HOME=/path/to/hbase/
ant compile-native
ant jar
cp build/hadoop-gpl-compression-*.jar $HBASE_HOME/lib/
cp build/native/Linux-amd64-64/lib/* /usr/local/lib/
echo ”export HBASE_LIBRARY_PATH=/usr/local/lib/” >> $HBASE_HOME/conf/hbase-env.sh
mkdir -p $HBASE_HOME/build
cp -r build/native $HBASE_HOME/build/native

Then verify that it works with:

cd $HBASE_HOME
./bin/hbase org.apache.hadoop.hbase.util.CompressionTest file:///tmp/testfile lzo

§24 · september 12, 2011 · HBase · Tags: , , · [Print]

Comments are closed.