In the world of NoSQL


This is an example on how to import data into hbase with importtsv and completebulkload:

Step 1, run the TSV file through importtsv to create the HFiles:

[root@hbase1 bulktest]# HADOOP_CLASSPATH=$(hbase classpath) sudo -u hdfs -E hadoop jar \
/usr/lib/hbase/hbase-0.90.4-cdh3u3.jar importtsv \
-Dimporttsv.bulk.output=/bulktest-hfiles \
-Dimporttsv.columns=HBASE_ROW_KEY,a:b,a:c bulktest /bulktest-tsv

This will generate HFiles from /bulktest-tsv and store in to /bulktest-hfiles.

I have three columns in the TSV files, first being the row key, second being what I want stored in columnfamily a with qualifier b, and third with qualifier c (this was controlled by importtsv.columns).

After that job is done, you need to change the permissions of /bulktest-hfiles so that the HBase user owns the HFiles, and then run completebulkload so HBase finds the HFiles:

[root@hbase1 bulktest]# sudo -u hdfs hadoop dfs -chown -R hbase /bulktest-hfiles
[root@hbase1 bulktest]# HADOOP_CLASSPATH=$(hbase classpath) sudo -u hdfs -E hadoop jar \
/usr/lib/hbase/hbase-0.90.4-cdh3u3.jar completebulkload /bulktest-hfiles bulktest

HBase should now see the new data. For usage help, run importtsv or completebulkload without any parameters.

References:
http://hbase.apache.org/bulk-loads.html


Comments are closed.