In the world of NoSQL

I’ve previously found a great addon to hadoop streaming called ”hadoop hbase streaming” which enables you to use a HBase table as input or output format for your hadoop streaming map reduce jobs, but it’s not been working since a recent API change.

The error it was saying was:

Error: java.lang.ClassNotFoundException: org.apache.hadoop.hbase.io.RowResult

I just found a fork of it on github by David Maust that has been updated for newer versions of HBase.

You can find the fork here:
https://github.com/dmaust/hadoop-hbase-streaming
And the original branch here:
https://github.com/wanpark/hadoop-hbase-streaming

Read more...

§98 · maj 2, 2012 · Hadoop, HBase · Kommentarer inaktiverade för Fork of hadoop-hbase-streaming with support for CDH3u3 · Tags: , , ,


I increased the number of map tasks in hadoop to 64 per TaskTracker, and TaskTracker started to crash every time I launched a map reduce job.

 

Errors were:

java.lang.OutOfMemoryError: unable to create new native thread

And:

org.apache.hadoop.mapred.DefaultTaskController: Unexpected error launching task JVM java.io.IOException: Cannot run program ”bash” (in directory ”/data/1/mapred/local/taskTracker/hdfs/jobcache/job_201110201642_0001/attempt_201110201642_0001_m_000031_0/work”): error=11,  Resource temporarily unavailable.

 

Googling for this problem presented the following solutions:

  1. Increase the heap size for the TaskTracker, I did this by changing HADOOP_HEAPSIZE to 4096 in /etc/hadoop/conf/hadoop-env.sh to test.  This did not solve it.
  2. Increase the heap size for the spawned child.  Add -Xmx1024 in mapred-site.xml for mapred.map.child.java.opts.  This did not solve it.
  3. Make sure that the limit of open files is not reached, I had already done this by adding ”mapred – nofile 65536” in /etc/security/limits.conf.  This did not solve it.

I decided to sudo to the mapred user and check the ulimits again, what I noticed that was off was:

max user processes              (-u) 1024

 

Adding the following to /etc/security/limits.conf and restarting the TaskTracker solved it:

mapred – nproc 8192

 

Apparently CentOS limits the number of processes for regular users to 1024 by default.

Read more...

§50 · oktober 20, 2011 · Hadoop · Kommentarer inaktiverade för Hadoop TaskTracker java.lang.OutOfMemoryError · Tags: , , ,