Native Ruby bindings to Hadoop's libhdfs, for interacting with Hadoop DFS.
C Ruby
Switch branches/tags
Nothing to show
Pull request Compare This branch is 1 commit ahead, 187 commits behind dallasmarlow:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Hadoop DFS (HDFS) bindings for Ruby

This library provides native C bindings to Hadoop's libhdfs, for interacting with Hadoop DFS.


You will need:

  • Java JDK and JRE (yes, both). The build file will attempt to find it for you.

  • Hadoop's libhdfs. On Ubuntu/Debian you will need libhdfs0 and libhdfs0-dev.

  • Hadoop Core and DFS libraries.


Install from gems. Note that you will need to provide JAVA_HOME so the compiler can find the required libraries.

The installation will attempt to discover the location of the libaries, but if it fails, you can try setting the environment variable JAVA_LIB to the library path of the JDK/JRE.

Installing with a specific Java JDK:

sudo env JAVA_HOME=/usr/lib/jvm/java-6-openjdk gem install ruby-hdfs


The library also depends on an installation of Hadoop DFS. The Cloudera distribution of Hadoop is pretty good:

Sample classpath setup (yes, welcome to JAR hell):

export CLASSPATH=$CLASSPATH:/usr/lib/hadoop/hadoop-0.18.3-6cloudera0.3.0-core.jar
for jarfile in /usr/lib/hadoop/lib/*.jar; do
  export CLASSPATH=$CLASSPATH:$jarfile

Wait, there's more. You will also need in your library path, which comes with the JRE. This might work:

export LD_LIBRARY_PATH=/usr/lib/jvm/java-6-openjdk/jre/lib/i386/server

Known issues

Not all APIs have been implemented yet.

libhdfs will sometimes throw exceptions, which will be output instead of caught by Ruby. This is annoying but harmless.

Building from source

To build from source:

rake compile

On completion, the compiled extension will be available in ext/hdfs.


Copyright © 2010 Alexander Staubo. See LICENSE for details.