Source/Sink configuration for using MongoDB with Cascading
Switch branches/tags
Nothing to show
Pull request Compare This branch is even with djktno:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


NOTE: This hasn't gotten much traction lately - I've been swamped with some other things.
Nonetheless, there some massive updates to push up, and given a little free time, I intend to
complete this work.  Message me if you have any questions.

This is the Cascading.MongoDB module.

 It provides support for writing data to MongoDB 
 when bound to a Cascading data processing flow.

 Cascading is a feature rich API for defining and executing complex,
 scale-free, and fault tolerant data processing workflows on a Hadoop
 cluster. It can be found at the following location:


 This release requires at least Cascading 1.1.1. Hadoop 0.19.x,
 and the related mongo-java-driver release. 

 To build a jar,

 > ant -Dcascading.home=... -Dhadoop.home=... -Dmongo.driver.home=... jar

 To test,

 > ant -Dcascading.home=... -Dhadoop.home=... -Dmongo.driver.home=... test

where "..." is the install path of each of the dependencies.


  The cascading-mongodb.jar file should be added to the "lib"
  directory of your Hadoop application jar file along with all
  Cascading dependencies.

  You must also include the mongo-java-driver library compatible with your database.

  The current master branch only is usable for sinking to MongoDB.  The API for that is still a little rough, and subject to change once I can simplify the parameters.