No description, website, or topics provided.
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is even with foursquare:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
project
src/main/scala/com/foursquare/hadoop
.gitignore
.travis.yml
LICENSE.txt
README.md
build.sbt
sbt

README.md

To run this, copy a mongod executable to this directory. (You can get a copy here) Then, run it with ./sbt run <args>, where args are

  • databaseName - the name of the database you are dumping from
  • shardName - the shard you are dumping
  • inputDir - mongod directory to dump from
  • hdfsPath - path to dump data to
  • dbPort - any free port for mongod to use
  • localTmpDir - local path for temporary data

ThriftBsonInputFormat can be used to read BSON files generated in this way from MapReduce jobs. It's configured with:

conf.setInputFormat(classOf[ThriftBsonInputFormat])
conf.set(ThriftBsonInputFormat.thriftClass, classOf[MyThriftClass].getName)