tdunning edited this page Nov 11, 2010 · 7 revisions
Clone this wiki locally

Plume is a (so far) serial, eager approximate clone of FlumeJava. The Hadoop based version is coming along very nicely. The intent is to experiment with the design of the API both to understand the design decisions the Google team made and to see if there are good alternatives.

The ultimate goal is to provide something comparable to FlumeJava on top of Hadoop, but with a much more flexible execution model so that it is easy and efficient to code small problems using Plume as well as large ones. My theory is that small problems often grow into large ones and it is really nice to not have to re-implement everything as scaling happens.

So far, we have a little acorn that is showing some sprouts. Hopefully, we will see an oak tree shortly.

- the local lazy reference implementation is pretty much there. This will allow you to write Plume programs to find where the API is rough

- pere is trucking on the local version optimizer. He has flatten map working. ish. This is looking like it won’t take nearly as long as one would expect.

- we have a local emulation of map-reduce to help the optimizer work along

- Doug Cutting wrote some sample avro file reading code. There are still some questions about strategy, but this is looking very good so far. I have been working just a bit lately on how to integrate normal Hadoop Writables into our framework.

Take a look at the page on basic structure for hints about how Plume works.

Check out the Word Count program for the map-reduce equivalent of hello world.