Commits on Jan 11, 2010
  1. Borrowing from Todd Lipcon's great work, added a distributed lzo inde…

    kevinweil committed Jan 11, 2010
    …xer. It indexes lzo files in a map reduce job, one mapper per file, using custom input and output formats. Moving the indexing computation away from a local process is a big step forward. Thanks Todd.
Commits on Nov 16, 2009
  1. Expose the tmp filename the index writes to so that it can be cleaned…

    kevinweil committed Nov 16, 2009
    … up upon failure, and use the proper Path api for appending to filenames.
Commits on Nov 6, 2009
Commits on Nov 4, 2009
  1. Make LzoIndex.createIndex properly cleanup after itself on failure, a…

    kevinweil committed Nov 4, 2009
    …nd fix some formatting in the javadoc.
Commits on Nov 1, 2009
  1. Add idea projects to .gitignore, add build properties to jar, and log…

    kevinweil committed Nov 1, 2009
    … build's git revision once upon static load.
Commits on Oct 30, 2009
Commits on Oct 25, 2009
  1. Don't leave behind old attempts at index files when exceptions are th…

    kevinweil committed Oct 25, 2009
    …rown during LzoIndex.createIndex, e.g. when the lzo file is corrupt.
Commits on Oct 21, 2009
Commits on Oct 19, 2009
  1. Forgot a return statement.

    kevinweil committed Oct 19, 2009
  2. Catch an EOF exception that wasn't being caught before (happened when…

    kevinweil committed Oct 19, 2009
    … file writers were killed unexpectedly).
  3. Allow the LzopInputStream to gracefully handle a file that does not f…

    kevinweil committed Oct 19, 2009
    …inish with four trailing zeroes, as can happen when a writer gets killed.
Commits on Oct 16, 2009
  1. LZOP mandates that blocks which compress to a larger size than their …

    kevinweil committed Oct 16, 2009
    …uncompressed size should be stored uncompressed with a slightly modified header. hadoop-gpl-compression does not honor this in reading or writing, which left it unable to read files with such blocks. I fixed this issue a few commits I go; this should be the corresponding fix to be able to properly write such blocks. Added a test for the LzopOutputStream too. The smallest test file led to this condition, so 1/3 tests initially failed, and now all pass.
Commits on Sep 23, 2009
  1. Hammerbacher inadvertantly reminded me that there were no GPL headers…

    kevinweil committed Sep 23, 2009
    … on the top of the new files. Add them for consistency with the rest.
Commits on Sep 21, 2009
  1. The main changes. Many are (unfortunately) whitespace caused by Eclip…

    kevinweil committed Sep 21, 2009
    …se's code formatter. Diff with no whitespace to remove that part. Much of the rest is a reorganization of the files because I kept getting tired of looking through long files for nested inner classes. I broke things out into their own classes and files. I also added a com.hadoop.mapred.DeprecatedLzoTextInputFormat to go along with the com.hadoop.mapreduce.LzoTextInputFormat -- applications like streaming still need a class derived from org.apache.hadoop.mapred.InputFormat, so this is one way to make streaming work until it's fixed. In the process of doing that, I ran into some code that was (a) repeated and (b) looked like it should be part of the LzoIndex class anyway, like createIndex and readIndex. I added that and another couple simple functions to the LzoIndex class.
  2. First commit.

    kevinweil committed Sep 21, 2009