Support for both Hadoop versions #308

Merged
merged 33 commits into from May 13, 2013

Projects

None yet

3 participants

@rangadi
Contributor
rangadi commented May 5, 2013

Supports both Hadoop 1 and Hadoop 2. The goal is to make the compiled jars work with either version of Hadoop.

Need to make travis-ci run tests in one profile with classes compiled with another profile. Not sure if mvn lets us run tests without building.

TODO:

  • should we update the version to 4.0?
  • update documentation
  • more mixed testing
Raghu Angadi added some commits Apr 30, 2013
Raghu Angadi remove explicit antlr dependency. property for protobuf.version. d54e3de
Raghu Angadi move a bunch of hadoop, pig, hive related dependencies to 'provided' …
…scope
b9ca626
Raghu Angadi put antlr back.
we only need hadoop-core. remove other hadoop dependencies
5b465c6
Raghu Angadi switch to hadoop-client-1.0.1, in place of hadoop-core. 330159c
Raghu Angadi hadoop-client in 'provided' scope.
need to check why same change didn't seem to work for pig, cascading, etc.
ddad252
Raghu Angadi 'provided' scope for pig, hive, and cascading. b446a91
Raghu Angadi first draft of multi hadoop version support.
hadoop1 tests pass. for hadoop2 tests to pass, need to update
pig version etc.
deprecated format wrappers need a bit more work.
460cf9d
Raghu Angadi Merge branch 'master' into hadoop_multi_version_support
Conflicts:
	core/src/test/java/com/twitter/elephantbird/mapreduce/input/TestLzoTextInputFormat.java
	pig/src/test/java/com/twitter/elephantbird/pig/util/AbstractTestWritableConverter.java
	rcfile/src/test/java/com/twitter/elephantbird/pig/load/TestRCFileProtobufStorage.java
	rcfile/src/test/java/com/twitter/elephantbird/pig/load/TestRCFileThriftStorage.java
5942b98
Raghu Angadi finally TestLzoTextInputFormat works. 30dbc53
Raghu Angadi update Pig dependency to 0.11.1 (for hadoop 2). b30462a
Raghu Angadi add test-log4j.properties (from pig repo) 0c72871
Raghu Angadi set log4j config for pig test. also needed increase in memory. 0df1b4e
Raghu Angadi remove 'surefire' from test data directory (looks like the plugin rem…
…oves it after the tests.

Some more dependency fixes.
3c86df6
Raghu Angadi TestLzoTextInputFormat : don't extend TestCase so that assume() works. a1a551c
Raghu Angadi disable compression in RCFile tests. need to check why it does not wo…
…rk on OSX
6445654
Raghu Angadi add getCounter() method to ContextUtil af921a8
Raghu Angadi getCounter is declared in different classes in hadoop1 & 2. b94c126
Raghu Angadi add jackson-mapper-asl dependency for pig. c4bc184
Raghu Angadi does not look like argLine in pig/pom.xml is appened. set java.libray…
….path again.
17ddebe
Raghu Angadi commons-cli for lucene (may be required for others) dbcbd3c
Raghu Angadi Merge branch 'hadoop_multi_version_support' of github.com:rangadi/ele…
…phant-bird into hadoop_multi_version_support
db4959a
Raghu Angadi travic-ci run tests with both profiles. 94ab018
@rangadi
Contributor
rangadi commented May 5, 2013
  • all the tests pass (both profiles)
  • most of the dependencies are changed to "provided" scope (hadoop, pig, hive etc).
  • Pig dependency is updated to 0.11.1

ContextUtil.java provides all the utility methods for handling incompatible changes between hadoop versions.

@rangadi
Contributor
rangadi commented May 5, 2013

btw, Deprecated Input/Output format wrappers mostly don't work. beed to fix them.

@rangadi rangadi referenced this pull request in Parquet/parquet-mr May 5, 2013
Merged

Add a build profile for Hadoop 2. #32

@traviscrawford traviscrawford and 1 other commented on an outdated diff May 6, 2013
...n/java/com/twitter/elephantbird/util/ContextUtil.java
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.JobID;
+import org.apache.hadoop.mapreduce.MapContext;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.OutputCommitter;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.RecordWriter;
+import org.apache.hadoop.mapreduce.ReduceContext;
+import org.apache.hadoop.mapreduce.StatusReporter;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.TaskAttemptID;
+import org.apache.hadoop.mapreduce.TaskInputOutputContext;
+
+/**
+ * This is based on ContextFactory.java from hadoop-2.0.x sources.
@traviscrawford
traviscrawford May 6, 2013 Contributor

How does javadoc handle this second block comment? Should it be merged with the class javadoc?

@rangadi
rangadi May 7, 2013 Contributor

meant to make it a simple comment, will fix it.

@gmalouf
gmalouf commented May 8, 2013

Hi Raghu, I am one of the many possibly users of the library bitten by this issue. Are there any workarounds while you continue to work on this in your branch?

Raghu Angadi added some commits May 8, 2013
@rangadi
Contributor
rangadi commented May 8, 2013

@gmalouf you could use this branch to if you like. We plan to cut a new EB release this as soon as early next week.

@rangadi
Contributor
rangadi commented May 12, 2013

I think this is ready for merge. Added two wiki pages: 'Build and Runtime Dependencies' and "Hadoop 2.x Support'. Updated Readme. All the tests pass (in all 4 combinations of 'build hadoop version' and 'runtime hadoop version).

@rangadi rangadi was assigned May 13, 2013
@traviscrawford traviscrawford merged commit e056d00 into twitter:master May 13, 2013

1 check failed

default The Travis CI build could not complete due to an error
Details
@rangadi rangadi removed their assignment Oct 19, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment