-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split elephant-bird into modules #201
Conversation
…his is the one used in the wild
…. Move to Maven to make this easier.
Raghu, the maven-release-plugin can take care of version ids in the parent and all sub modules if you go that route: http://maven.apache.org/plugins/maven-release-plugin/prepare-mojo.html#autoVersionSubmodules |
huge thanks Johan. will go through more thoroughly. |
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" | ||
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> | ||
<modelVersion>4.0.0</modelVersion> | ||
<groupId>com.twitter</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add parent info? We can configure dependency management in parent pom.xml to simplify dependency version management and other build plugins across all sub modules that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very well versed in the Maven world. What exactly is the parent info you are referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I found it. Adding now.
Can you update the readme with a section about building, testing and deploying to the in-git-repo repository? This is a pretty major build change and we should point users in the right direction. |
<dependency> | ||
<groupId>com.twitter</groupId> | ||
<artifactId>elephant-bird-core</artifactId> | ||
<version>2.2.3-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to set the project version in one place and have all sub projects use it? I could see these version strings getting out of date, not just for the EB release, but for stuff like protobuf. Can we reference properties or something instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can specify all dependency version info in parent pom.xml within dependencyManagement section. Also, if each sub module pom.xml contains parent section, we can leave off the sub module's groupId and version elements.
I've updated the readme and fixed the broken test. I also added the two plugins to our maven repo so they won't disappear. Using the hadoop-lzo from the local dir until we can get that in a repo. I believe this covers all of the issues pointed out above. |
<groupId>com.twitter</groupId> | ||
<artifactId>elephant-bird</artifactId> | ||
<version>2.2.3-SNAPSHOT</version> | ||
</parent> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May also want element in here and in other submodule pom.xmls:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, not following unfortunately. What element? What does it do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like github filtered out the html element from my earlier comment :(
I was referring to parent/relativePath element, which just points submodule to directory containing parent pom.xml if it's likely to be close by. Generally, this is just "..".
When I push the hcatalog dep config down into eb hive module, I get this error:
Not sure if we should worry about this one, but we could alleviate it by publishing this jar into some publicly accessibly maven repo and updating repository config for EB hive module. |
https://issues.apache.org/jira/browse/HCATALOG-132 -- looks like Travis got the pom generated, but auto-publishing isn't happening yet. |
Does the hcatalog jar contain modified code or can we simply change that to one of their released artifacts? |
If we just need one of their artifacts but it's not in a maven repo yet we can simply put it in maven.twittr.com till they do. |
We need the 0.5 snapshot (which is a moving target, naturally). But afaik there are no custom patches. Travis is the authority on this one. |
- Removes yamlbeans from dependency management as it's not used by any modules. - Moves relative system path of hcatalog dep from parent pom to hive pom. - Adds slf4j impl jar dep to submodules. This dep is optional by default so each submodule has to specify what slf4j impl jar they want to use. - Forces junit47 provider within default surefire configuration. I tested parallel test execution, but some tests fail in that mode.
I've made a few small updates to the poms. Please see https://github.com/johanoskarsson/elephant-bird/pull/2 Question: The pig and mahout module tests produce a lot of logging output. We can add src/test/resources/log4j.properties configurations to reduce the console output to WARN / ERROR messages. Sound good, or should we just leave all output there? |
Seems like there are a few places in core module which still use log4j api's directly, instead of slf4j. Anyone mind if I submit a pull req to migrate to slf4j completely? |
Tweaks to poms
Looks like there's also a path for deployment of 3rd-party artifacts to Sonatype-- another option for the hcatalog jar: https://docs.sonatype.org/display/Repository/Uploading+3rd-party+Artifacts+to+The+Central+Repository I've been working more on release process and configuration for deployment to Sonatype. Hoping to have another pull request ready later tonight. |
Ok the latest seems to have worked for me ("seems" because I am now working through other issues. But at least EB was fetched and the compile is no longer complaining about transitive dependencies like Hadoop). |
- Adds Johan to contributors section of Readme - Adds developers section - Adds parent section to parent pom which references Sonatype's root pom - Removes conflicting configuration from pluginManagement section
Some small updates to support deployment to Sonatype OSS repos: |
Updates parent pom.xml to include required Sonatype OSS config
Here's my +1 on this branch. The one remaining thing I'd wanted to test is full staging release to Sonatype repo, but this is awkward till after this branch has been merged. I'd also like to revisit name of submodule directories, in case the maven release plugin is more forgiving of those now.. I prefer simpler names i.e. |
the output from tests does not go console using ant. |
@rangadi I had tested adding a test-specific log4j.properties configuration which pushes test logging output to a file. This is a viable solution, but we'll have to add it to path src/test/resources/ within every submodule. In terms of release process, I'd like to try and stick with the release plugin if at all possible. The only issue with using submodule directory names which don't match the module names is that SCM paths must be explicitly defined within submodule poms. Otherwise, they default to SCM paths based on module names, and not the actual directory names. |
explicitly defining paths does not seem that bad. it is done only once. The directory names are the ones we come across all the time. adding log4j.properties to each directory is not bad either.. |
I'll make those two things happen and request another pull from Johan into this branch. |
Thanks guys. I will make a branch-0.3 later today and you can even publish the jars. will merge into master in a few days.. |
branch-0.3 or branch-3.0? |
3.0 of course, or we could merge directly the master. What do you guys think? |
Also updates .gitignore to reflect new module paths.
Renames all submodule directories, removing "elephant-bird-" prefix
thanks guys. |
I split the project into core, pig, hive, cascading2 and rcfile modules. This will make it easier to depend on only the parts you need without getting a metric ton of extra jar files. The rcfile module was only since it needs both hive and pig.
I switched the project to Maven in order to easier modularize it. I don't really care what is used though, this just seemed easier. Added two plugins for thrift and protobuf code gen from Maven.
This branch is almost fully baked, but there are a couple of nits. I probably won't have time to get this all the way to the finish line, but it's almost there. Hopefully someone else can help out with these.