-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade common-codec from 1.3 to 1.4 to explictly eliminate misuse. #37
Conversation
Hi Angus, Here's the thing, Hadoop 0.20 actually requires codec 1.3. Now, granted, this may not, in fact, be possible, as it looks like Hadoop recently bumped the requirement up to 1.4. I am not even sure where they did that, but it's 1.3 in https://github.com/yahoo/hadoop-common/blob/yahoo-hadoop-0.20/ivy/hadoop-core.pom and 1.4 in the yahoo-hadoop-0.20.104 branch. Haven't checked what Cloudera ships. I don't really know what a good solution here is unless it's to add some compatibility mode flag to the build process and do gnarly file-substitution tricks. Thoughts? |
Probably we could roll our own Base64 implementation just like HBase? |
Or upgrade to common-codec-1.4, for new Base64(0) cannot compile in common-codec-1.3? |
You know, we can probably just add a little helper that checks if the Base64(0) constructor is available via reflection, and calls it if it is. Then it'll "just work" with whatever is on the classpath. |
Hi dvryaboy, Sorry for the delay. |
Changes look good, I'll have another pair of eyes take a look and if Raghu is cool with it, we'll pull it in. Thanks for doing this! |
*/ | ||
public static Base64 createStandardBase64() { | ||
try { | ||
return Base64.class.getConstructor(int.class).newInstance(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just call Base64(0), right? since EB will have 1.4 in its libs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Raghu,
It is intentional, for some users's Hadoop distribution comes up with the commons-codec-1.3.jar which may precede the 1.4 version in classpath. And in that case, Base64(0) fails to compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was able to compile even though my hadoop has 1.3. I don't think hadoop libs are included in classpath while compiling. Not sure if t invoking through reflection is much costlier than invoking directly. Either way, it does not affect EB since it creates only one per file.
It would be nice to have a comment inside createStandardBase64() as to why we have this work around.
thanks for the patch.
Pulled this into 1.2.4, should be out on master later today. Thanks Angus! |
Hi Guys, First sorry for the poor design above, for createStandardBase64 might output two different concrete classes(though nearly the same but have no common interface). And it might cause some runtime errors. If user uses some specific methods only existing in one version such as encodeToString in common-codecs-1.4.jar(not in common-codecs-1.3.jar) and common-codecs-1.3.jar precedes common-codecs-1.4 in java classpath at runtime, error happens. So another commit comes. Any comment? And Do I need to issue another pull request? |
Hm yeah you are right we might have runtime errors. Please for from eb-dev branch, and issue a new pull request to it. |
what is the fix? may be it is better to use static method encodeBase64(byes, isChunked) that is part of both versions : http://commons.apache.org/codec/api-1.3/org/apache/commons/codec/binary/Base64.html#encodeBase64(byte[], boolean) |
Angus, could you describe the bug in detail? How does the failure manifest.? We tried running this code in an environment where there is just 1.3, 1.3 & 1.4. It seemed to work alright. |
Hi Raghu, Try the following steps to reproduce the error:
And the error happens: The cause is that Base64 default constructor in 1.4 enables the chunking silently, which breaks the output line with more than 76 into several lines, thus TextInputFormat in hadoop can only read part of the message in one call. |
I am still not bale to reproduce. "[..] The cause is that Base64 default constructor in 1.4 enables the chunking silently, which breaks the output line with more than 76 into several lines, thus TextInputFormat in hadoop can only read part of the message in one call.[...]" This is exactly the bug this issue is supposed to fix, right? Why do you think it didn't work? I see that you don't have '-libjars' in your command to reproduce this. May be elephant-bird jar is part of TaskTracker's classpatch. Can you make sure that jar has this fix? One way to make sure is that you can log something in Codecs.createStandardBase64(). |
Angus, Raghu's asking about the "might cause some runtime errors" problem with the patch I merged into 1.2.4, not the original bug you reported. |
Hi Raghu, Sorry for the misunderstanding. (Thanks, dvryaboy.) The first pull request just works in current implementation of eb, for the two method Base64.encode() and Base64.decode() exist in both 1.3 and 1.4. But the design of the first is not elegant, for Codecs.createStandardBase64() might return two different concrete class which share no common interface between compile time and runtime(or among different runtimes), which may result in some errors. For instance, if one user invokes Base64.encodeToString() with the 1.4 at compile time, but there is also a 1.3 at runtime and 1.3 precedes 1.4 in classpath, an instance of NoSuchMethodException is thrown, for 1.3 has no such method. So in order to get a deterministic behavior, the second pull request about this comes. |
That is true for for any jar mismatch, right? |
The mismatch is quite likely to happen when used with hadoop, so we might need to include a base64 implementation with deterministic behavior just like HBase does. |
Hi Guys,
Just as the title, the change tries to prevent some silent errors if misuse happens.
The Base64 default constructor has different behavior between common-codec-1.3 and common-codec-1.4. In version 1.4. the default constructor enables the "chunking" ability which is introduced in this version for the first time, but not in common-codec-1.3.
When the common-codec-1.4 precedes common-codec-1.3 in Java classpath, which is often the case because of 1.4 in Hadoop's classpath, and the if some line length exceeds the CHUNK_SIZE(default to 76), this line gives birth to an unexpected multiple-line output.