-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ThriftBinaryDeserializer incompatible with thrift 0.9.1+ #452
Comments
I believe for spark you should use their classloader options to isolate. Unfortunately we consume our thrift via github.com/twitter/scrooge which is based off apache thrift 0.5 or so. This method is super useful (probably always, not sure why it got removed) for us since it avoids OOM's when corrupt data reports that arrays should be huge |
The option in spark is experimental and works only in cluster mode, which in my case is not possible. I understand that this maybe doesn't match with the versions used at twitter, but at the same time freezing the version of some dependency prevents other users from upgrading too. Some ideas:
|
Well its freezing the versions used by our other open source projects, util, finagle, scrooge. So unless scrooge upgrades or removes its dependency on libthrift we cannot really upgrade. You could investigate doing the maven profile thing, which you would need to include different versions of sources depending on the profile, that would be fine with me. The maven profile approach sounds like a decent approach though, would love a PR with it |
Ok I am looking at it. If we go for the maven profile I see two alternatives:
WDYT, for which one should I go? |
Sorry for the noise, had a bit of trouble to make the build pass with travis. |
Thanks for the contribution in the PR @EugenCepoi , the multiple maven profiles while having a small amount of duplication look like a good way to go here as you've done it. |
Fixes #452 - supporting thrift 0.7 and 0.9
@ianoc do you think you could make a release including the merged changes? Thanks! |
We are looking into it post holidays now, the release script makes assumptions about being on a debian/ubuntu env, so little effort to get going. Will let you know |
I never managed to build thrift 0.7 on a mac, so I guessed that you are releasing from something else, most likely debian/ubuntu. Looks like I was wrong...on the other hand you can easily use a vagrant box or docker to do the release, as in theory you only need the maven credentials for sonatype. I used a vagrant with debian to test it. |
Yeah vagrant or virtualbox is the plan, I just need to get time this week to set it up get the creds in place and do it really. Should be done before end of week |
Cool thanks! Let me know if I can be of some help :) |
FYI You can get working thrift binaries for Mac from the pantsbuild binaries project -> https://github.com/pantsbuild/binaries |
Since thrift 0.9.1 the method setReadLength from TBinaryProtocol has been removed but ThriftBinaryDeserializer is still using it.
I couldn't find why it has been removed or any replacement to it. This issue happens when using emr 4 and spark 1.5 as they use thrift 0.9.2. If "the fix" is to just remove the calls to that method and make sure it works with latest thrift releases I can open a PR.
The text was updated successfully, but these errors were encountered: