-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add new codec
option for compression in Spark-Tensorflow connector
#131
Conversation
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
I signed it! |
CLAs look good, thanks! ℹ️ Googlers: Go here for more info. |
@skavulya Mind doing a code review? |
@jhseu Sure, I'll review it. Thanks! |
@vgod-dbx Thank you so much for the contribution. It looks good. Please add a description and example usage of the codec option to the README under the features section before merge. |
@skavulya README updated! Thanks for the review. |
…ensorflow#131) * support option 'codec' for compression * add `codec` option to README
@vgod-dbx what version did this make it into? I'm on 1.13.1 and it seems to ignore the |
Hi it seems that the codec is ignored actually |
With #125, it became possible to output gzipped TFrecords by setting
spark.hadoop.mapreduce.output.fileoutputformat.compress
in the globalSparkConf
.However, there's no way to only enable compression for individual DataFrame outputs.
This PR adds a new option
codec
to the Spark-Tensorflow connector for enabling compression in individual DataFrameWriter. With this, we don't need to setspark.hadoop.mapreduce.output.fileoutputformat.compress
globally anymore.Sample usage: