New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressing Output #533

calinburloiu opened this Issue Aug 1, 2013 · 4 comments


None yet
5 participants
Copy link

calinburloiu commented Aug 1, 2013

Currently, the output cannot be compressed in Scalding. The Hadoop configuration property mapred.output.compress is ignored, so setting it to true both thorough command line with -D mapred.output.compress=true or by overriding config method of Job class has no effect.

As noted in a post on cascading-user group, cascading's TextLine and TextDelimited override this property to false in JobConf.


This comment has been minimized.

Copy link

locked-fg commented Sep 22, 2014

Sorry I don't get the merged PR / or the meaning of it:
should it work if I use 0.12.0rc3? Because I don't get it to work.
Currently I just use the classes from here:


This comment has been minimized.

Copy link

Trevoke commented Apr 28, 2017

Is this still true?


This comment has been minimized.


This comment has been minimized.

Copy link

gerashegalov commented May 9, 2017

FYI: it's still true due to using Cascading's default in a hard-coded way which #903 aimed to address.
Simple repro in REPL:

scalding> useHdfsLocalMode
scalding> customConfig.get("mapreduce.output.fileoutputformat.compress")
scalding> import com.twitter.scalding.source._
scalding> TypedPipe.from(List(1 -> 2, 3 -> 4)).save(TypedText.tsv("test"))
scalding> fsShell("-ls test")
Found 2 items
-rw-r--r--   3 user group          0 2017-05-09 21:46 test/_SUCCESS
-rw-r--r--   3 user grouo          8 2017-05-09 21:46 test/part-00000

scalding> fsShell("-cat test/*")
1       2
3       4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment