Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rss shuffle data size is much larger than external shuffle service #90

Closed
YutingWang98 opened this issue Dec 7, 2022 · 6 comments
Closed

Comments

@YutingWang98
Copy link

Hi team,

I noticed that the RSS genereates almost 2 times more shuffle data than external shuffle service. For example, a job stage with rss has a shuffle write size of 720 TB, but external shuffle service only has 370 TB, and they are using the same inputs and codes. We also tried with some other job, and got the similar result.

We first suspect this may cause by no compression, but I took a look at the RssShuffleManager, and seems like there is compression/decompression on client side using LZ4(https://github.com/uber/RemoteShuffleService/blob/master/src/main/scala/org/apache/spark/shuffle/RssShuffleWriter.scala#L205, https://github.com/uber/RemoteShuffleService/blob/master/src/main/scala/org/apache/spark/shuffle/rss/BlockDownloaderPartitionRecordIterator.scala#L167).

Any ideas about possible causes of such a big margin of shuffle data? Thanks.

@hiboyang
Copy link
Contributor

hiboyang commented Dec 7, 2022

Hi @YutingWang98, this is interesting finding! For the size 720 TB/370 TB, is this disk file size, or the stats shown in Spark UI?

@YutingWang98
Copy link
Author

YutingWang98 commented Dec 7, 2022

Hi @hiboyang, thank you for the reply! It is the stats from spark UI stage tab ('Shuffle Write Size / Records' value). Also, I just did more tests on a same job and here are my findings:

Remote shuffle service

  • rss without compression: 5.7GB
  • rss with compression (lz4): 3.1GB
  • made changes in rss and allow it to use zstd:
    • compression level=1 2.4 GB
    • compression level=7 2031 MB

External shuffle service

  • default setting (zstd compression, compression level=1): 1927 MB
  • without compression (spark.shuffle.service.enabled=false): 5.6GB
  • other compressions methods
    • spark.io.compression.codec=lz4: 2.7 GB
    • spark.io.compression.codec=lzf 2.7GB
    • spark.io.compression.codec=snappy 2.7 GB

So, I think switching to zstd might be helpful.

@hiboyang
Copy link
Contributor

hiboyang commented Dec 8, 2022

Nice to know zstd has better compression ratio. I created a PR to support zstd in RSS: https://github.com/uber/RemoteShuffleService/pull/91/files

@YutingWang98
Copy link
Author

Thank you! WIll test our job with this new change. Also I think in spark compression, the zstd compression level 'spark.io.compression.zstd.level' is set to 1 as default. But I saw you are using level 3 as default. Is there any specific reason for it?

@hiboyang
Copy link
Contributor

No specific reason :) I changed it to level 1 in the PR, but forgot to reply the comment here.

@YutingWang98
Copy link
Author

I saw your changes in the pull request! Thanks so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants