-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rss shuffle data size is much larger than external shuffle service #90
Comments
Hi @YutingWang98, this is interesting finding! For the size 720 TB/370 TB, is this disk file size, or the stats shown in Spark UI? |
Hi @hiboyang, thank you for the reply! It is the stats from spark UI stage tab ('Shuffle Write Size / Records' value). Also, I just did more tests on a same job and here are my findings: Remote shuffle service
External shuffle service
So, I think switching to zstd might be helpful. |
Nice to know zstd has better compression ratio. I created a PR to support zstd in RSS: https://github.com/uber/RemoteShuffleService/pull/91/files |
Thank you! WIll test our job with this new change. Also I think in spark compression, the zstd compression level 'spark.io.compression.zstd.level' is set to 1 as default. But I saw you are using level 3 as default. Is there any specific reason for it? |
No specific reason :) I changed it to level 1 in the PR, but forgot to reply the comment here. |
I saw your changes in the pull request! Thanks so much. |
Hi team,
I noticed that the RSS genereates almost 2 times more shuffle data than external shuffle service. For example, a job stage with rss has a shuffle write size of 720 TB, but external shuffle service only has 370 TB, and they are using the same inputs and codes. We also tried with some other job, and got the similar result.
We first suspect this may cause by no compression, but I took a look at the RssShuffleManager, and seems like there is compression/decompression on client side using LZ4(https://github.com/uber/RemoteShuffleService/blob/master/src/main/scala/org/apache/spark/shuffle/RssShuffleWriter.scala#L205, https://github.com/uber/RemoteShuffleService/blob/master/src/main/scala/org/apache/spark/shuffle/rss/BlockDownloaderPartitionRecordIterator.scala#L167).
Any ideas about possible causes of such a big margin of shuffle data? Thanks.
The text was updated successfully, but these errors were encountered: