Rss shuffle data size is much larger than external shuffle service #90

YutingWang98 · 2022-12-07T00:47:15Z

Hi team,

I noticed that the RSS genereates almost 2 times more shuffle data than external shuffle service. For example, a job stage with rss has a shuffle write size of 720 TB, but external shuffle service only has 370 TB, and they are using the same inputs and codes. We also tried with some other job, and got the similar result.

We first suspect this may cause by no compression, but I took a look at the RssShuffleManager, and seems like there is compression/decompression on client side using LZ4(https://github.com/uber/RemoteShuffleService/blob/master/src/main/scala/org/apache/spark/shuffle/RssShuffleWriter.scala#L205, https://github.com/uber/RemoteShuffleService/blob/master/src/main/scala/org/apache/spark/shuffle/rss/BlockDownloaderPartitionRecordIterator.scala#L167).

Any ideas about possible causes of such a big margin of shuffle data? Thanks.

hiboyang · 2022-12-07T04:15:03Z

Hi @YutingWang98, this is interesting finding! For the size 720 TB/370 TB, is this disk file size, or the stats shown in Spark UI?

YutingWang98 · 2022-12-07T19:02:57Z

Hi @hiboyang, thank you for the reply! It is the stats from spark UI stage tab ('Shuffle Write Size / Records' value). Also, I just did more tests on a same job and here are my findings:

Remote shuffle service

rss without compression: 5.7GB
rss with compression (lz4): 3.1GB
made changes in rss and allow it to use zstd:
- compression level=1 2.4 GB
- compression level=7 2031 MB

External shuffle service

default setting (zstd compression, compression level=1): 1927 MB
without compression (spark.shuffle.service.enabled=false): 5.6GB
other compressions methods
- spark.io.compression.codec=lz4: 2.7 GB
- spark.io.compression.codec=lzf 2.7GB
- spark.io.compression.codec=snappy 2.7 GB

So, I think switching to zstd might be helpful.

hiboyang · 2022-12-08T01:26:21Z

Nice to know zstd has better compression ratio. I created a PR to support zstd in RSS: https://github.com/uber/RemoteShuffleService/pull/91/files

YutingWang98 · 2022-12-08T01:35:47Z

Thank you! WIll test our job with this new change. Also I think in spark compression, the zstd compression level 'spark.io.compression.zstd.level' is set to 1 as default. But I saw you are using level 3 as default. Is there any specific reason for it?

hiboyang · 2023-01-14T03:22:14Z

No specific reason :) I changed it to level 1 in the PR, but forgot to reply the comment here.

YutingWang98 · 2023-01-17T01:25:36Z

I saw your changes in the pull request! Thanks so much.

hiboyang mentioned this issue Dec 8, 2022

Support zstd compression for shuffle data #91

Merged

YutingWang98 closed this as completed Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rss shuffle data size is much larger than external shuffle service #90

Rss shuffle data size is much larger than external shuffle service #90

YutingWang98 commented Dec 7, 2022

hiboyang commented Dec 7, 2022

YutingWang98 commented Dec 7, 2022 •

edited

Loading

hiboyang commented Dec 8, 2022

YutingWang98 commented Dec 8, 2022

hiboyang commented Jan 14, 2023

YutingWang98 commented Jan 17, 2023

Rss shuffle data size is much larger than external shuffle service #90

Rss shuffle data size is much larger than external shuffle service #90

Comments

YutingWang98 commented Dec 7, 2022

hiboyang commented Dec 7, 2022

YutingWang98 commented Dec 7, 2022 • edited Loading

hiboyang commented Dec 8, 2022

YutingWang98 commented Dec 8, 2022

hiboyang commented Jan 14, 2023

YutingWang98 commented Jan 17, 2023

YutingWang98 commented Dec 7, 2022 •

edited

Loading