The datasize of Shuffle read and write is much larger #100

zuoliwei · 2022-03-21T14:51:11Z

hello, when I run the wordcount tesk in Hibench ,datasize is 5g, I found that the datasize of shuffle write and read using firestorm(Memory_HDFS mode) is much larger than using the external shuffle service with the same spark parameters

could you please analyze why it happened

colinmjj · 2022-03-22T02:45:08Z

@zuoliwei currently, Firestorm haven't function of map side combine, maybe the problem is caused by this.
with Firestorm, you can get benefit in following situation:

Support spark on K8S better
Performance improvement when random IO seriously with ESS
In your test case, I think performance degradation with Firestorm is in expectation

zuoliwei · 2022-03-22T03:48:20Z

OK ，I got it thank you

zuoliwei closed this as completed Mar 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The datasize of Shuffle read and write is much larger #100

The datasize of Shuffle read and write is much larger #100

zuoliwei commented Mar 21, 2022

colinmjj commented Mar 22, 2022

zuoliwei commented Mar 22, 2022

The datasize of Shuffle read and write is much larger #100

The datasize of Shuffle read and write is much larger #100

Comments

zuoliwei commented Mar 21, 2022

colinmjj commented Mar 22, 2022

zuoliwei commented Mar 22, 2022