Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The datasize of Shuffle read and write is much larger #100

Closed
zuoliwei opened this issue Mar 21, 2022 · 2 comments
Closed

The datasize of Shuffle read and write is much larger #100

zuoliwei opened this issue Mar 21, 2022 · 2 comments

Comments

@zuoliwei
Copy link

hello, when I run the wordcount tesk in Hibench ,datasize is 5g, I found that the datasize of shuffle write and read using firestorm(Memory_HDFS mode) is much larger than using the external shuffle service with the same spark parameters
image

image

could you please analyze why it happened

@colinmjj
Copy link
Collaborator

@zuoliwei currently, Firestorm haven't function of map side combine, maybe the problem is caused by this.
with Firestorm, you can get benefit in following situation:

  1. Support spark on K8S better
  2. Performance improvement when random IO seriously with ESS
    In your test case, I think performance degradation with Firestorm is in expectation

@zuoliwei
Copy link
Author

OK ,I got it thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants