New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
shuffle server OOM #49
Comments
can you share the client's config? |
spark.properties
|
I can't tell the root cause for now. Shuffle server's memory is composed with buffer of write + buffer of read + metadata used, and there shouldn't be OOM with your configuration. |
OOM also happened. I called the metrics API(/metrics/jvm, /metrics/server) to check the buffer usage and jvm metrics: buffer-related metrics are all 0 or very small, but shuffle server metrics:
jvm metrics
|
It seems to be killed by kernel. You should use a virtual machine that have more memory. |
I deployed the shuffle server release 0.1.0 version on a 16c64g machine with
XMX_SIZE="55g"
configuration. When running a spark application, the shuffle server memory will continue to grow, and eventually, it will grow to about 60g to trigger the OOM and exit.server.conf
rss-env.sh
Is my configuration incorrect or there is a memory leak in the program?
The text was updated successfully, but these errors were encountered: