-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chase down haphazard core dump when running mnist example main #52
Labels
bug
Something isn't working
Comments
@selitvin pointed out: "The segfault happens in memory release. We had these kind of issues with TF and related to memory allocators. Just as another point, you can try using a different memory allocator (e.g. mentioned here pytorch/pytorch#2314): |
forbearer
pushed a commit
to forbearer/petastorm
that referenced
this issue
Sep 6, 2018
forbearer
pushed a commit
to forbearer/petastorm
that referenced
this issue
Sep 6, 2018
forbearer
pushed a commit
to forbearer/petastorm
that referenced
this issue
Sep 7, 2018
forbearer
pushed a commit
that referenced
this issue
Sep 7, 2018
selitvin
pushed a commit
to selitvin/petastorm
that referenced
this issue
Oct 9, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When running the mnist main, about 2 out of every 3 run fails with a core dump, typically during the Reader open phase (before training begins). Once in my latest run, the seg fault occurs at the end of the first
train
epoch, but before the firsttest
batch, so it's very likely still during Reader construction.The core dump occurs in data page release within pyarrow libparquet:
The text was updated successfully, but these errors were encountered: