Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory tracking issue: worker OOM in DictionaryValuesWriter #21745

Closed
davseitsev opened this issue Apr 29, 2024 · 6 comments · Fixed by #21801
Closed

Memory tracking issue: worker OOM in DictionaryValuesWriter #21745

davseitsev opened this issue Apr 29, 2024 · 6 comments · Fixed by #21801
Assignees

Comments

@davseitsev
Copy link

davseitsev commented Apr 29, 2024

Under the load some Trino workers experience memory starvation. On the graph it looks like horizontal line at the top.
OOM doesn't happen but the worker disappears from the cluster and running queries fail. It happends due to long Major GC.
image
This behaviour persists after Trino upgrade from 409 to 444.

Heap dump looks ok there is no obvious memory leak issue. So probably the issue is in memory accouning in memory context.
I looked over the biggest objects and found something interesting.
image

DictionaryValuesWriter collects heap byte buffers which backing bytes size is much bigger that logical size. As you can see on the screen, logical size of the buffer is 199 bytes when it's backing bytes size is almost 200K. We collect such buffers in dictionary and do not account their "extra" bytes. As you can see on the screen, DictionaryFallbackValuesWriter accepted 620K of raw data, but actually it takes 122MB of heap space.

I went over all instances of PlainBinaryDictionaryValuesWriter and calculated how many extra bytes they take:

select 
  sum(cast(o.this['encodedValues.currentSlabPos'] as int) + retainedSize(o.this['encodedValues.slabs'])) as total_allocated_size,    
  sum(retainedSize(o.this)) total_retained_size
from "org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter" o

Results:

 total_allocated_size | total_retained_size
--------------------------------------------
          149 845 365 |       7 615 653 296
--------------------------------------------

There are about 7.5GB of data in "extra" bytes. I followed thought the source code and I didn't find any other place where we take these bytes into account in memory context. So it can cause OOM workers.

Also as far as I can see, PlainBinaryDictionaryValuesWriter does not take into account the size of binaryDictionaryContent map in the method getAllocatedSize() at all.

@raunaqmorarka
Copy link
Member

@davseitsev image attachment links above are not working

@davseitsev
Copy link
Author

Uploaded images again, should be ok now

@raunaqmorarka
Copy link
Member

@davseitsev thanks for reporting this
I think #21801 should solve this, can you verify that it fixes the problem for you ?

@davseitsev
Copy link
Author

I will port the fix and test it on the same query.

@davseitsev
Copy link
Author

Thank you, @raunaqmorarka, it looks good, there are no more heap buffers. The difference between allocated size and actual retained heap is much smaller.

The query

select 
  sum(cast(o.this['encodedValues.currentSlabPos'] as int) + retainedSize(o.this['encodedValues.slabs'])) as total_allocated_size,    
  sum(retainedSize(o.this)) total_retained_size
from "org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter" o

Returns

 total_allocated_size | total_retained_size
--------------------------------------------
           51 573 871 |       1 341 402 000
--------------------------------------------

As I can see binaryDictionaryContent is still not accounted in memory context but the size of the dictionary is much smaller now, I think it's ok.

@raunaqmorarka
Copy link
Member

Thanks for verifying, DictionaryValuesWriter is in fact accounting for binaryDictionaryContent through dictionaryByteSize which is updated by PlainBinaryDictionaryValuesWriter#writeBytes. What's missing is that we only account for size of values in the map Object2IntMap<Binary> binaryDictionaryContent and ignore size of the keys and overall map structure. This will be possible to improve when we eventually stop using parquet-mr code and write our own implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants