-
Notifications
You must be signed in to change notification settings - Fork 337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: A branch with deleted objects is very ineffective with small 'amount' param #7864
Comments
Yikes! This is obviously bad. But please do NOT fix without a careful analysis. Obviously fetching larger batches will give better latency. But it may still consume significantly more RCUs. Ref:
What makes it tricky is when we don't have many tombstones (uncommitted deletes). In a Spark workload there are a huge number of listObjects operations with amount=1. Now if we Scan to fetch, say, 50 elements, we will experience 50x amplification -- and run out of RCUs. An interim solution might increase the batch size when the iterator encounters tombstones. Another possible patchAnother patch might be to add a flag to listObjects that allows us to ask not to receive hasMore. That reduces the number of items which we need to find 2x. But actually for Spark / DBIO / lakeFSFS, this will be much better than 2x! more importantly, |
Relevant: #2092, of course (but this is the quicker win). |
I think the underlying issue here is pushing application level logic to the kv store implementation. The application level logic (graveler) is "I need 2 non-tombstones entries", what the kv store gets is "give me 2 entries". I don't think graveler knows much about the kv specific implementation, and it shouldn't control it in the essence of how many entries to fetch with every call to the iterator. The |
Reopenning since it wasn't fixed for CosmosDB |
Consider the following branch staging area:
A client, most likely our HadoopFS implementation, wants to check if the
foo/
prefix is empty.It will
listObjects
withamount=1
, just to find anything under that prefix.The amount will propagate all the way to the
kvstore
iterator that will usebatchSize=amount+1
, in this case2
. The iterator will continue to use this batching value during the next fetches, as the wrapperStagingIterator
calls it to bring more values. The reason that the wrapping iterators calls for more fetches although it's only looking for a single object, is that tombstones (Graveler's delete markers) are being filtered (i.e. they mark that an object does not exist).In the example above, the iterator will perform 51 list calls to the kvstore. Very ineffective, and potentially not finishing on time for a HTTP request.
The text was updated successfully, but these errors were encountered: