-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
De-duplication in BlobDataProvider #2062
Conversation
No code size change for the first commit. Waiting for the second one to come in. |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Need to resolve the testdata conflict again, but this is good enough for review |
.map_err(|kind| kind.with_req(key, req))?; | ||
blob.buffers | ||
.get(idx) | ||
.ok_or(DataErrorKind::InvalidState.with_req(key, req)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the invariant during construction would remove the invalid state possibility
@@ -68,17 +69,34 @@ impl BlobDataProvider { | |||
BlobSchema::deserialize(&mut postcard::Deserializer::from_bytes(bytes)).map( | |||
|blob| { | |||
let BlobSchema::V001(blob) = blob; | |||
blob.resources | |||
blob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're adding new invariants to BlobSchemaV1
, which I think we should verify here. Something like
blob.keys.iter_values().max() < blob.buffers.len()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could check the invariant upon construction, but it means we need to scan over the buffer. I'd rather lean towards GIGO behavior than having a somewhat expensive validation operation for a case that should never happen.
I will make it a debug assertion instead.
#[doc(hidden)] // See #1771, we don't want this to be a publicly visible API | ||
pub fn get_map(&self) -> &ZeroMap2dBorrowed<ResourceKeyHash, [u8], [u8]> { | ||
self.data.get() | ||
#[doc(hidden)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: I don't like having this just for the fingerprinting script. I'll try to add fingerprinting to the exporter.
.data | ||
.buffers | ||
.get(idx) | ||
.ok_or_else(|| DataErrorKind::InvalidState.with_req(key, req))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like this InvalidState
error, because it doesn't tell the client anything. Can we make it a custom error with a message or camouflage it as a postcard error (with a message)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally agree with you that InvalidState
errors are not great because they don't tell the client anything, but I think your other suggestions are even worse (especially camouflaging it as a postcard error). I can attach a message string to the InvalidState error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how Custom
with message is even worse than InvalidState
with message. This is arguably not even invalid "state", it's an invalid input (the bytes buffer), which is why I suggested pretending that it's a postcard error. If we checked the invariant during deserialization it would become a postcard error, and for most invalid blobs it will be a postcard error (because the ZeroMap or its contents don't deserialize), but for a very small subset of invalid inputs we return InvalidState
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the other place where we use InvalidState
is in the try_unwrap_owned
function.
for a very small subset of invalid inputs we return
InvalidState
instead.
I'm convinced by this argument. With postcard, we do
crate::DataError::custom("Postcard deserialize").with_display_context(&e)
so we can use custom
here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re try_unwrap_owned
, I'd be happy to use custom
there and remove the InvalidState
error kind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was confused that decimal-bn-en.postcard
reduced in size, but it's because it also includes und
. Is that intended?
This would be something potentially fixed by #834 |
Fixes #838