Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement BCS deserialization from_reader #6

Merged
merged 5 commits into from
Oct 13, 2023

Conversation

preston-evans98
Copy link
Contributor

This Pull Request implements BCS deserialization from implementers of std::io::Read.

Strategy

The PR aims to maintain a minimal diff over the previous implementation. To that end, the following changes have been made.

  • Make Deserializer generic over an inner type R
  • Factor as much of the implementation as possible into a new trait BcsReader, which implements BCS in terms of a few primitive operations. Implement serde::Deserialize<'de> for Deserializer<R> where Deserializer<R>: BcsReader<'de>
  • Implement BcsReader<'de> for Deserializer<&'de [u8]> using existing code
  • Implement BcsReader generically for (a version of) Deserializer<R: Read>.

Implementing BcsReader generically

The existing implementation is very close to supporting deserialization from readers. There are only two places in which its behavior relied on access to a byte slice was in the implementation of de::MapAccess.

The first case is map deserialization. bcs needs access to the serialized representation of map keys in order to enforce canonicity. When deserializing from a slice, this is straightforward - the implementation can simply deserialize a map key, and then "look back" at the original input slice to determine its serialized representation. When deserializing from a reader, however, this kind of rewinding is not generally possible. To solve this problem, I introduce a the TeeReader struct, which wraps a Reader and optionally copies its bytes into a capture_buffer for later retrieval. Then, I simply implement BcsReader for Deserializer<TeeReader<...>>.

The second case is the end method, which checks that all input bytes have been consumed. To handle this case, I simply attempt to read one extra byte from the Reader after deserialization and assert that an EOF error is returned from the underlying Reader.

Testing

All existing unit tests except for zero_copy_parse have been modified to test that deserialization from_bytes and from_reader yield the same output. That test is not applicable since readers are not capable of zero-copy deserialization.

** Stragety **
The commit aims to maintain a minimal diff over the previous implementation. To that end, the following changes have been made.
- Make `Deserializer` generic over an inner type `R`
- Factor as much of the implementation as possible into a new trait `BcsReader`, which implements BCS in terms of a few primitive operations. Implement `serde::Deserialize<'de> for Deserializer<R> where Deserializer<R>: BcsReader<'de>`
- Implement `BcsReader<'de>` for `Deserializer<&'de [u8]>` using existing code
- Implement `BcsReader` generically for (a version of) `Deserializer<R: Read>`.

The existing implementation is very close to supporting deserialization from readers. There are only two places in which its behavior relied on access to a byte slice was in the implementation of `de::MapAccess`.

The first case is `map` deserialization. `bcs` needs access to the serialized representation of map keys in order to enforce canonicity. When deserializing from a slice, this is straightforward - the implementation can simply deserialize a map key, and then "look back" at the original input slice to determine its serialized representation. When deserializing from a reader, however, this kind of rewinding is not generally possible. To solve this problem, I introduce a the `TeeReader` struct, which wraps a `Read`er and optionally copies its bytes into a `capture_buffer` for later retrieval. Then, I simply implement `BcsReader` for `Deserializer<TeeReader<...>>`.

The second case is the `end` method, which checks that all input bytes have been consumed. To handle this case, I simply attempt to read one extra byte from the `Read`er after deserialization and assert that an EOF error is returned from the underlying `Read`er.

** Testing **
All existing unit tests except for `zero_copy_parse` have been modified to test that deserialization `from_bytes` and `from_reader` yield the same output. That test is not applicable since readers are not capable of zero-copy deserialization.
@ma2bd ma2bd requested review from ma2bd and bmwill September 11, 2023 04:07
@ma2bd
Copy link
Contributor

ma2bd commented Sep 11, 2023

Thanks @preston-evans98 for the changes. Hopefully, @bmwill and I will find time for a proper code review soon.

@preston-evans98
Copy link
Contributor Author

It looks like CI failed initially due to an outdated clippy allow directive, and because cargo selected a version of thiserror which is only compatible with the 2021 rust edition. I removed the lint, and pinned thiserror to exactly the version specified in the Cargo.toml. This should be safe to do because thiserror does not appear in the crate's public API.

@preston-evans98
Copy link
Contributor Author

Hmm, looks like the MSRV failing due to unrelated changes. Are you opposed to checking in a Cargo.lock?

@ma2bd
Copy link
Contributor

ma2bd commented Oct 7, 2023

@preston-evans98 Thanks for investigating. I fixed the CI on the main branch. Can you rebase your PR?

Copy link
Contributor

@ma2bd ma2bd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for the new feature. This is looking pretty good. Mostly style comments. I found an issue with a corner case and proposed a tentative fix (see other comment).

src/lib.rs Outdated
pub use de::{from_bytes, from_bytes_seed, from_bytes_seed_with_limit, from_bytes_with_limit};
pub use de::{
from_bytes, from_bytes_seed, from_bytes_seed_with_limit, from_bytes_with_limit, from_reader,
from_reader_seed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't suppose anything is preventing from_reader_seed_with_limit, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I can see, no. Will add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also added from_reader_with_limit for completeness.

src/error.rs Show resolved Hide resolved
src/de.rs Outdated Show resolved Hide resolved
src/de.rs Outdated Show resolved Hide resolved
src/de.rs Outdated
}

impl<'de, R> TeeReader<'de, R> {
/// Wrapse the provided reader in a new [`TeeReader`].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Wraps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing.

src/de.rs Outdated Show resolved Hide resolved
src/de.rs Outdated
&mut self,
seed: K,
) -> Result<(K::Value, Self::MaybeBorrowedBytes), Error> {
self.input.capture_buffer = Some(Vec::new());
Copy link
Contributor

@ma2bd ma2bd Oct 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, this code won't work if a map is itself keyed by a map like BTreeMap<BTreeMap<u8, u8>, u32>
(which does compile, see https://gist.github.com/rust-play/4f4819775a567501e2dfd8e41699bbce )

We could easily return a error, say UnsupportedMapInKeyposition in the case where self.input.capture_buffer.is_some() before line 298, but I believe that there is a simple fix:

  • Make capture_buffer a vector of bytes. Let's rename it in the process, say captured_keys: Vec<Vec<u8>>
  • Add an empty vector at the end to start recording a new key.
  • When the current key is finished, pop the vector to take the bytes, but do not forget to also append the bytes at the end of the previous (still ongoing) recorded key, if any.

I made a branch pursuing this idea here: #7

@preston-evans98 I didn't write a test so I could be missing something. Your help with finalizing this PR is still much appreciated! Can you review and pick the suggested changes, add a unit test for BTreeMap<BTreeMap<u8, u8>, u32>, then update this PR?

Thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good catch! Here's a quick test case to detect the issue:

#[test]
fn test_nested_map() {
    use std::collections::BTreeMap as Map;

    let mut m = Map::new();
    m.insert(Map::from_iter([(1u8, 0u8); 5]), 3);
    let bytes = to_bytes(&m).unwrap();

    assert_eq!(from_bytes(&bytes).as_ref(), Ok(&m));
    assert_eq!(from_bytes_via_reader(&bytes), Ok(m));
}

Working on a fix now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes look good to me! I also tested that they worked with an additional layer of nesting. Pushing the changes for review 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

Cargo.toml Outdated Show resolved Hide resolved
Copy link
Contributor

@ma2bd ma2bd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@bmwill bmwill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! There's been times in the past where i've wanted this but haven't ever had the time to implement it myself so thank you :)

@ma2bd ma2bd merged commit 56cfe89 into zefchain:main Oct 13, 2023
2 checks passed
@ma2bd
Copy link
Contributor

ma2bd commented Oct 13, 2023

@preston-evans98 @bmwill This was just released in bcs v0.1.6

@preston-evans98
Copy link
Contributor Author

Fantastic, thanks for the review @ma2bd @bmwill!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants