Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add read_until_slice #6531

Closed
wants to merge 2 commits into from
Closed

Conversation

Sytten
Copy link

@Sytten Sytten commented May 3, 2024

Motivation

Right now there is no built-in way to read until a sequence of bytes is found from a BufReader.
I first improved the algorithm in https://github.com/programingjd/read_until_slice but I think this should be supported natively.

Solution

The code is very similar to read_until, but it keeps track of a match_len in the needle. That allows us to match more than one character across multiple reads, but it does make the function not cancel safe. I find this solution elegant since it doesn't require us to keep another buffer.

@Darksonn Darksonn added A-tokio Area: The main tokio crate M-io Module: tokio/io labels May 3, 2024
@Darksonn
Copy link
Contributor

Darksonn commented May 3, 2024

I am reluctant to add this. It's not cancellation safe, so there is likely to be people who use it incorrectly. There's also no precedent for it in std.

@Sytten
Copy link
Author

Sytten commented May 3, 2024

Totally fair, thats what I thought anyway. Maybe it could go in tokio-utils? Its pretty useful to lazy load files.

@Sytten
Copy link
Author

Sytten commented May 3, 2024

Also it would be possible to make it cancel safe by checking for a partial needle at the end of the existing buffer. It adds some complexity and some edge cases, but doable. If that changes you decision.

@Darksonn
Copy link
Contributor

Darksonn commented May 4, 2024

It can't be cancel safe. Back before the 1.0 release, we considered changing AsyncBufRead to allow exactly that: #2428. But we decided against it.

@Sytten
Copy link
Author

Sytten commented May 5, 2024

Right make sense, is there a trait in tokio-utils where this code could live?

Copy link
Contributor

@Darksonn Darksonn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but you can add a function similar to tokio_util::io::read_buf.

Comment on lines +73 to +92
/// Returns the first index matching the `needle` in the `haystack`. `match_len` specifies how
/// many bytes from the needle were already matched during the previous lookup.
/// If we reach the end of the `haystack` with a partial match, then this is a partial match,
/// and we update the `match_len` value accordingly, even though we still return `None`.
fn search(needle: &[u8], haystack: &[u8], match_len: &mut usize) -> Option<usize> {
let haystack_len = haystack.len();
let needle_len = needle.len();
#[allow(clippy::needless_range_loop)]
for i in 0..haystack_len {
if haystack[i] == needle[*match_len] {
*match_len += 1;
if *match_len == needle_len {
return Some(i + 1);
}
} else if *match_len > 0 {
*match_len = 0;
}
}
None
}
Copy link
Contributor

@Darksonn Darksonn May 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this logic is really tricky. It's going to need really good tests. Probably something that uses some loops to try a lot of different situations. For example, for a few different lengths of search strings, try to split some input in all possible places and ensure that you get the same result no matter how the input is split.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed but I am testing it in our app already and I didnt catch a bug yet so I am confident it is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Famous last words. :)

@Darksonn Darksonn marked this pull request as draft May 25, 2024 20:11
@Darksonn
Copy link
Contributor

I'm closing this due to inactivity, but you are welcome to reopen if you would like to continue working on this.

@Darksonn Darksonn closed this Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-io Module: tokio/io
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants