Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session: add expect_eager method #35

Closed

Conversation

tzemanovic
Copy link
Contributor

hi, we're trying to migrate from rexpect to this crate in https://github.com/anoma/anoma/pull/1095. It was pretty smooth, thank you for this crate! We just hit couple road-blocks - one of them is that the lazy approach in Session::expect is too slow for processes with many lines of output. We get around it by adding this expect_eager variation.

Copy link
Owner

@zhiburt zhiburt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tzemanovic

Thank you for opening the PR, and reporting the issue.

I consider that rexpect version may be faster because it uses an additional thread.
expectrl doesn't do it, we try to read from a output when necessary.

I guess the approach with a thread may make some sense.
We could create some util function for this I guess. What do you think?

Maybe Session::check help you?

Comment on lines +371 to +374
} else if !data.is_empty() {
let data_len = data.len();
self.stream.consume_from_buffer(data_len);
continue;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you wan't?

I mean potentially a mach will never be found because we drop data here.

Here's an example.

The output Hello World.
We do expect_eager("Hello World")
First read_available reads only Hello,
Second read_available reads World.
Because you drop the buffer the match will never found even though it was there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, good point! I guess we should never drain the buffer for this approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking, maybe we could still drain the unmatched buffer parts up to line breaks that would keep the buffer from growing until the match is found, but only if the Needle itself doesn't contain a line break

@tzemanovic
Copy link
Contributor Author

hi @zhiburt, thanks for checking and sorry about the delay! I might have misunderstood, but from what I saw in expectrl, the slowdown was caused by reading 1 byte at time and trying to match every time, whereas rexpect reads whatever is available at once

@zhiburt
Copy link
Owner

zhiburt commented Jun 1, 2022

I saw in expectrl, the slowdown was caused by reading 1 byte at time and trying to match every time, whereas rexpect reads whatever is available at once

I think it is true.

As far as I remember the main reason is.

// We read by byte to make things as lazy as possible.
//
// It's chose is important in using Regex as a Needle.
// Imagine we have a `\d+` regex.
// Using such buffer will match string `2` imidiately eventhough right after might be other digit.

Could you try to call check instead of expect?
It's semantics a bit different but it reads not by byte.

PS: Maybe we could configure a read logic, whether do ready by byte or not. what do you think?

@tzemanovic
Copy link
Contributor Author

I saw in expectrl, the slowdown was caused by reading 1 byte at time and trying to match every time, whereas rexpect reads whatever is available at once

I think it is true.

As far as I remember the main reason is.

// We read by byte to make things as lazy as possible.
//
// It's chose is important in using Regex as a Needle.
// Imagine we have a `\d+` regex.
// Using such buffer will match string `2` imidiately eventhough right after might be other digit.

Could you try to call check instead of expect? It's semantics a bit different but it reads not by byte.

PS: Maybe we could configure a read logic, whether do ready by byte or not. what do you think?

Unfortunately, check doesn't work, because the output that's being matched is not available immediately. Having a way to configure read logic would be nice and it would probably solve the issue for us.

I'm not sure if I follow the reasoning for reading byte-by-byte:

  • with the Regex needle, it doesn't seem like byte-by-byte reading solves that as 2 would still be matched immediately before the other digit
  • what is the scenario in which the EoF indication get lost? We didn't run into this issue with expect_eager

@zhiburt
Copy link
Owner

zhiburt commented Jun 3, 2022

Unfortunately, check doesn't work, because the output that's being matched is not available immediately. Having a way to configure read logic would be nice and it would probably solve the issue for us.

I'll take a look at it.

I'm not sure if I follow the reasoning for reading byte-by-byte:

  • with the Regex needle, it doesn't seem like byte-by-byte reading solves that as 2 would still be matched immediately before the other digit

Imagine string "123" is ready to be read,
If we read 1 byte and match \d+ we will succeed and get "1".
If we read all what's available (with buffer limit exactly) we will get "123".

\d+ matches more then 1 digit prior more. (As far as I remember)

  • what is the scenario in which the EoF indication get lost? We didn't run into this issue with expect_eager

I don't remember clearly the case.

But as I do it's more not about EOF (comment is outdated probably).
But because the behavior of some platforms are different in case when the process dies.

On linux we can read something even when the process is gone.
On mac/bsd we don't.

From what I remember we actually should not be allowed to do it on Linux either but we do.

So imagine you do echo "Hello World".
You process is immediately is gone.

So it's like a mechanism to read everything possible whenever it's possible.

Maybe wrong in my comments here.

The point is this comment about EOF is probably outdated.
We shouldn't lose it when read everything but we may lose content?

ref: #19

@zhiburt
Copy link
Owner

zhiburt commented Jun 3, 2022

You can check out the new approach to Session::expect (the one we discussed)

Change your Cargo.toml to

expectrl = { git = "https://github.com/zhiburt/expectrl/", branch = "configuration-of-expect-logic" } 

@tzemanovic
Copy link
Contributor Author

You can check out the new approach to Session::expect (the one we discussed)

Change your Cargo.toml to

expectrl = { git = "https://github.com/zhiburt/expectrl/", branch = "configuration-of-expect-logic" } 

Thank you, this branch works fine for us too! (Just noticed couple things in 8040332 - s/gready/greedy and sync_session::Session::expect_gready has the rustdocs swapped with expect_lazy.

Imagine string "123" is ready to be read,
If we read 1 byte and match \d+ we will succeed and get "1".
If we read all what's available (with buffer limit exactly) we will get "123".

I think when the regex is \d+ that's expected behavior - in a test where you care about the whole number being matched, you should match till the end of line (e.g. \d+$) or up the next expected character, but I don't think reading byte-by-byte helps here as you'll still only get the partial match

@zhiburt
Copy link
Owner

zhiburt commented Jun 7, 2022

You're right 👍

fixed 4c43fe1

I think when the regex is \d+ that's expected behavior - in a test where you care about the whole number being matched, you should match till the end of line (e.g. \d+$) or up the next expected character, but I don't think reading byte-by-byte helps here as you'll still only get the partial match

It is; and you're right.
I just think that it may not be apparent for some people that's why the comment is there.

@zhiburt
Copy link
Owner

zhiburt commented Jun 7, 2022

If you don't mind I'll close this PR?

Thank you for your contribution.
And take care.

@zhiburt zhiburt closed this Jun 7, 2022
@tzemanovic
Copy link
Contributor Author

Thank you for your contribution. And take care.

Thank you, you too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants