session: add `expect_eager` method #35

tzemanovic · 2022-05-13T17:32:16Z

hi, we're trying to migrate from rexpect to this crate in https://github.com/anoma/anoma/pull/1095. It was pretty smooth, thank you for this crate! We just hit couple road-blocks - one of them is that the lazy approach in Session::expect is too slow for processes with many lines of output. We get around it by adding this expect_eager variation.

zhiburt

Hi @tzemanovic

Thank you for opening the PR, and reporting the issue.

I consider that rexpect version may be faster because it uses an additional thread.
expectrl doesn't do it, we try to read from a output when necessary.

I guess the approach with a thread may make some sense.
We could create some util function for this I guess. What do you think?

Maybe Session::check help you?

zhiburt · 2022-05-14T09:09:36Z

src/session/async_session.rs

+            } else if !data.is_empty() {
+                let data_len = data.len();
+                self.stream.consume_from_buffer(data_len);
+                continue;


Is this what you wan't?

I mean potentially a mach will never be found because we drop data here.

Here's an example.

The output Hello World.
We do expect_eager("Hello World")
First read_available reads only Hello,
Second read_available reads World.
Because you drop the buffer the match will never found even though it was there.

ah, good point! I guess we should never drain the buffer for this approach

I was thinking, maybe we could still drain the unmatched buffer parts up to line breaks that would keep the buffer from growing until the match is found, but only if the Needle itself doesn't contain a line break

tzemanovic · 2022-06-01T10:14:58Z

hi @zhiburt, thanks for checking and sorry about the delay! I might have misunderstood, but from what I saw in expectrl, the slowdown was caused by reading 1 byte at time and trying to match every time, whereas rexpect reads whatever is available at once

zhiburt · 2022-06-01T12:39:32Z

I saw in expectrl, the slowdown was caused by reading 1 byte at time and trying to match every time, whereas rexpect reads whatever is available at once

I think it is true.

As far as I remember the main reason is.

expectrl/src/session/sync_session.rs

Lines 89 to 93 in 4025cce

    
           // We read by byte to make things as lazy as possible. 
        
           // 
        
           // It's chose is important in using Regex as a Needle. 
        
           // Imagine we have a `\d+` regex. 
        
           // Using such buffer will match string `2` imidiately eventhough right after might be other digit.

Could you try to call check instead of expect?
It's semantics a bit different but it reads not by byte.

PS: Maybe we could configure a read logic, whether do ready by byte or not. what do you think?

tzemanovic · 2022-06-02T13:53:00Z

I saw in expectrl, the slowdown was caused by reading 1 byte at time and trying to match every time, whereas rexpect reads whatever is available at once

I think it is true.

As far as I remember the main reason is.

expectrl/src/session/sync_session.rs

Lines 89 to 93 in 4025cce

// We read by byte to make things as lazy as possible.

//

// It's chose is important in using Regex as a Needle.

// Imagine we have a `\d+` regex.

// Using such buffer will match string `2` imidiately eventhough right after might be other digit.

Could you try to call check instead of expect? It's semantics a bit different but it reads not by byte.

PS: Maybe we could configure a read logic, whether do ready by byte or not. what do you think?

Unfortunately, check doesn't work, because the output that's being matched is not available immediately. Having a way to configure read logic would be nice and it would probably solve the issue for us.

I'm not sure if I follow the reasoning for reading byte-by-byte:

with the Regex needle, it doesn't seem like byte-by-byte reading solves that as 2 would still be matched immediately before the other digit
what is the scenario in which the EoF indication get lost? We didn't run into this issue with expect_eager

zhiburt · 2022-06-03T12:32:04Z

Unfortunately, check doesn't work, because the output that's being matched is not available immediately. Having a way to configure read logic would be nice and it would probably solve the issue for us.

I'll take a look at it.

I'm not sure if I follow the reasoning for reading byte-by-byte:

with the Regex needle, it doesn't seem like byte-by-byte reading solves that as 2 would still be matched immediately before the other digit

Imagine string "123" is ready to be read,
If we read 1 byte and match \d+ we will succeed and get "1".
If we read all what's available (with buffer limit exactly) we will get "123".

\d+ matches more then 1 digit prior more. (As far as I remember)

what is the scenario in which the EoF indication get lost? We didn't run into this issue with expect_eager

I don't remember clearly the case.

But as I do it's more not about EOF (comment is outdated probably).
But because the behavior of some platforms are different in case when the process dies.

On linux we can read something even when the process is gone.
On mac/bsd we don't.

From what I remember we actually should not be allowed to do it on Linux either but we do.

So imagine you do echo "Hello World".
You process is immediately is gone.

So it's like a mechanism to read everything possible whenever it's possible.

Maybe wrong in my comments here.

The point is this comment about EOF is probably outdated.
We shouldn't lose it when read everything but we may lose content?

ref: #19

zhiburt · 2022-06-03T22:19:52Z

You can check out the new approach to Session::expect (the one we discussed)

Change your Cargo.toml to

expectrl = { git = "https://github.com/zhiburt/expectrl/", branch = "configuration-of-expect-logic" }

tzemanovic · 2022-06-07T10:38:49Z

You can check out the new approach to Session::expect (the one we discussed)

Change your Cargo.toml to
expectrl = { git = "https://github.com/zhiburt/expectrl/", branch = "configuration-of-expect-logic" } 

Thank you, this branch works fine for us too! (Just noticed couple things in 8040332 - s/gready/greedy and sync_session::Session::expect_gready has the rustdocs swapped with expect_lazy.

Imagine string "123" is ready to be read,
If we read 1 byte and match \d+ we will succeed and get "1".
If we read all what's available (with buffer limit exactly) we will get "123".

I think when the regex is \d+ that's expected behavior - in a test where you care about the whole number being matched, you should match till the end of line (e.g. \d+$) or up the next expected character, but I don't think reading byte-by-byte helps here as you'll still only get the partial match

zhiburt · 2022-06-07T21:15:51Z

You're right 👍

fixed 4c43fe1

I think when the regex is \d+ that's expected behavior - in a test where you care about the whole number being matched, you should match till the end of line (e.g. \d+$) or up the next expected character, but I don't think reading byte-by-byte helps here as you'll still only get the partial match

It is; and you're right.
I just think that it may not be apparent for some people that's why the comment is there.

zhiburt · 2022-06-07T21:17:04Z

If you don't mind I'll close this PR?

Thank you for your contribution.
And take care.

tzemanovic · 2022-06-08T12:04:22Z

Thank you for your contribution. And take care.

Thank you, you too!

session: add expect_eager method

91c135a

tzemanovic mentioned this pull request May 13, 2022

session: allow to borrow the stream mutably #36

Closed

zhiburt reviewed May 14, 2022

View reviewed changes

zhiburt closed this Jun 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

session: add `expect_eager` method #35

session: add `expect_eager` method #35

tzemanovic commented May 13, 2022

zhiburt left a comment

zhiburt May 14, 2022

tzemanovic Jun 1, 2022

tzemanovic Jun 2, 2022

tzemanovic commented Jun 1, 2022

zhiburt commented Jun 1, 2022 •

edited

tzemanovic commented Jun 2, 2022

zhiburt commented Jun 3, 2022 •

edited

zhiburt commented Jun 3, 2022 •

edited

tzemanovic commented Jun 7, 2022

zhiburt commented Jun 7, 2022 •

edited

zhiburt commented Jun 7, 2022 •

edited

tzemanovic commented Jun 8, 2022

session: add expect_eager method #35

session: add expect_eager method #35

Conversation

tzemanovic commented May 13, 2022

zhiburt left a comment

Choose a reason for hiding this comment

zhiburt May 14, 2022

Choose a reason for hiding this comment

tzemanovic Jun 1, 2022

Choose a reason for hiding this comment

tzemanovic Jun 2, 2022

Choose a reason for hiding this comment

tzemanovic commented Jun 1, 2022

zhiburt commented Jun 1, 2022 • edited

tzemanovic commented Jun 2, 2022

zhiburt commented Jun 3, 2022 • edited

zhiburt commented Jun 3, 2022 • edited

tzemanovic commented Jun 7, 2022

zhiburt commented Jun 7, 2022 • edited

zhiburt commented Jun 7, 2022 • edited

tzemanovic commented Jun 8, 2022

session: add `expect_eager` method #35

session: add `expect_eager` method #35

zhiburt commented Jun 1, 2022 •

edited

zhiburt commented Jun 3, 2022 •

edited

zhiburt commented Jun 3, 2022 •

edited

zhiburt commented Jun 7, 2022 •

edited

zhiburt commented Jun 7, 2022 •

edited