Wait in click() #413

jonthegeek · 2024-07-11T13:32:02Z

Closes #405.

I tried to make this automatic, and got it working some of the time, but I couldn't find a way to detect that something might change, and thus wait in that situation. I tried to make it as clear as possible for users to be able to fix this. Once a strategy is agreed on, the same strategy should probably be applied to other methods.

An approach I worked on but failed to finalize would be to either wait for a loadEventFired() event or time out (probably after a relatively brief time, and only if no events occur in that window, or something along those lines). I couldn't find a clean way to check if anything is in progress and thus wait if so; if we're waiting for a loadEventFired that never fires, we'll introduced a new error for many click events.

I also tried adding private$refresh_root() as a callback on loadEventFired(), which might be a useful approach but it still doesn't solve the issue of users not waiting for that to occur before executing their next step.

I like this better than status quo (since it makes it possible to deal with these situations), but I'm not sure that it's quite there yet.

You work at posit now, not rstudio.

Closes tidyverse#405. I tried to make this automatic, and got it working some of the time, but I couldn't find a way to detect that something might change, and thus wait in that situation. I tried to make it as clear as possible for users to be able to fix this. Once a strategy is agreed on, the same strategy should probably be applied to other methods.

hadley · 2024-07-15T14:55:06Z

R/live.R

+            if (grepl("-32000", cnd_message(cnd))) {
+              cli::cli_abort(
+                c(
+                  "Can't find root node.",


I think I'd rather make refresh_root() public and then suggest that the user call it. But I'd also want to be 100% sure we shouldn't just call refresh_root() on their behalf.

The problem is it might not be ready yet. I feel like there has to be a way to essentially wait for the loadEventFired only if the page is navigating or reloading, but I couldn't get there.

Maybe we should always be waiting on the loadEventFired so we can update the root node.

I did put refresh_root() into loadEventFired as a callback at one point, and that's probably worth re-implementing. The problem occurs when the user tries to do something that results in a find_nodes() after a click, but before everything has processed. If the click results in a loadEventFired() and we don't wait before find_nodes(), the root will be wrong (which triggered a thought, see below). But if we do wait when they click somewhere that doesn't navigate/reload, the loadEventFired() never occurs and times out, so we make them wait for nothing (and we throw a confusing warning or error).

But... something knows that the page is in an invalid state, because there's an error when you try to find_nodes() too soon. So in theory at THAT point waiting on loadEventFired() should work. I feel like that's getting into the issues that are discussed in https://rstudio.github.io/chromote/#loading-a-page-reliably though, so that might not solve it.

Perhaps create the promise right away on click (and anything else that might change the page), but only wait for it to resolve when there's an indication that it hasn't resolved? Ie, inside click we'd do something like this:

private$load_promise <- self$session$Page$loadEventFired(wait_ = FALSE)

And then in find_nodes() we could wait for that promise to resolve and try again (inside this try_fetch(), probably). Async programming still always feels weird to me, so I can't decide if this is hacky, correct, or somewhere in-between.

We might also have to capture the timeout message if the promise never resolves, to avoid random confusing warnings showing up for the user.

Thanks for the detailed explanation! I think that approach sounds like a way forward, and then I can double check it with Joe/Winston.

It's closer now, I think, but it fails ungracefully for "normal" clicks, and I haven't yet found a way to clean this up. To see what I mean, run the "can click a button" test in test-live.R a few times. Or, more specifically:

sess <- read_html_live(html_test_path("click")) sess$click("button") Sys.sleep(10) rm(sess)

I can see where that happens in {chromote}, and it makes sense that it does so, but the only way I can think of to avoid the error message would be to set the timeout to Inf, and that just means we could stack up promises that never resolve. Or maybe line 149 of live.R should do something fancier when it sets up private$loadEvent_promise, to capture (and silently discard) the promise if it times out.

jonthegeek · 2025-07-04T16:09:25Z

Another summer trip = another look at scraping fancier websites! I think I understand promises much better now than I did a year ago, so I'm seeing if I can do this a bit more "right". The thing that led me back was a page that never fires its load event, so there might end up being a second wave of updates after the first wave (mostly to export more to make it more possible for users to deal with advanced topics).

If you already have a separate effort to make LiveHTML behave better, let me know, and I'll hold off.

Update a snapshot that I shouldn't have changed. I'm assuming this is from an underlying change on the page, but we can revert this if I broke GHA by accepting it.

I used httr2 as an example of current best practice.

jonthegeek · 2025-07-06T01:18:52Z

This is getting closer to a "best of both worlds" solution. Users can set up their own promise to wait for, or use the loadEventFired when that's relevant. It should work by default in most cases, but is fixable (and gives tips) if the page is slow or anything like that.

I can definitely see ways to make this more robust (eg, waiting for an actual timeout in the try_fetch() calls and implementing an async option throughout), but this is much more solid than the status quo.

I implemented the same fix in all the "event" methods, so users can set up promises to wait for if the thing they're doing results in a navigation. If a user hits the error message that led to this PR (which should be rarer now to begin with), it now tells them how to fix the issue.

Once this is sorted out, I could write a vignette showing some of the trickier options. I have a TidyTuesday in mind that will build off of this (and use a custom promise). I'll see if I can get it working with this and sort out a vignette if so.

jonthegeek · 2025-07-06T16:06:14Z

I think I just figured out a simpler API (no need for wait_for args, but options for more explicit async for advanced users) that will also be more robust. I'm on the road, but will update the PR when I stop.

jonthegeek · 2025-07-07T19:16:54Z

@hadley is like to add a wait argument to this method (and the other "action" methods) as part of this update, to allow advanced users to return a promise rather than always waiting for a result. If I do that, is wait OK (rather than the underlying wait_ from chromote)? The underscore makes it scary and weird 🙃

My current plan is to add a done arg (in place of the wait_for arg in the most recent commit), to set up a promise that signifies that this action has resolved. That would default to "auto", which will briefly wait for a documentUpdated event and assume navigation on the current page if none occurs. That should work in most cases, but I plan to also accept "load" to explicitly wait for loadEventFired, and then use an updated wait_for_selector for any other strings. We could also accept numbers for a simple delay. We'd also accept a promise object in case the user wants to get more fancy.

That would fix #405 + work for my next case (a page with a sort of "infinite scroll" that causes it to never fire the load event) + I think work for any other arbitrarily weir page... And those are precisely the sorts of pages that require this LiveHTML stuff in the first place.

I have most of the code theoretically worked out in conversations with Google Gemini during a road trip, I just need to sit at my computer and stitch snatches of ideas together and clean out the hallucinations I've presumably missed, etc. I'll probably get some of it done tonight but I'm not sure whether I'll finish).

Let me know what you think!

hadley · 2025-07-07T19:27:02Z

Given that this would change the return type, I think I'd prefer (e.g.) $click_async() like ellmer does.

hadley · 2025-07-07T19:30:30Z

And I've asked @shikokuchuo to take a look general; he is way better at async than me 😄.

jonthegeek · 2025-07-07T19:39:01Z

Don't review the code as it is now in any case. I'm a lot better at async than I was 2 days ago, at least it sure seems to make more sense now!

Alternate methods for async makes sense to me! I'll take a closer look at ellmer before I implement these changes, too!

jonthegeek · 2025-07-07T21:31:00Z

OK, I don't think my implementation will be quite like ellmer under the hood, but I can definitely split the methods that way. I'll almost definitely call the async versions from the normal versions, then self$session$wait_for() the resulting p (then return invisible(self) in most cases, I think; whatever it's doing now). That will make sure they don't drift apart in functionality. The biggest change will be around smarter ways to figure out when the action has completed, which I'm PRETTY sure I have all ready to go!

shikokuchuo · 2025-07-08T09:46:07Z

I agree, splitting the methods for the different return types is the way to go.

I much prefer being explicit (even if it requires the user to know a bit more) rather than implement heuristics, as timing in this context is just so uncontrollable. Something that generally 'just works' and then breaks on the 'n'th try is to be avoided in my view. But sounds good that you have something in mind for getting the event completion.

Don't review the code as it is now in any case.

Just ping me once you're ready (no time pressure!)

jonthegeek · 2025-07-08T11:58:57Z

I much prefer being explicit (even if it requires the user to know a bit more) rather than implement heuristics, as timing in this context is just so uncontrollable. Something that generally 'just works' and then breaks on the 'n'th try is to be avoided in my view. But sounds good that you have something in mind for getting the event completion.

I generally agree, but I'm trying to make it less likely to throw an error, and to make it throw something informative when it has to throw an error. The status quo throws an ugly, confusing, unfixable error any time an action causes navigation to a new page, and my initial fix throws either a warning or an error (I'd think an error but it doesn't halt anything so I can't remember for sure) long after the event (when the chromote session ends), about a timeout that didn't matter.

If the user is working interactively, these errors and warnings often don't matter anyway (the user can view() and continue when they see the page load, and just waiting a second to type the next bit of code will likely give the browser time to get things ready even without a view()), and I want the warnings and errors to be informative in cases where they do matter.

So the thing I plan to submit later does include a bit of a heuristic that I expect to "just work" in almost all cases (I'll do some testing to validate that, but obviously I can't test every scenario / hardware config / internet connection). To me it feels like rvest's LiveHTML is aiming to be the "usually just works" solution for people who haven't dug deep into chromote and aren't comfortable with promises. But then when it fails it will explain why, and give the user options to fix their code without having to learn all of the intricacies of chromote and chrome dev tools in most cases. I poured over that documentation and poked and prodded at things, to figure out what users probably want. And then, of course, if their situation is stranger, we'll give them the ability to implement a custom solution while keeping rvest's friendly wrapper.

That is, assuming it actually works how I think it will! I still haven't had a chance yet to sit down and actually type and test the latest iteration of what I think should work, but I'll get back to Austin sometime today and take some time to work things out!

… way things work (other than the fact that I added some pipeable wrappers for the methods), but it made it much easier for me to reason about. The next update will pretty much completely change the way waiting works, but it should be much easier to see it at that point!

jonthegeek added 4 commits July 9, 2024 08:46

Update snapshot for session.

e223d9d

You work at posit now, not rstudio.

Document.

d945007

Add NEWS bullet.

6602675

hadley reviewed Jul 15, 2024

View reviewed changes

More-automatic handling of clicks that navigate.

bcff212

sebastiansteffen mentioned this pull request Oct 9, 2024

Failed to deserialize params.nodeId when attempting to click a button #425

Closed

ucb mentioned this pull request Mar 14, 2025

LiveHTML object corrupted after $click() #405

Open

Merge commit '78b3e1ff38cd55e217ca61ac1fe0cbf3b33b5b75'

c13d0b5

jonthegeek added 3 commits July 4, 2025 12:22

Update docs.

730e73c

Update a snapshot that I shouldn't have changed. I'm assuming this is from an underlying change on the page, but we can revert this if I broke GHA by accepting it.

Update GHA.

0729f56

I used httr2 as an example of current best practice.

Better promise handling.

0e61020

jonthegeek marked this pull request as draft July 6, 2025 16:08

jonthegeek added 2 commits July 9, 2025 22:00

Use local maps.

17ca01a

Wait in click() #413

Are you sure you want to change the base?

Wait in click() #413

Uh oh!

Conversation

jonthegeek commented Jul 11, 2024

Uh oh!

hadley Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

jonthegeek Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

hadley Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

jonthegeek Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

hadley Jul 15, 2024

Choose a reason for hiding this comment

Uh oh!

jonthegeek Jul 16, 2024

Choose a reason for hiding this comment

Uh oh!

jonthegeek commented Jul 4, 2025

Uh oh!

jonthegeek commented Jul 6, 2025

Uh oh!

jonthegeek commented Jul 6, 2025

Uh oh!

jonthegeek commented Jul 7, 2025

Uh oh!

hadley commented Jul 7, 2025

Uh oh!

hadley commented Jul 7, 2025

Uh oh!

jonthegeek commented Jul 7, 2025

Uh oh!

jonthegeek commented Jul 7, 2025

Uh oh!

shikokuchuo commented Jul 8, 2025

Uh oh!

jonthegeek commented Jul 8, 2025

Uh oh!

Uh oh!