-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Description:
Problem:
Currently, automating LLM interactions with the browser requires a DOM snapshot before each interaction with a page element.
For example, to fill two text fields, the process is as follows:
-
Take a snapshot to retrieve the
refof the first text field. -
Fill the first text field using the retrieved
ref. -
Take a new snapshot to retrieve the
refof the second text field. -
Fill the second text field using the retrieved
ref.
This approach, which requires a snapshot for each interaction, results in a significant overhead and considerably slows down the automation process.
Proposed Optimization:
To optimize this process, it would be preferable to allow the use of a single snapshot for multiple interactions. Instead of retrieving a ref each time, the system should be able to:
-
Take a single snapshot of the DOM.
-
Retrieve all the necessary
refs from this single snapshot. -
Perform all the required interactions (e.g., filling multiple text fields) using the
refs from the single snapshot.
Expected Benefits:
-
Significant reduction in workload: By taking a single snapshot instead of multiple ones, the number of necessary operations is considerably reduced.
-
Improved execution speed: Fewer snapshots translate to faster execution of automation tasks.
-
Simplified process: A single snapshot simplifies the logic and reduces code complexity.