Skip to content

Optimizing Snapshots for LLMs #343

@elieworkspace

Description

@elieworkspace
Image Image

Description:

Problem:

Currently, automating LLM interactions with the browser requires a DOM snapshot before each interaction with a page element.

For example, to fill two text fields, the process is as follows:

  1. Take a snapshot to retrieve the ref of the first text field.

  2. Fill the first text field using the retrieved ref.

  3. Take a new snapshot to retrieve the ref of the second text field.

  4. Fill the second text field using the retrieved ref.

This approach, which requires a snapshot for each interaction, results in a significant overhead and considerably slows down the automation process.

Proposed Optimization:

To optimize this process, it would be preferable to allow the use of a single snapshot for multiple interactions. Instead of retrieving a ref each time, the system should be able to:

  1. Take a single snapshot of the DOM.

  2. Retrieve all the necessary refs from this single snapshot.

  3. Perform all the required interactions (e.g., filling multiple text fields) using the refs from the single snapshot.

Expected Benefits:

  • Significant reduction in workload: By taking a single snapshot instead of multiple ones, the number of necessary operations is considerably reduced.

  • Improved execution speed: Fewer snapshots translate to faster execution of automation tasks.

  • Simplified process: A single snapshot simplifies the logic and reduces code complexity.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions