Skip to content

Resolve Time Limit Related Issues (Total vs Page vs Behavior Time Limits) #321

@ikreymer

Description

@ikreymer

The time limit option in the UI may not actually be working as intended. However, there are some other associated issues regarding time limit.

Total Crawl Time Limit

  • Ensure crawl time limit works as expected

Per-Page Behavior Time Limit

There is also the per-page behavior time limit, how long a behavior should run for. This is most useful for sites that have custom behaviors, eg. social media sites with potentially infinite scroll.

Should this be a separate 'behavior time limit' setting? Of course, for single page crawls, user might be surprised about multiple time limit settings.

Overall per-page time limit

There are also timeouts for how long to wait on any page, including network loading. The behavior time limit should be <= total page time limit, but does it make sense to expose this, or should it be set automatically?

Use Cases

A lot of this matters for the time of crawling. For example, here are two distinct use cases where the behavior timeouts might be configured lower or higher:

  • For crawling a full site, the behavior time can be pretty low (probably just auto-scrolling), as there may be a lot of pages, but not spending too much time on each page.
  • For crawling one or more social media feeds, there's probably few or only one page, but want to take time capturing as much of the social media feed as possible.

Tasks

Metadata

Metadata

Assignees

Labels

back endRequires back end dev workfeature designThis issue tracks smaller sub issues that compose a feature

Type

No type

Projects

Status

Done!

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions