-
-
Notifications
You must be signed in to change notification settings - Fork 60
Description
The time limit option in the UI may not actually be working as intended. However, there are some other associated issues regarding time limit.
Total Crawl Time Limit
- Ensure crawl time limit works as expected
Per-Page Behavior Time Limit
There is also the per-page behavior time limit, how long a behavior should run for. This is most useful for sites that have custom behaviors, eg. social media sites with potentially infinite scroll.
Should this be a separate 'behavior time limit' setting? Of course, for single page crawls, user might be surprised about multiple time limit settings.
Overall per-page time limit
There are also timeouts for how long to wait on any page, including network loading. The behavior time limit should be <= total page time limit, but does it make sense to expose this, or should it be set automatically?
Use Cases
A lot of this matters for the time of crawling. For example, here are two distinct use cases where the behavior timeouts might be configured lower or higher:
- For crawling a full site, the behavior time can be pretty low (probably just auto-scrolling), as there may be a lot of pages, but not spending too much time on each page.
- For crawling one or more social media feeds, there's probably few or only one page, but want to take time capturing as much of the social media feed as possible.
Tasks
- Ensure total time limit set works #664
- Ensure that this time limit works as expected in Browsertrix Crawler, setting both the page and behavior limits
- Add default Page Time Limit value to frontend #497
- Return the default from /api/settings
- Move
Page Time Limitto the first page #485
Metadata
Metadata
Labels
Type
Projects
Status