Critical Errors Viewer #738

ikreymer · 2023-04-03T20:19:45Z

Separate from the full logging system (#631 and #330), we want to quickly be able to show carefully curated critical errors that user should be made aware of while a crawl is running and after. This is a subset of the full, filterable list of logs that will come from every crawler.
The idea would be to:

Use Redis to store critical errors across all crawler instances in one place
Store only the first N and last N critical errors, to keep the list manageable no matter how long crawl is running.
~~Store the list in a separate errors.log file, which can then be streamed from the WACZ.~~
Upon crawl completion/failure, store the list in mongodb so this error log is accessible even if no WACZ was produced
In browsertrix-crawler, use the logger.error and filter what that's used for now, or add a new category logger.critical that would count the error as added to the critical errors list
On the frontend, shows this as first step of a seprate 'Logs' or 'Errors' tab, which could later become the more generic 'Logs' tab with Critical Errors as one option once the rest of the logging system is implemented.

The text was updated successfully, but these errors were encountered:

ikreymer · 2023-04-03T20:57:32Z

Store the list in a separate errors.log file, which can then be streamed from the WACZ.

Instead, we should store this list in mongodb, since it'll be of limited size. This way, if the crawl has failed / was canceled, can still see the list of errors.

Thinking this error list should be either 50 or 100 lines at most.

tw4l · 2023-04-03T21:20:12Z

Upon crawl completion/failure, store the list in mongodb so this error log is accessible even if no WACZ was produced

I wasn't sure if this was necessary or if we could just be more careful about what is logged as an error in the crawler and highlight those in the UI (or add a new critical level to the crawler logs, as mentioned) but this is an important point. Until we have live-streaming logs, crawls that fail to produce a WACZ will be pretty opaque.

It seems reasonable to me that the initial Logs tab implementation could include both these critical errors and then the full logs from the WACZ if available, and I agree that up to a few dozen JSON lines per crawl seems reasonable to store in redis (short-term) and mongodb (long-term). I'm curious if this will impact how we want to think about live-streaming logs from running crawls, or if that solution may eliminate the need for some of what's scoped in this issue.

Shrinks99 · 2023-04-03T21:33:20Z

Would it be more helpful to perhaps store a list of the most common errors a workflow is encountering instead of a list of the most recent errors? If a workflow has 50 of error X and 45 of error Y it is presumably a lot more useful to know both of these things instead of 50 error X messages.

As for the rest of what is proposed here, I don't see this feature as very useful existing alongside a proper full featured log viewer. Giving users helpful information at a glance and directing them towards the log viewer if they require more details would be great and storing / caching the most common types of errors to display in places like the workflows page in a tooltip or the workflow details overview page would be really great. A less feature-rich log viewer is a reasonable first iteration if need be, but obviously we don't want to expend more time than we need for things that will be reworked later as more features come online.

ikreymer · 2023-04-05T20:14:31Z

From latest discussion:

We'll attempt to log all logger.errors into redis
When crawl complete, we'll save errors into mongodb
The /api/.../crawl/<id>/errors endpoint can be made available at all times, while crawl is running and when done or canceled.
The logger UI can be built incrementally, to first get data only from errors only endpoint, and later full logs with additional filtering.

ikreymer self-assigned this Apr 3, 2023

Shrinks99 self-assigned this Apr 3, 2023

Shrinks99 added the ui/ux This issue requires UI/UX work label Apr 3, 2023

This was referenced Apr 5, 2023

Add option to log crawl errors to Redis webrecorder/browsertrix-crawler#278

Closed

Backend crawl log streaming API: Stream logs while crawl is running #670

Open

Add crawl errors endpoint #757

Merged

tw4l self-assigned this Apr 6, 2023

tw4l closed this as completed in #757 Apr 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Critical Errors Viewer #738

Critical Errors Viewer #738

ikreymer commented Apr 3, 2023 •

edited

Loading

ikreymer commented Apr 3, 2023 •

edited

Loading

tw4l commented Apr 3, 2023

Shrinks99 commented Apr 3, 2023 •

edited

Loading

ikreymer commented Apr 5, 2023

Critical Errors Viewer #738

Critical Errors Viewer #738

Comments

ikreymer commented Apr 3, 2023 • edited Loading

ikreymer commented Apr 3, 2023 • edited Loading

tw4l commented Apr 3, 2023

Shrinks99 commented Apr 3, 2023 • edited Loading

ikreymer commented Apr 5, 2023

ikreymer commented Apr 3, 2023 •

edited

Loading

ikreymer commented Apr 3, 2023 •

edited

Loading

Shrinks99 commented Apr 3, 2023 •

edited

Loading