Backend crawl log streaming API: Stream logs from WACZ files after crawl concludes #669

tw4l · 2023-03-03T20:56:05Z

Sub-task for #796

Once a crawl is finished, the API endpoint should stream logs from all of the WACZ files created by the crawl.

tw4l · 2023-04-19T21:25:36Z

First pass is implemented in #682.

We'll want to move to properly streaming logs, currently blocked by aio-libs/aiobotocore#991

ikreymer · 2023-08-14T19:47:20Z

Until the aiobotocore is resolved, we may be able to use the sync download option, since we've already implemented this to support collection downloads via https://github.com/webrecorder/browsertrix-cloud/blob/main/backend/btrixcloud/storages.py#L358

tw4l · 2023-09-15T14:36:31Z

Implemented as a sync stream in #1168 . Closing for now, though we may eventually want to make this async.

tw4l · 2023-09-20T20:11:16Z

Still seems to be a memory issue, looking into it.

Could just fetch from presigned URLs

tw4l · 2023-09-26T21:25:57Z

@Chickensoupwithrice In your court now if you want to try to figure this out :)

Chickensoupwithrice · 2023-09-27T18:53:57Z

Alright, after much experimenting I've managed to nail down exactly where we're no longer doing generators and instead load up all the logs. It's in the way we're calling stream_log_bytes_as_line_dicts (which by itself does return a generator) but instead we're extending an array by the output of the generator leading to loading the entire log file into memory and then run out of memory.

Switching extend to append does mean we're generators all the way down, but then when I try to execute on this generator, I get read timeouts on the DO space?

Still investigating.

tw4l changed the title ~~Stream logs from WACZ files after crawl concludes~~ Logging API: Stream logs from WACZ files after crawl concludes Mar 3, 2023

tw4l changed the title ~~Logging API: Stream logs from WACZ files after crawl concludes~~ Backend crawl log streaming API: Stream logs from WACZ files after crawl concludes Mar 3, 2023

tw4l mentioned this issue Mar 3, 2023

Crawl Logging WACZ Streaming #631

Closed

tw4l added the back end Requires back end dev work label Mar 3, 2023

tw4l self-assigned this Mar 3, 2023

This was referenced Mar 3, 2023

Crawl Error Viewer (initial log viewer v0 implementation) #330

Closed

Add crawl /log API endpoint to stream crawler logs #682

Merged

tw4l mentioned this issue Mar 13, 2023

Suppress ioredis Unhandled error event messages in crawler pods #703

Closed

tw4l assigned ikreymer Mar 24, 2023

SuaYoo mentioned this issue Apr 24, 2023

Stream Crawl Logs on Crawl Detail Page #142

Closed

3 tasks

Shrinks99 mentioned this issue Apr 25, 2023

Log Viewer v1 #796

Open

3 tasks

ikreymer mentioned this issue Aug 14, 2023

Add Download button for downloading logs. #1072

Closed

ikreymer assigned Chickensoupwithrice Aug 14, 2023

tw4l unassigned Chickensoupwithrice Sep 7, 2023

tw4l mentioned this issue Sep 12, 2023

Implement sync streaming for finished crawl logs #1168

Merged

tw4l closed this as completed Sep 15, 2023

tw4l reopened this Sep 20, 2023

tw4l assigned Chickensoupwithrice Sep 26, 2023

Chickensoupwithrice mentioned this issue Sep 27, 2023

Fix: Stream log downloading from WACZ #1225

Merged

ikreymer closed this as completed in #1225 Sep 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backend crawl log streaming API: Stream logs from WACZ files after crawl concludes #669

Backend crawl log streaming API: Stream logs from WACZ files after crawl concludes #669

tw4l commented Mar 3, 2023 •

edited by Shrinks99

Loading

tw4l commented Apr 19, 2023

ikreymer commented Aug 14, 2023

tw4l commented Sep 15, 2023

tw4l commented Sep 20, 2023

tw4l commented Sep 26, 2023

Chickensoupwithrice commented Sep 27, 2023

Backend crawl log streaming API: Stream logs from WACZ files after crawl concludes #669

Backend crawl log streaming API: Stream logs from WACZ files after crawl concludes #669

Comments

tw4l commented Mar 3, 2023 • edited by Shrinks99 Loading

tw4l commented Apr 19, 2023

ikreymer commented Aug 14, 2023

tw4l commented Sep 15, 2023

tw4l commented Sep 20, 2023

tw4l commented Sep 26, 2023

Chickensoupwithrice commented Sep 27, 2023

tw4l commented Mar 3, 2023 •

edited by Shrinks99

Loading