-
-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend crawl log streaming API: Stream logs from WACZ files after crawl concludes #669
Comments
First pass is implemented in #682. We'll want to move to properly streaming logs, currently blocked by aio-libs/aiobotocore#991 |
Until the aiobotocore is resolved, we may be able to use the sync download option, since we've already implemented this to support collection downloads via https://github.com/webrecorder/browsertrix-cloud/blob/main/backend/btrixcloud/storages.py#L358 |
Implemented as a sync stream in #1168 . Closing for now, though we may eventually want to make this async. |
Still seems to be a memory issue, looking into it. Could just fetch from presigned URLs |
@Chickensoupwithrice In your court now if you want to try to figure this out :) |
Alright, after much experimenting I've managed to nail down exactly where we're no longer doing generators and instead load up all the logs. It's in the way we're calling Switching Still investigating. |
Sub-task for #796
Once a crawl is finished, the API endpoint should stream logs from all of the WACZ files created by the crawl.
The text was updated successfully, but these errors were encountered: