Regression fix: Update failed crawl database object after deleting files #2993

tw4l · 2025-11-17T21:50:13Z

After deleting files (e.g. WACZs uploaded while a crawl was paused) for canceled or otherwise failed crawls, ensure we also update the crawl database object.

This fixes a regression introduced by crawl pausing, which resulted in org storage numbers being incorrect when later deleting the canceled crawl as a consequence of the crawl files not having been deleted from the database at the same time as they were deleted from storage.

It also renames the basecrawls delete_crawl_files method to delete_failed_crawl_files to make purpose clearer, as it is only used by the operator and should only be used for failed crawls (when deleting successful crawls, there are other workflow- and org-related updates that are handled by other codepaths).

Testing

Spin up local instance
Run a crawl
Pause the crawl
Cancel the crawl while it's paused
Verify the crawl's files, fileSize, and fileCount are reset in the database in addition to the crawl files having been deleted from the configured s3 storage
Delete the canceled crawl from the workflow crawl list
Verify the org's bytesStored and bytesStoredCrawls are now 0 and not negative as before

After deleting files (e.g. WACZs uploaded while a crawl was paused) for canceled or otherwise failed crawls, ensure we also update the crawl database object. This fixes a regression introduced by crawl pausing, which resulted in org storage numbers being incorrect when later deleting the canceled crawl, because its files wer enot removed from the database at the same time as they were deleted from storage. It also renames the basecrawls method to make purpose clearer, as it is only used by the operator and should only be used for failed crawls.

ikreymer · 2025-11-18T01:55:51Z

Good catch! I think its that for successful crawls, these values should never be reset once incremented, but for failed crawls, they need to be reset to 0.

ikreymer · 2025-11-18T03:01:20Z

It may be useful to store the size of crawl before it was cancelled, but perhaps we can put that elsewhere to avoid more confusion with actual crawl size.

tw4l requested a review from ikreymer November 17, 2025 21:50

tw4l changed the title ~~Update file crawl db object after deleting files~~ Regression fix: Update failed crawl database object after deleting files Nov 17, 2025

ikreymer approved these changes Nov 18, 2025

View reviewed changes

ikreymer merged commit 2725686 into main Nov 18, 2025
24 checks passed

ikreymer deleted the issue-2991-cancel-paused-crawl-storage-bug branch November 18, 2025 03:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Regression fix: Update failed crawl database object after deleting files #2993

Regression fix: Update failed crawl database object after deleting files #2993

Uh oh!

tw4l commented Nov 17, 2025

Uh oh!

ikreymer commented Nov 18, 2025

Uh oh!

ikreymer commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Regression fix: Update failed crawl database object after deleting files #2993

Regression fix: Update failed crawl database object after deleting files #2993

Uh oh!

Conversation

tw4l commented Nov 17, 2025

Testing

Uh oh!

ikreymer commented Nov 18, 2025

Uh oh!

ikreymer commented Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants