Skip to content

[Bug]: Deleting a canceled crawl that was paused makes org storage stats inaccurate #2991

@tw4l

Description

@tw4l

Browsertrix Version

v1.19.1 (and previous versions since pausing was introduced)

What did you expect to happen? What happened instead?

I expect that if I cancel a paused crawl (or a previously paused crawl), that the WACZs uploaded during pausing will be deleted from s3 storage (which is happening correctly), and that the crawl object's files should be delete and fileSize and fileCount should both be reset to 0 (which is not currently happening).

When deleting that canceled crawl, there should then be no effect on the org's bytesStored and bytesStoredCrawls, because there are no files left to delete. Currently, however, the canceled crawl's fileSize is decremented from the org, which results in the org's bytesStored and bytesStoredCrawls to become inaccurate.

Reproduction instructions

  1. Spin up a fresh local deployment
  2. Run a crawl and pause it after a few pages have been successfully crawled
  3. Cancel the crawl while it's paused
  4. Verify that the WACZ uploaded during crawling is deleted from s3 storage, but that the crawl's files, fileSize and fileCount are not updated in the database, meaning they're now inaccurate
  5. Delete the canceled crawl from the workflow's crawls list
  6. Verify the org's bytesStored and bytesStoredCrawls are now negative

Screenshots / Video

Image

Environment

No response

Additional details

No response

Metadata

Metadata

Assignees

Labels

back endRequires back end dev workbugSomething isn't working

Type

Projects

Status

Done!

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions