Track bytes stored per file type and include in org metrics #1207

tw4l · 2023-09-20T22:04:01Z

The org now tracks bytesStored by type of crawl, uploads, and browser profiles in addition to the total, and returns these values in the org metrics endpoint. A migration is added to precompute these values in existing deployments. In addition, all /metrics storage values are now returned solely as bytes, as the GB form wasn't being used in the frontend and is unnecessary.

backend/btrixcloud/orgs.py

backend/btrixcloud/basecrawls.py

- Remove redundant add_crawl_files_to_org_bytes_stored method - Handle bulk deletes by type - Fix bug where it was always assumed only one crawl was deleted per cid and size was not tracked per cid - Combine inc_org_bytes_stored into single query

backend/btrixcloud/migrations/migration_0017_storage_by_type.py

Chickensoupwithrice

LGTM!

ikreymer · 2023-09-22T04:11:29Z

backend/btrixcloud/basecrawls.py

+
+        deleted_count = 0
+
+        crawls_to_delete = await self._filter_delete_list_by_type(


This all works, but there's a more optimal way without having to do the filtering by type here actually.
Since the delete function already calls get_crawl_raw, we have the type available there, and it can just bin the crawls by type there.

See other comment - now splitting in one pass, but for now I think there are advantages to keeping delete_crawls to one type at a time.

ikreymer · 2023-09-22T04:14:15Z

backend/btrixcloud/basecrawls.py


        for crawl_id in delete_list.crawl_ids:
            crawl = await self.get_crawl_raw(crawl_id, org)
-            size += await self._delete_crawl_files(crawl, org)


Can filter by crawl type here, since we have the type available. Files are deleted for all crawls, but then can do:

deleted_size = await self._delete_crawl_files(crawl, org) if crawl.type == "crawl": crawl_size += delete_size else: upload_size += delete_size

Actually, it looks like we're not even checking the type before deleting crawl files here!
Should probably do:

if type_ and crawl.type != type_: continue

This is actually a bug in the current version as well, where it'll delete the files regardless of type, but not actual archived item object!

Good catch on the missing type check! That's now been added.

I think there are advantages to handling the complexity of deleting multiple types in delete_crawls_all_types rather than delete_crawls. Everywhere else in our app that we are deleting content other than through the /all-crawls delete endpoint, we're handling one type at a time. And having delete_crawls handle only one type at a time keeps things a bit simpler, e.g. letting inc_org_bytes_storedonly need to worry about one archived item type at a time.

I pushed a change to delete_crawls_all_types to split the delete list into crawls and uploads arrays in one pass rather than two, which should help.

There are other improvements to be made to the delete endpoints that I'm planning on doing in a separate pass in this sprint (see #1208), so I'd expect some of this to change and get optimized there, but I don't want to hold up this PR for something that's creeping a bit out of scope just because I discovered some bugs 😅 If that sounds reasonable to you!

ikreymer

On closer look, maybe we should fix at least the type checking bug before merging, so it doesn't bite us in the future..

tw4l · 2023-09-22T16:01:00Z

On closer look, maybe we should fix at least the type checking bug before merging, so it doesn't bite us in the future..

Type checking bug is fixed!

tw4l added 4 commits September 20, 2023 17:54

Ignore duplicate code in migrations

b51af26

Update tests and drop Bytes from type values in metrics

934d7e7

Inc bytesStoredCrawls after crawl files stored

5783d26

tw4l requested review from Chickensoupwithrice, SuaYoo and ikreymer September 20, 2023 22:08

Update OrgMetrics model for new field names

f30b4e2

ikreymer reviewed Sep 20, 2023

View reviewed changes

backend/btrixcloud/orgs.py Outdated Show resolved Hide resolved

ikreymer reviewed Sep 20, 2023

View reviewed changes

backend/btrixcloud/orgs.py Show resolved Hide resolved

ikreymer reviewed Sep 20, 2023

View reviewed changes

backend/btrixcloud/basecrawls.py Outdated Show resolved Hide resolved

tw4l added 2 commits September 20, 2023 18:58

Fix test

3c62da3

Make code review revisions

9a507bd

- Remove redundant add_crawl_files_to_org_bytes_stored method - Handle bulk deletes by type - Fix bug where it was always assumed only one crawl was deleted per cid and size was not tracked per cid - Combine inc_org_bytes_stored into single query

tw4l force-pushed the issue-1206-crawl-upload-size-metrics branch from 6e86811 to 9a507bd Compare September 21, 2023 15:35

tw4l added 5 commits September 21, 2023 11:55

Fixup

fa4d78a

Make fixes to ensure workflow size is correct after deletion

121a312

Extend tests to test mixed type delete with stats

b5f073f

Add pylint disable comment

b305b72

Fix test fixture

fa55ebd

Chickensoupwithrice reviewed Sep 21, 2023

View reviewed changes

backend/btrixcloud/migrations/migration_0017_storage_by_type.py Outdated Show resolved Hide resolved

Fix migration number in docstring

08fdbda

tw4l requested a review from ikreymer September 21, 2023 19:33

Chickensoupwithrice approved these changes Sep 21, 2023

View reviewed changes

ikreymer approved these changes Sep 22, 2023

View reviewed changes

ikreymer reviewed Sep 22, 2023

View reviewed changes

ikreymer requested changes Sep 22, 2023

View reviewed changes

tw4l added 3 commits September 22, 2023 11:46

Add type check within delete_crawls

fb0616d

Split crawls and uploads in one filter pass

c40ed02

Linting

e006697

tw4l added 2 commits September 22, 2023 12:05

Add missing await and type hint

2932284

Fix getter for crawl type in type check

210843f

tw4l merged commit 094f27b into main Sep 22, 2023

tw4l deleted the issue-1206-crawl-upload-size-metrics branch September 22, 2023 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Track bytes stored per file type and include in org metrics #1207

Track bytes stored per file type and include in org metrics #1207

Uh oh!

tw4l commented Sep 20, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Chickensoupwithrice left a comment

Uh oh!

ikreymer Sep 22, 2023

Uh oh!

tw4l Sep 22, 2023

Uh oh!

ikreymer Sep 22, 2023

Uh oh!

ikreymer Sep 22, 2023

Uh oh!

tw4l Sep 22, 2023

Uh oh!

ikreymer left a comment

Uh oh!

tw4l commented Sep 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		deleted_count = 0

		crawls_to_delete = await self._filter_delete_list_by_type(

Uh oh!

Track bytes stored per file type and include in org metrics #1207

Track bytes stored per file type and include in org metrics #1207

Uh oh!

Conversation

tw4l commented Sep 20, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Chickensoupwithrice left a comment

Choose a reason for hiding this comment

Uh oh!

ikreymer Sep 22, 2023

Choose a reason for hiding this comment

Uh oh!

tw4l Sep 22, 2023

Choose a reason for hiding this comment

Uh oh!

ikreymer Sep 22, 2023

Choose a reason for hiding this comment

Uh oh!

ikreymer Sep 22, 2023

Choose a reason for hiding this comment

Uh oh!

tw4l Sep 22, 2023

Choose a reason for hiding this comment

Uh oh!

ikreymer left a comment

Choose a reason for hiding this comment

Uh oh!

tw4l commented Sep 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants