- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 60
 
Track bytes stored per file type and include in org metrics #1207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The org now tracks bytesStored by type of crawl, uploads, and browser profiles in addition to the total, and returns these values in the org metrics endpoint. A migration is added to precompute these values in existing deployments. In addition, all /metrics storage values are now returned solely as bytes, as the GB form wasn't being used in the frontend and is unnecessary.
- Remove redundant add_crawl_files_to_org_bytes_stored method - Handle bulk deletes by type - Fix bug where it was always assumed only one crawl was deleted per cid and size was not tracked per cid - Combine inc_org_bytes_stored into single query
6e86811    to
    9a507bd      
    Compare
  
            
          
                backend/btrixcloud/migrations/migration_0017_storage_by_type.py
              
                Outdated
          
            Show resolved
            Hide resolved
        
      There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
        
          
                backend/btrixcloud/basecrawls.py
              
                Outdated
          
        
      | 
               | 
          ||
| deleted_count = 0 | ||
| 
               | 
          ||
| crawls_to_delete = await self._filter_delete_list_by_type( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all works, but there's a more optimal way without having to do the filtering by type here actually.
Since the delete function already calls get_crawl_raw, we have the type available there, and it can just bin the crawls by type there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment - now splitting in one pass, but for now I think there are advantages to keeping delete_crawls to one type at a time.
| 
               | 
          ||
| for crawl_id in delete_list.crawl_ids: | ||
| crawl = await self.get_crawl_raw(crawl_id, org) | ||
| size += await self._delete_crawl_files(crawl, org) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can filter by crawl type here, since we have the type available. Files are deleted for all crawls, but then can do:
deleted_size = await self._delete_crawl_files(crawl, org)
if crawl.type == "crawl":
   crawl_size += delete_size
else:
   upload_size += delete_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it looks like we're not even checking the type before deleting crawl files here!
Should probably do:
if type_ and crawl.type != type_:
   continue
This is actually a bug in the current version as well, where it'll delete the files regardless of type, but not actual archived item object!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch on the missing type check! That's now been added.
I think there are advantages to handling the complexity of deleting multiple types in delete_crawls_all_types rather than delete_crawls. Everywhere else in our app that we are deleting content other than through the /all-crawls delete endpoint, we're handling one type at a time. And having delete_crawls handle only one type at a time keeps things a bit simpler, e.g. letting inc_org_bytes_storedonly need to worry about one archived item type at a time.
I pushed a change to delete_crawls_all_types to split the delete list into crawls and uploads arrays in one pass rather than two, which should help.
There are other improvements to be made to the delete endpoints that I'm planning on doing in a separate pass in this sprint (see #1208), so I'd expect some of this to change and get optimized there, but I don't want to hold up this PR for something that's creeping a bit out of scope just because I discovered some bugs 😅 If that sounds reasonable to you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On closer look, maybe we should fix at least the type checking bug before merging, so it doesn't bite us in the future..
          
 Type checking bug is fixed!  | 
    
Fixes #1206