New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Webapp feature for public data export #11996
Conversation
@whoodes I went through the code quickly. I have commented how to use queue system. Let me know in CZO if there is something that you did't understand. Also posted a bunch of minor comments. We need to add tests for this as well once you are done so that we know this is working correctly as expected. |
02d9cdc
to
578a625
Compare
@whoodes thanks for working on this! I have a few proposed changes:
|
This should be tested in |
578a625
to
bd1ae6e
Compare
Thanks for reviewing @timabbott, I addressed the current batch of comments. I made some additional updates based upon working on the frontend piece of this as well. A couple questions:
Looking forward to it. |
bd1ae6e
to
17064cf
Compare
(Just rebased from master.) |
17064cf
to
b923294
Compare
This comment has been minimized.
This comment has been minimized.
e04c01c
to
430bfcc
Compare
I merged the first couple commits, since they looked complete. Can you post the precise threading error traceback you got? I'll need that to help figure out what's going on.
|
f3eaa25
to
b597b33
Compare
zerver/lib/export.py
Outdated
@@ -1709,7 +1709,7 @@ def get_realm_exports_serialized(user: UserProfile) -> List[Dict[str, Any]]: | |||
for export in all_exports: | |||
exports_dict[export.id] = dict( | |||
id=export.id, | |||
event_time=export.event_time.ctime(), | |||
export_time=ujson.dumps(export.event_time), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand -- why do we need JSON to send a timestamp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking at this incorrectly. I realize now that using dumps
was just a weird way of converting the datetime
object to a timestamp. Fixed.
zerver/lib/actions.py
Outdated
# However, mypy throws: "AnyStr" of "loads" cannot be "Optional[str]` | ||
# if we don't check the None case. | ||
export_extra_data = export.extra_data | ||
if export_extra_data is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do assert export_extra_data is not None
then, not add a useless and confusing if statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh of course, that should have been obvious. Fixed.
zerver/tests/test_openapi.py
Outdated
@@ -223,6 +223,7 @@ class OpenAPIArgumentsTest(ZulipTestCase): | |||
# test because of the variable name mismatch. So really, it's more of a buggy endpoint. | |||
'/user_groups/{group_id}', # Equivalent of what's in urls.py | |||
'/user_groups/{user_group_id}', # What's in the OpenAPI docs | |||
'/export/realm/{export_id}', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this earlier., it's not related to the use_groups
block that has comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, should be next to /export/realm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, silly mistake. Fixed.
zerver/views/realm_export.py
Outdated
realm=user.realm, | ||
event_type="realm_exported") | ||
except RealmAuditLog.DoesNotExist: | ||
return json_error(_("The export doesn't exist")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Invalid data export ID" would be a more standard way to write this.
return json_error(_("The export doesn't exist")) | ||
|
||
if ujson.loads(export.extra_data).get('deleted_timestamp') is not None: | ||
return json_error(_("Export already deleted")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split the JSON load on a separate line, and do an if 'deleted_timestamp' in export_data
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
zerver/views/realm_export.py
Outdated
@require_realm_admin | ||
def delete_realm_export(request: HttpRequest, user: UserProfile, export_id: int) -> HttpResponse: | ||
try: | ||
export = RealmAuditLog.objects.get(id=export_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call this local variable audit_log_entry
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, also updated the tests to use this naming.
b002504
to
eb1f309
Compare
The time of the event was incorrectly being sent as a datetime object.
We don't want to add a `deleted_timestamp` key until the export is actually deleted.
Leftover from prior debugging.
* Added comments. * Refactored `export_object` to `audit_log_entry`. * Cleaned up white space. * Removed uses of `getattr`.
This commit serves as the frontend piece for the "public export" webapp feature. Fixes: zulip#11930
eb1f309
to
9ae1b41
Compare
@timabbott I addressed the batch of comments. I also split up some clean up commits that should be individually mergeable. Namely:
I also removed |
I merged all the backend commits. I then added 8ac8247 and the commit before it to make sure we don't accidentally trigger a DoS (limits are probably conservative) before this has gotten more heavy use. @rishig you should look at the error message I set for "organization too large"; I think it should be good. Thanks for making this happen @whoodes! @rishig we should now be able to update the documentation for data exports to say one can do "public export" self-service. Not sure if we also need something on /features or what. I'll open a few technical follow-up issues. |
#12992 is the main follow-up issue for this. We also have some technical infrastructure work to do. |
Also your limit commit failed CI. |
Added tests for that in #12993 |
What is this PR for
Fixes: #11930, which adds a public only realm export feature for the webapp.
Demo of tentative UI:
Commits
Previous commits
(merged): This commit serves as the first step in supporting "public export" as a
webapp feature. The refactoring was done as a means to allow calling
the export logic from elsewhere in the codebase.
(merged): This allows for removal of the local tarball upon a successful s3 export.
(merged): An endpoint was created in zerver/views. Basic rate-limiting was
implemented using RealmAuditLog. The event is then published to the
deferred_work
queue processor to prevent the export process from being killed after 60s. Finally, upon completion of the export the realm admin(s) are notified viasend_event
.(Plus a handful of clean up commits).
Complete (re-posting of Tim's comment below):
We need to add LOCAL_UPLOADS_DIR support to the --upload-to-s3 option. First, a commit should rename it to --upload since it's no longer going to be just S3. And then a commit that changes export_realm_wrapper to support upload via the LOCAL_UPLOADS_DIR backend (likely, this will be just a shutil.copyfile call with the right construction), and then we can remove the AssertionError and fill out the test for the LOCAL_UPLOADS_DIR backend.
There are several TODOs in test_realm_export.py that need to be completed.
We probably should arrange to store the ID of the RealmAuditLog row created in public_only_realm_export in the event sent to the deferred_work queue, and then have the queue processor add the public_url to the RealmAuditLog event in the extra_data field. This will enable support for deleting the uploaded exports later once they've been accessed.