Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Webapp feature for public data export #11996

Closed
wants to merge 7 commits into from

Conversation

whoodes
Copy link
Collaborator

@whoodes whoodes commented Mar 27, 2019

What is this PR for

Fixes: #11930, which adds a public only realm export feature for the webapp.

Demo of tentative UI:
demo

Commits

Previous commits

  • (merged): This commit serves as the first step in supporting "public export" as a
    webapp feature. The refactoring was done as a means to allow calling
    the export logic from elsewhere in the codebase.

  • (merged): This allows for removal of the local tarball upon a successful s3 export.

  • (merged): An endpoint was created in zerver/views. Basic rate-limiting was
    implemented using RealmAuditLog. The event is then published to the deferred_work queue processor to prevent the export process from being killed after 60s. Finally, upon completion of the export the realm admin(s) are notified via send_event.

  • (Plus a handful of clean up commits).

Complete (re-posting of Tim's comment below):

  • We need to add LOCAL_UPLOADS_DIR support to the --upload-to-s3 option. First, a commit should rename it to --upload since it's no longer going to be just S3. And then a commit that changes export_realm_wrapper to support upload via the LOCAL_UPLOADS_DIR backend (likely, this will be just a shutil.copyfile call with the right construction), and then we can remove the AssertionError and fill out the test for the LOCAL_UPLOADS_DIR backend.

  • There are several TODOs in test_realm_export.py that need to be completed.

  • We probably should arrange to store the ID of the RealmAuditLog row created in public_only_realm_export in the event sent to the deferred_work queue, and then have the queue processor add the public_url to the RealmAuditLog event in the extra_data field. This will enable support for deleting the uploaded exports later once they've been accessed.

zerver/views/public_export.py Outdated Show resolved Hide resolved
zerver/views/public_export.py Outdated Show resolved Hide resolved
zerver/views/public_export.py Outdated Show resolved Hide resolved
zerver/views/public_export.py Outdated Show resolved Hide resolved
zerver/views/public_export.py Outdated Show resolved Hide resolved
zerver/views/public_export.py Outdated Show resolved Hide resolved
@hackerkid
Copy link
Member

hackerkid commented Mar 27, 2019

@whoodes I went through the code quickly. I have commented how to use queue system. Let me know in CZO if there is something that you did't understand. Also posted a bunch of minor comments. We need to add tests for this as well once you are done so that we know this is working correctly as expected.

@whoodes whoodes force-pushed the webapp-public-data-export branch 5 times, most recently from 02d9cdc to 578a625 Compare April 2, 2019 00:48
@timabbott
Copy link
Sponsor Member

@whoodes thanks for working on this! I have a few proposed changes:

  • Let's move the function you put in zerver/lib/export_realm.py in the existing zerver/lib/export.py. And I think the name isn't ideal; how about export_realm_wrapper ? It's a bit tricky to name this given the existing do_export_realm, but I think that's clear enough.
  • With that change, I think the first two commits are safe to merge.
  • For the third commit, add @require_realm_admin on the view function, and test that it's only usable by organization administrators.
  • I'll have more suggestions, but this is a good start.

zerver/models.py Outdated Show resolved Hide resolved
@timabbott
Copy link
Sponsor Member

Testing for the admin notification event.

This should be tested in test_events.py similar to our other tests, though it'll be a simple one (since the event is a no-op).

@whoodes
Copy link
Collaborator Author

whoodes commented Apr 7, 2019

Thanks for reviewing @timabbott, I addressed the current batch of comments. I made some additional updates based upon working on the frontend piece of this as well. A couple questions:

  • Do we want to send a json_error if the s3 backend isn't configured?
  • I changed the admins to be notified via a PM from notification bot, curious as to your thoughts on this approach (or how you would prefer notifying admins).

Looking forward to it.

@whoodes
Copy link
Collaborator Author

whoodes commented Apr 7, 2019

(Just rebased from master.)

@whoodes

This comment has been minimized.

@whoodes whoodes force-pushed the webapp-public-data-export branch 2 times, most recently from e04c01c to 430bfcc Compare April 12, 2019 10:08
@timabbott
Copy link
Sponsor Member

I merged the first couple commits, since they looked complete. Can you post the precise threading error traceback you got? I'll need that to help figure out what's going on.

Do we want to send a json_error if the s3 backend isn't configured?

json_error is for situations which are a user error (HTTP code 4xx), not for system configuration problems (5xx). I think what we need to do instead is extend the upload logic to support using the "local file uploads" backend as well (where we effectively just copy the file into an appropriate path under LOCAL_UPLOADS_DIR).

zproject/urls.py Outdated Show resolved Hide resolved
@whoodes whoodes force-pushed the webapp-public-data-export branch 3 times, most recently from f3eaa25 to b597b33 Compare August 9, 2019 19:30
@@ -1709,7 +1709,7 @@ def get_realm_exports_serialized(user: UserProfile) -> List[Dict[str, Any]]:
for export in all_exports:
exports_dict[export.id] = dict(
id=export.id,
event_time=export.event_time.ctime(),
export_time=ujson.dumps(export.event_time),
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand -- why do we need JSON to send a timestamp?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at this incorrectly. I realize now that using dumps was just a weird way of converting the datetime object to a timestamp. Fixed.

# However, mypy throws: "AnyStr" of "loads" cannot be "Optional[str]`
# if we don't check the None case.
export_extra_data = export.extra_data
if export_extra_data is not None:
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do assert export_extra_data is not None then, not add a useless and confusing if statement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh of course, that should have been obvious. Fixed.

@@ -223,6 +223,7 @@ class OpenAPIArgumentsTest(ZulipTestCase):
# test because of the variable name mismatch. So really, it's more of a buggy endpoint.
'/user_groups/{group_id}', # Equivalent of what's in urls.py
'/user_groups/{user_group_id}', # What's in the OpenAPI docs
'/export/realm/{export_id}',
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this earlier., it's not related to the use_groups block that has comments.

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, should be next to /export/realm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, silly mistake. Fixed.

realm=user.realm,
event_type="realm_exported")
except RealmAuditLog.DoesNotExist:
return json_error(_("The export doesn't exist"))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Invalid data export ID" would be a more standard way to write this.

return json_error(_("The export doesn't exist"))

if ujson.loads(export.extra_data).get('deleted_timestamp') is not None:
return json_error(_("Export already deleted"))
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split the JSON load on a separate line, and do an if 'deleted_timestamp' in export_data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@require_realm_admin
def delete_realm_export(request: HttpRequest, user: UserProfile, export_id: int) -> HttpResponse:
try:
export = RealmAuditLog.objects.get(id=export_id,
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Call this local variable audit_log_entry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, also updated the tests to use this naming.

@whoodes whoodes force-pushed the webapp-public-data-export branch 5 times, most recently from b002504 to eb1f309 Compare August 11, 2019 21:08
Wyatt Hoodes added 7 commits August 12, 2019 07:58
The time of the event was incorrectly being sent
as a datetime object.
We don't want to add a `deleted_timestamp` key until
the export is actually deleted.
* Added comments.
* Refactored `export_object` to `audit_log_entry`.
* Cleaned up white space.
* Removed uses of `getattr`.
This commit serves as the frontend piece for the "public export" webapp
feature.

Fixes: zulip#11930
@whoodes
Copy link
Collaborator Author

whoodes commented Aug 12, 2019

@timabbott I addressed the batch of comments. I also split up some clean up commits that should be individually mergeable. Namely:

  • Fixing the timestamp thing.
  • Changing the extra_data key in export_dict to have the more consistent, and less ambiguous name export_data.
  • A general clean up of test_realm_export

I also removed deleted_timestamp from being added in queue_processors. Instead the key, value pair is only added upon deletion.

@timabbott
Copy link
Sponsor Member

I merged all the backend commits. I then added 8ac8247 and the commit before it to make sure we don't accidentally trigger a DoS (limits are probably conservative) before this has gotten more heavy use. @rishig you should look at the error message I set for "organization too large"; I think it should be good.

Thanks for making this happen @whoodes! @rishig we should now be able to update the documentation for data exports to say one can do "public export" self-service. Not sure if we also need something on /features or what.

I'll open a few technical follow-up issues.

@timabbott
Copy link
Sponsor Member

#12992 is the main follow-up issue for this. We also have some technical infrastructure work to do.

@andersk
Copy link
Member

andersk commented Aug 13, 2019

Also your limit commit failed CI.

@whoodes whoodes deleted the webapp-public-data-export branch August 13, 2019 02:20
@whoodes
Copy link
Collaborator Author

whoodes commented Aug 13, 2019

Added tests for that in #12993

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add webapp feature to trigger a public-data-only export
8 participants