New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Allow users to export their messages from the UI #8834
base: main
Are you sure you want to change the base?
Conversation
6b60e18
to
d7819d9
Compare
static/js/settings_account.js
Outdated
e.preventDefault(); | ||
e.stopPropagation(); | ||
channel.get({ | ||
url: "/json/users/me/export-my-data", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think /export
would be a more RESTful UI.
static/js/settings_account.js
Outdated
$('#user_export_data_message').text(response.msg).addClass('show'); | ||
}, | ||
error: function(response){ | ||
// FIXME: Does error have response.msg??! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, it does. But see the ui_report
library.
</button> | ||
</div> | ||
</form> | ||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to move this into the below section and title it was "Account options."
<form class="grid"> | ||
<div class="input-group"> | ||
<button type="submit" class="button rounded w-200 input-size" id="user_export_data"> | ||
{{t 'Export my messages' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Export my Zulip message history"?
zerver/views/user_settings.py
Outdated
@@ -229,3 +230,13 @@ def change_enter_sends(request: HttpRequest, user_profile: UserProfile, | |||
enter_sends: bool=REQ(validator=check_bool)) -> HttpResponse: | |||
do_change_enter_sends(user_profile, enter_sends) | |||
return json_success() | |||
|
|||
# FIXME: Do we plan to allow exports of all messages sent by bots? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think human_users_only is reasonable for this.
zerver/worker/queue_processors.py
Outdated
user_profile = get_user_profile_by_email(email) | ||
|
||
# FIXME: Can we check if an export is currently running, or has been | ||
# started in the past hour, and deny exports? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll want to do something here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should just manage this with a new UserDataExport table.
zerver/worker/queue_processors.py
Outdated
@@ -527,3 +535,37 @@ def consume(self, event: Mapping[str, Any]) -> None: | |||
(stream, recipient, sub) = access_stream_by_id(user_profile, stream_id, | |||
require_active=False) | |||
do_mark_stream_messages_as_read(user_profile, stream) | |||
|
|||
|
|||
@assign_queue('export_data') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the deferred_work
queue; this seems like a rare enough thing that we don't need several.
zproject/urls.py
Outdated
@@ -258,6 +258,10 @@ | |||
url(r'^users/me/avatar$', rest_dispatch, | |||
{'POST': 'zerver.views.user_settings.set_avatar_backend', | |||
'DELETE': 'zerver.views.user_settings.delete_avatar_backend'}), | |||
url(r'^users/me/export-my-data$', rest_dispatch, | |||
# FIXME: Use a POST? | |||
{'GET': 'zerver.views.user_settings.export_my_data'}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be a POST; we don't want browsers to auto-retry this sort of request.
zerver/worker/queue_processors.py
Outdated
size = getsize(tarball_path) | ||
name = basename(tarball_path) | ||
print(content_type, name, size) | ||
upload_path = upload_message_image(name, size, content_type, f.read(), user_profile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is that function really called upload_message_image
? Sigh, that name is super confusing. Can you do a separate PR (or commit at the base of this PR) to rename that to upload_message_file
? That seems like a clear thing to merge first. Be sure to use good commit discipline for that refactor; check out the Zulip commit message guidelines for more details: https://zulip.readthedocs.io/en/latest/contributing/version-control.html
zerver/worker/queue_processors.py
Outdated
def consume(self, email: str) -> None: | ||
output_dir = tempfile.mkdtemp(prefix="/tmp/zulip-export-") | ||
if os.path.exists(output_dir): | ||
shutil.rmtree(output_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably just assert that it doesn't exist; that should never happen given that mkdtemp
is supposed to make us a secure, randomly generated path. Also, does mkdtemp
create the directory, in which case we should be asserting it does exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this code is something I just copied over from the export command, and your suggestions make sense. 👍
zerver/worker/queue_processors.py
Outdated
print("Finished exporting to %s; tarring" % (output_dir,)) | ||
tarball_path = output_dir.rstrip('/') + '.tar.gz' | ||
subprocess.check_call(["tar", "--strip-components=1", "-czf", tarball_path, output_dir]) | ||
print("Tarball written to %s" % (tarball_path,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be sure to delete the file after uploading.
5db4cf6
to
ecb8fbc
Compare
e796050
to
9d8f28a
Compare
@timabbott I pushed a few changes based on your initial feedback. I also implemented some tracking for a running export, but I'm not sure what the best way to handle failures during an export would be. Also, like you said, the export part could definitely use some more tests and error handling. Any advice on that would be helpful too. |
49a5294
to
15985bb
Compare
zerver/worker/queue_processors.py
Outdated
super(DeferredWorker, self).consume_wrapper(data) | ||
if data['type'] == 'export_user_messages': | ||
user_id = data['user_profile_id'] | ||
UserDataExport.export_finished(user_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be easier done as a try/except
block in the handler after extracting the function.
2fe5990
to
90cf68f
Compare
04ffd11
to
1ebb0a7
Compare
dc923a9
to
227f310
Compare
227f310
to
f7cdac2
Compare
@timabbott I've finally been able to address the suggestion to update the status of the export for the user using an event. This PR is ready for another review. |
Heads up @punchagan, we just merged some commits that conflict with the changes your made in this pull request! You can review this repository's recent commits to see where the conflicts occur. Please rebase your feature branch against the |
@punchagan It looks like Tim made several other suggestions/requests in his review - did you address those already? It would be great to get this feature! :-) |
Hey @brainwane thanks for the nudge/ping on this. This PR has become quite stale, and I will try to take a look at this again. (I seem to have some time on my hands owing to the pandemic situation). Meanwhile, this PR also doesn't really implement the exact feature you requested - dump of all the messages you sent. This PR tries to create an export of all sent and received messages. I remember @timabbott mentioned on CZO that just getting the messages you sent would be easy to do using the API. Would it be useful to look into that and write a script that fetches this data, before I jump back into this PR? 🤔 |
4ec3636
to
88b200c
Compare
Testing Plan:
GIFs or Screenshots: