New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
import: Import from gitter. #9569
Conversation
3c37683
to
ad7d8c3
Compare
895b09c
to
9138063
Compare
zerver/lib/gitter_import.py
Outdated
zerver_subscription: List[ZerverFieldsT], | ||
user_map: Dict[str, int]) -> ZerverFieldsT: | ||
user_map: Dict[str, int], chunk_size: int=800) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chunk_size
used in export.py is 1000. But when I used the chunk_size
as 1000
with the dataset of 15493 messages and 273 users, I had a successful gitter to zulip data conversion, but while importing, I received a Memory error. Working with 800 as the chunk_size
didn't have any issue. I think it would be good if we decide upon a proper chunk_size
.
123f0fc
to
c8254e4
Compare
b5873bc
to
1950200
Compare
The build seems to be failing because of a test flake. |
This is the current sample dataset: https://s3.amazonaws.com/custodian-gitter/capitalone-cloud-custodian.json |
Messages can be bulky, and storing them in a single data structure can cause a memory error. In this commit, the messages are written to a file batch-wise, thus avoiding the memory error. Similar to commit 6b7b6b3
The gitter mentions are in the format '@usermention' and the mentions are included in the export data as: "mentions": [ { "screenName": "usermention", "userId": "54d7876c15522ed4b3dbbefb", "userIds": [] }] We extract this data and map this mention to @**usermention** for Zulip.
I will see if I can do this by the end of the week(end). |
@rheaparekh There should be additional caveats in the documentation that 1. Gitter markdown 2. issue mentions haven't been mapped yet. Other than those points, LGTM for the rest of the commits. |
I think I will add another PR to add common import functions, so that it'll help both slack and gitter importer (and any other future imports). |
@timabbott This should be ready for a final review. |
This is good enough for a preliminary merge (since it does work and is a lot better than nothing), but I don't want to advertise this until we've cleaned it up a bit more. @rheaparekh here's the main things we'll need to adjust here before we advertise this feature:
(without newlines at start/end), we should treat that as a code block in the import.
I think these could probably be burned through pretty quickly, so let's try to focus on them. |
docs/production/maintain-secure-upgrade.md has docs on |
@timabbott @rht thankyou for the reviews! I'll get started on the followups. |
This successfully imports gitter data to Zulip.
Things done:
Mapping users, stream, recipients, subscriptions, messages, avatars, added Management command
convert_gitter_data
, added basic tests, basic documentation, support user mentions.Things to do:
Improve documentation, See how markdown conversion can be improved after feedback.
To Test:
gitter.json
../manage.py convert_gitter_data gitter.json --output gitter_data
to get the converted file../manage.py import 'test-gitter-import' gitter_data
to import. This will create a realm with the nametest-gitter-import
.