Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slack importer threading #9097

Merged
merged 4 commits into from Apr 16, 2018
Merged

Conversation

rheaparekh
Copy link
Collaborator

Using Zulip's run_parallel method to thread downloads.

Test:
./manage.py convert_slack_data <slack_zip_data> --token <token> --output <output_dir> --threads <threads>

@rheaparekh
Copy link
Collaborator Author

@timabbott Can you test this out?


avatar['path'] = image_path
avatar['s3_path'] = image_path
avatar['size'] = image_size
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the size piece her got lost; is that intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timabbott I removed it as I saw that it is not really required in the import process. See the function import_uploads_local.

@timabbott timabbott merged commit f6b6aa1 into zulip:master Apr 16, 2018
@timabbott
Copy link
Sponsor Member

This works great (in testing, I had a 1.5-6x improvement in total import speed with this). Not sure why the results varied so much when I tested a few times. I merged this since it'll make basically every other aspect of Slack import development a lot nicer, but posted one comment above on the size attribute. I didn't see any evidence that we actually read the size field on the import size, so I don't think it's a big issue, but we should make sure we're making a sensible decision here. At the very least, I think it deserves a comment.

@rheaparekh
Copy link
Collaborator Author

@timabbott As size is not being used anywhere in the import, I removed it. I'll add a comment for it.

@rheaparekh rheaparekh deleted the slack_importer_threading branch April 16, 2018 20:38
@rht
Copy link
Collaborator

rht commented Apr 17, 2018

As with the variation in the total import speed, could it be that the download rate after several repeated downloads is throttled by AWS?

@timabbott
Copy link
Sponsor Member

Or my local network. I don't think that's interesting; what's interesting is just that the parallelism does help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants