New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slack importer threading #9097
Slack importer threading #9097
Conversation
Use Zulip's run_parallel method to run thread downloads.
@timabbott Can you test this out? |
|
||
avatar['path'] = image_path | ||
avatar['s3_path'] = image_path | ||
avatar['size'] = image_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the size piece her got lost; is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@timabbott I removed it as I saw that it is not really required in the import process. See the function import_uploads_local.
This works great (in testing, I had a 1.5-6x improvement in total import speed with this). Not sure why the results varied so much when I tested a few times. I merged this since it'll make basically every other aspect of Slack import development a lot nicer, but posted one comment above on the |
@timabbott As |
As with the variation in the total import speed, could it be that the download rate after several repeated downloads is throttled by AWS? |
Or my local network. I don't think that's interesting; what's interesting is just that the parallelism does help. |
Using Zulip's run_parallel method to thread downloads.
Test:
./manage.py convert_slack_data <slack_zip_data> --token <token> --output <output_dir> --threads <threads>