Slack importer threading #9097

rheaparekh · 2018-04-15T14:21:47Z

Using Zulip's run_parallel method to thread downloads.

Test:
./manage.py convert_slack_data <slack_zip_data> --token <token> --output <output_dir> --threads <threads>

Use Zulip's run_parallel method to run thread downloads.

rheaparekh · 2018-04-15T14:30:50Z

@timabbott Can you test this out?

timabbott · 2018-04-16T02:41:40Z

zerver/lib/slack_data_to_zulip_data.py


        avatar['path'] = image_path
        avatar['s3_path'] = image_path
-        avatar['size'] = image_size


It looks like the size piece her got lost; is that intentional?

@timabbott I removed it as I saw that it is not really required in the import process. See the function import_uploads_local.

timabbott · 2018-04-16T02:43:34Z

This works great (in testing, I had a 1.5-6x improvement in total import speed with this). Not sure why the results varied so much when I tested a few times. I merged this since it'll make basically every other aspect of Slack import development a lot nicer, but posted one comment above on the size attribute. I didn't see any evidence that we actually read the size field on the import size, so I don't think it's a big issue, but we should make sure we're making a sensible decision here. At the very least, I think it deserves a comment.

rheaparekh · 2018-04-16T14:46:43Z

@timabbott As size is not being used anywhere in the import, I removed it. I'll add a comment for it.

rht · 2018-04-17T13:23:05Z

As with the variation in the total import speed, could it be that the download rate after several repeated downloads is throttled by AWS?

timabbott · 2018-04-17T18:15:12Z

Or my local network. I don't think that's interesting; what's interesting is just that the parallelism does help.

zulipbot added the size: XL label Apr 15, 2018

rheaparekh added 4 commits April 15, 2018 19:51

slack importer: Thread attachment downloads.

8a291d0

Use Zulip's run_parallel method to run thread downloads.

slack importer: Thread emoji downloads.

ebc2ee2

slack importer: Thread avatar downloads.

7c0c393

slack import: Implement threading as a management command.

f6b6aa1

timabbott reviewed Apr 16, 2018

View reviewed changes

timabbott merged commit f6b6aa1 into zulip:master Apr 16, 2018

rheaparekh deleted the slack_importer_threading branch April 16, 2018 20:38

rheaparekh mentioned this pull request Apr 16, 2018

Slack import: update docs and add a comment. #9110

Closed

rht mentioned this pull request Dec 18, 2018

import: Improve data import performance with a very large number of users #11009

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slack importer threading #9097

Slack importer threading #9097

rheaparekh commented Apr 15, 2018

rheaparekh commented Apr 15, 2018

timabbott Apr 16, 2018

rheaparekh Apr 16, 2018 •

edited

timabbott commented Apr 16, 2018

rheaparekh commented Apr 16, 2018

rht commented Apr 17, 2018

timabbott commented Apr 17, 2018

Slack importer threading #9097

Slack importer threading #9097

Conversation

rheaparekh commented Apr 15, 2018

rheaparekh commented Apr 15, 2018

timabbott Apr 16, 2018

Choose a reason for hiding this comment

rheaparekh Apr 16, 2018 • edited

Choose a reason for hiding this comment

timabbott commented Apr 16, 2018

rheaparekh commented Apr 16, 2018

rht commented Apr 17, 2018

timabbott commented Apr 17, 2018

rheaparekh Apr 16, 2018 •

edited