Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow worker strategies to receive import options #321

Merged
merged 1 commit into from Jan 28, 2016

Conversation

dnd
Copy link
Contributor

@dnd dnd commented Jan 27, 2016

This pull allows the Sidekiq, Resque, and ActiveJob strategy workers to be called with an options hash to pass to #import!. It makes it easy to completely rebuild an index with a different suffix in the background.

@pyromaniac
Copy link
Contributor

Could you provide usage examples please?

@dnd
Copy link
Contributor Author

dnd commented Jan 28, 2016

Here in the issue for justification? Or do you want examples in the worker comments, or README?

@pyromaniac
Copy link
Contributor

Well, I'm just curious :) Simply leave it here

@dnd
Copy link
Contributor Author

dnd commented Jan 28, 2016

I use it like his to allow me do my large imports in the background instead of in a console session.

User.where(status: :active).find_in_batches do |users|   
  Chewy::Strategy::Sidekiq::Worker.perform_async("UsersIndex::User", users.map(&:id), suffix: '20160125')
end

Once all my workers are done I just switch the alias to point to the new suffix. Exactly like what the builtin import in rake does, except in the background.

@pyromaniac
Copy link
Contributor

Hm, interesting, got it. Is it much faster?

pyromaniac added a commit that referenced this pull request Jan 28, 2016
Allow worker strategies to receive import options
@pyromaniac pyromaniac merged commit 611348e into toptal:master Jan 28, 2016
@dnd
Copy link
Contributor Author

dnd commented Jan 28, 2016

Pretty much as fast as I want to make it since I can control how many workers are available to process the jobs.I have an index that previously took around 17 hours to reindex that I have now reindexed in as fast as 15-20 minutes. Also, I don't have to worry about running in screen in case the ssh session gets interrupted and have to try and figure out where to restart the import.

@pyromaniac
Copy link
Contributor

Awesome, thanks!

@pyromaniac
Copy link
Contributor

Btw, 15-20 minutes of queueing? Or complete indexing?

@dnd
Copy link
Contributor Author

dnd commented Jan 28, 2016

complete indexing. I just throw a few dozen workers at it.

On Thu, Jan 28, 2016 at 9:08 AM, Arkadiy Zabazhanov <
notifications@github.com> wrote:

Btw, 15-20 minutes of queueing? Or complete indexing?


Reply to this email directly or view it on GitHub
#321 (comment).

@pyromaniac
Copy link
Contributor

Perfect, thanks again

@dang-hoang-hieu
Copy link

@dnd @pyromaniac
Hi, I tried to use Sidekiq, have many workers (threads) but it didnt speed up my import
I have to do another workaround by spawning processes (say 6 process for 4 cores cpu)
then, It can only speed up ~5x
when processes running, mysql take few seconds to fetch records but It seems has bottle neck when chewy build and push index to ES :(

@pyromaniac
Copy link
Contributor

Well, there are two possible causes: AR objects instantiation and document composing.

For the first case it is possible to use plain SQL query or so to get data in hashes. It is probably possible since chewy supports plain objects. Also AR 4 has a bug which makes has_many :through and probably other associations types preloading painfully slow. Here Crutches™ will definitely help.

As for document composing - Witchcraft™ will help to speed it up. In synthetic tests it gave me up to 80% speed-up. With heavy heavy associations preloading in real app it gives 20% speed-up, so it depends and you have to try.

@dang-hoang-hieu
Copy link

@pyromaniac
oki I am reading about Witchcraft :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants