Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make files as the unit of task distribution for HoodieWriteClient.clean() #168

Closed
kaushikd49 opened this issue May 20, 2017 · 5 comments
Closed
Assignees
Projects

Comments

@kaushikd49
Copy link
Contributor

When partitions have skew (some partitions having more files than others), clean takes time since the unit of task distribution is partition. Change the unit of task distribution to
be files instead, to avoid delays due to such skews.

@vinothchandar
Copy link
Member

This needs to be done on top of #171 .. @kaushikd49 when are you planning to work on this

@prazanna prazanna added this to 05/22/2017 in Sprint May 22, 2017
@kaushikd49
Copy link
Contributor Author

@vinothchandar I am taking this up this week. I had started making some changes actually. Sure, I can make it on top of that. Any idea when that will be complete?

@vinothchandar
Copy link
Member

Conservatively, by end of week, mostly by EOD tmrw. I strongly encourage to wait for the filegroups to be in, before you go there, coz what you need is to ultimately parallelize by file groups.

@kaushikd49
Copy link
Contributor Author

Ok, that would be great. I'll wait for it to go in before.

@vinothchandar
Copy link
Member

@kaushikd49 and I chatted offline and decided to decouple the efforts..

@prazanna prazanna moved this from 05/22/2017 to 05/29/2017 in Sprint May 30, 2017
@prazanna prazanna moved this from 05/29/2017 to 06/12/2017 in Sprint Jun 20, 2017
@prazanna prazanna moved this from 06/12/2017 to 06/05/2017 in Sprint Jun 20, 2017
@vinothchandar vinothchandar moved this from 06/05/2017 to Done in Sprint Jul 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

3 participants