Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamlines task in parallel using ray #1136

Merged
merged 16 commits into from
May 24, 2024

Conversation

asagilmore
Copy link
Contributor

This adds a num_chunks argument to tracking_params which causes the streamlines task to generate multiple trx files in parallel using ray, and then concatenate them at the end.

…tiple trx files in parallel and concats upon finish
@pep8speaks
Copy link

pep8speaks commented May 14, 2024

Hello @asagilmore! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 467:53: W292 no newline at end of file

Comment last updated at 2024-05-24 20:00:56 UTC

@asagilmore asagilmore marked this pull request as draft May 14, 2024 19:38
Copy link
Collaborator

@arokem arokem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we incorporate this into the example?

https://github.com/yeatmanlab/pyAFQ/blob/master/examples/tutorial_examples/plot_001_afq_api.py#L62

@36000 : any objections? We'd need to install ray on the CI.

AFQ/tasks/tractography.py Outdated Show resolved Hide resolved
@asagilmore
Copy link
Contributor Author

asagilmore commented May 16, 2024

To handle non-random seeding, where there may be less that 1 seed per chunk after splitting up the seeds for ray workers
I am running gen_seeds outside of track function and then passing the seeds as an array here:

def _gen_seeds(n_seeds, params_file, seed_mask=None, seed_threshold=0,
               thresholds_as_percentages=False,
               random_seeds=False, rng_seed=None):
    if isinstance(params_file, str):
        params_img = nib.load(params_file)
    else:
        params_img = params_file

    affine = params_img.affine

    return gen_seeds(seed_mask, seed_threshold, n_seeds,
                     thresholds_as_percentages,
                     random_seeds, rng_seed, affine)

And then using that function here:

seeds = _gen_seeds(this_tracking_params['n_seeds'],
                   params_file, **this_tracking_params)
seed_chunks = np.array_split(seeds, num_chunks)
tracking_params_list = [this_tracking_params.copy() for _
                        in range(num_chunks)]
for i in range(num_chunks):
    tracking_params_list[i]['n_seeds'] = seed_chunks[i]

This works well, but this bypasses the default arguments from track(), which I have copied to the helper function. If those are ever changed the default arugments here would not be changed and it would exhibit unexpected behavior. Is there any way to make this always match the track() default arugments?

AFQ/tasks/tractography.py Outdated Show resolved Hide resolved
@36000
Copy link
Collaborator

36000 commented May 23, 2024

From my perspective this looks good now. You can add ray in setup.cfg (

) by adding this to the end:

ray =
    ray

all =
    %(dev)s
    %(fury)s
    %(fsl)s
    %(afqbrowser)s
    %(plot)s
    %(trx)s
    %(ray)s

Then you can also try making it so the plot_001_afq_api.py example (which already uses trx) does num_chunks, so that this implementation will also be automatically tested. then, I think we can merge.

@arokem
Copy link
Collaborator

arokem commented May 23, 2024

+1

Have you had a chance to run some more experiments with this? Are you still consistently seeing pretty substantial speedup with this?

@asagilmore
Copy link
Contributor Author

+1

Have you had a chance to run some more experiments with this? Are you still consistently seeing pretty substantial speedup with this?

yes, it shows over a 25 times speedup on a 72 core machine.

image

image

@arokem
Copy link
Collaborator

arokem commented May 23, 2024

Nice! ⚡

I think that we can mark this ready to review and after you add it to the dependencies and documentation, we can go ahead and merge it. For now, we can keep this as an optional feature, but we might consider setting this as default on some future release (probably after we let users kick the tires on this some more).

@asagilmore
Copy link
Contributor Author

asagilmore commented May 24, 2024

Nice! ⚡

I think that we can mark this ready to review and after you add it to the dependencies and documentation, we can go ahead and merge it. For now, we can keep this as an optional feature, but we might consider setting this as default on some future release (probably after we let users kick the tires on this some more).

I think I have correctly added it to the documentation and dependencies, but please let me know if I have done it incorrectly I am not sure exactly where everything is supposed to be.

@arokem arokem marked this pull request as ready for review May 24, 2024 00:11
Copy link
Collaborator

@arokem arokem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments. Your last commit is now running on the CI, so we'll see how that works on the documentation build action.

examples/tutorial_examples/plot_001_afq_api.py Outdated Show resolved Hide resolved
examples/tutorial_examples/plot_001_afq_api.py Outdated Show resolved Hide resolved
setup.cfg Outdated Show resolved Hide resolved
@arokem arokem changed the title WIP: Streamlines task in parallel using ray Streamlines task in parallel using ray May 24, 2024
@arokem
Copy link
Collaborator

arokem commented May 24, 2024

I've set this to "Ready for review" and removed the "WIP" from the title of the PR, because I think we're close.

asagilmore and others added 2 commits May 23, 2024 17:23
@36000
Copy link
Collaborator

36000 commented May 24, 2024

this looks good, if the docs pass, we can merge!

@arokem arokem merged commit 7473e6e into yeatmanlab:master May 24, 2024
2 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants