Parallelizes `MDAnalysis.analysis.msd` #4896

tanishy7777 · 2025-01-20T20:54:01Z

Fixes #4676

Changes made in this Pull Request:

Added the split-apply-combine technique to parallelize the MDAnalysis.analysis.msd.EinsteinMSD
Added boilerplate fixture(s) to testsuite/analysis/conftest.py, analogous with existing ones
Added a client_EinsteinMSD, fixtures to all tests using in testsuite/MDAnalysisTests/analysis/test_msd.py, and modified the way run() method is called to run(**client_EinsteinMSD)

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

Developers certificate of origin

I certify that this contribution is covered by the LGPLv2.1+ license as defined in our LICENSE and adheres to the Developer Certificate of Origin.

📚 Documentation preview 📚: https://mdanalysis--4896.org.readthedocs.build/en/4896/

pep8speaks · 2025-01-20T20:54:08Z

Hello @tanishy7777! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2025-01-20 21:03:12 UTC

codecov · 2025-01-20T21:15:18Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.41%. Comparing base (7fb3534) to head (ba68a93).

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #4896      +/-   ##
===========================================
- Coverage    93.42%   93.41%   -0.01%     
===========================================
  Files          177      189      +12     
  Lines        21865    22945    +1080     
  Branches      3079     3079              
===========================================
+ Hits         20427    21435    +1008     
- Misses         986     1059      +73     
+ Partials       452      451       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

tanishy7777 · 2025-01-23T14:25:49Z

Just wanted to remind you that this is ready to be merged I think. Please do so at your convenience.
@RMeli @orbeckst

orbeckst · 2025-01-23T17:29:50Z

Thanks for your work. I'm currently quite busy, so might not be able to review in the next few days. Please be patient.

orbeckst · 2025-01-23T17:32:27Z

@talagayev / @marinegor can you have a look at this PR, please?

talagayev · 2025-01-25T19:02:30Z

Checked the code and ran locally, looks all good.

https://github.com/tanishy7777/mdanalysis/blob/18a2e516d914f6dc438b409b403be2a1a3429e77/testsuite/MDAnalysisTests/analysis/test_msd.py#L155

Here @tanishy7777 you could also add the **client_EinsteinMSD to cover the parallelization in test_simple_start_stop_step_all_dims and test_fft_start_stop_step_all_dims, but here I would rely on what @orbeckst suggests if it needs to have the **client_EinsteinMSD or not.

From my side it looks good, good job @tanishy7777 :)

tanishy7777 · 2025-01-25T19:34:10Z

Checked the code and ran locally, looks all good.

https://github.com/tanishy7777/mdanalysis/blob/18a2e516d914f6dc438b409b403be2a1a3429e77/testsuite/MDAnalysisTests/analysis/test_msd.py#L155

Here @tanishy7777 you could also add the **client_EinsteinMSD to cover the parallelization in test_simple_start_stop_step_all_dims and test_fft_start_stop_step_all_dims, but here I would rely on what @orbeckst suggests if it needs to have the **client_EinsteinMSD or not.

From my side it looks good, good job @tanishy7777 :)

Thanks a lot for reviewing my PR, will wait for the suggestions as you mentioned.

tanishy7777 · 2025-01-25T20:28:07Z

From my side it looks good, good job @tanishy7777 :)

Also could you please review this PR #4884 its pretty similar or tell me if it needs any more work to be done. Thanks again

talagayev · 2025-01-25T21:28:59Z

From my side it looks good, good job @tanishy7777 :)

Also could you please review this PR #4884 its pretty similar or tell me if it needs any more work to be done. Thanks again

Hey @tanishy7777, yes the PR is similar, I can take a look at it as well.

hmacdope

Blocking here, I just need to check the implementation IIRC there is a reason MSD algo itself is non-parallelisable, but may not apply if only the collection of particle positions is parallelised.

tanishy7777 · 2025-02-01T06:56:56Z

Blocking here, I just need to check the implementation IIRC there is a reason MSD algo itself is non-parallelisable, but may not apply if only the collection of particle positions is parallelised.

Like the tests were passing so I thought it has been parallized

marinegor

hi @tanishy7777, sorry for long review -- many life things got in the way.

I think you're on a good path, I mentioned few minor things in the comments.

Main action items:

move out hacky @staticmethod def f(arrays): pass out of the class
using datafiles in MDAnalysisTests (imported on top of test_msd.py, check that parallelized run produces exactly the same results as non-parallelized one (add code snippet to comments that anyone can run to check, and its results)
add this check as a test (if it's too slow, we can always mark it this way and not run by default)

Please ask if you have questions, and ping me here if I don't reply for more than 48 hours.

marinegor · 2025-02-20T00:08:34Z

package/MDAnalysis/analysis/msd.py

+    @staticmethod
+    def f(arrs):
+        pass


move this outside of the class and explicitly name it something like __noop to make sure no one uses that:

def __noop(arrs) -> None: pass

move this outside of the class and explicitly name it something like __noop to make sure no one uses that:

def __noop(arrs) -> None: pass

I tried renaming it to __noop and moving it outside the class, but this leads to NameError due to how name mangling works in python. I renamed it _noop instead, hope that isnt a problem?

we could also do something like this if we want to keep the name as __noop

from MDAnalysis.analysis import msd def _get_aggregator(self): return ResultsGroup( lookup={ "_position_array": ResultsGroup.ndarray_vstack, "msds_by_particle": msd.__noop, "timeseries": msd.__noop, } )

Not sure if this is as clean as above method though.

marinegor · 2025-02-20T00:12:49Z

testsuite/MDAnalysisTests/analysis/test_msd.py

Since there are indeed concerns that the algorithm is indeed parallelizable, I would suggest you explicitly test for that: namely, ensure that all results.<something> attributes are exactly the same, regardless of how you run your analysis. You can do this yourself first, and than later add it as one of the tests here.

I should also say that if you make a convenient way to test that, it'd be nice to have it added to other parallelization tests, to ensure additional correctness.

tanishy7777 · 2025-03-14T20:49:56Z

hi @tanishy7777, sorry for long review -- many life things got in the way.

I think you're on a good path, I mentioned few minor things in the comments.

Main action items:

move out hacky @staticmethod def f(arrays): pass out of the class

using datafiles in MDAnalysisTests (imported on top of test_msd.py, check that parallelized run produces exactly the same results as non-parallelized one (add code snippet to comments that anyone can run to check, and its results)

add this check as a test (if it's too slow, we can always mark it this way and not run by default)

Please ask if you have questions, and ping me here if I don't reply for more than 48 hours.

Hey, sorry for the late response. I had semester exams so I was quite busy the last 2 weeks. Will start working on this soon!

tanishy7777 · 2025-04-01T20:44:00Z

@marinegor

As suggested in #4896 (comment)
I have added the tests for checking the equivalence of the msd algorithm with both the serial and multiprocessing backends.

After running the tests, the results seem to be equal across both backends. Could you please review the implementation and confirm if this is correct?

Also, is this a good way of adding this test? Should I add it for all other methods that were parallelized?

I have added the following test for comparing the result.attributes
This test is passing.

Thank you!

tanishy7777 · 2025-04-11T18:32:56Z

@marinegor

As suggested in #4896 (comment) I have added the tests for checking the equivalence of the msd algorithm with both the serial and multiprocessing backends.

After running the tests, the results seem to be equal across both backends. Could you please review the implementation and confirm if this is correct?

Also, is this a good way of adding this test? Should I add it for all other methods that were parallelized?

Thank you!

@marinegor just a gentle reminder

orbeckst · 2025-04-12T20:03:26Z

@marinegor tests pass, codecov is good, linter is happy – could you please have another look?

tanishy7777 and others added 3 commits January 8, 2025 16:08

mark analysis.msd.EinsteinMSD as not parallelizable

b4bc30a

Merge branch 'MDAnalysis:develop' into parallize_msd

be100f0

Parallelizes EinsteinMSD

c45bef4

tanishy7777 mentioned this pull request Jan 20, 2025

MDAnalysis.analysis.msd: Implement parallelization or mark as unparallelizable #4676

Open

tanishy7777 added 2 commits January 21, 2025 02:30

Minor changes

3f24903

Minor changes

18a2e51

hmacdope requested changes Jan 30, 2025

View reviewed changes

marinegor requested changes Feb 20, 2025

View reviewed changes

orbeckst added Component-Analysis parallelization labels Mar 14, 2025

marinegor self-assigned this Mar 16, 2025

tanishy7777 and others added 5 commits March 24, 2025 02:18

Merge branch 'develop' into parallize_msd

c0d855a

Merge branch 'MDAnalysis:develop' into parallize_msd

bb15ea4

Moves the static method f outside the class and renamed to _noop

d9b216b

adds tests for checking that msd algorithm is parallelizable

22e7e18

Fixes formatting

644a69a

tanishy7777 added 2 commits April 2, 2025 15:11

Fixes tidynamics import error by passing fft=False

ebbcc90

Fixes black linter

ba68a93

orbeckst mentioned this pull request Apr 12, 2025

Parallelizes MDAnalysis.analysis.InterRDF and MDAnalysis.analysis.InterRDF_s #4884

Open

5 tasks

Parallelizes MDAnalysis.analysis.msd #4896

Are you sure you want to change the base?

Parallelizes MDAnalysis.analysis.msd #4896

Uh oh!

Conversation

tanishy7777 commented Jan 20, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Developers certificate of origin

Uh oh!

pep8speaks commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2025-01-20 21:03:12 UTC

Uh oh!

codecov bot commented Jan 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

tanishy7777 commented Jan 23, 2025

Uh oh!

orbeckst commented Jan 23, 2025

Uh oh!

orbeckst commented Jan 23, 2025

Uh oh!

talagayev commented Jan 25, 2025

Uh oh!

tanishy7777 commented Jan 25, 2025

Uh oh!

tanishy7777 commented Jan 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

talagayev commented Jan 25, 2025

Uh oh!

hmacdope left a comment

Choose a reason for hiding this comment

Uh oh!

tanishy7777 commented Feb 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marinegor left a comment

Choose a reason for hiding this comment

Uh oh!

marinegor Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

tanishy7777 Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marinegor Feb 20, 2025

Choose a reason for hiding this comment

Uh oh!

tanishy7777 commented Mar 14, 2025

Uh oh!

tanishy7777 commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tanishy7777 commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

orbeckst commented Apr 12, 2025

Uh oh!

Uh oh!

Parallelizes `MDAnalysis.analysis.msd` #4896

Parallelizes `MDAnalysis.analysis.msd` #4896

tanishy7777 commented Jan 20, 2025 •

edited by github-actions bot

Loading

pep8speaks commented Jan 20, 2025 •

edited

Loading

codecov bot commented Jan 20, 2025 •

edited

Loading

tanishy7777 commented Jan 25, 2025 •

edited

Loading

tanishy7777 commented Feb 1, 2025 •

edited

Loading

tanishy7777 Apr 1, 2025 •

edited

Loading

tanishy7777 commented Apr 1, 2025 •

edited

Loading

tanishy7777 commented Apr 11, 2025 •

edited

Loading