Skip to content

Feature non linear time averaged msd #5066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

gitsirsha
Copy link

@gitsirsha gitsirsha commented Jun 15, 2025

Fixes #5028

Changes made in this Pull Request:

  • msd.py script was modified to support time-averaged mean square displacement calculation for dump files with non-linear time spacings.

PR Checklist

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.


📚 Documentation preview 📚: https://mdanalysis--5066.org.readthedocs.build/en/5066/

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

Copy link

codecov bot commented Jun 15, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.86%. Comparing base (c4af00b) to head (7ad3e09).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #5066   +/-   ##
========================================
  Coverage    93.85%   93.86%           
========================================
  Files          178      178           
  Lines        22122    22150   +28     
  Branches      3129     3133    +4     
========================================
+ Hits         20763    20791   +28     
  Misses         902      902           
  Partials       457      457           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link

@mrshirts mrshirts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments on the code.

# Modified by Sirsha Ganguly
#
# Modification:
# '_conclude_modified()' is the new function added to calculate time-averaged mean squared displacement of non-linear dumps

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest something more descriptive here - _conclude_nonlinear() instead of _conclude_modified()

if len(sequence) < 2:
return True
step = sequence[1] - sequence[0]
return all(sequence[i+1] - sequence[i] == step for i in range(len(sequence) - 1))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So right now, it's failing the test. My guess is that it's going down the wrong test path, because it's interpreting whatever data is being passed in by the test as not actually being linear (maybe something like 1.0000 gets interpreted at 0.999999). A couple of options here:

  1. Add a keyword like nonlinear that controls the logic, rather than determining the logic from the steps.
  2. Using something like assert_almost_equal so floating point errors don't cause the logic to give the wrong answer (i.e. that they are not equally spaced when they actually are).

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some initial questions before we get into the weeds. I'm keen for us to end up with a single method that works for both use cases if possible.

There are probably things we can do to make that happen, but I first went to get a good idea of where things are at in this current state.

@@ -1,3 +1,25 @@
######## Documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine to do this later once code has been settled on, but documents exist as module docstring, or API docstring (see the sections encompassed by """ below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Irfan said: The comments will not be in the final PR. The first line of the file is always

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-

and then comes the standard comment block. The only history information (eg who modified the file or added something) that we keep inside the code is versionchanged/versionadded markup in the doc strings. Everything else is known to be available through the git history and the information in the PR/issue.

Could you please

  • remove the comments
  • copy them to the PR information

This will make it a lot easier to discuss the intention of the PR separate from the implementation.

Information on the algorithm (basically what's important for a user to know) will go into the doc string in a Notes section (we follow the numpy doc string standard and see https://userguide.mdanalysis.org/stable/contributing_code.html#guidelines-for-docstrings for MDAnalysis specific guidelines on docs).

@@ -1,3 +1,25 @@
######## Documentation
#
# Issue: The default '_conclude_simple()' function previously calculated time-averaged mean squared displacement considering linear timespacing between the frames even when a dump file with non-linear timestep gap is provided
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if these details were in the PR info instead - this is where a reviewer usually looks for them (and ultimately we will likely merge without this).

# Modification:
# '_conclude_modified()' is the new function added to calculate time-averaged mean squared displacement of non-linear dumps
# Note: This new function is generalized for non-linear dumps.
# Moreover, if you use this function to calculate for linear timedump this gives same result as '_conclude_simple()' which is implemented in the default version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking this, it's really what we need to discuss the most here. Ultimately I think the end target is this PR will be to end up with just one method. Having alternate code paths adds to user difficulty and maintenance costs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just taking things as they are now (before we get into the weeds).Could you comment on performance? Do the two methods run at similar speeds with the same inputs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if someone uses the fft based method instead with non linear times? Does that fail?

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR. That's a good start for a discussion.

Initially it would really help to have the big picture up front, namely a description at a high level what you want to accomplish and how.

Additionally

  • Describe the algorithm that you're implementing.
  • How does it differ from the existing one (without FFT)?
  • Can you provide a rough performance comparison between "linear" and "nonlinear"? (I understand that applying the algorithm for equally spaced frames to a trajectory with variably spaced frames is incorrect but I want to get a sense of the necessary added computational complexity.)

I added other comments throughout but initially the big picture is more important because we are trying to figure out if there are ways to re-use code: the fewer code is duplicated, the lower the maintenance burden that we incur, and the easier it is to test and maintain correctness.

@@ -1,3 +1,25 @@
######## Documentation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As Irfan said: The comments will not be in the final PR. The first line of the file is always

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-

and then comes the standard comment block. The only history information (eg who modified the file or added something) that we keep inside the code is versionchanged/versionadded markup in the doc strings. Everything else is known to be available through the git history and the information in the PR/issue.

Could you please

  • remove the comments
  • copy them to the PR information

This will make it a lot easier to discuss the intention of the PR separate from the implementation.

Information on the algorithm (basically what's important for a user to know) will go into the doc string in a Notes section (we follow the numpy doc string standard and see https://userguide.mdanalysis.org/stable/contributing_code.html#guidelines-for-docstrings for MDAnalysis specific guidelines on docs).

class EinsteinMSD(AnalysisBase):

@staticmethod
def is_linear(sequence): # This function is to check if the timesteps in the array is linear or not! If it's linear default MDA code will carry-out other wise the modified code carries!
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of comments, write doc strings.



def _conclude_modified(self):
from collections import defaultdict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports at top of file,

import collections

then use collections.defaultdict


def _conclude_modified(self):
from collections import defaultdict
print("The Dump file has non-linear time spacing")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, remove all print – if you want to show status messages, use the logger.

from collections import defaultdict
print("The Dump file has non-linear time spacing")

dump_times = [ts.time for ts in self._trajectory]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This iterates through the whole trajectory and may already be expensive!

Maybe unavoidable but keep it in mind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to use self._sliced_trajectory instead of self._trajectory. If someone is running it with run(start=10, stop=1000, step=5) then self._sliced_trajectory will contain the proper frames that were analyzed in _single_frame. However, the time information should already by available in self.times:

Suggested change
dump_times = [ts.time for ts in self._trajectory]
dump_times = self.times

This should improve performance as you don't have to iterate through the trajectory again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's already a numpy array.

@gitsirsha
Copy link
Author

Thank you @IAlibay @orbeckst and @mrshirts for your time to review the initial PR. Based on your suggestions I have modified a few things:

  1. Added a keyword argument non_linear instead of a check like the initial commit is_linear.
    non_linear is set to False and only set True by the user to call the new method _conclude_non_linear .
    This is done to check if the GH Actions CI tests get cleared or not as the code executes to the default route.

    The new method only executes when fft=False and non_linear = True

  2. Removed the comments for documentation at the top. I will update that in the doc strings later once we are satisfied with the code.

  3. Removed the print statements.

  4. import collections at top of the file.

  5. deleted positions = np.zeros((n_frames, n_atoms, 3)) as it was not necessary.

Necessity of a new method :

  • The _conclude_simple method calculated the time-averaged msd relying on the number of frames a particular dump file has but not relying on at what timestep did each frame got dumped at. So, a better method is needed to calculate "time-averaging" not "frame-averaging".

  • Moreover If the new method _conclude_non_linear is used on a linearly dumped file it generates same msd values as the default _conclude_simple method generates. This is because the new method solely uses the timestamps each frame is at which for a linearly dumped file is nothing but the frame index!.

Algorithm of _conclude_non_linear to calculate time-averaged msd :

  • It creates a dictionary/hash-map with the keys being the time difference between the frames Delta_t and values being a list containing the frame to frame MSD values [msd1, msd2, .....] for that particular Delta t Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]}.

  • The reference frame is changed with each iteration and each time a new Key (i.e, Delta t) is found it is appended in the dictionary/hash-map.

  • If a duplicate Delta_t is found it is added to the value (msd list) of that specefic Key in the dictionary/hash-map.

  • Lastly in the dictionary/hash-map for each Delta_t the msd values inside the list are averaged.

Example Performance comparison:

A coarse grained LAMMPS trajectory with N=60,000 and 21 frames (non-linearly spaced timesteps) is run with fft turned off:

It took around 2.9s with the new method (toggled on using non_linear = True).
and took around 1.6s with the default method.

(This is not including the plotting time in the user interface)

output_default
output_new

@mrshirts
Copy link

The big picture here is that for polymer MSD's, the results are highly nonlinear in the amount of time (unlike with liquids, where the MSD becomes linear very quickly). In fact, generally they have a power law dependence. To calculate MSD at long log times ends up requiring a huge amount of data; you get the short time behavior very accurately, and long time behavior less accurately, but you don't really need the short time behavior more accurately than the long time behavior.

So LAMMPS has a standard option to output the trajectory files where every N steps, you output steps logarithmically: perhaps with spacing (roughly): 1, 10, 100, 1000. So for every N steps, you are only outputting 4 points. Then maybe your simulation is 100/N steps long, so ,you will get 100 data points for lags of 1, 10, 100, 1000, etc, and you get equal accuracy for all lags, at a cost of some precision at short times, but using way less storage space (polymer simulations can have a lot of particles!). Polymer scientists use this a lot.

The issue is that the standard code can't handle this, and I'm actually not sure if it can be handled with FFTs. So I'm not totally sure if it makes sense to have a single code path (although that would be desirable). What one should have is that one gets the same answer for LINEAR data (within machine precision) if it goes through either path, which can be automatically checked, so at least any code duplication is validated.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I am ok with having a separate _conclude_nonlinear. It seems sufficiently different to warrant its own code path.

The next important step is to add tests that check that your code runs and which can be used as regression tests during further code optimization.

I also left some comments on how to use existing features of AnalysisBase in your code.

Additionally, start paying attention to the linter. Run your file through black for autoformatting.

(We'll also ask you to write more docs and add an entry to CHANGELOG.)

from collections import defaultdict
print("The Dump file has non-linear time spacing")

dump_times = [ts.time for ts in self._trajectory]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would need to use self._sliced_trajectory instead of self._trajectory. If someone is running it with run(start=10, stop=1000, step=5) then self._sliced_trajectory will contain the proper frames that were analyzed in _single_frame. However, the time information should already by available in self.times:

Suggested change
dump_times = [ts.time for ts in self._trajectory]
dump_times = self.times

This should improve performance as you don't have to iterate through the trajectory again.

def _conclude_non_linear(self):

dump_times = [ts.time for ts in self._trajectory]
n_frames = len(dump_times)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use self.nframes

Suggested change
n_frames = len(dump_times)
n_frames = self.n_frames

Sorry, that's not properly documented but it is created with AnalysisBase._prepare_sliced_trajectory (I just updated the docs ... #5075 )

from collections import defaultdict
print("The Dump file has non-linear time spacing")

dump_times = [ts.time for ts in self._trajectory]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's already a numpy array.


msd_dict = collections.defaultdict(list) # Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]}

# Looping over all the frames as if the referenced gets shifted frame to frame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's probably a way to write this all-vs-all distance calculation without loops and just with numpy operations and thus increase performance.

But for a start this is ok while you're making sure that your code produces correct results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have functions in lib.distances that can do optimized all-vs-all distance calculations for coordinates.

There's also cdist in scipy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will look into optimizing it once the MSD.results.msds_by_particle is as per the documentation. Supposing there will be slight performance hit when this is implemented for the non linear method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO comment as reminder to look into optimizing performance of the code

Comment on lines 465 to 466
self.results.timeseries = delta_t_values
self.results.msds_by_particle = avg_msds
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turn the contents of results into numpy arrays. Your code should produce results in the same way as documented in https://docs.mdanalysis.org/stable/documentation_pages/analysis/msd.html#MDAnalysis.analysis.msd.EinsteinMSD .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.results.timeseries produces desired results the same way.
self.results.msds_by_particle implementation is still in progress, will update shortly with the performance check and results.

@gitsirsha
Copy link
Author

gitsirsha commented Jul 14, 2025

Current changes:
-MSD.results.timeseries now stores the msd values and MSD.results.delta_t_values stores the delta t values for which time average msds are calculated (In previous commit it was stored in results.timeseries). This makes sure MSD.results.timeseries array stores results which is consistent as per the documentation.

-Simple plot can be generated with:
msd = MSD.results.timeseries
ax.plot(MSD.results.delta_t_values, msd)

or simply ax.plot(MSD.results.delta_t_values, MSD.results.timeseries)

  • No performance hit and deals with sliced trajectory

Future implementation in progress:
The MSD.results.msds_by_particle (a 2D array) as mentioned in msds_by_particle is "The MSD of each individual particle with respect to lag-time", For uniform dumps this is very straightforward with each row of the array being the lag-time. Each lag (Δt) corresponds to a fixed number of frames, so we can neatly store MSD per particle, per lag-index.

However In the non-linear (irregular Δt) case: msds_by_particle is defined as the time-averaged MSD of each particle computed at actual Δt values (i.e., time differences between all frame pairs), rather than fixed frame lags. This accounts for non-uniform sampling and ensures MSD is evaluated based on physical time rather than frame index.

msds_by_particle[Δt_index, i] = ⟨|rᵢ(t + Δt) − rᵢ(t)|²⟩_t

In short for the _conclude_non_linear() method MSD.results.msds_by_particle will be a 2D array with rows being the numbers of unique delta_t values that can be obtained from MSD.results.delta_t_values and columns being the MSD for each particle for that specific delta_t. This is also consistent of the array structure mentioned ensuring easy plotting of particle specific msd over time-lags.

I will push once this implementation is done shortly and update you with the performance. Thank you again for your valuable reviews!

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some comments on the code structure.

The biggest thing is to add tests that exercise all your code and check that it produces the results you expect. Have a look at the code coverage. It should be >= 94%.

You'll also need to update the docs and the CHANGELOG.

@@ -301,7 +304,7 @@ class EinsteinMSD(AnalysisBase):
.. versionadded:: 2.0.0
"""

def __init__(self, u, select="all", msd_type="xyz", fft=True, **kwargs):
def __init__(self, u, select="all", msd_type="xyz", fft=True, non_linear = False, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document new argument. Should look something like

non_linear : bool
    (describe what the option enables)

    .. versionadded:: 2.10.0

(Make sure that there's space under the versionadded and visually check the built docs that it's formatted correctly – under Checks there's the Read The Docs build that has the built docs)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add an entry to CHANGELOG under Enhancements documenting the new features including the new kwarg.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run black over your code, I am pretty sure that it should not have spaces around = in argument lists.

@@ -338,6 +342,7 @@ def __init__(self, u, select="all", msd_type="xyz", fft=True, **kwargs):
# result
self.results.msds_by_particle = None
self.results.timeseries = None
self.results.delta_t_values = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add to docs (with .. versionadded:: 2.10.0)

@@ -383,10 +388,12 @@ def _single_frame(self):
]

def _conclude(self):
if self.fft:
if self.fft == True:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave as is, that was how it's normally written in Python.

Comment on lines 386 to 393
if self.fft:
if self.fft == True:
self._conclude_fft()
else:
elif self.non_linear != True:
self._conclude_simple()
elif self.non_linear:
self._conclude_non_linear()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's group this logically:

if LINEAR
   do either FFT or normal
else
   do non-linear

This keeps the logic MUCH clearer and is more robust – you don't have an else so with weird settings, conclude will not do anything.

@gitsirsha
Copy link
Author

gitsirsha commented Jul 15, 2025

I have some comments on the code structure.

The biggest thing is to add tests that exercise all your code and check that it produces the results you expect. Have a look at the code coverage. It should be >= 94%.

You'll also need to update the docs and the CHANGELOG.

Hi! Thanks for the feedback. I'm not very familiar with writing tests in Python yet. Could you please guide me or point me to an example from this repo so I can follow the same pattern?
(For now I am running the code locally with an example lammpsdump file every time I update the code to check for results)

I see the codecov/project is at 93.5%. Can you please let me know how can it be increased is it just by adding tests?

Happy to update the CHANGELOG and docs right after!

@orbeckst
Copy link
Member

Testing

Writing test takes some time to learn. We use pytest as our testing framework so you may have to keep looking up things there.

See https://userguide.mdanalysis.org/stable/testing.html#testing for more information.

For the MSD code specifically, start (and add your tests) in testsuite/MDAnalysisTests/analysis/test_msd.py.

Coverage

For coverage look in the status check box: Right now it says codecov/patch – 19.35% of diff, which means less than 20% of the lines of codes that you touched are being tested. This number should be >94%. You can click on the status check to see which lines specifically are not covered and you can also look at the Files changed tab on this PR and see the codecov annotations of missing coverage.

image

@orbeckst
Copy link
Member

You should add your lammpsdump data file to the testsuite (see develop/testsuite/MDAnalysisTests/datafiles.py). Make it as small as possible and bz2 compress it. Ideally, it should be much less than 1 MB in size. Otherwise, produce a small file for testing. It does not need to produce scientifically solid results, just exercise all the code that you're writing in the same way that "real" data would.

@orbeckst orbeckst self-assigned this Jul 18, 2025
@orbeckst
Copy link
Member

I am going to assign myself as PR shepherd but if anyone else wants to take over or be involved, please do!

@gitsirsha gitsirsha force-pushed the feature-Non-linear-time-averaged-msd branch from 1453bd6 to 871faa2 Compare July 22, 2025 16:32
@gitsirsha
Copy link
Author

gitsirsha commented Jul 22, 2025

With the new commit:

  • CHANGELOG has been edited
  • Docstrings has been edited
  • An initial Test has been added for test_msd.py

This initial test currently checks if results.timeseries and results.delta_t_values are matched with the expected value. (NOTE: results.msds_by_particle is yet to be implemented in the original msd.py code inside the new method)

It seems like some of the automated tests are failing although when I did pytest testsuite/MDAnalysisTests/analysis/test_msd.py locally it passed. I am a little confused about this.

Ran black after each.py codes are edited

@gitsirsha
Copy link
Author

It seems from the errors that the filename needs to be added in datafiles.py > __all__

@gitsirsha
Copy link
Author

It seems like some of the automated tests are failing although when I did pytest testsuite/MDAnalysisTests/analysis/test_msd.py locally it passed. I am a little confused about this.

fixed by adding the test file name to __all__ in datafiles.py.
codecov/patch has increased to 100%.

The linter seems to giving errors for msd.py although black was run after edits.

@orbeckst
Copy link
Member

Did you run black with version 24? (Check black --version and if necessary pin to 24)

@gitsirsha
Copy link
Author

Did you run black with version 24? (Check black --version and if necessary pin to 24)

I was running black v25. Downgraded and ran black 24.10.0 with [c90f52e]

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress! There are still some major things to work on:

  • Test that your code also works with different start/stop/step
  • test that it works with not only xyz but also other combinations

And some minor/easy to do:

  • update docs/comments (see inline comments)
  • revert black formatting for any files that you didn't touch — only reformat the files that you've been working on, please.

@@ -298,10 +305,12 @@ class EinsteinMSD(AnalysisBase):
Number of particles MSD was calculated over.


.. versionadded:: 2.0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to stay. It records when the class was originally added.

Instead, you should add a .. versionchanged:: 2.10.0 with a short explanation. We'll also add .. versionadded to the new kwarg, see comment above.

@@ -290,6 +295,8 @@ class EinsteinMSD(AnalysisBase):
The averaged MSD over all the particles with respect to lag-time.
results.msds_by_particle : :class:`numpy.ndarray`
The MSD of each individual particle with respect to lag-time.
results.delta_t_values : :class:`numpy.ndarray`
Array of unique Δt (time differences) at which time-averaged MSD values are computed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indicate when this new results attribute was added so that users understand which version of the code they need

Suggested change
Array of unique Δt (time differences) at which time-averaged MSD values are computed.
Array of unique Δt (time differences) at which time-averaged MSD values are
computed.
.. versionadded: 2.10.0

non_linear : bool
If ``True``, calculates MSD for trajectory where frames are
non-linearly dumped. To use this set `fft=False`.
Defaults to ``False``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indicate when this new kwarg was added so that users understand which version of the code they need in order to use it

Suggested change
Defaults to ``False``.
Defaults to ``False``.
.. versionadded:: 2.10.0

(needs 2 newlines below so that formatting works, manually check the rendered docs from the status checks)

@@ -298,10 +305,12 @@ class EinsteinMSD(AnalysisBase):
Number of particles MSD was calculated over.


.. versionadded:: 2.0.0
.. versionadded:: 2.10.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.. versionadded:: 2.10.0
.. versionadded:: 2.0.0
.. versionchanged:: 2.10.0
Added ability to calculate MSD from samples that are not linearly spaced with the
new `non_linear` keyword argument.


msd_dict = collections.defaultdict(list) # Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]}

# Looping over all the frames as if the referenced gets shifted frame to frame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO comment as reminder to look into optimizing performance of the code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you bz2 compress this file? Should still work.

@@ -906,6 +907,7 @@

SURFACE_PDB = (_data_ref / "surface.pdb.bz2").as_posix()
SURFACE_TRR = (_data_ref / "surface.trr").as_posix()
LAMMPSDUMP_non_linear = (_data_ref / "custom_non_linear/test_non_linear.dump").as_posix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.bz2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move near other LAMMPS files and keep them together (also in the __all__ list)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after line ~785 or so

@@ -389,6 +389,7 @@
"SURFACE_PDB", # 111 FCC lattice topology for NSGrid bug #2345
"SURFACE_TRR", # full precision coordinates for NSGrid bug #2345
"DSSP", # DSSP test suite
"LAMMPSDUMP_non_linear"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list near other LAMMPS files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and add a short comment explaining what it is

Comment on lines +66 to +70
* Added new method '_conclude_non_linear` to calculate time-averaged
mean square displacement in `package/MDAnalysis/analysis/msd.py`,
new bool keyword argument `non_linear=True` under `EinsteinMSD`
enables execution of this method.
(Issue #5028, PR #5066)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Focus on the user-visible changes and keep it short

Suggested change
* Added new method '_conclude_non_linear` to calculate time-averaged
mean square displacement in `package/MDAnalysis/analysis/msd.py`,
new bool keyword argument `non_linear=True` under `EinsteinMSD`
enables execution of this method.
(Issue #5028, PR #5066)
* Added capability to calculate MSD from frames with irregular (non-linear)
time spacing in analysis.msd.EinsteinMSD with keyword argument
`non_linear=True` (Issue #5028, PR #5066)

Also, traditionally we add the newest changes under the heading, i.e., move it to directly under "Enhancements"



def test_msd_non_linear():
u = mda.Universe(LAMMPSDUMP_non_linear, format="LAMMPSDUMP")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the universe a fixture, e.g. u_nonlinear. See near top of the file as examples.

@gitsirsha
Copy link
Author

* revert black formatting for any files that you didn't touch — _only reformat the files that you've been working on_, please.

Can you please let me know how to revert back the changes for the recent commit locally. When I ran black v24.10.0 on msd.py it said 1 file unchanged So, I did black ., which changed every .py file.

Thank you for all the suggestions will get to those shortly!

@orbeckst
Copy link
Member

orbeckst commented Jul 22, 2025

If you look at https://github.com/MDAnalysis/mdanalysis/pull/5066/files you see that 260(!) files were changed. I would look into reverting the commit that did that.

Alternatively, find the files where you definitely added code (msd.py. test_msd.py, CHANGELOG, AUTHORS, ... ?) and revert everything else. Look at git revert, git reset (be careful), and also git restore. I also saw "See "Reset, restore and revert" in git(1) for the differences between the three commands.".

This reverts commit c90f52e.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for MDAnalysis packages dealing with non-linear time dump
4 participants