Feature non linear time averaged msd #5066

gitsirsha · 2025-06-15T18:16:34Z

Changes made in this Pull Request:

msd.py script was modified to support time-averaged mean square displacement calculation for dump files with non-linear time spacings.

PR Checklist

Issue raised/referenced?
Yes Issue Support for MDAnalysis packages dealing with non-linear time dump #5028
Tests updated/added?
Not Yet
Documentation updated/added?
Updated short documentation at the top of the msd.py as comments
package/CHANGELOG file updated?
Not Yet
Is your name in package/AUTHORS? (If it is not, add it!)
Yes - Added

Developers Certificate of Origin

I certify that I can submit this code contribution as described in the Developer Certificate of Origin, under the MDAnalysis LICENSE.

📚 Documentation preview 📚: https://mdanalysis--5066.org.readthedocs.build/en/5066/

github-actions

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on GitHub Discussions so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

codecov · 2025-06-15T18:30:58Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.86%. Comparing base (c4af00b) to head (7ad3e09).

Additional details and impacted files

@@           Coverage Diff            @@
##           develop    #5066   +/-   ##
========================================
  Coverage    93.85%   93.86%           
========================================
  Files          178      178           
  Lines        22122    22150   +28     
  Branches      3129     3133    +4     
========================================
+ Hits         20763    20791   +28     
  Misses         902      902           
  Partials       457      457

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mrshirts

Added some comments on the code.

mrshirts · 2025-06-18T17:49:49Z

package/MDAnalysis/analysis/msd.py

+#   Modified by Sirsha Ganguly
+#
+# Modification:
+# '_conclude_modified()' is the new function added to calculate time-averaged mean squared displacement of non-linear dumps


Suggest something more descriptive here - _conclude_nonlinear() instead of _conclude_modified()

mrshirts · 2025-06-18T17:59:12Z

package/MDAnalysis/analysis/msd.py

+        if len(sequence) < 2:
+            return True
+        step = sequence[1] - sequence[0]
+        return all(sequence[i+1] - sequence[i] == step for i in range(len(sequence) - 1))


So right now, it's failing the test. My guess is that it's going down the wrong test path, because it's interpreting whatever data is being passed in by the test as not actually being linear (maybe something like 1.0000 gets interpreted at 0.999999). A couple of options here:

Add a keyword like nonlinear that controls the logic, rather than determining the logic from the steps.

Using something like assert_almost_equal so floating point errors don't cause the logic to give the wrong answer (i.e. that they are not equally spaced when they actually are).

IAlibay

Here are some initial questions before we get into the weeds. I'm keen for us to end up with a single method that works for both use cases if possible.

There are probably things we can do to make that happen, but I first went to get a good idea of where things are at in this current state.

IAlibay · 2025-06-19T19:09:20Z

package/MDAnalysis/analysis/msd.py

@@ -1,3 +1,25 @@
+######## Documentation


It's fine to do this later once code has been settled on, but documents exist as module docstring, or API docstring (see the sections encompassed by """ below).

As Irfan said: The comments will not be in the final PR. The first line of the file is always

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-

and then comes the standard comment block. The only history information (eg who modified the file or added something) that we keep inside the code is versionchanged/versionadded markup in the doc strings. Everything else is known to be available through the git history and the information in the PR/issue.

Could you please

remove the comments

copy them to the PR information

This will make it a lot easier to discuss the intention of the PR separate from the implementation.

Information on the algorithm (basically what's important for a user to know) will go into the doc string in a Notes section (we follow the numpy doc string standard and see https://userguide.mdanalysis.org/stable/contributing_code.html#guidelines-for-docstrings for MDAnalysis specific guidelines on docs).

IAlibay · 2025-06-19T19:18:54Z

package/MDAnalysis/analysis/msd.py

@@ -1,3 +1,25 @@
+######## Documentation
+#
+# Issue: The default '_conclude_simple()' function previously calculated time-averaged mean squared displacement considering linear timespacing between the frames even when a dump file with non-linear timestep gap is provided


It would be better if these details were in the PR info instead - this is where a reviewer usually looks for them (and ultimately we will likely merge without this).

IAlibay · 2025-06-19T19:22:21Z

package/MDAnalysis/analysis/msd.py

+# Modification:
+# '_conclude_modified()' is the new function added to calculate time-averaged mean squared displacement of non-linear dumps
+    # Note: This new function is generalized for non-linear dumps.
+    #       Moreover, if you use this function to calculate for linear timedump this gives same result as '_conclude_simple()' which is implemented in the default version.


Thanks for checking this, it's really what we need to discuss the most here. Ultimately I think the end target is this PR will be to end up with just one method. Having alternate code paths adds to user difficulty and maintenance costs.

Just taking things as they are now (before we get into the weeds).Could you comment on performance? Do the two methods run at similar speeds with the same inputs?

What happens if someone uses the fft based method instead with non linear times? Does that fail?

orbeckst

Thank you for the PR. That's a good start for a discussion.

Initially it would really help to have the big picture up front, namely a description at a high level what you want to accomplish and how.

Additionally

Describe the algorithm that you're implementing.
How does it differ from the existing one (without FFT)?
Can you provide a rough performance comparison between "linear" and "nonlinear"? (I understand that applying the algorithm for equally spaced frames to a trajectory with variably spaced frames is incorrect but I want to get a sense of the necessary added computational complexity.)

I added other comments throughout but initially the big picture is more important because we are trying to figure out if there are ways to re-use code: the fewer code is duplicated, the lower the maintenance burden that we incur, and the easier it is to test and maintain correctness.

orbeckst · 2025-06-26T22:13:12Z

package/MDAnalysis/analysis/msd.py

@@ -1,3 +1,25 @@
+######## Documentation


As Irfan said: The comments will not be in the final PR. The first line of the file is always

# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-

and then comes the standard comment block. The only history information (eg who modified the file or added something) that we keep inside the code is versionchanged/versionadded markup in the doc strings. Everything else is known to be available through the git history and the information in the PR/issue.

Could you please

remove the comments

copy them to the PR information

This will make it a lot easier to discuss the intention of the PR separate from the implementation.

Information on the algorithm (basically what's important for a user to know) will go into the doc string in a Notes section (we follow the numpy doc string standard and see https://userguide.mdanalysis.org/stable/contributing_code.html#guidelines-for-docstrings for MDAnalysis specific guidelines on docs).

orbeckst · 2025-06-26T22:14:05Z

package/MDAnalysis/analysis/msd.py

 class EinsteinMSD(AnalysisBase):
+
+    @staticmethod
+    def is_linear(sequence): # This function is to check if the timesteps in the array is linear or not! If it's linear default MDA code will carry-out other wise the modified code carries!


Instead of comments, write doc strings.

orbeckst · 2025-06-26T22:16:13Z

package/MDAnalysis/analysis/msd.py

+
+
+    def _conclude_modified(self):
+        from collections import defaultdict


imports at top of file,

import collections

then use collections.defaultdict

orbeckst · 2025-06-26T22:16:50Z

package/MDAnalysis/analysis/msd.py

+
+    def _conclude_modified(self):
+        from collections import defaultdict
+        print("The Dump file has non-linear time spacing")


Eventually, remove all print – if you want to show status messages, use the logger.

orbeckst · 2025-06-26T22:17:50Z

package/MDAnalysis/analysis/msd.py

+        from collections import defaultdict
+        print("The Dump file has non-linear time spacing")
+
+        dump_times = [ts.time for ts in self._trajectory]


This iterates through the whole trajectory and may already be expensive!

Maybe unavoidable but keep it in mind.

You would need to use self._sliced_trajectory instead of self._trajectory. If someone is running it with run(start=10, stop=1000, step=5) then self._sliced_trajectory will contain the proper frames that were analyzed in _single_frame. However, the time information should already by available in self.times:

Suggested change

dump_times = [ts.time for ts in self._trajectory]

dump_times = self.times

This should improve performance as you don't have to iterate through the trajectory again.

And it's already a numpy array.

package/MDAnalysis/analysis/msd.py

gitsirsha · 2025-06-30T22:36:21Z

Thank you @IAlibay @orbeckst and @mrshirts for your time to review the initial PR. Based on your suggestions I have modified a few things:

Added a keyword argument non_linear instead of a check like the initial commit is_linear.
non_linear is set to False and only set True by the user to call the new method _conclude_non_linear .
This is done to check if the GH Actions CI tests get cleared or not as the code executes to the default route.

The new method only executes when fft=False and non_linear = True
Removed the comments for documentation at the top. I will update that in the doc strings later once we are satisfied with the code.
Removed the print statements.
import collections at top of the file.
deleted positions = np.zeros((n_frames, n_atoms, 3)) as it was not necessary.

Necessity of a new method :

The _conclude_simple method calculated the time-averaged msd relying on the number of frames a particular dump file has but not relying on at what timestep did each frame got dumped at. So, a better method is needed to calculate "time-averaging" not "frame-averaging".
Moreover If the new method _conclude_non_linear is used on a linearly dumped file it generates same msd values as the default _conclude_simple method generates. This is because the new method solely uses the timestamps each frame is at which for a linearly dumped file is nothing but the frame index!.

Algorithm of _conclude_non_linear to calculate time-averaged msd :

It creates a dictionary/hash-map with the keys being the time difference between the frames Delta_t and values being a list containing the frame to frame MSD values [msd1, msd2, .....] for that particular Delta t Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]}.
The reference frame is changed with each iteration and each time a new Key (i.e, Delta t) is found it is appended in the dictionary/hash-map.
If a duplicate Delta_t is found it is added to the value (msd list) of that specefic Key in the dictionary/hash-map.
Lastly in the dictionary/hash-map for each Delta_t the msd values inside the list are averaged.

Example Performance comparison:

A coarse grained LAMMPS trajectory with N=60,000 and 21 frames (non-linearly spaced timesteps) is run with fft turned off:

It took around 2.9s with the new method (toggled on using non_linear = True).
and took around 1.6s with the default method.

(This is not including the plotting time in the user interface)

mrshirts · 2025-06-30T22:50:20Z

The big picture here is that for polymer MSD's, the results are highly nonlinear in the amount of time (unlike with liquids, where the MSD becomes linear very quickly). In fact, generally they have a power law dependence. To calculate MSD at long log times ends up requiring a huge amount of data; you get the short time behavior very accurately, and long time behavior less accurately, but you don't really need the short time behavior more accurately than the long time behavior.

So LAMMPS has a standard option to output the trajectory files where every N steps, you output steps logarithmically: perhaps with spacing (roughly): 1, 10, 100, 1000. So for every N steps, you are only outputting 4 points. Then maybe your simulation is 100/N steps long, so ,you will get 100 data points for lags of 1, 10, 100, 1000, etc, and you get equal accuracy for all lags, at a cost of some precision at short times, but using way less storage space (polymer simulations can have a lot of particles!). Polymer scientists use this a lot.

The issue is that the standard code can't handle this, and I'm actually not sure if it can be handled with FFTs. So I'm not totally sure if it makes sense to have a single code path (although that would be desirable). What one should have is that one gets the same answer for LINEAR data (within machine precision) if it goes through either path, which can be automatically checked, so at least any code duplication is validated.

orbeckst

Overall I am ok with having a separate _conclude_nonlinear. It seems sufficiently different to warrant its own code path.

The next important step is to add tests that check that your code runs and which can be used as regression tests during further code optimization.

I also left some comments on how to use existing features of AnalysisBase in your code.

Additionally, start paying attention to the linter. Run your file through black for autoformatting.

(We'll also ask you to write more docs and add an entry to CHANGELOG.)

orbeckst · 2025-07-03T19:12:23Z

package/MDAnalysis/analysis/msd.py

+        from collections import defaultdict
+        print("The Dump file has non-linear time spacing")
+
+        dump_times = [ts.time for ts in self._trajectory]


You would need to use self._sliced_trajectory instead of self._trajectory. If someone is running it with run(start=10, stop=1000, step=5) then self._sliced_trajectory will contain the proper frames that were analyzed in _single_frame. However, the time information should already by available in self.times:

Suggested change

dump_times = [ts.time for ts in self._trajectory]

dump_times = self.times

This should improve performance as you don't have to iterate through the trajectory again.

orbeckst · 2025-07-03T19:36:15Z

package/MDAnalysis/analysis/msd.py

+    def _conclude_non_linear(self):
+
+        dump_times = [ts.time for ts in self._trajectory]
+        n_frames = len(dump_times)


Just use self.nframes

Suggested change

n_frames = len(dump_times)

n_frames = self.n_frames

Sorry, that's not properly documented but it is created with AnalysisBase._prepare_sliced_trajectory (I just updated the docs ... #5075 )

orbeckst · 2025-07-03T19:37:39Z

package/MDAnalysis/analysis/msd.py

+        from collections import defaultdict
+        print("The Dump file has non-linear time spacing")
+
+        dump_times = [ts.time for ts in self._trajectory]


And it's already a numpy array.

orbeckst · 2025-07-03T19:41:29Z

package/MDAnalysis/analysis/msd.py

+
+        msd_dict = collections.defaultdict(list) # Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]}
+
+        # Looping over all the frames as if the referenced gets shifted frame to frame


There's probably a way to write this all-vs-all distance calculation without loops and just with numpy operations and thus increase performance.

But for a start this is ok while you're making sure that your code produces correct results.

We have functions in lib.distances that can do optimized all-vs-all distance calculations for coordinates.

There's also cdist in scipy.

We will look into optimizing it once the MSD.results.msds_by_particle is as per the documentation. Supposing there will be slight performance hit when this is implemented for the non linear method.

Add a TODO comment as reminder to look into optimizing performance of the code

orbeckst · 2025-07-03T19:44:54Z

package/MDAnalysis/analysis/msd.py

+        self.results.timeseries = delta_t_values
+        self.results.msds_by_particle = avg_msds


Turn the contents of results into numpy arrays. Your code should produce results in the same way as documented in https://docs.mdanalysis.org/stable/documentation_pages/analysis/msd.html#MDAnalysis.analysis.msd.EinsteinMSD .

self.results.timeseries produces desired results the same way.
self.results.msds_by_particle implementation is still in progress, will update shortly with the performance check and results.

gitsirsha · 2025-07-14T19:06:30Z

Current changes:
-MSD.results.timeseries now stores the msd values and MSD.results.delta_t_values stores the delta t values for which time average msds are calculated (In previous commit it was stored in results.timeseries). This makes sure MSD.results.timeseries array stores results which is consistent as per the documentation.

-Simple plot can be generated with:
msd = MSD.results.timeseries
ax.plot(MSD.results.delta_t_values, msd)

or simply ax.plot(MSD.results.delta_t_values, MSD.results.timeseries)

No performance hit and deals with sliced trajectory

Future implementation in progress:
The MSD.results.msds_by_particle (a 2D array) as mentioned in msds_by_particle is "The MSD of each individual particle with respect to lag-time", For uniform dumps this is very straightforward with each row of the array being the lag-time. Each lag (Δt) corresponds to a fixed number of frames, so we can neatly store MSD per particle, per lag-index.

However In the non-linear (irregular Δt) case: msds_by_particle is defined as the time-averaged MSD of each particle computed at actual Δt values (i.e., time differences between all frame pairs), rather than fixed frame lags. This accounts for non-uniform sampling and ensures MSD is evaluated based on physical time rather than frame index.

msds_by_particle[Δt_index, i] = ⟨|rᵢ(t + Δt) − rᵢ(t)|²⟩_t

In short for the _conclude_non_linear() method MSD.results.msds_by_particle will be a 2D array with rows being the numbers of unique delta_t values that can be obtained from MSD.results.delta_t_values and columns being the MSD for each particle for that specific delta_t. This is also consistent of the array structure mentioned ensuring easy plotting of particle specific msd over time-lags.

I will push once this implementation is done shortly and update you with the performance. Thank you again for your valuable reviews!

orbeckst

I have some comments on the code structure.

The biggest thing is to add tests that exercise all your code and check that it produces the results you expect. Have a look at the code coverage. It should be >= 94%.

You'll also need to update the docs and the CHANGELOG.

orbeckst · 2025-07-15T16:35:41Z

package/MDAnalysis/analysis/msd.py

@@ -301,7 +304,7 @@ class EinsteinMSD(AnalysisBase):
    .. versionadded:: 2.0.0
    """

-    def __init__(self, u, select="all", msd_type="xyz", fft=True, **kwargs):
+    def __init__(self, u, select="all", msd_type="xyz", fft=True, non_linear = False, **kwargs):


Document new argument. Should look something like

non_linear : bool (describe what the option enables) .. versionadded:: 2.10.0

(Make sure that there's space under the versionadded and visually check the built docs that it's formatted correctly – under Checks there's the Read The Docs build that has the built docs)

Also add an entry to CHANGELOG under Enhancements documenting the new features including the new kwarg.

Run black over your code, I am pretty sure that it should not have spaces around = in argument lists.

orbeckst · 2025-07-15T16:38:03Z

package/MDAnalysis/analysis/msd.py

@@ -338,6 +342,7 @@ def __init__(self, u, select="all", msd_type="xyz", fft=True, **kwargs):
        # result
        self.results.msds_by_particle = None
        self.results.timeseries = None
+        self.results.delta_t_values = None


add to docs (with .. versionadded:: 2.10.0)

orbeckst · 2025-07-15T16:38:32Z

package/MDAnalysis/analysis/msd.py

@@ -383,10 +388,12 @@ def _single_frame(self):
        ]

    def _conclude(self):
-        if self.fft:
+        if self.fft == True:


Leave as is, that was how it's normally written in Python.

orbeckst · 2025-07-15T16:42:39Z

package/MDAnalysis/analysis/msd.py

-        if self.fft:
+        if self.fft == True:
            self._conclude_fft()
-        else:
+        elif self.non_linear != True:
            self._conclude_simple()
+        elif self.non_linear:
+            self._conclude_non_linear()


Let's group this logically:

if LINEAR do either FFT or normal else do non-linear

This keeps the logic MUCH clearer and is more robust – you don't have an else so with weird settings, conclude will not do anything.

gitsirsha · 2025-07-15T16:57:08Z

I have some comments on the code structure.

The biggest thing is to add tests that exercise all your code and check that it produces the results you expect. Have a look at the code coverage. It should be >= 94%.

You'll also need to update the docs and the CHANGELOG.

Hi! Thanks for the feedback. I'm not very familiar with writing tests in Python yet. Could you please guide me or point me to an example from this repo so I can follow the same pattern?
(For now I am running the code locally with an example lammpsdump file every time I update the code to check for results)

I see the codecov/project is at 93.5%. Can you please let me know how can it be increased is it just by adding tests?

Happy to update the CHANGELOG and docs right after!

orbeckst · 2025-07-15T19:34:50Z

Testing

Writing test takes some time to learn. We use pytest as our testing framework so you may have to keep looking up things there.

See https://userguide.mdanalysis.org/stable/testing.html#testing for more information.

For the MSD code specifically, start (and add your tests) in testsuite/MDAnalysisTests/analysis/test_msd.py.

Coverage

For coverage look in the status check box: Right now it says codecov/patch – 19.35% of diff, which means less than 20% of the lines of codes that you touched are being tested. This number should be >94%. You can click on the status check to see which lines specifically are not covered and you can also look at the Files changed tab on this PR and see the codecov annotations of missing coverage.

orbeckst · 2025-07-15T19:37:49Z

You should add your lammpsdump data file to the testsuite (see develop/testsuite/MDAnalysisTests/datafiles.py). Make it as small as possible and bz2 compress it. Ideally, it should be much less than 1 MB in size. Otherwise, produce a small file for testing. It does not need to produce scientifically solid results, just exercise all the code that you're writing in the same way that "real" data would.

orbeckst · 2025-07-18T20:01:05Z

I am going to assign myself as PR shepherd but if anyone else wants to take over or be involved, please do!

…ar dump

gitsirsha · 2025-07-22T16:57:46Z

With the new commit:

CHANGELOG has been edited
Docstrings has been edited
An initial Test has been added for test_msd.py

This initial test currently checks if results.timeseries and results.delta_t_values are matched with the expected value. (NOTE: results.msds_by_particle is yet to be implemented in the original msd.py code inside the new method)

It seems like some of the automated tests are failing although when I did pytest testsuite/MDAnalysisTests/analysis/test_msd.py locally it passed. I am a little confused about this.

Ran black after each.py codes are edited

gitsirsha · 2025-07-22T17:27:05Z

It seems from the errors that the filename needs to be added in datafiles.py > __all__

gitsirsha · 2025-07-22T18:40:15Z

It seems like some of the automated tests are failing although when I did pytest testsuite/MDAnalysisTests/analysis/test_msd.py locally it passed. I am a little confused about this.

fixed by adding the test file name to __all__ in datafiles.py.
codecov/patch has increased to 100%.

The linter seems to giving errors for msd.py although black was run after edits.

orbeckst · 2025-07-22T19:19:08Z

Did you run black with version 24? (Check black --version and if necessary pin to 24)

gitsirsha · 2025-07-22T19:55:42Z

Did you run black with version 24? (Check black --version and if necessary pin to 24)

I was running black v25. Downgraded and ran black 24.10.0 with [c90f52e]

orbeckst

Good progress! There are still some major things to work on:

Test that your code also works with different start/stop/step
test that it works with not only xyz but also other combinations

And some minor/easy to do:

update docs/comments (see inline comments)
revert black formatting for any files that you didn't touch — only reformat the files that you've been working on, please.

orbeckst · 2025-07-22T19:22:52Z

package/MDAnalysis/analysis/msd.py

@@ -298,10 +305,12 @@ class EinsteinMSD(AnalysisBase):
        Number of particles MSD was calculated over.


-    .. versionadded:: 2.0.0


This has to stay. It records when the class was originally added.

Instead, you should add a .. versionchanged:: 2.10.0 with a short explanation. We'll also add .. versionadded to the new kwarg, see comment above.

orbeckst · 2025-07-22T19:24:15Z

package/MDAnalysis/analysis/msd.py

@@ -290,6 +295,8 @@ class EinsteinMSD(AnalysisBase):
        The averaged MSD over all the particles with respect to lag-time.
    results.msds_by_particle : :class:`numpy.ndarray`
        The MSD of each individual particle with respect to lag-time.
+    results.delta_t_values : :class:`numpy.ndarray`
+        Array of unique Δt (time differences) at which time-averaged MSD values are computed.


Indicate when this new results attribute was added so that users understand which version of the code they need

Suggested change

Array of unique Δt (time differences) at which time-averaged MSD values are computed.

Array of unique Δt (time differences) at which time-averaged MSD values are

computed.

.. versionadded: 2.10.0

orbeckst · 2025-07-22T19:25:41Z

package/MDAnalysis/analysis/msd.py

+    non_linear : bool
+        If ``True``, calculates MSD for trajectory where frames are
+        non-linearly dumped. To use this set `fft=False`.
+        Defaults to ``False``.


Indicate when this new kwarg was added so that users understand which version of the code they need in order to use it

Suggested change

Defaults to ``False``.

Defaults to ``False``.

.. versionadded:: 2.10.0

(needs 2 newlines below so that formatting works, manually check the rendered docs from the status checks)

orbeckst · 2025-07-22T19:27:53Z

package/MDAnalysis/analysis/msd.py

@@ -298,10 +305,12 @@ class EinsteinMSD(AnalysisBase):
        Number of particles MSD was calculated over.


-    .. versionadded:: 2.0.0
+    .. versionadded:: 2.10.0


Suggested change

.. versionadded:: 2.10.0

.. versionadded:: 2.0.0

.. versionchanged:: 2.10.0

Added ability to calculate MSD from samples that are not linearly spaced with the

new `non_linear` keyword argument.

orbeckst · 2025-07-22T19:31:37Z

package/MDAnalysis/analysis/msd.py

+
+        msd_dict = collections.defaultdict(list) # Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]}
+
+        # Looping over all the frames as if the referenced gets shifted frame to frame


Add a TODO comment as reminder to look into optimizing performance of the code

orbeckst · 2025-07-22T19:38:25Z

testsuite/MDAnalysisTests/data/custom_non_linear/test_non_linear.dump

Can you bz2 compress this file? Should still work.

orbeckst · 2025-07-22T19:38:40Z

testsuite/MDAnalysisTests/datafiles.py

@@ -906,6 +907,7 @@

 SURFACE_PDB = (_data_ref / "surface.pdb.bz2").as_posix()
 SURFACE_TRR = (_data_ref / "surface.trr").as_posix()
+LAMMPSDUMP_non_linear = (_data_ref / "custom_non_linear/test_non_linear.dump").as_posix()


Move near other LAMMPS files and keep them together (also in the __all__ list)

after line ~785 or so

orbeckst · 2025-07-22T19:40:54Z

testsuite/MDAnalysisTests/datafiles.py

@@ -389,6 +389,7 @@
    "SURFACE_PDB",  # 111 FCC lattice topology for NSGrid bug #2345
    "SURFACE_TRR",  # full precision coordinates for NSGrid bug #2345
    "DSSP",  # DSSP test suite
+    "LAMMPSDUMP_non_linear"


list near other LAMMPS files

and add a short comment explaining what it is

orbeckst · 2025-07-22T19:43:51Z

package/CHANGELOG

+ * Added new method '_conclude_non_linear` to calculate time-averaged
+   mean square displacement in `package/MDAnalysis/analysis/msd.py`,
+   new bool keyword argument `non_linear=True` under `EinsteinMSD`
+   enables execution of this method.
+   (Issue #5028, PR #5066)


Focus on the user-visible changes and keep it short

Suggested change

* Added new method '_conclude_non_linear` to calculate time-averaged

mean square displacement in `package/MDAnalysis/analysis/msd.py`,

new bool keyword argument `non_linear=True` under `EinsteinMSD`

enables execution of this method.

(Issue #5028, PR #5066)

* Added capability to calculate MSD from frames with irregular (non-linear)

time spacing in analysis.msd.EinsteinMSD with keyword argument

`non_linear=True` (Issue #5028, PR #5066)

Also, traditionally we add the newest changes under the heading, i.e., move it to directly under "Enhancements"

orbeckst · 2025-07-22T19:58:43Z

testsuite/MDAnalysisTests/analysis/test_msd.py

+
+
+def test_msd_non_linear():
+    u = mda.Universe(LAMMPSDUMP_non_linear, format="LAMMPSDUMP")


Make the universe a fixture, e.g. u_nonlinear. See near top of the file as examples.

gitsirsha · 2025-07-22T20:18:26Z

* revert black formatting for any files that you didn't touch — _only reformat the files that you've been working on_, please.

Can you please let me know how to revert back the changes for the recent commit locally. When I ran black v24.10.0 on msd.py it said 1 file unchanged So, I did black ., which changed every .py file.

Thank you for all the suggestions will get to those shortly!

orbeckst · 2025-07-22T20:23:16Z

If you look at https://github.com/MDAnalysis/mdanalysis/pull/5066/files you see that 260(!) files were changed. I would look into reverting the commit that did that.

Alternatively, find the files where you definitely added code (msd.py. test_msd.py, CHANGELOG, AUTHORS, ... ?) and revert everything else. Look at git revert, git reset (be careful), and also git restore. I also saw "See "Reset, restore and revert" in git(1) for the differences between the three commands.".

This reverts commit c90f52e.

github-actions bot reviewed Jun 15, 2025

View reviewed changes

mrshirts reviewed Jun 18, 2025

View reviewed changes

IAlibay reviewed Jun 19, 2025

View reviewed changes

orbeckst requested changes Jun 26, 2025

View reviewed changes

orbeckst requested changes Jul 3, 2025

View reviewed changes

orbeckst requested changes Jul 15, 2025

View reviewed changes

orbeckst self-assigned this Jul 18, 2025

gitsirsha added 6 commits July 22, 2025 12:22

Inditial addition of code to calculate time-averaged msd for non-line…

60973ba

…ar dump

updated_documentation_v1_plots_removed

0dc3db1

Added_name_in_the_package/AUTHORS

f6f55a1

addressed_review_comments_mainly_removed_linearity_check

e363635

added results delta t values

5aa09ca

initial test added changed changelog and docs

871faa2

gitsirsha force-pushed the feature-Non-linear-time-averaged-msd branch from 1453bd6 to 871faa2 Compare July 22, 2025 16:32

added LAMMPSDUMP_non_linear to __all__

00fc309

ran black 24.10.0

c90f52e

orbeckst requested changes Jul 22, 2025

View reviewed changes

Revert "ran black 24.10.0"

7ad3e09

This reverts commit c90f52e.



		def _conclude_modified(self):
		from collections import defaultdict

	dump_times = [ts.time for ts in self._trajectory]
	dump_times = self.times


		msd_dict = collections.defaultdict(list) # Dictionary to collect MSDs: {Δt: [msd1, msd2, ...]}

		# Looping over all the frames as if the referenced gets shifted frame to frame

		self.results.timeseries = delta_t_values
		self.results.msds_by_particle = avg_msds

		@@ -298,10 +305,12 @@ class EinsteinMSD(AnalysisBase):
		Number of particles MSD was calculated over.


		.. versionadded:: 2.0.0

-    .. versionadded:: 2.10.0
+    .. versionadded:: 2.0.0
+    .. versionchanged:: 2.10.0
+       Added ability to calculate MSD from samples that are not linearly spaced with the
+       new `non_linear` keyword argument.



		def test_msd_non_linear():
		u = mda.Universe(LAMMPSDUMP_non_linear, format="LAMMPSDUMP")

Feature non linear time averaged msd #5066

Are you sure you want to change the base?

Feature non linear time averaged msd #5066

Conversation

gitsirsha commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Developers Certificate of Origin

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mrshirts left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gitsirsha commented Jun 30, 2025

Uh oh!

mrshirts commented Jun 30, 2025

Uh oh!

orbeckst left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gitsirsha commented Jun 15, 2025 •

edited

Loading

codecov bot commented Jun 15, 2025 •

edited

Loading

orbeckst left a comment •

edited

Loading

gitsirsha commented Jul 14, 2025 •

edited

Loading

gitsirsha commented Jul 15, 2025 •

edited

Loading

gitsirsha commented Jul 22, 2025 •

edited

Loading