Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite ovf reading routine #121

Merged
merged 42 commits into from Jan 31, 2022
Merged

Conversation

lang-m
Copy link
Member

@lang-m lang-m commented Jan 29, 2022

Execution times for 1M cells:

  • Reading
mode   old    new   speedup
==== ======= ====== =======
bin4 1730 ms  21 ms   82
bin8 1860 ms  36 ms   52
text 4920 ms 401 ms   12
  • Writing
mode    old    new   speedup
==== ======== ====== =======
bin4 63000 ms  56 ms   1125
bin8 64000 ms  84 ms    762
text 69000 ms 4510 ms    15

Filesizes are

  • 2.9M for bin4
  • 5.8M for bin8
  • 15M for txt

@lang-m lang-m changed the title Rewrite omf reading routine Rewrite ovf reading routine Jan 29, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jan 29, 2022

Codecov Report

Merging #121 (d55d124) into master (201a10e) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #121      +/-   ##
==========================================
- Coverage   95.51%   95.46%   -0.05%     
==========================================
  Files          20       20              
  Lines        2027     1984      -43     
==========================================
- Hits         1936     1894      -42     
+ Misses         91       90       -1     
Impacted Files Coverage Δ
discretisedfield/field.py 97.59% <100.00%> (-0.01%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 201a10e...d55d124. Read the comment docs.

Copy link
Member

@marijanbeg marijanbeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful :) A few suggestions for you to consider. Please ignore if they do not make sense. Before merging, it would be cool if we could measure the execution time before and after using pandas/numpy ;)

discretisedfield/field.py Outdated Show resolved Hide resolved
discretisedfield/field.py Outdated Show resolved Hide resolved
discretisedfield/field.py Show resolved Hide resolved
discretisedfield/field.py Outdated Show resolved Hide resolved
discretisedfield/field.py Outdated Show resolved Hide resolved
discretisedfield/field.py Outdated Show resolved Hide resolved
discretisedfield/field.py Outdated Show resolved Hide resolved
discretisedfield/field.py Outdated Show resolved Hide resolved
discretisedfield/field.py Show resolved Hide resolved
discretisedfield/field.py Outdated Show resolved Hide resolved
discretisedfield/field.py Outdated Show resolved Hide resolved
discretisedfield/field.py Show resolved Hide resolved
@lang-m
Copy link
Member Author

lang-m commented Jan 29, 2022

Here is an example comparison for the execution time for 1e6 cells

mode   old    new   speedup
==== ======= ====== =======
bin4 1730 ms  21 ms   82
bin8 1860 ms  36 ms   52
text 4920 ms 401 ms   12

@marijanbeg
Copy link
Member

Wow! That is a serious speed-up. We should now definitely move to bin8 as a default.

@lang-m
Copy link
Member Author

lang-m commented Jan 29, 2022

[commit 9584e61 is not showing up]

@lang-m
Copy link
Member Author

lang-m commented Jan 30, 2022

@marijanbeg Speedup for the rewritten writing method (1M cells):

1 million cells 

mode    old    new   speedup
==== ======== ====== =======
bin4 63000 ms  56 ms   1125
bin8 64000 ms  84 ms    762
text 69000 ms 4510 ms    15

Copy link
Member

@marijanbeg marijanbeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing improvement and simplification.

discretisedfield/field.py Outdated Show resolved Hide resolved
discretisedfield/field.py Show resolved Hide resolved
discretisedfield/field.py Show resolved Hide resolved
discretisedfield/field.py Outdated Show resolved Hide resolved
@marijanbeg
Copy link
Member

It is embarrassing to see that the writing routine took 60+ seconds to write the file. :(

@marijanbeg
Copy link
Member

Let's add a comment above that line that ndarray.tofile is about 20% slower to know in the future.

@marijanbeg
Copy link
Member

I am impressed by all the changes, please feel free to merge when you are ready ;)

@fangohr
Copy link
Member

fangohr commented Jan 31, 2022

Excellent. I am pleased the innocent suggestion of testing pandas to read text-based data has led to such speed-ups, also for binary data, and also better code readability.

I would suggest that we record the speed-ups you have measured in some kind of changelog.txt for this package:

  • it will be useful for us to be able to look up at which version the speed improvements come in
  • we should also mention the speed up in the next release (and then it will be good to be able to look up significant changes).

@fangohr
Copy link
Member

fangohr commented Jan 31, 2022

More than a remark than review feedback:

As we are drifting into performance behaviour of the source, we could introduce performance regression checks:

  • have a set of tests that carry out reading and writing of data (as was done here)
  • measure performance and record the execution time as part of the CI

This would catch deviations where, by accident, we have reduced execution performance. However, this is only useful if the hardware on which the tests run doesn't change. It increases additional complexity as we would need to track those execution times. On balance, I would say it is not worth doing this here.

@lang-m
Copy link
Member Author

lang-m commented Jan 31, 2022

@fangohr Changes for all packages are collected in the website repository, in particular the latest changes are in this PR: ubermag/ubermag.github.io#2
The speedup is mentioned here: https://github.com/ubermag/ubermag.github.io/blob/b1e0532c4e3e48a5e15ffae2db76ebc62b8e8b67/source/changelog.rst (-> discretisedfield -> bullet point 4)
I can add more details about the actual numbers in there (i.e. show a table containing the numbers).

To reduce the number of files and keep it more clean I'd like to avoid introducing an additional file in discretisedfield (that we need to remember to look into before the next release).

@fangohr
Copy link
Member

fangohr commented Jan 31, 2022

@fangohr Changes for all packages are collected in the website repository, in particular the latest changes are in this PR: ubermag/ubermag.github.io#2 The speedup is mentioned here: https://github.com/ubermag/ubermag.github.io/blob/b1e0532c4e3e48a5e15ffae2db76ebc62b8e8b67/source/changelog.rst (-> discretisedfield -> bullet point 4) I can add more details about the actual numbers in there (i.e. show a table containing the numbers).

Yes, please.

To reduce the number of files and keep it more clean I'd like to avoid introducing an additional file in discretisedfield (that we need to remember to look into before the next release).

I agree that's a better solution.

@fangohr fangohr closed this Jan 31, 2022
@lang-m lang-m reopened this Jan 31, 2022
@lang-m
Copy link
Member Author

lang-m commented Jan 31, 2022

Changelog is now updated.

@lang-m lang-m merged commit 7da908b into master Jan 31, 2022
@lang-m lang-m deleted the rewrite-omf-reading-writing-routines branch January 31, 2022 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants