New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite ovf reading routine #121
Conversation
Codecov Report
@@ Coverage Diff @@
## master #121 +/- ##
==========================================
- Coverage 95.51% 95.46% -0.05%
==========================================
Files 20 20
Lines 2027 1984 -43
==========================================
- Hits 1936 1894 -42
+ Misses 91 90 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful :) A few suggestions for you to consider. Please ignore if they do not make sense. Before merging, it would be cool if we could measure the execution time before and after using pandas/numpy ;)
Additional speedup ~ 8
…rmag/discretisedfield into rewrite-omf-reading-writing-routines
…rmag/discretisedfield into rewrite-omf-reading-writing-routines
Here is an example comparison for the execution time for 1e6 cells
|
Wow! That is a serious speed-up. We should now definitely move to bin8 as a default. |
[commit 9584e61 is not showing up] |
@marijanbeg Speedup for the rewritten writing method (1M cells):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing improvement and simplification.
It is embarrassing to see that the writing routine took 60+ seconds to write the file. :( |
Let's add a comment above that line that |
I am impressed by all the changes, please feel free to merge when you are ready ;) |
Excellent. I am pleased the innocent suggestion of testing pandas to read text-based data has led to such speed-ups, also for binary data, and also better code readability. I would suggest that we record the speed-ups you have measured in some kind of
|
More than a remark than review feedback: As we are drifting into performance behaviour of the source, we could introduce performance regression checks:
This would catch deviations where, by accident, we have reduced execution performance. However, this is only useful if the hardware on which the tests run doesn't change. It increases additional complexity as we would need to track those execution times. On balance, I would say it is not worth doing this here. |
@fangohr Changes for all packages are collected in the website repository, in particular the latest changes are in this PR: ubermag/ubermag.github.io#2 To reduce the number of files and keep it more clean I'd like to avoid introducing an additional file in |
Yes, please.
I agree that's a better solution. |
Changelog is now updated. |
Execution times for 1M cells:
Filesizes are
2.9M
forbin4
5.8M
forbin8
15M
fortxt