Rewrite ovf reading routine #121

lang-m · 2022-01-29T13:42:50Z

Execution times for 1M cells:

Reading

mode   old    new   speedup
==== ======= ====== =======
bin4 1730 ms  21 ms   82
bin8 1860 ms  36 ms   52
text 4920 ms 401 ms   12

Writing

mode    old    new   speedup
==== ======== ====== =======
bin4 63000 ms  56 ms   1125
bin8 64000 ms  84 ms    762
text 69000 ms 4510 ms    15

Filesizes are

2.9M for bin4
5.8M for bin8
15M for txt

codecov-commenter · 2022-01-29T13:47:40Z

Codecov Report

Merging #121 (d55d124) into master (201a10e) will decrease coverage by 0.04%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #121      +/-   ##
==========================================
- Coverage   95.51%   95.46%   -0.05%     
==========================================
  Files          20       20              
  Lines        2027     1984      -43     
==========================================
- Hits         1936     1894      -42     
+ Misses         91       90       -1

Impacted Files	Coverage Δ
discretisedfield/field.py	`97.59% <100.00%> (-0.01%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 201a10e...d55d124. Read the comment docs.

marijanbeg

Beautiful :) A few suggestions for you to consider. Please ignore if they do not make sense. Before merging, it would be cool if we could measure the execution time before and after using pandas/numpy ;)

discretisedfield/field.py

Additional speedup ~ 8

discretisedfield/field.py

…rmag/discretisedfield into rewrite-omf-reading-writing-routines

lang-m · 2022-01-29T17:19:12Z

Here is an example comparison for the execution time for 1e6 cells

mode   old    new   speedup
==== ======= ====== =======
bin4 1730 ms  21 ms   82
bin8 1860 ms  36 ms   52
text 4920 ms 401 ms   12

marijanbeg · 2022-01-29T17:21:18Z

Wow! That is a serious speed-up. We should now definitely move to bin8 as a default.

lang-m · 2022-01-29T17:21:49Z

[commit 9584e61 is not showing up]

lang-m · 2022-01-30T10:50:28Z

@marijanbeg Speedup for the rewritten writing method (1M cells):

1 million cells 

mode    old    new   speedup
==== ======== ====== =======
bin4 63000 ms  56 ms   1125
bin8 64000 ms  84 ms    762
text 69000 ms 4510 ms    15

marijanbeg

Amazing improvement and simplification.

discretisedfield/field.py

marijanbeg · 2022-01-30T11:37:45Z

It is embarrassing to see that the writing routine took 60+ seconds to write the file. :(

marijanbeg · 2022-01-30T13:04:07Z

Let's add a comment above that line that ndarray.tofile is about 20% slower to know in the future.

marijanbeg · 2022-01-30T14:51:59Z

I am impressed by all the changes, please feel free to merge when you are ready ;)

fangohr · 2022-01-31T09:33:45Z

Excellent. I am pleased the innocent suggestion of testing pandas to read text-based data has led to such speed-ups, also for binary data, and also better code readability.

I would suggest that we record the speed-ups you have measured in some kind of changelog.txt for this package:

it will be useful for us to be able to look up at which version the speed improvements come in
we should also mention the speed up in the next release (and then it will be good to be able to look up significant changes).

fangohr · 2022-01-31T09:35:52Z

More than a remark than review feedback:

As we are drifting into performance behaviour of the source, we could introduce performance regression checks:

have a set of tests that carry out reading and writing of data (as was done here)
measure performance and record the execution time as part of the CI

This would catch deviations where, by accident, we have reduced execution performance. However, this is only useful if the hardware on which the tests run doesn't change. It increases additional complexity as we would need to track those execution times. On balance, I would say it is not worth doing this here.

lang-m · 2022-01-31T09:53:41Z

@fangohr Changes for all packages are collected in the website repository, in particular the latest changes are in this PR: ubermag/ubermag.github.io#2
The speedup is mentioned here: https://github.com/ubermag/ubermag.github.io/blob/b1e0532c4e3e48a5e15ffae2db76ebc62b8e8b67/source/changelog.rst (-> discretisedfield -> bullet point 4)
I can add more details about the actual numbers in there (i.e. show a table containing the numbers).

To reduce the number of files and keep it more clean I'd like to avoid introducing an additional file in discretisedfield (that we need to remember to look into before the next release).

fangohr · 2022-01-31T10:14:39Z

@fangohr Changes for all packages are collected in the website repository, in particular the latest changes are in this PR: ubermag/ubermag.github.io#2 The speedup is mentioned here: https://github.com/ubermag/ubermag.github.io/blob/b1e0532c4e3e48a5e15ffae2db76ebc62b8e8b67/source/changelog.rst (-> discretisedfield -> bullet point 4) I can add more details about the actual numbers in there (i.e. show a table containing the numbers).

Yes, please.

To reduce the number of files and keep it more clean I'd like to avoid introducing an additional file in discretisedfield (that we need to remember to look into before the next release).

I agree that's a better solution.

lang-m · 2022-01-31T11:33:06Z

Changelog is now updated.

lang-m added 7 commits January 29, 2022 12:34

New reading method.

1964fd4

Fix wrong end.

cce8bf8

Read data split over multiple lines.

8998044

Fix: wrong reading of the binary data.

b171e4a

Remove old code.

7eac82e

More clear reading of ovf 1 or 2

35bf96a

Typecast not required.

17e3cce

lang-m changed the title ~~Rewrite omf reading routine~~ Rewrite ovf reading routine Jan 29, 2022

lang-m requested review from marijanbeg and fangohr January 29, 2022 13:47

marijanbeg reviewed Jan 29, 2022

View reviewed changes

lang-m added 2 commits January 29, 2022 15:45

Compute number of bytes.

6651249

Use numpy to read binary data.

07ceb5d

Additional speedup ~ 8

marijanbeg reviewed Jan 29, 2022

View reviewed changes

discretisedfield/field.py Show resolved Hide resolved

discretisedfield/field.py Outdated Show resolved Hide resolved

discretisedfield/field.py Outdated Show resolved Hide resolved

discretisedfield/field.py Show resolved Hide resolved

lang-m and others added 15 commits January 29, 2022 16:58

Defer type conversions.

188bc60

Use pandas to_numpy.

ac4cb2c

Remove print statement.

e5764d2

Conditional expressions

bfdff38

Simplify expression for major version number.

65e0192

Merge branch 'rewrite-omf-reading-writing-routines' of github.com:ube…

0f84ace

…rmag/discretisedfield into rewrite-omf-reading-writing-routines

Linelength.

8f2d525

Move to another if

ae9c447

Only one use of struct.unpack

00d6de9

Merge branch 'rewrite-omf-reading-writing-routines' of github.com:ube…

040b9b3

…rmag/discretisedfield into rewrite-omf-reading-writing-routines

Remove variable.

718522b

Rename variable.

27f0a74

More explicit tuple for transpose.

59697dd

Version newline independent.

b6f3b6d

Explicitly search for version number.

6e6e2bc

lang-m added 2 commits January 30, 2022 11:47

Ovf writing rewritten.

8abebcd

Reordering; remove old code.

ea5a53a

Allow arbitrary dimension.

442a151

marijanbeg reviewed Jan 30, 2022

View reviewed changes

discretisedfield/field.py Outdated Show resolved Hide resolved

discretisedfield/field.py Show resolved Hide resolved

discretisedfield/field.py Show resolved Hide resolved

discretisedfield/field.py Outdated Show resolved Hide resolved

lang-m added 3 commits January 30, 2022 12:56

Address @marijanbeg's comments

212f0da

Fix representation order.

01e978a

Line break.

d506cdc

marijanbeg and others added 6 commits January 30, 2022 13:05

An attempt to simplify conditions.

c1cb35a

Add comment.

329d3df

Simplify conditions.

054eb00

Additional test.

262de06

Fix test (missing representation)

6384080

Fix test.

5c2f06b

tuple -> list for better readability.

d55d124

fangohr closed this Jan 31, 2022

lang-m reopened this Jan 31, 2022

lang-m merged commit 7da908b into master Jan 31, 2022

lang-m deleted the rewrite-omf-reading-writing-routines branch January 31, 2022 11:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite ovf reading routine #121

Rewrite ovf reading routine #121

lang-m commented Jan 29, 2022 •

edited

codecov-commenter commented Jan 29, 2022 •

edited

marijanbeg left a comment

lang-m commented Jan 29, 2022

marijanbeg commented Jan 29, 2022

lang-m commented Jan 29, 2022

lang-m commented Jan 30, 2022

marijanbeg left a comment

marijanbeg commented Jan 30, 2022

marijanbeg commented Jan 30, 2022

marijanbeg commented Jan 30, 2022

fangohr commented Jan 31, 2022

fangohr commented Jan 31, 2022

lang-m commented Jan 31, 2022

fangohr commented Jan 31, 2022

lang-m commented Jan 31, 2022

Rewrite ovf reading routine #121

Rewrite ovf reading routine #121

Conversation

lang-m commented Jan 29, 2022 • edited

codecov-commenter commented Jan 29, 2022 • edited

Codecov Report

marijanbeg left a comment

Choose a reason for hiding this comment

lang-m commented Jan 29, 2022

marijanbeg commented Jan 29, 2022

lang-m commented Jan 29, 2022

lang-m commented Jan 30, 2022

marijanbeg left a comment

Choose a reason for hiding this comment

marijanbeg commented Jan 30, 2022

marijanbeg commented Jan 30, 2022

marijanbeg commented Jan 30, 2022

fangohr commented Jan 31, 2022

fangohr commented Jan 31, 2022

lang-m commented Jan 31, 2022

fangohr commented Jan 31, 2022

lang-m commented Jan 31, 2022

lang-m commented Jan 29, 2022 •

edited

codecov-commenter commented Jan 29, 2022 •

edited