Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend capabilities of read_raw_data #84

Merged
merged 16 commits into from
Jul 16, 2018

Conversation

raphaeldussin
Copy link
Contributor

  • possibility to read part of the file, with offset and partial_read
  • choice of row/column major order

This will allow the refactor of read_mds by putting the np.fromfile and np.memmap
calls into one generic function

* possibility to read part of the file, with offset and partial_read
* choice of row/column major order
@codecov-io
Copy link

codecov-io commented Jun 22, 2018

Codecov Report

Merging #84 into master will increase coverage by 0.11%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #84      +/-   ##
==========================================
+ Coverage   91.65%   91.77%   +0.11%     
==========================================
  Files           4        4              
  Lines         635      644       +9     
  Branches      140      143       +3     
==========================================
+ Hits          582      591       +9     
  Misses         33       33              
  Partials       20       20
Impacted Files Coverage Δ
xmitgcm/utils.py 91.47% <100%> (+0.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 14c7338...91c1a70. Read the comment docs.

@rabernat
Copy link
Member

This looks like a nice simple extension of the read_raw_data function. I see how it will be useful in the future refactoring. Thanks!

The new options need a test, however. The current test of read_raw_data is here:
https://github.com/xgcm/xmitgcm/blob/master/xmitgcm/test/test_mds_store.py#L202-L221

You can either extend that test or add a new test function that covers the new options.

@raphaeldussin
Copy link
Contributor Author

working on it. I have found some case where the error message is not very informative.
will catch those exception and set proper error message then resubmit PR with testing

raphaeldussin and others added 2 commits June 22, 2018 16:36
* function will warn user if trying to pass inconsistent args
* function checks byte offset < file size
'(expected %g, found %g)' %
(datafile,
expected_number_of_bytes,
actual_number_of_bytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

raise IOError('File `%s` does not have the correct size '
'(expected %g, found %g)' %
(datafile,
expected_number_of_bytes,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (80 > 79 characters)

assert isinstance(mdata, np.memmap)

# test it breaks when it should
with pytest.raises(IOError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (82 > 79 characters)


# a meta test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

F841 local variable '_' is assigned to but never used


# a meta test


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (93 > 79 characters)

offset=offset, partial_read=True, use_mmap=True)
assert isinstance(mdata, np.memmap)

# test it breaks when it should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (82 > 79 characters)


# a meta test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E501 line too long (93 > 79 characters)

@raphaeldussin
Copy link
Contributor Author

yeah finally!

Copy link
Member

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks Raphael!

There are a few minor changes you could make, but I'm generally happy to merge as is.


# a meta test


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe remove these blank lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

shape[0]*shape[1]*shape[2]*dtype.itemsize), partial_read=True)
_ = read_raw_data(fname, dtype, shape, offset=(
shape[0]*shape[1]*shape[2]*dtype.itemsize), partial_read=True,
use_mmap=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! 👍

xmitgcm/utils.py Outdated
d.shape = shape
return d
pass
assert(offset < actual_number_of_bytes), 'offset greater than filesize'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we raise an error here instead of assert?

assert should be used only for internal consistency checks. Is that what this is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what I did originally but I couldn't get codecov to pass.
I guess when I put a condition that is not realized much then that ruins the coverage of the statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to resolve this is to put it back as an exception but add a test function that the error is raised.

Just do what you think is best and then merge.

# test optional functionalities
shape = (5, 15, 10)
shape_subset = (15, 10)
for dtype in [np.dtype('f8'), np.dtype('f4'), np.dtype('i4')]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, one comment about pytest:
rather than doing a for loop within the test function, you could parameterize this dtype. Before the function, you could add

    @pytest.mark.parametrize("dtype", [np.dtype('f8'), np.dtype('f4'), np.dtype('i4')])
    def test_read_raw_data(tmpdir, dtype):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! I didn't know you could do that.

Copy link
Member

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just saw one small thing that you could do to improve the test function using parameterization.

* remove blank lines
* iteration on dtype cleaner
xmitgcm/utils.py Outdated
d.shape = shape
return d
pass
assert(offset < actual_number_of_bytes), 'offset greater than filesize'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to resolve this is to put it back as an exception but add a test function that the error is raised.

Just do what you think is best and then merge.

_ = read_raw_data(fname, dtype, shape, offset=(
shape[0]*shape[1]*shape[2]*dtype.itemsize), partial_read=True)
_ = read_raw_data(fname, dtype, shape, offset=(
shape[0]*shape[1]*shape[2]*dtype.itemsize), partial_read=True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E999 IndentationError: unindent does not match any outer indentation level

@rabernat
Copy link
Member

rabernat commented Jul 16, 2018

I think this is ready to merge, despite the codecov complaints.

Please do not make any new pull requests other than those related to the release. We need to make a release to mark the current state of xmitgcm, before making major changes. That is the whole point of having versions.

pass
else:
raise ValueError('bytes offset %g is greater than file size %g' %
(offset, actual_number_of_bytes))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the codecov report, it looks like this is the error that is not getting tested:
https://codecov.io/gh/xgcm/xmitgcm/pull/84/src/xmitgcm/utils.py?before=xmitgcm/utils.py#L247

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! Now I understand why it doesn't work.

You need a separate with block for each exception you want to catch. The block exits as soon as it finds the exception. So only the first of the four read_raw_data calls in your with pytest.raises(ValueError): block is actually getting run.

Sorry I didn't catch that in my review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! Ok that make sense now!

@rabernat
Copy link
Member

Great!

@rabernat rabernat merged commit 6b24a3c into MITgcm:master Jul 16, 2018
@raphaeldussin raphaeldussin deleted the dev_read_mds branch July 17, 2018 19:35
fraserwg pushed a commit to fraserwg/xmitgcm that referenced this pull request Nov 23, 2021
* extend capabilities of read_raw_data

* possibility to read part of the file, with offset and partial_read
* choice of row/column major order

* testing + better error handling

* function will warn user if trying to pass inconsistent args
* function checks byte offset < file size

* Fixing style errors.

* get it shorter

* completing tests for codecov

* Fixing style errors.

* fix line length

* fix line length

* try to improve coverage

* finalize request

* remove blank lines
* iteration on dtype cleaner

* replace assert with raise error

* try to fool code cov

* fix indentation

* codecov didn't bite the bait

* fix error in testing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants