Skip to content

Join Rewrite#730

Merged
untzag merged 34 commits into
masterfrom
join
Sep 12, 2018
Merged

Join Rewrite#730
untzag merged 34 commits into
masterfrom
join

Conversation

@ksunden
Copy link
Copy Markdown
Member

@ksunden ksunden commented Aug 28, 2018

Left to do:

@untzag @p770193 @darienmorrow:
Please feel free to add to this list as you see fit.
Focus is on tests first, then implementation.

For now tests focus primarily on shape, but do test values in many places as well.

Also note, This WILL fail CI until implementation phase, that is expected. I do not intend at this point to mark xfail/skip unless there are test cases which we deem okay to fail when we look to merge.

@ksunden
Copy link
Copy Markdown
Member Author

ksunden commented Sep 4, 2018

All tests are currently passing locally!

Still want to add tolerance kwargs, need to update docs and such. But should be fully usable.

@ksunden
Copy link
Copy Markdown
Member Author

ksunden commented Sep 4, 2018

There is a problem that arises from the following test:

data = wt.Data()
data.create_variable("x", np.linspace(-5, 5, 10)[:, None])
data.create_variable("y", np.linspace(-5, 5, 10)[None, :])
data.create_variable("z", np.exp(-data.x[:] ** 2) * np.exp(-data.y[:] ** 2))
data.create_channel("zz", np.exp(-data.x[:] ** 2) * np.exp(-data.y[:] ** 2))
data.transform('y', 'x')
data.print_tree()
j = wt.data.join([data])
Traceback (most recent call last):
  File "join.py", line 16, in <module>
    j = wt.data.join([data])
  File "/home/kyle/venvs/source/WrightTools/WrightTools/data/_join.py", line 133, in join
    vals[wt_kit.valid_index(new_idx, new.shape)] = old[:]
ValueError: shape mismatch: value array of shape (10,10) could not be broadcast to indexing result of shape (1,10)

The problem stems from:

            for variable_name in vs.keys():
            p = data[variable_name][:][np.newaxis, ...]
            arr = out[variable_name][:][..., np.newaxis]
            i = np.argmin(np.abs(arr - p), axis=np.argmax(arr.shape))
            new_idx.append(i)

And the fact that the resultant p is not being subtracted from the new axis.
It is, however, crucial that the end result new_idx be broadcast, and not simply flat lists (see https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing)

My first idea was to add points, but that ended up being tricky, need to think it through a bit more.
I also tried (and committed) using boolean masks instead of advanced indexing (which worked in this test), but that proved to have even more challenges, and thus I reverted that commit.

This test case should be added, as the implementation should not falter even when a single data is given, and I believe this to be indicative of larger problems that are not caught by the existing tests.

Comment thread WrightTools/data/_data.py Outdated

def create_channel(self, name, values=None, shape=None, units=None, **kwargs) -> Channel:
def create_channel(
self, name, values=None, shape=None, units=None, *, dtype=None, **kwargs
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the * for dtype to be kwarg only.
I chose that location for strict backwards compatibility reasons.
That said, given that shape is ignored if values are given, I think it is nonsensical to have shape (or units, for that matter) be accessible only when an ignored (or default) value is given.

Therefore, I move that we place the * between values and shape, maintaining the ability to simply pass the values, but making the others keyword only.

Same applies to create_variable below

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

test fails in cce84bf but works in c5b448f
@ksunden ksunden changed the title [WIP] Join Rewrite Join Rewrite Sep 7, 2018
@@ -1,4 +1,4 @@
"""Interpolation tools."""
"""Least-square fitting tools."""
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to add this to this PR?

It's a small enough change that now that it's done, not worth reverting, but it is unrelated.

Copy link
Copy Markdown
Member

@untzag untzag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

big step forward

consider migrating useful functions to kit

Comment thread WrightTools/data/_data.py Outdated

def create_channel(self, name, values=None, shape=None, units=None, **kwargs) -> Channel:
def create_channel(
self, name, values=None, shape=None, units=None, *, dtype=None, **kwargs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree

Comment thread WrightTools/data/_join.py Outdated
if isinstance(datas, Collection):
datas = datas.values()
datas = list(datas)
if not isinstance(atol, collections.Iterable):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider hasattr(atol, '__iter__')

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.python.org/3/library/collections.abc.html#collections.abc.Iterable

This check subsumes the hasattr check, though this does raise the point that the Iterable class has moved to collections.abc rather than just collections (though it is still available there in current python, I think it may be slated for removal in 3.8)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other option is to try/except iter(atol)

Comment thread WrightTools/data/_join.py
slice_ = []
for variable_name in vs.keys():
p = data[variable_name][:][np.newaxis, ...]
# p is at most 1-D by precondition to join
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this tested for, and what exception is raised otherwise?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values = np.concatenate([d[n][:].flat for d in datas])

It is actually enforced by the structure of the from_dict It will not raise an error, but it will flatten the array (and combine with atol/rtol)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you CAN hand join multidimensional axes, but you will get 1D out

@ksunden
Copy link
Copy Markdown
Member Author

ksunden commented Sep 10, 2018

Please advise if there are particular functions that should be exported to kit, the internal functions I define are hyper specific, though there is probably some functionality that is useful elsewhere (default atol/rtol?, the check for floating point type?)

At this point, that kind of reorganizing I think should be a separate PR, or as part of the implementation of a thing that also uses the same code.

@untzag
Copy link
Copy Markdown
Member

untzag commented Sep 11, 2018

@ksunden are we ready to merge?

@ksunden
Copy link
Copy Markdown
Member Author

ksunden commented Sep 12, 2018

If you are satisfied with the changes I made since your last review

@untzag untzag merged commit 2ed25ad into master Sep 12, 2018
@untzag untzag deleted the join branch September 12, 2018 16:09
@ksunden ksunden mentioned this pull request Sep 12, 2018
@ksunden ksunden added this to the 3.1.3 milestone Sep 13, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

META join joining 3D data objects throws a shape mismatch error. Join fill value Recover join overlap methods

3 participants