Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact behavior on duplicate input #298

Closed
ajfriend opened this issue Jan 5, 2020 · 6 comments
Closed

compact behavior on duplicate input #298

ajfriend opened this issue Jan 5, 2020 · 6 comments
Assignees

Comments

@ajfriend
Copy link
Contributor

ajfriend commented Jan 5, 2020

I'm seeing inconsistent behavior when I input a collection of duplicated hexagons to the compact() function.

Using the cython branch of h3-py (with a few extra print debugging statements):

import numpy as np
import h3.api.memview_int as h3

def foo(n):
    h = h3.geo_to_h3(37, -122, 10)
    hexes = np.array([h] * n, dtype='uint64')

    return np.array(h3.compact(hexes))

When I run foo(8), I get

len(hu):  8
flag:  0
array([622204016658513919, 622204016658513919, 622204016658513919,
       622204016658513919, 622204016658513919, 622204016658513919,
       622204016658513919, 622204016658513919], dtype=uint64)

This is mostly OK, as some of the wrappers will dedupe the output with a set container. However, should the C compact function actually raise an error on any duplicate input?

When I run foo(9), I get

len(hu):  9
flag:  -2

and an H3ValueError, due to the -2 return flag.

It seems like the C code is detecting the duplicate hexagons in this case, but not the case where n <= 8.

  • Should we try to catch this duplicate entry error consistently?
  • Can we check for duplicate hexagons in all cases, or do we expect the user to input an array with no duplicates? If the latter, should we add that to the documentation of the function?
  • Alternatively, should the compact function just handle and ignore duplicated hexagons? (We can "compact" any duplicates into a single hexagon.)
@ajfriend
Copy link
Contributor Author

ajfriend commented Jan 5, 2020

Also, what about inputs to compact that contain hexagons of different resolutions. This seems to return without error, but perhaps we should detect this and raise an error?

For example, the code

h1 = h3.geo_to_h3(37, -122, 10)
h2 = h3.geo_to_h3(37, -122, 9)

hexes = np.array([h1,h2], dtype='uint64')

np.array(h3.compact(hexes))

returns

len(hu):  2
flag:  0
array([622204016658513919, 617700417031372799], dtype=uint64)

and the code

h = h3.geo_to_h3(37, -122, 10)

hexes = [h] + list(h3.h3_to_children(h))
hexes = np.array(hexes, dtype='uint64')

np.array(h3.compact(hexes))

returns

len(hu):  8
flag:  0
array([622204016658513919, 626707616285855743, 626707616285859839,
       626707616285863935, 626707616285868031, 626707616285872127,
       626707616285876223, 626707616285880319], dtype=uint64)

@ajfriend
Copy link
Contributor Author

ajfriend commented Jan 5, 2020

Now for uncompact!

If, for example, we try to uncompact the set containing a hex and its immediate children, we get duplicated outputs:

h = h3.geo_to_h3(37, -122, 10)

hexes = [h] + list(h3.h3_to_children(h))
hexes = np.array(hexes, dtype='uint64')

out = np.array(h3.uncompact(hexes, 11))

we get that:

>>> len(out)
14
>>> len(set(out))
7

Should we dedupe these outputs, try to detect this situation and raise an error, or simply inform the user that there will be weird output in this case?

@ajfriend
Copy link
Contributor Author

ajfriend commented Jan 5, 2020

Similar issues with duplicate hexagons in the input to uncompact:

h = h3.geo_to_h3(37, -122, 1)
hexes = np.array([h]*10, dtype='uint64')

out = np.array(h3.uncompact(hexes, 2))

has result

>>> len(out)
70
>>> len(set(out))
7

@isaacbrodsky
Copy link
Collaborator

Also, what about inputs to compact that contain hexagons of different resolutions.

I'd imagine this is acceptable because it makes compact idempotent.

Can we check for duplicate hexagons in all cases, or do we expect the user to input an array with no duplicates? If the latter, should we add that to the documentation of the function?

The assumption of compact is that the input is a deduplicated set. The function uses that property to allow it to more easily count how many children of a parent it has seen.

I would have expected the function to fail when 8 duplicates are present instead of 9. I think the reason for this is inconsistency in h3Index.c lines 313 and 343 (and then 354). The number of children starts at zero in 313 which is why it doesn't return an error, but this gets adjusted on line 343.

@ajfriend
Copy link
Contributor Author

ajfriend commented Jan 6, 2020

Also, what about inputs to compact that contain hexagons of different resolutions.

I'd imagine this is acceptable because it makes compact idempotent.

Having it be idempotent would be cool, but it doesn't currently work that way; see the second example in #298 (comment).

If we want it to be idempotent, then we need a stronger restriction on input than just deduplicated. We would need to require that the parent of any cell is also absent from the set. Either that, or we modify the function to handle duplicated hexes, and hexes that have their parents in the set.

@nrabinowitz
Copy link
Collaborator

This is a general issue, applying to e.g. h3SetToLinkedGeo as well. I think our standard approach has been to deal with bad input issues only when it's cheap to do so - e.g. we don't validate inputs as valid H3 indexes in most cases, because doing so would substantially impact perf.

This is something we ought to lock down consistently for v4. I think it's okay not to validate, as long as input requirements are clearly documented, but it's worth discussion and review of all methods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants