Seeking geofence st_contains query alternative #91

harryprince · 2018-07-10T08:16:13Z

using geofence polygon to judge one event is too complicated and slow in Hive by directly ESRI st_contains function, which means no index.

In that, I am seeking a better solution, for example, using H3 maxPolyfillSize function.

However, I can not find any example about using maxPolyfillSize function to be a st_contains alternative solution, need help.

By default, maxPolyfillSize will return multiple h3index resolutions and that means SQL need to join many times with resolution levels. Does any better solution exist?

The text was updated successfully, but these errors were encountered:

nrabinowitz · 2018-07-10T16:35:41Z

(As this is a question, not a feature request or bug report, please direct to StackOverflow in the future.)

I think there's some confusion here - you can use the polyfill function to return the hexagons within a polygon, but they are all the same resolution (supplied by the caller). You would have to call compact on the resulting hexagons in order to get hexagons in multiple resolutions.

I am not a Hive expert, but there are a few options for using H3 to spatially index data points:

The simplest is to use polyfill to fill a given polygon at a resolution that fits your desired precision, and then create a table using the hexagons as a reverse index with rows like h3index, polygon_id. Data points with lat/lon can then be mapped to a H3 index using geoToH3 (ideally at index time), and you can join this field with the polygon table to find the polygon id for a given data point. This is generally very fast, but the reverse index can get very large depending on the size and precision of the polygons you need to index.
A slower but more space-efficient option uses compact to index the polygon at multiple resolutions. Assigning a data point to a polygon would then require performing n joins or queries, where n is the number of different resolutions in the compacted set. This might be a better solution if you needed to cover a significant geographic area with high precision (making a standard reverse index very large) but generally only had to handle a single data point at a time (e.g. in a geocoding API).

Obviously if your polygons are stable, you'd be better off doing this at index time, storing the polygon id with the data point, rather than at query time.

dfellis · 2018-07-10T17:32:22Z

@nrabinowitz I don't think the second one would be that much slower. If you take your given lat, lng coordinates and compute the H3 index at all of the possible resolutions (say 6, 7, 8, 9 for your compacted set) and simply query for any of those 4 matching, it would just be four integer compares until it either succeeds or fails, and that would probably actually be faster than a single comparison across the entire uncompacted set.

If it was a normal non-Hive database, you could index the H3 integers with a hash index and it would literally be just 4 hash lookups and you'd get the answer in O(1) time instead of O(n), but that's the trade-off you make with the Hadoop ecosystem versus classic DBs (more space, but slower queries).

nrabinowitz · 2018-07-10T17:50:38Z

As noted, I don't actually know much about Hive :). This tradeoff was something we were considering with Cassandra queries at one point.

harryprince · 2018-07-18T01:48:48Z

@dfellis join multiple times seems too stupid on the hive SQL grammar side, however, so far, it seems the best solution.

nrabinowitz closed this as completed Jul 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking geofence st_contains query alternative #91

Seeking geofence st_contains query alternative #91

harryprince commented Jul 10, 2018 •

edited

Loading

nrabinowitz commented Jul 10, 2018

dfellis commented Jul 10, 2018

nrabinowitz commented Jul 10, 2018

harryprince commented Jul 18, 2018 •

edited

Loading

Seeking geofence st_contains query alternative #91

Seeking geofence st_contains query alternative #91

Comments

harryprince commented Jul 10, 2018 • edited Loading

nrabinowitz commented Jul 10, 2018

dfellis commented Jul 10, 2018

nrabinowitz commented Jul 10, 2018

harryprince commented Jul 18, 2018 • edited Loading

harryprince commented Jul 10, 2018 •

edited

Loading

harryprince commented Jul 18, 2018 •

edited

Loading