New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shapes are invalid when importing them into Elasticsearch #975

Open
peterneubauer opened this Issue Oct 9, 2017 · 27 comments

Comments

Projects
None yet
4 participants
@peterneubauer

peterneubauer commented Oct 9, 2017

Hi,
we are using the WOF shapes for Mapillary top lists. Works great, however, some shapes are not valid geodata according to the Elasticsearch parser when trying to index them in our ELS 2.4. Examples:

# invalid holes in shapes countries
remove_holes_shapes = [
    '85632475',
    '85632505',
    '85632469',
    '85632761',
    '85632793',
    '85632449',
    '85632773',
    '85632475',
    '85632505',
    '85632469',
    '85632761',
    '85632793',
    '85632449',
    '85632773',
]

Duplicate oordinate in countries:

duplicate_coords = {
    '85632693': {
        'error': 'invalid_shape_exception: Provided shape has duplicate consecutive coordinates at: (-12.055823, 25.99583, NaN)'}
}


Is there any chance we can submit you feedback for the shapes and work on making them valid?

@stepps00 stepps00 self-assigned this Oct 9, 2017

@stepps00

This comment has been minimized.

Contributor

stepps00 commented Oct 9, 2017

Hi @peterneubauer - thanks for filing the issue. A list of affected wof:id values is great and any additional information you have about the specific records is helpful, too.

We've seen this issue pop up before.. I suspect this issue touches more than the list of records you created above. I will look into this issue and propose a fix through a PR.

@stepps00 stepps00 added the geometry label Oct 9, 2017

@stepps00

This comment has been minimized.

Contributor

stepps00 commented Oct 10, 2017

@peterneubauer - are you able to provide the error that you receive when indexing these records in ELS 2.4? I'd like to update these records' self-intersecting interior rings so they can be handled in Elasticsearch.

@peterneubauer

This comment has been minimized.

peterneubauer commented Oct 16, 2017

Let me update the repo, and reimport everything to see if I still get errors - will report back!

@peterneubauer

This comment has been minimized.

peterneubauer commented Oct 31, 2017

So, I'm constructing a test that tries to import all shapes that are relevant to use and logs errors withohut fixing them. One simple start is here:

https://raw.githubusercontent.com/whosonfirst-data/whosonfirst-data/master/data/856/822/11/85682211.geojson,	Points of LinearRing do not form a closed linestring
https://raw.githubusercontent.com/whosonfirst-data/whosonfirst-data/master/data/856/766/75/85676675.geojson,	Points of LinearRing do not form a closed linestring
https://raw.githubusercontent.com/whosonfirst-data/whosonfirst-data/master/data/856/819/55/85681955.geojson,	Points of LinearRing do not form a closed linestring
https://raw.githubusercontent.com/whosonfirst-data/whosonfirst-data/master/data/856/819/07/85681907.geojson,	Points of LinearRing do not form a closed linestring
@nvkelso

This comment has been minimized.

Contributor

nvkelso commented Oct 31, 2017

@thisisaaronland

This comment has been minimized.

Contributor

thisisaaronland commented Oct 31, 2017

I am pretty sure that at the end of the day it's the Java Topology Suite. ES uses:

https://github.com/locationtech/spatial4j

Which in turn uses:

https://sourceforge.net/projects/jts-topo-suite/

Unfortunately I am not sufficiently familiar with Java to just whip up a simple-dumb "validate this thing" tool which would be a nice thing to have, in general...

@nvkelso

This comment has been minimized.

Contributor

nvkelso commented Oct 31, 2017

It all comes down to JTS in the end, so that's good at least to have one point of feature improvement. We'll see if anyone contacts me after my Twitter missive ;)

@thisisaaronland

This comment has been minimized.

Contributor

thisisaaronland commented Oct 31, 2017

If web browsers can render broken markup...

@peterneubauer

This comment has been minimized.

peterneubauer commented Nov 1, 2017

Ok,
I did a little test project that recreated the problems, see https://github.com/peterneubauer/wof-testing, a sample output is this:

/usr/lib/jvm/java-8-openjdk-amd64/bin/java -ea -Didea.test.cyclic.buffer.size=1048576 -javaagent:/opt/idea-IU-171.3780.107/lib/idea_rt.jar=34685:/opt/idea-IU-171.3780.107/bin -Dfile.encoding=UTF-8 -classpath /opt/idea-IU-171.3780.107/lib/idea_rt.jar:/opt/idea-IU-171.3780.107/plugins/junit/lib/junit-rt.jar:/opt/idea-IU-171.3780.107/plugins/junit/lib/junit5-rt.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/cldrdata.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/icedtea-sound.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/jaccess.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/localedata.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/nashorn.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunec.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/management-agent.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-8-openjdk-amd64/jre/lib/rt.jar:/home/peter/src/wof-testing/out/test/classes:/home/peter/src/wof-testing/out/test/resources:/home/peter/.gradle/caches/modules-2/files-2.1/org.locationtech.spatial4j/spatial4j/0.6/21b15310bddcfd8c72611c180f20cf23279809a3/spatial4j-0.6.jar:/home/peter/.gradle/caches/modules-2/files-2.1/org.noggit/noggit/0.8/ba4ad65a62d7dfcf97a8d42c82ae7d8824f9087f/noggit-0.8.jar:/home/peter/.gradle/caches/modules-2/files-2.1/com.vividsolutions/jts/1.13/3ccfb9b60f04d71add996a666ceb8902904fd805/jts-1.13.jar:/home/peter/.gradle/caches/modules-2/files-2.1/junit/junit/4.12/2973d150c0dc1fefe998f834810d68f278ea58ec/junit-4.12.jar:/home/peter/.gradle/caches/modules-2/files-2.1/com.google.code.gson/gson/2.8.2/3edcfe49d2c6053a70a2a47e4e1c2f94998a49cf/gson-2.8.2.jar:/home/peter/.gradle/caches/modules-2/files-2.1/org.hamcrest/hamcrest-core/1.3/42a25dc3219429f0e5d060061f71acb49bf010a0/hamcrest-core-1.3.jar com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 -junit4 WofJTsTest,testJtsValidity
0, 85673433, Vientiane (prefecture)
1, 85673487, Beirut
2, 85673413, Louang Namtha
3, 85673427, Saravan
4, 85673495, An Nabatiyah
5, 85673483, North Lebanon
6, 85673473, Xékong
7, 85673409, Bokeo
8, 85673451, Luangphrabang
9, 85673453, Oudômxai
10, 85673447, Houaphan
11, 85673469, Attapeu
12, 85673499, South Lebanon
13, 85673441, Xiangkhoang
14, 85673479, Beqaa
15, 85673405, Hawalli
16, 85673491, Mount Lebanon
17, 85673417, Xaignabouri
18, 85673467, Khammouan
19, 85673423, Champasak
20, 85673437, Vientiane
21, 85673463, Borikhamxai
22, 85673429, Savannakhét
23, 85673459, Phôngsali
24, 85682233, De Los Lagos
25, 85682249, Glarus
26, 85682211, DE MAGALLANES y ANTARTICA CHILENA

org.locationtech.spatial4j.exception.InvalidShapeException: Ring Self-intersection at or near point (-72.09424987140444, -54.41320129356904, NaN)

	at org.locationtech.spatial4j.shape.jts.JtsGeometry.validate(JtsGeometry.java:127)
	at org.locationtech.spatial4j.shape.jts.JtsGeometry.assertValidate(JtsGeometry.java:113)
	at org.locationtech.spatial4j.shape.jts.JtsGeometry.<init>(JtsGeometry.java:84)
	at org.locationtech.spatial4j.shape.jts.JtsShapeFactory.makeShape(JtsShapeFactory.java:523)
	at org.locationtech.spatial4j.shape.jts.JtsShapeFactory.makeShape(JtsShapeFactory.java:536)
	at org.locationtech.spatial4j.shape.jts.JtsShapeFactory$JtsMultiPolygonBuilder.build(JtsShapeFactory.java:407)
	at org.locationtech.spatial4j.io.GeoJSONReader.readMultiPolygon(GeoJSONReader.java:392)
	at org.locationtech.spatial4j.io.GeoJSONReader.readShapeFromCoordinates(GeoJSONReader.java:318)
	at org.locationtech.spatial4j.io.GeoJSONReader.readShape(GeoJSONReader.java:248)
	at org.locationtech.spatial4j.io.GeoJSONReader.read(GeoJSONReader.java:48)
	at org.locationtech.spatial4j.io.GeoJSONReader.read(GeoJSONReader.java:54)
	at WofJTsTest.testJtsValidity(WofJTsTest.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)


Process finished with exit code 255
@thisisaaronland

This comment has been minimized.

Contributor

thisisaaronland commented Nov 1, 2017

@peterneubauer – This is great, thanks!

If we created a proper java-whosonfirst-jts repo and made you a contributor would you be willing to move your code over there?

This would be a super useful tool and as time and circumstances permit I'd like to make it a proper library that can be invoked from the command line and/or things like a dropwizard tool.

(I know enough Java that I can probably start poking at your work; it's the getting started part that still leaves me stumbling around in the dark :-)

Longer term it would be nice to replicate the input/indexing functionality of the go-whosonfirst-index package [1] but for now just being able to validate individual records and/or meta files would be super useful.

[1] https://github.com/whosonfirst/go-whosonfirst-index/blob/master/index.go#L110-L215

@peterneubauer

This comment has been minimized.

peterneubauer commented Nov 1, 2017

yes, no problem. Just make me a contributor, and I can move over the code. I'm using a static extraction b place_type to pull the files so we don't have to clone this repo as part of the test, will see if I can port that, too, it's in python right now.

@peterneubauer

This comment has been minimized.

peterneubauer commented Nov 2, 2017

Code imported into https://github.com/whosonfirst/java-whosonfirst-jts, will now report errors in the output, feel free to reformat. Works?

@nvkelso

This comment has been minimized.

Contributor

nvkelso commented Nov 2, 2017

@peterneubauer will it also fix the errors or will a separate tool be needed for that?

@thisisaaronland

This comment has been minimized.

Contributor

thisisaaronland commented Nov 2, 2017

@nvkelso it does not fix anything, it's simply the work to sanity check and report bunk geometry now wearing a warm and comfy WOF sweater.

@peterneubauer

This comment has been minimized.

peterneubauer commented Nov 2, 2017

Yes, right now we are fixing geometries with PSQL ST_Simplify and Shapely (python) and actual removal of offending holes and duplicate coordinates, but it's of course MUCH better to test and find these offending geometries and fix them at the source - here :)

Just added the generating script in https://github.com/whosonfirst/java-whosonfirst-jts/blob/master/src/main/python/generate_wof_url_lists.py (I'm not a python expert, so forgive the ugly code) so you can generate the lists yourself.

@stepps00

This comment has been minimized.

Contributor

stepps00 commented Nov 2, 2017

@peterneubauer - thanks for the update! If possible, we'd like to hear how you're dealing with/fixing these invalid geometries on your end. If you have sample code or a repo, feel free to link :)

@peterneubauer

This comment has been minimized.

peterneubauer commented Nov 3, 2017

Added some more code to demo simplification using shapely in SimplifyUtils.py (and PostGIS, but not used right now)

@peterneubauer

This comment has been minimized.

peterneubauer commented Nov 4, 2017

There is a good blog post on postgis and performance here https://carto.com/blog/inside/postgis-performance/, in the simplify utility python class you can see that, if you have postgis installed and put in the environment variables, you can use their engine to validate and simplify things. This IA however nit working in all cases but might be a good replacement for our manual removal if duplicate coordinates for instance.

@peterneubauer

This comment has been minimized.

peterneubauer commented Nov 6, 2017

@stepps00 let me know how we can proceed here, would be great to get the shapes into shape :)

@stepps00

This comment has been minimized.

Contributor

stepps00 commented Nov 6, 2017

Hey @peterneubauer - thanks for the updates. Let me poke around today, I'll respond in this issue with ideas to proceed.

@peterneubauer

This comment has been minimized.

peterneubauer commented Nov 13, 2017

Good, let me know if you need help getting started, I would like us to make progress towards valid shapes rather sooner than later.

@peterneubauer

This comment has been minimized.

peterneubauer commented Nov 22, 2017

I updated the first comment with the manual fixes on country level that we are doing now.

@nvkelso

This comment has been minimized.

Contributor

nvkelso commented Aug 23, 2018

Related: #1071

@stepps00

This comment has been minimized.

Contributor

stepps00 commented Aug 28, 2018

Noting here that #1293 was merged recently - this took care of any invalid geometry found when crawling the whosonfirst-data repo. Invalid geometries were caused by the following errors:

  • Self-intersection
  • Ring Self-intersection
  • Hole lies outside shell
  • Too few points in geometry component
  • Nested shells
  • Duplicate Rings

From what I can tell, though, ElasticSearch can sometimes be overly-strict and throw errors at valid GeoJSON (ES issues: 1, 2). Using the .IsValid() function in osgeo.ogr, each of the records listed above have been confirmed as valid - the issue here is coming from ElasticSearch, unless I'm missing something.

We also have an open issue to wire in additional validity checks to our export tool.

@thisisaaronland

This comment has been minimized.

Contributor

thisisaaronland commented Aug 29, 2018

Have you checked the ordering of the rings? The issue is not so much Elasticsearch as it is JTS, which ES uses under the hood.

@stepps00

This comment has been minimized.

Contributor

stepps00 commented Aug 29, 2018

I haven't checked this in the recent validation work, but I'm guessing the right-hand rule for outer/inner rings is causing Elasticsearch to throw these errors.

The GeoJSON spec says that GeoJSON should follow this rule, but also suggests:

For backwards compatibility, parsers SHOULD NOT reject polygons that do not follow the right-hand rule.

I think the next step would be to do a second pass at correcting geometries that don't follow the right-hand rule.

https://tools.ietf.org/html/rfc7946

@thisisaaronland

This comment has been minimized.

Contributor

thisisaaronland commented Aug 29, 2018

JTS predates GeoJSON by a while and anyway the former doesn't know about the latter, per se, only an internal representation of a geometry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment