Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"segmentation fault (core dumped)" when repairing a 57mb shapefile #25

Open
lucasmation opened this issue Aug 28, 2015 · 12 comments
Open
Assignees

Comments

@lucasmation
Copy link

First, thanks for this amazing program! We were pulling our hair out trying to solve the topological probems in PostGis.

I'm trying to correct the shape files of the enumeration districts of the Brazilian 2010 Census, about 318k polygons in total, which has lots of topological problems, overlays, gaps, etc. The data is divided by state shapefiles and available from: ftp://geoftp.ibge.gov.br/malhas_digitais/censo_2010/setores_censitarios/

pprepair works great in 19 of the 27 states. However for the remaining 8 states, which have the largest shapefiles, ppreair issues the following error and stops :

"segmentation fault (core dumped)"

This error ocurr after much of the fixes have been completed, at the last step, just when the shapefiles are about to be saved.

For instance, pprepair crashes with the Bahia(BA) data, which has a 57mb shapefile and 24k polygons. The shapefile is available from:

ftp://geoftp.ibge.gov.br/malhas_digitais/censo_2010/setores_censitarios/ba/ba_setores_censitarios.zip

pprepair keeps using more and more RAM. At the time it crasses, in the Bahia case, ppreair is using 3gb of RAM (which is still bellow the available 16gb of ram)

Additionally, a different error ocurrs for the state o Pará (PA) (ftp://geoftp.ibge.gov.br/malhas_digitais/censo_2010/setores_censitarios/pa/pa_setores_censitarios.zip). pprepair seems to never end the computation. Keeps runing indefinetivelly. This is weird as the PA file is only 17mb, not particularly large relative to other states.

We are using:

Ununtu 12.04 LTS, 64bits
RAM: 16GB

Library versions:
"gdal 1.9.0, release 2011/12/29"
"cmake version 2.8.7"
cgal: 3.9-1bulild1

@hugoledoux
Copy link
Member

Thanks for reporting these, and glad you enjoy pprepair (we developed it because we too were pulling our hairs with other software).

The 57MB file works fine for me here under Mac. I have no idea what goes wrong for you, can you send me the output of the console, where does it hang?

One thing I could imagine is that you're using a very old CGAL (3.9), and I'm using using 4.6. Should make a difference (don,t think the triangulation package has been updated, but first try this). My GDAL is 1.11, but again that shouldn't make a difference.

For the second dataset, indeed it hangs too here. I don't have time to dig into it now, but if I use the new version of pprepair (branch "new") it works fine. It seems to be not linked to the precision of coordinates (usually the case when there are errors) but to the handling of attributes. The new branch does that better. Watch out, slightly different CLI:

$ ./pprepair -i /Users/hugo/temp/lucasmation/pa_setores_censitarios/15SEE250GC_SIR.shp -r fix -o ~/temp/lucasmation/

Also, this version does not repair automatically single polygons, so one needs to use prepair first if there are invalid polygons. Checkout the branch 'improvements-with-ogr' and then:

$ ./prepair --ogr /Users/hugo/temp/lucasmation/ba_setores_censitarios/29SEE250GC_SIR.shp --shpOut /Users/hugo/temp/lucasmation/

@lucasmation
Copy link
Author

Tks hugo. I'll try updating and get back to you. Meanwhile, this is the output of the console

pprepair -i 29SEE250GC_SIR.shp -o 29SEE250GC_SIR2.shp -fix
Adding a new set of polygons to the triangulation...
    Path: 29SEE250GC_SIR.shp
    Type: ESRI Shapefile
    Layers: 1
    Reading layer #1 (24139 polygons)...
>       double      ID
        string      CD_GEOCODI
        string      TIPO
        string      CD_GEOCODB
        string      NM_BAIRRO
        string      CD_GEOCODS
        string      NM_SUBDIST
        string      CD_GEOCODD
        string      NM_DISTRIT
        string      CD_GEOCODM
        string      NM_MUNICIP
        string      NM_MICRO
        string      NM_MESO
    Feature #3464 (55 vertices): self intersecting outer boundary #0. Split.
    Created 2 rings.
    Feature #19405 (113 vertices): self intersecting outer boundary #0. Split.
    Created 2 rings.
    Feature #22073: duplicate vertices in outer boundary #0. Removed duplicates.
    Feature #23285 (84 vertices): self intersecting outer boundary #0. Split.
    Created 2 rings.
Polygons added (92 s). The triangulation has now:
    Vertices: 1594058
    Edges: 4782171
    Triangles: 3188076
Tagging...
Tagging done (10 s).
Input triangulation:
    Holes:    349767 triangles (10.971100 %)
    Ok:       2516928 triangles (78.948181 %)
    Overlaps: 321381 triangles (10.080720 %)
Repairing regions by longest boundary...
Repair of all polygons not possible (2 s).
Repairing regions by random neighbour...
Repair successful (0 s). All polygons are now valid.
Repaired triangulation:
    Holes:    0 triangles (0.000000 %)
    Ok:       3188076 triangles (100.000000 %)
    Overlaps: 0 triangles (0.000000 %)
Reconstructing polygons (geometry)...
    Removed 482605 constrained edges
Segmentation fault (core dumped)

@lucasmation
Copy link
Author

Dear Hugo, I updated "cgal" and I'm still getting the same error as above on the Bahia file.

@hugoledoux
Copy link
Member

Also with the branch "new"?

@lucasmation
Copy link
Author

No, I haven't tried with new yet

Em Qua, 2 de set de 2015 2:08 AM, Hugo Ledoux notifications@github.com
escreveu:

Also with the branch "new"?


Reply to this email directly or view it on GitHub
#25 (comment).

@lucasmation
Copy link
Author

we managed to update the libraries and programs (lots of work to make the dependencies work...) to the folowing versions:

  • cmake: 2.8.7 (don't think was updated)
  • gdal: 1.9.0, released 2011/12/29
  • cgal: 4.6
  • prepair to improvements-with-ogr
  • pprepair to pprepair-new

Now I'am able to run things, but still find some problems. First, I sucessfully ran prepair ( improvements-with-ogr version) on all files. For each state

prepair --ogr state_file.shp  --shpOut   state_file_1.shp

Then I tryed the pprepair:

 pprepair -i state_file_1.shp -o  new -r fix 

This fixed the problems in some state but caused problems in others. Some states that use to work now don't work under pprepair-new. Others that did not work under pprepair-master now work with pprepair-new.
The 27 states can be grouped into 4 cagegories, by success or failure wth the pprepair master and new versions:

  1. 15 states work with pprepair-master and pprepair-new : RO, AC, AM, RR, AP, MA, CE, RN, PB, AL, SC, RS, MS, GO, DF
  2. 3 states work with pprepair-master and error with pprepair-new (''Segmentation fault (core dumped)''): TO, PI, PE
  3. 4 sates error with pprepair-master (''Segmentation fault (core dumped)'') and work with pprepair-new: ES, RJ, SP, and BA. In BA, which as the main example in the original question, the script only works if we add the --skipvalideach option to the pprepair call.
  4. 4 states error with both pprepair-master and pprepair-new : PA, MG, PR, MT. The state of PA used to run indefinetivelly, now it errors 'Segmentation fault (core dumped)'' as the errors in other states.

Any idea on what can be going on?

Bellow I copy the ofput when the program errors with ''Segmentation fault (core dumped)'' (the example is the state of BA without the --skipvalideach option, which erros).

pprepair -i 29SEE250GC_SIR2.shp -o new  -r fix
Reading input dataset: 29SEE250GC_SIR2.shp
    Reading layer #1 (24139 polygons)
    Validating individually every polygon...
    Done, all polygons are valid.
    Adding the polygons to the PP...
    polygon #100
    polygon #200
...
    polygon #23900
    polygon #24000
    polygon #24100
    Total input single polygons: 24166
Building the PP (tagging the triangles)... done.
*** Triangulation ***
    Vertices: 1594058
    Edges: 4782171
    Triangles: 3188076
    Ok:       2513269 triangles
    Overlaps: 339710 triangles
    Holes:    335097 triangles
*** Problematic Regions ***
    Overlaps: 73736 regions
    Holes:    67816 regions
Repairing by random neighbour...
Repair successful (1 s). All polygons are now valid.
*** Triangulation ***
    Vertices: 1594058
    Edges: 4782171
    Triangles: 3188076
    Ok:       3188076 triangles
    Overlaps: 0 triangles
    Holes:    0 triangles
*** Problematic Regions ***
    Overlaps: 0 regions
    Holes:    0 regions
Reconstructing polygons (geometry)...
    Removed 468657 constrained edges
Segmentation fault (core dumped)

@lucasmation
Copy link
Author

We ran more test here on the states that don't work with pprepair (cases ''2'' an ''4'' in the comment above). Most of them work if, in stead of -r fix we use *-r LB *. As indicated in the table bellow, this is true for all states except PA and BA (when the --skipvalideach option not included).

pprepair output (error or works) per parameter used in the -r swith

State fix LB
TO error works
PI error works
PE error works
PA error error
MG error works
PR error works
MT error works
BA* error error

@hugoledoux
Copy link
Member

I would suggest you never use ---skipvalideach unless you know that the polygons are valid, if that works for some datasets if pure luck, problems are fixed because of the processing.

Hmmm, the BA does work for me here under Mac if I repair them with prepair first (as above), and with pprepair fix. I don't know why, the other developer in the project can maybe answer this week, he wasn't there last week.

The fix vs LB shouldn't happen, I've opened an issue to look into the issue ( #26 ).

Thanks for reporting these, very useful for us.

@lucasmation
Copy link
Author

Hugo, tks. We look forward to these fixes.

Given that I'm running prepair first, shouldn't the "individual validation of each input poligon" in pprepair be inocous because the poligons are already valid (i.e. ---skipvalideach would not change things )?

For the states in which the code worked, the output shapefile still has some problems:

  • poligons that were composed of multiple disconected parts (multi-póligons), such as small islands on the coast that bellowng to the same Enumeration District, become a series of single poligons.
  • Some elemensts are now comprized of "a polygon + a point" (that we do not understand) or "a polygon + a line" (we sort of understand why that hapens, becuase of an overlaid polygon just that shares a border with the underlying polygon).

Don´t take me wrong. "pprepair" is showing to be invaluable to us. These other errors are minor and
we can fix them in POSTGIS. But I though you mighth also interested in these other findings.

@hugoledoux
Copy link
Member

indeed yes if you run prepair first then the individual validation can be
skipped.

multi? Yes, these are evil. I break them and they are not bundled as such
as the end indeed. But the attributes are kept, so you should be able to
reconstruct them. I might get around to implementing that, but not my
priority. I am indeed very interesting in knowing these things, thanks!

polygon+point? OK, that’s worrying, no piece of code would write a point,
it’s not possible. So please can you fill a new issue and give us a link to
a small example where that happens?

Can I btw know for which organisastion you are working?

On Mon, Sep 14, 2015 at 7:08 PM, lucasmation notifications@github.com
wrote:

Hugo, tks. We look forward to these fixes.

Given that I'm running prepair first, shouldn't the "individual validation
of each input poligon" in pprepair be inocous because the poligons are
already valid (i.e. ---skipvalideach would not change things )?

For the states in which the code worked, the output shapefile still has
some problems:

poligons that were composed of multiple disconected parts
(multi-póligons), such as small islands on the coast that bellowng to the
same Enumeration District, become a series of single poligons.

Some elemensts are now comprized of "a polygon + a point" (that we do
not understand) or "a polygon + a line" (we sort of understand why that
hapens, becuase of an overlaid polygon just that shares a border with the
underlying polygon).

Don´t take me wrong. "pprepair" is showing to be invaluable to us. These
other errors are minor and
we can fix them in POSTGIS. But I though you mighth also interested in
these other findings.


Reply to this email directly or view it on GitHub
#25 (comment).

@lucasmation
Copy link
Author

I work for the Instituto de Pesquisa Economica Aplicada (IPEA, institute of applyed economic research, http://ipea.gov.br/portal/) in Brasíila, Brazil. We will publish a tecnhincal note on how we fixed the Census Enumeration District Polygons, along with the corrected data. That will be based on pprepair (giving due credit off course) and some other fixes in Postgis and a "fringe" of manual corrections.

it is weird that, having run prepair before the --skipvalideach option makes the data for BA state work.

I'll send the polygon+point bugs tomorrow. In general, the shapefile outputed from pprepair still has invalid polygons according to PostGis. But at least now these errors can be fixed with "makeValid()" , which did not work on the original data (or to be more precise, ran but did not solve the non noded intersections for instance) .

We are now working on code to detect "drastic changes" made by pprepair, to have them checked by a human

@kenohori kenohori self-assigned this Sep 15, 2015
@kenohori
Copy link
Member

Hello!

Sorry for the delay in replying. Things are quite busy these days and I just managed to look into the errors now. Thanks for the bug reports!

Most of the datasets work fine for me, so I'm not 100% sure about the source of most of your errors, but my guess is that you have a faulty GMP installation (the multi-precision library used by CGAL). Which version are you using and how did you get it? From apt-get?

So, the Bahia dataset works fine for me too, but the Pará dataset doesn't. That one is hanging on a very strange place where we use an iterator around a set of triangles. I've never seen that error before and we'll certainly look into it.

In the mean time, I've uploaded a fixed version of the two datasets causing you problems:
https://www.dropbox.com/sh/cfnpiawdy63qgzv/AACYiNg7Vh9E3ldMaAre8z4Aa?dl=0

The Pará one was fixed by using the options -rrlb -rtnn -rtam

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants