## Testing out parenx skeletonization and voronoi approaches

Resources:
* https://github.com/nptscot/networkmerge
* https://github.com/nptscot/networkmerge
* https://github.com/anisotropi4/parenx/tree/main

In [2]:
import glob
import os
import re
import shutil

import geopandas as gpd

from core import utils

In [3]:
fuas = list(utils.fua_city.keys())

In [None]:
# parquet is not a recognized format for parenx - convert to gpkg first
for fua in fuas:
    subfolder = f"../../data/{fua}/temp-parenx/"
    os.makedirs(f"../../data/{fua}/temp-parenx/", exist_ok=True)
    roads = gpd.read_parquet(f"../../data/{fua}/original/{fua}.parquet").reset_index(
        drop=True
    )
    roads.to_file(subfolder + "roads_osm.gpkg", layer="roads", engine="pyogrio")

**Now, run the bash script `parenx-run.sh` from command line**

(this implies having activated a `conda` environent in which `parenx` is installed)

`bash notebooks/methods/parenx-run.sh`

this will add to each subfolder in `temp-parenx` 2 files: voronoi.gpkg and skeletonize.gpkg. gitignoring them for now because the outputs are too large.

```bash
Last login: Sun Dec  1 12:19:51 on ttys014
anvy@mac622265  ~ % cd /Users/anvy/Library/CloudStorage/OneDrive-ITU/projects/simplification
anvy@mac622265  simplification % conda activate simplification
(simplification) anvy@mac622265  simplification % bash notebooks/methods/parenx-run.sh
Simplification for ./data/869/temp-parenx started
start		0:00:00.000336
read geojson	0:00:00.918972
process		0:00:01.475191
write simple	0:04:02.346021
write primal	0:04:02.622587
stop		0:04:03.053205
start		0:00:00.000161
read geojson	0:00:01.046571
process		0:00:01.581595
dewhisker	0:02:16.698107
write simple	0:17:31.049953
write primal	0:17:31.646008
stop		0:17:31.725630
Simplification for ./data/8989/temp-parenx started
start		0:00:00.000625
read geojson	0:00:01.830190
process		0:00:02.686399
write simple	2:56:22.620136
write primal	2:56:24.372666
stop		2:56:25.852823
start		0:00:00.000149
read geojson	0:00:01.595519
process		0:00:02.359671
dewhisker	0:22:23.474801
write simple	2:30:53.991138
write primal	2:30:55.799845
stop		2:30:56.056108
Simplification for ./data/1656/temp-parenx started
start		0:00:00.000497
read geojson	0:00:01.910386
process		0:00:02.695223
write simple	0:12:31.371171
write primal	0:12:31.995311
stop		0:12:32.556674
start		0:00:00.000225
read geojson	0:00:01.505799
process		0:00:02.204637
dewhisker	0:04:34.519544
write simple	0:37:07.637350
write primal	0:37:08.207694
stop		0:37:08.538319
Simplification for ./data/4881/temp-parenx started
start		0:00:00.000481
read geojson	0:00:00.970832
process		0:00:01.407188
write simple	0:06:24.631430
write primal	0:06:25.051560
stop		0:06:25.581967
start		0:00:00.000190
read geojson	0:00:00.836642
process		0:00:01.283132
dewhisker	0:04:54.067353
write simple	0:40:18.274922
write primal	0:40:18.973528
stop		0:40:19.078338
Simplification for ./data/809/temp-parenx started
start		0:00:00.000628
read geojson	0:00:01.645937
/Users/anvy/anaconda3/envs/simplification/lib/python3.11/site-packages/pyogrio/geopandas.py:523: UserWarning: GeoSeries.notna() previously returned False for both missing (None) and empty geometries. Now, it only returns False for missing values. Since the calling GeoSeries contains empty geometries, the result has changed compared to previous versions of GeoPandas.
Given a GeoSeries 's', you can use '~s.is_empty & s.notna()' to get back the old behaviour.

To further ignore this warning, you can do: 
import warnings; warnings.filterwarnings('ignore', 'GeoSeries.notna', UserWarning)
  has_z_arr = geometry[geometry.notna() & (~geometry.is_empty)].has_z
/Users/anvy/anaconda3/envs/simplification/lib/python3.11/site-packages/pyogrio/raw.py:709: RuntimeWarning: Layer 'input' has been declared with non-Z geometry type LineString, but it does contain geometries with Z. Setting the Z=2 hint into gpkg_geometry_columns
  ogr_write(
process		0:00:02.383023
write simple	0:06:36.363997
write primal	0:06:36.890368
stop		0:06:38.061981
start		0:00:00.000197
read geojson	0:00:01.462391
/Users/anvy/anaconda3/envs/simplification/lib/python3.11/site-packages/pyogrio/geopandas.py:523: UserWarning: GeoSeries.notna() previously returned False for both missing (None) and empty geometries. Now, it only returns False for missing values. Since the calling GeoSeries contains empty geometries, the result has changed compared to previous versions of GeoPandas.
Given a GeoSeries 's', you can use '~s.is_empty & s.notna()' to get back the old behaviour.

To further ignore this warning, you can do: 
import warnings; warnings.filterwarnings('ignore', 'GeoSeries.notna', UserWarning)
  has_z_arr = geometry[geometry.notna() & (~geometry.is_empty)].has_z
/Users/anvy/anaconda3/envs/simplification/lib/python3.11/site-packages/pyogrio/raw.py:709: RuntimeWarning: Layer 'input' has been declared with non-Z geometry type LineString, but it does contain geometries with Z. Setting the Z=2 hint into gpkg_geometry_columns
  ogr_write(
process		0:00:02.267041
dewhisker	0:05:49.679755
write simple	0:51:16.541609
write primal	0:51:17.745536
stop		0:51:17.931114
Simplification for ./data/1133/temp-parenx started
start		0:00:00.000613
read geojson	0:00:01.560002
process		0:00:02.275621
write simple	0:16:43.989057
write primal	0:16:44.888332
stop		0:16:46.350155
start		0:00:00.000198
read geojson	0:00:01.249237
process		0:00:01.910138
dewhisker	0:10:14.528398
write simple	1:18:32.373940
write primal	1:18:34.121413
stop		1:18:34.416316
Simplification for ./data/4617/temp-parenx started
start		0:00:00.000461
read geojson	0:00:01.483120
process		0:00:02.152510
write simple	0:31:30.022070
write primal	0:31:30.992823
stop		0:31:31.649961
start		0:00:00.000303
read geojson	0:00:01.479936
process		0:00:02.135223
dewhisker	0:10:55.153675
write simple	1:50:06.980065
write primal	1:50:07.784702
stop		1:50:07.878618
Done.
```

**reduce output file size by removing duplicated data**,  and copy to corresponding `data/{fua_id]}/parenx/` folders (in parquet format)

In [4]:
for subfolder in glob.glob("../../data/*/temp-parenx/"):
    fua = int(re.findall(r"\d+", subfolder)[0])

    # SKELETONIZE
    os.makedirs(f"../../data/{fua}/parenx-skeletonize/", exist_ok=True)
    ske = gpd.read_file(
        filename=subfolder + "/skeletonize.gpkg", driver="fiona", layer="line"
    )
    ske.to_parquet(f"../../data/{fua}/parenx-skeletonize/{fua}.parquet")

    # VORONOI
    os.makedirs(f"../../data/{fua}/parenx-voronoi/", exist_ok=True)
    vor = gpd.read_file(
        filename=subfolder + "/voronoi.gpkg", driver="fiona", layer="line"
    )
    vor.to_parquet(f"../../data/{fua}/parenx-voronoi/{fua}.parquet")

    print(f"Done for {fua}")

  return ogr_read(
  return ogr_read(
  return ogr_read(


Done for 869


  return ogr_read(
  return ogr_read(


Done for 8989


  return ogr_read(
  return ogr_read(


Done for 1656


  return ogr_read(
  return ogr_read(


Done for 4881


  return ogr_read(


Done for 809


  return ogr_read(
  return ogr_read(
  return ogr_read(


Done for 1133
Done for 4617


  return ogr_read(


**Remove parenx results**

In [5]:
# for fua in fuas:
#     shutil.rmtree(f"../../data/{fua}/temp-parenx/")

### Initial observations & thoughts:
* computation time: skeletonization around 10min for all 5 usecases; voronoi between 1h and 14h (salt lake city, maybe because it has the largest area, or maybe because my laptop went to sleep...)
*  it works well for some places (esp intersections, even the more complicated ones)
* major issue 1: sometimes network topology is not kept (linestrings that don't connect are merged)
* major issue 2: it creates wobbly lines