# GeoPandas interface

Due to the nature of operations done with the vector data cubes, you will often need to convert the Xarray object to GeoPandas GeoDataFrames. Xvec offers a set of methods extending support for exporting Xarray to pandas.

In [1]:
import geopandas as gpd
import numpy as np
import pandas as pd
import xarray as xr
import xvec

from geodatasets import get_path

Create a dummy DataArray with two dimensions indexed by geometry, mimicking transport counts between communities of Chicago.

In [2]:
chicago = gpd.read_file(get_path("geoda.chicago health"))

origin = destination = chicago.geometry.array
mode = ["car", "bike", "foot"]
date = pd.date_range("2023-01-01", periods=100)
hours = range(24)
rng = np.random.default_rng(1)
data = rng.integers(1, 100, size=(3, 100, 24, len(chicago), len(chicago)))
traffic_counts = xr.DataArray(
    data,
    coords=(mode, date, hours, origin, destination),
    dims=["mode", "date", "time", "origin", "destination"],
    name="traffic_counts",
).xvec.set_geom_indexes(["origin", "destination"], crs=chicago.crs)
traffic_counts

Xvec offers `.xvec.to_geopandas()` and `.xvec.to_geodataframe()` methods as counterparts to the default `to_pandas()` and `to_dataframe()` methods built-in Xarray. While the former works on arrays with 2 dimensions or less, the latter is generalisable and always returns all coordinates. Consider the following examples.

Select one day, one time and one mode and convert the data to a GeoDataFrame.

In [3]:
traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geodataframe(
    geometry="origin"
)

Unnamed: 0,origin,destination,mode,date,time,traffic_counts
0,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...","POLYGON ((-87.60914 41.84469, -87.60915 41.844...",bike,2023-02-28,12,19
1,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...","POLYGON ((-87.59215 41.81693, -87.59231 41.816...",bike,2023-02-28,12,23
2,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...","POLYGON ((-87.62880 41.80189, -87.62879 41.801...",bike,2023-02-28,12,20
3,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...","POLYGON ((-87.60671 41.81681, -87.60670 41.816...",bike,2023-02-28,12,13
4,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...","POLYGON ((-87.59215 41.81693, -87.59215 41.816...",bike,2023-02-28,12,13
...,...,...,...,...,...,...
5924,"POLYGON ((-87.80676 42.00084, -87.80676 42.000...","POLYGON ((-87.69646 41.70714, -87.69644 41.706...",bike,2023-02-28,12,53
5925,"POLYGON ((-87.80676 42.00084, -87.80676 42.000...","POLYGON ((-87.64215 41.68508, -87.64249 41.685...",bike,2023-02-28,12,47
5926,"POLYGON ((-87.80676 42.00084, -87.80676 42.000...","MULTIPOLYGON (((-87.83658 41.98640, -87.83658 ...",bike,2023-02-28,12,26
5927,"POLYGON ((-87.80676 42.00084, -87.80676 42.000...","POLYGON ((-87.65456 41.99817, -87.65456 41.998...",bike,2023-02-28,12,75


You end up with two geometry columns. Because none is primary by default, the cell above contains `geometry="origin"` to specify which of them shall become the active geometry of the GeoDataFrame. You can see that all coordinates are present, but `mode`, `date` and `time` are constants.

However, exporting the same using `.xvec.to_geopandas()` does not work because both index and columns are arrays of geometries.

In [4]:
traffic_counts.sel(date="2023-02-28", time=12, mode="bike").xvec.to_geopandas()

ValueError: Multiple coordinates based on xvec.GeometryIndex are not supported as GeoPandas.GeoDataFrame cannot be indexed by geometry. Try using `.xvec.to_geodataframe()` instead.

You can use the method if you ensure that only one array of geometries is returned.

In [5]:
traffic_counts.sel(date="2023-02-28", time=12, origin=origin[0]).xvec.to_geopandas()

mode,destination,car,bike,foot
0,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",48,19,97
1,"POLYGON ((-87.59215 41.81693, -87.59231 41.816...",77,23,25
2,"POLYGON ((-87.62880 41.80189, -87.62879 41.801...",43,20,59
3,"POLYGON ((-87.60671 41.81681, -87.60670 41.816...",96,13,42
4,"POLYGON ((-87.59215 41.81693, -87.59215 41.816...",38,13,48
...,...,...,...,...
72,"POLYGON ((-87.69646 41.70714, -87.69644 41.706...",54,13,22
73,"POLYGON ((-87.64215 41.68508, -87.64249 41.685...",76,10,82
74,"MULTIPOLYGON (((-87.83658 41.98640, -87.83658 ...",49,82,14
75,"POLYGON ((-87.65456 41.99817, -87.65456 41.998...",84,28,16


Compared to the `to_geodataframe`, you can see that constants are removed, and the final GeoDataFrame is much smaller.

In [6]:
traffic_counts.sel(date="2023-02-28", time=12, origin=origin[0]).xvec.to_geodataframe()

Unnamed: 0_level_0,destination,date,time,origin,traffic_counts
mode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
car,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",2023-02-28,12,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",48
car,"POLYGON ((-87.59215 41.81693, -87.59231 41.816...",2023-02-28,12,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",77
car,"POLYGON ((-87.62880 41.80189, -87.62879 41.801...",2023-02-28,12,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",43
car,"POLYGON ((-87.60671 41.81681, -87.60670 41.816...",2023-02-28,12,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",96
car,"POLYGON ((-87.59215 41.81693, -87.59215 41.816...",2023-02-28,12,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",38
...,...,...,...,...,...
foot,"POLYGON ((-87.69646 41.70714, -87.69644 41.706...",2023-02-28,12,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",22
foot,"POLYGON ((-87.64215 41.68508, -87.64249 41.685...",2023-02-28,12,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",82
foot,"MULTIPOLYGON (((-87.83658 41.98640, -87.83658 ...",2023-02-28,12,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",14
foot,"POLYGON ((-87.65456 41.99817, -87.65456 41.998...",2023-02-28,12,"POLYGON ((-87.60914 41.84469, -87.60915 41.844...",16


## Differences compared to the default pandas exports

While both methods wrap native Xarray methods `to_pandas` and `to_dataframe`, there are differences in behavior. Since GeoPandas doesn't support GeometryArray as an index (or columns), Xvec transforms the resulting DataFrame if that happens. It means either resetting the index to get geometries as a standard GeoSeries or transposing the DataFrame and then resetting the index (if geometries are indexing columns).