# Geospatial Data in Python

## Vector Data

We are going to use [geopandas](https://geopandas.org/) to work with some vector data layers.  Geopandas is built on top of pandas, the main addition is its ability to handle **Geometries**.  For this, it relies on another package called [shapely](https://shapely.readthedocs.io/en/stable/manual.html)


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import geopandas as gpd


%matplotlib inline

Y = np.array([2,12,18,12,2])
X = np.array([12,12,15,18,18])

fig,ax=plt.subplots()
ax.scatter(X,Y)
ax.grid()
ax.set_title('Verticies')

## Shapely Objects

### Point(s)

In [None]:
from shapely.geometry import Point

point = Point([X[0],Y[0]])

point

In [None]:
from shapely.geometry import MultiPoint

mulit_point = MultiPoint([x for x in zip(X, Y)])

mulit_point

### Polygon(s)

In [None]:
from shapely.geometry import Polygon

poly = Polygon([coord for coord in zip(X, Y)])

poly

In [None]:
from shapely.geometry import MultiPolygon

Mpol = []
for i in range(0,61,20):
    Mpol.append(Polygon([coord for coord in zip(X+i, Y+i)]))

print(Mpol)
Mpoly = MultiPolygon(Mpol)

Mpoly

### Line(s)

In [None]:
from shapely.geometry import LineString

line = LineString([x for x in zip(X, Y)])

line

In [None]:
from shapely.geometry import MultiLineString

Mlin = []
for i in range(0,61,20):
    Mlin.append(LineString([coord for coord in zip(X+i, Y+i)]))

print(Mlin)
Mline = MultiLineString(Mlin)

Mline

## Unzip Data

- Below is a .zip file of census sub-divisions from Simply Analytics.
- We need to unzip it and inspect the metadata

In [None]:
import zipfile
Shape_file='SimplyAnalytics_Shapefiles_2021-11-18_04_29_59_93f600838bff00a6da2283b90dbf31c8'
with zipfile.ZipFile('data/'+Shape_file+'.zip', 'r') as zip_ref:
    zip_ref.extractall('data/Census/')


## Reading the shapefile

We can rely on [geopandas](https://geopandas.org/en/stable/), a spatial extension for pandas.  We can use Geopandas to read, manipulate, and write geospatial data.

* We can open .txt files with pandas to view the metadata

In [None]:
import geopandas as gpd

# the .read_file() function reads shapefiles
BC_FSA = gpd.read_file('data/Census/'+Shape_file+'.shp')

meta_data = pd.read_csv('data/Census/variable_names.txt',header=None,sep='#')
print('Vrriable Names: \n',meta_data.values)

BC_FSA

## Editing Data

In [None]:

## Note - This is the terminology used by the census
## It can be found in the variable_name file that comes with the download
BC_FSA = BC_FSA.rename(columns={
'VALUE0': 'Population',
})

BC_FSA.head()

## Plotting the Data

- Display the data and inspect the projection

In [None]:
fig,ax=plt.subplots(figsize=(8,8))
BC_FSA.plot(ax=ax)

BC_FSA.crs

## Re-project and Save the Data

In [None]:
BC_FSA_Albers = BC_FSA.to_crs('EPSG:3005')

fig,ax=plt.subplots(figsize=(8,8))
BC_FSA_Albers.plot(ax=ax)

BC_FSA_Albers.crs

## Inspect the Geometry

In [None]:
BC_FSA_Albers.area

## Map by a Column

In [None]:
fig,ax=plt.subplots(figsize=(8,8))
BC_FSA_Albers.plot(column='Population',ax=ax,legend=True,scheme="quantiles",edgecolor='k')

## Spatial Overlay

In [None]:
BC_Boundary = gpd.read_file('data/Census/BC_Boudary_File.shp')

fig,ax = plt.subplots(1,2,figsize=(8,5))
BC_FSA_Albers.plot(edgecolor='k',ax=ax[0])
BC_Boundary.plot(edgecolor='k',ax=ax[1])

ax[0].set_title('Input Layer')
ax[1].set_title('Clip Layer')


## Clip

This will take a little while to process

In [None]:


BC_FSA_Clip = gpd.clip(BC_FSA_Albers,BC_Boundary)

fig,ax = plt.subplots(figsize=(5,7))
BC_FSA_Clip.plot(edgecolor='k',ax=ax)
ax.set_title('Final Result')

## Calculate Population Density

In [None]:
BC_FSA_Clip['Pop_Density'] = BC_FSA_Clip['Population']/BC_FSA_Clip.area*1e6
BC_FSA_Clip['Pop_Density']=BC_FSA_Clip['Pop_Density'].fillna(0)

fig,ax=plt.subplots(figsize=(10,10))
BC_FSA_Clip.plot(column='Pop_Density',ax=ax,legend=True,scheme="User_Defined", 
         classification_kwds=dict(bins=[
             BC_FSA_Clip['Pop_Density'].quantile(.25),
             BC_FSA_Clip['Pop_Density'].quantile(.5),
             BC_FSA_Clip['Pop_Density'].quantile(.75),
             BC_FSA_Clip['Pop_Density'].max()]),
                    edgecolor='black',linewidth=.25)

ax.grid()

# ## Set Zoom & Turn Grid off
x = 1.225e6
y = 0.45e6
v = 8.5e4 
h = 8.5e4

ax.set_xlim(x-h,x+h)
ax.set_ylim(y-h,y+h)
ax.grid()

# BC_FSA_Clip.to_file('data//BC_FSAisions.shp')



## Data Classification

In [None]:
BC_FSA_Clip.loc[((BC_FSA_Clip['Pop_Density']<=400)|(BC_FSA_Clip['Population']<=1000)),
                    'Community_Type']='Rural'

BC_FSA_Clip.loc[(BC_FSA_Clip['Community_Type']!='Rural'),
                    'Community_Type']='Urban'

print('Community Type Summary BC')
print(BC_FSA_Clip.groupby('Community_Type').count()['name'].sort_values())


fig,ax=plt.subplots(figsize=(8,8))
BC_FSA_Clip.plot(column='Community_Type',ax=ax,legend=True,cmap='Pastel2',edgecolor='k',linewidth=.25)

ax.set_xlim(x-h,x+h)
ax.set_ylim(y-h,y+h)
# ax.grid()

ax.set_title('Community Type SW BC')



# Saving Data

## Shapefiles

We can save new layer as a shapefile.  [Shapefiles](https://en.wikipedia.org/wiki/Shapefile) are only one type of vector file.  They have certain restrictions to minimize storage space (eg. limiting column names), they aren't human readable, are restricted to one geometry type (Polygon, line, or point), and they split the data into multiple files.

Note the **"UserWarnings"**  This one isn't too serious, but it explains why the data came with a generic column header

In [None]:
BC_FSA_Albers.loc[BC_FSA_Albers['geometry'].type !='GeometryCollection'].to_file('data/Outputs/BC_FSA_Albers.shp')
BC_FSA_Clip.loc[BC_FSA_Clip['geometry'].type !='GeometryCollection'].to_file('data/Outputs/BC_FSA_Clip.shp')



## Geojson

We can save the new layer as a [geojson](https://en.wikipedia.org/wiki/GeoJSON) file to get around the issue.
* This is a simple file type often used in web-centered applications that stores data as a **human readable** dictionary.
* The file takes up more space, but is also a bit more accessible/flexible.


In [None]:
print(BC_FSA_Clip.groupby(BC_FSA_Clip['geometry'].type).count()['Population'])
BC_FSA_Clip.to_file("data/Outputs/BC_FSA_Clip.json", driver = "GeoJSON")