# Sorting Greenspace Data

Data is loaded in at LSOA level. Greenspace data was downloaded as individual polygons from OS. Greenspace was divided up into 4 types: natural, parks, sports, others. It was intersected with LSOAs, to identify which greenspaces fell within each LSOA. This now needs to be aggregated, to calculate the amount of greenspace within each LSOA.

1. Separate out into 4 types
2. Dissolve, to remove overlapping polygons
3. Intersect with LSOAs
4. Calculate areas
5. Export as csv
6. Load into Python
7. Aggregate to LSOA
8. Join back up with LSOA geometry
9. Calculate proportions

## Reading in Data

In [1]:
import pandas as pd 
import numba
import seaborn as sns 
import matplotlib.pyplot as plt
import geopandas as gpd
import palettable as pltt
import descartes
from pysal.viz import mapclassify 
import numpy as np
import statsmodels.api as sm
import scipy.stats as stats



In [2]:
import shapely
import rtree
from shapely.geometry import Polygon

In [3]:
#parks = gpd.read_file('GS_Parks.shp')

In [4]:
#parks.head()

Unnamed: 0,priFunc,Type,OBJECTID,Area,geometry
0,Public Park Or Garden,Park,,16.76488,"POLYGON Z ((399649.300 653037.950 0.000, 39964..."
1,Public Park Or Garden,Park,,169.0226,"POLYGON Z ((399664.550 653075.840 0.000, 39967..."
2,Public Park Or Garden,Park,,202.42683,"POLYGON Z ((399686.900 653074.660 0.000, 39968..."
3,Public Park Or Garden,Park,,10.07841,"POLYGON Z ((399665.160 653076.510 0.000, 39967..."
4,Public Park Or Garden,Park,,89.11763,"POLYGON Z ((399665.160 653076.510 0.000, 39967..."


In [3]:
lsoas = gpd.read_file('LSOAs_KeyUrban.shp')

In [4]:
lsoas.head()

Unnamed: 0,OBJECTID,LSOA11CD,LSOA11NM,LSOA11NMW,Shape__Are,Shape__Len,geometry
0,1,E01000001,City of London 001A,City of London 001A,129865.337669,2635.781429,"POLYGON ((532095.563 181577.351, 532095.125 18..."
1,2,E01000002,City of London 001B,City of London 001B,228419.333099,2708.05204,"POLYGON ((532267.728 181643.781, 532262.875 18..."
2,3,E01000003,City of London 001C,City of London 001C,59054.013168,1224.770897,"POLYGON ((532105.312 182010.574, 532104.872 18..."
3,4,E01000005,City of London 001E,City of London 001E,189577.165154,2275.832056,"POLYGON ((533610.974 181410.968, 533615.622 18..."
4,5,E01000006,Barking and Dagenham 016A,Barking and Dagenham 016A,146536.52047,1966.162225,"POLYGON ((544817.826 184346.261, 544815.791 18..."


<BR><BR><BR>
    
## Dissolve the greenspace polygons

In [7]:
#Leeds['outline'] = 1
#parks_outline = parks.dissolve()

In [8]:
#check it worked
#parks_outline.head()

Unnamed: 0,geometry,priFunc,Type,OBJECTID,Area
0,"MULTIPOLYGON Z (((573378.692 106943.743 0.000,...",Public Park Or Garden,Park,41186.0,16.76488


In [9]:
#save the dissolved polygons
#parks_outline.to_file('Parks_outline.shp')

In [5]:
#read back in the dissolved polygons
#parks_outline = gpd.read_file('Parks_outline.shp')

In [6]:
#parks_outline.head()

Unnamed: 0,priFunc,Type,OBJECTID,Area,geometry
0,Public Park Or Garden,Park,41186.0,16.76488,"MULTIPOLYGON Z (((573378.692 106943.743 0.000,..."


<br><br><br>

## Intersect LSOAs with parks

In [None]:
#check CRS and convert if needed

In [7]:
#lsoas.crs

<Projected CRS: EPSG:27700>
Name: OSGB36 / British National Grid
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore.
- bounds: (-9.0, 49.75, 2.01, 61.01)
Coordinate Operation:
- name: British National Grid
- method: Transverse Mercator
Datum: Ordnance Survey of Great Britain 1936
- Ellipsoid: Airy 1830
- Prime Meridian: Greenwich

In [8]:
#parks_outline.crs

<Projected CRS: PROJCS["OSGB36_British_National_Grid",GEOGCS["GCS_ ...>
Name: OSGB36_British_National_Grid
Axis Info [cartesian]:
- [east]: Easting (metre)
- [north]: Northing (metre)
Area of Use:
- undefined
Coordinate Operation:
- name: unnamed
- method: Transverse Mercator
Datum: Ordnance Survey of Great Britain 1936
- Ellipsoid: Airy 1830
- Prime Meridian: Greenwich

In [11]:
#parks_outline = parks_outline.to_crs(epsg = 27700) 

In [12]:
#parks_outline.crs

<Projected CRS: EPSG:27700>
Name: OSGB36 / British National Grid
Axis Info [cartesian]:
- E[east]: Easting (metre)
- N[north]: Northing (metre)
Area of Use:
- name: United Kingdom (UK) - offshore to boundary of UKCS within 49°45'N to 61°N and 9°W to 2°E; onshore Great Britain (England, Wales and Scotland). Isle of Man onshore.
- bounds: (-9.0, 49.75, 2.01, 61.01)
Coordinate Operation:
- name: British National Grid
- method: Transverse Mercator
Datum: Ordnance Survey of Great Britain 1936
- Ellipsoid: Airy 1830
- Prime Meridian: Greenwich

In [None]:
#lsoas_parks_int = gpd.overlay(lsoas, parks_outline, how='intersection')



In [11]:
sports = pd.read_csv('GS_Sport_Edit.csv')

In [12]:
sports.head()

Unnamed: 0,OBJECTID,LSOA11CD,LSOA11NM,LSOA11NMW,Shape__Are,Shape__Len,Area
0,9076,E01009345,Birmingham 053D,Birmingham 053D,502490.073685,3976.735948,14478.16459
1,8685,E01008924,Birmingham 101A,Birmingham 101A,411321.745239,4663.597926,115461.38626
2,32705,E01033629,Birmingham 106F,Birmingham 106F,935576.119431,6069.037061,201634.07831
3,9226,E01009505,Birmingham 101E,Birmingham 101E,337469.733559,4308.736789,429.85873
4,8689,E01008929,Birmingham 106C,Birmingham 106C,532836.286461,5400.6903,15372.21043


In [6]:
natural = pd.read_csv('GS_Nat_Edit.csv')

In [10]:
natural.head()

Unnamed: 0,OBJECTID,LSOA11CD,LSOA11NM,LSOA11NMW,Shape__Are,Shape__Len,Area
0,1,E01000001,City of London 001A,City of London 001A,129865.337669,2635.781429,32.638
1,2,E01000002,City of London 001B,City of London 001B,228419.333099,2708.05204,8856.764
2,3,E01000003,City of London 001C,City of London 001C,59054.013168,1224.770897,366.142
3,7,E01000008,Barking and Dagenham 015B,Barking and Dagenham 015B,195064.797699,4135.125164,1957.608
4,10,E01000011,Barking and Dagenham 016C,Barking and Dagenham 016C,91630.550575,1543.254615,383.622


In [20]:
parks = pd.read_csv('GS_Parks_Edit.csv')

In [21]:
parks.head()

Unnamed: 0,OBJECTID,LSOA11CD,LSOA11NM,LSOA11NMW,Shape__Are,Shape__Len,Area
0,6409,E01006577,Liverpool 034C,Liverpool 034C,377258.075829,3347.700025,1712.28435
1,6374,E01006539,Liverpool 047B,Liverpool 047B,855312.532181,4122.723589,306624.72571
2,6592,E01006774,Liverpool 036E,Liverpool 036E,245399.756279,4447.291992,406.33295
3,6610,E01006794,Liverpool 046B,Liverpool 046B,455504.442162,5152.160406,25163.86291
4,6412,E01006580,Liverpool 036B,Liverpool 036B,378033.34024,3204.390176,41323.96329


In [28]:
otherlon = pd.read_csv('GS_Other_Lon_Edit.csv')

In [35]:
othernotlon = pd.read_csv('GS_Other_NOTLondon_Edit.csv')

In [29]:
otherlon.head()

Unnamed: 0,OBJECTID,LSOA11CD,LSOA11NM,LSOA11NMW,Shape__Are,Shape__Len,Area
0,195,E01000199,Barnet 025B,Barnet 025B,338086.2,4082.191756,33323.28884
1,258,E01000262,Barnet 017D,Barnet 017D,2209581.0,9856.607209,963661.72991
2,279,E01000283,Barnet 012C,Barnet 012C,330542.9,4473.084544,22542.36808
3,276,E01000280,Barnet 007D,Barnet 007D,1016103.0,5545.40832,19490.5103
4,285,E01000289,Barnet 004D,Barnet 004D,281199.9,3287.721871,59972.46614


In [36]:
othernotlon.head()

Unnamed: 0,OBJECTID,LSOA11CD,LSOA11NM,LSOA11NMW,Shape__Are,Shape__Len,Area
0,10534,E01010846,Bradford 055E,Bradford 055E,447893.2,6258.086866,97710.40263
1,10447,E01010755,Bradford 055B,Bradford 055B,265920.8,2807.825025,25752.11212
2,10385,E01010693,Bradford 002B,Bradford 002B,1010691.0,8388.025224,113621.8192
3,10533,E01010845,Bradford 059B,Bradford 059B,398219.0,5252.412452,84440.77265
4,10537,E01010849,Bradford 059C,Bradford 059C,235118.1,3204.891027,17164.21293


<br> <br> <br>
 
# Keeping only Required Columns 

In [71]:
sports.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 160294 entries, 0 to 160293
Data columns (total 10 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   OBJECTID    160294 non-null  int64  
 1   LSOA11CD    160294 non-null  object 
 2   LSOA11NM    160294 non-null  object 
 3   LSOA11NMW   160294 non-null  object 
 4   Shape__Are  160294 non-null  float64
 5   Shape__Len  160293 non-null  float64
 6   priFunc     160293 non-null  object 
 7   Type        160293 non-null  object 
 8   OBJECTID_2  158185 non-null  float64
 9   Area        160293 non-null  float64
dtypes: float64(4), int64(1), object(5)
memory usage: 12.2+ MB


In [13]:
sports_simp = sports[['LSOA11CD', 'Area']]

In [7]:
natural_simp = natural[['LSOA11CD','Area']]

In [22]:
parks_simp = parks[['LSOA11CD', 'Area']]

In [30]:
otherlon_simp = otherlon[['LSOA11CD', 'Area']]

In [37]:
othernotlon_simp = othernotlon[['LSOA11CD', 'Area']]

<br><br><br>

# Aggregating by LSOA

In [14]:
sports_agg = sports_simp.groupby(['LSOA11CD'], as_index = False).sum()

In [15]:
sports_agg.head()

Unnamed: 0,LSOA11CD,Area
0,E01000001,2536.14772
1,E01000003,3733.85104
2,E01000005,247.36556
3,E01000007,197.7103
4,E01000008,464.64081


In [8]:
natural_agg = natural_simp.groupby(['LSOA11CD'], as_index = False).sum()

In [9]:
natural_agg.head()

Unnamed: 0,LSOA11CD,Area
0,E01000001,32.638
1,E01000002,8856.764
2,E01000003,366.142
3,E01000008,1957.608
4,E01000011,383.622


In [23]:
parks_agg = parks_simp.groupby(['LSOA11CD'], as_index = False).sum()

In [24]:
parks_agg.head()

Unnamed: 0,LSOA11CD,Area
0,E01000001,2582.9339
1,E01000005,1538.83062
2,E01000009,22.52817
3,E01000010,38189.31363
4,E01000011,9579.57566


In [31]:
otherlon_agg = otherlon_simp.groupby(['LSOA11CD'], as_index = False).sum()

In [32]:
otherlon_agg.head()

Unnamed: 0,LSOA11CD,Area
0,E01000001,10992.60819
1,E01000002,23719.80642
2,E01000003,8305.13093
3,E01000005,21707.9388
4,E01000006,8025.92023


In [38]:
othernotlon_agg = othernotlon_simp.groupby(['LSOA11CD'], as_index = False).sum()

In [39]:
othernotlon_agg.head()

Unnamed: 0,LSOA11CD,Area
0,E01005061,155956.353
1,E01005062,157668.2871
2,E01005063,58478.92806
3,E01005065,133293.5842
4,E01005066,41116.11909


In [42]:
other_agg.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 27503 entries, 0 to 27502
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   LSOA11CD  27503 non-null  object 
 1   Area      27503 non-null  float64
dtypes: float64(1), object(1)
memory usage: 644.6+ KB


<br> <br> <br>
# Save Outputs

In [16]:
sports_agg.to_csv('Sports_agg.csv')

In [10]:
natural_agg.to_csv('Natural_agg.csv')

In [25]:
parks_agg.to_csv('Parks_agg.csv')

In [33]:
otherlon_agg.to_csv('Otherlon_agg.csv')

In [40]:
othernotlon_agg.to_csv('Othernotlon_agg.csv')

<br><Br><br>

# Compare file lengths

In [58]:
sports_simp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69966 entries, 0 to 69965
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   LSOA11CD  69966 non-null  object 
 1   Area      69966 non-null  float64
dtypes: float64(1), object(1)
memory usage: 1.1+ MB


In [59]:
sports_agg.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4400 entries, 0 to 4399
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   LSOA11CD  4400 non-null   object 
 1   Area      4400 non-null   float64
dtypes: float64(1), object(1)
memory usage: 103.1+ KB


In [60]:
parks_simp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3187 entries, 0 to 3186
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   LSOA11CD  3187 non-null   object 
 1   Area      3187 non-null   float64
dtypes: float64(1), object(1)
memory usage: 49.9+ KB


In [61]:
parks_agg.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3187 entries, 0 to 3186
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   LSOA11CD  3187 non-null   object 
 1   Area      3187 non-null   float64
dtypes: float64(1), object(1)
memory usage: 74.7+ KB


In [62]:
natural_simp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3768 entries, 0 to 3767
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   LSOA11CD  3768 non-null   object 
 1   Area      3768 non-null   float64
dtypes: float64(1), object(1)
memory usage: 59.0+ KB


In [63]:
natural_agg.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 3768 entries, 0 to 3767
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   LSOA11CD  3768 non-null   object 
 1   Area      3768 non-null   float64
dtypes: float64(1), object(1)
memory usage: 88.3+ KB


In [64]:
otherlon_simp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 513031 entries, 0 to 513030
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   LSOA11CD  513031 non-null  object 
 1   Area      513031 non-null  float64
dtypes: float64(1), object(1)
memory usage: 7.8+ MB


In [65]:
otherlon_agg.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4812 entries, 0 to 4811
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   LSOA11CD  4812 non-null   object 
 1   Area      4812 non-null   float64
dtypes: float64(1), object(1)
memory usage: 112.8+ KB


In [66]:
othernotlon_simp.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 785537 entries, 0 to 785536
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype  
---  ------    --------------   -----  
 0   LSOA11CD  785537 non-null  object 
 1   Area      785537 non-null  float64
dtypes: float64(1), object(1)
memory usage: 12.0+ MB


In [68]:
othernotlon_agg.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4191 entries, 0 to 4190
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   LSOA11CD  4191 non-null   object 
 1   Area      4191 non-null   float64
dtypes: float64(1), object(1)
memory usage: 98.2+ KB
