# Apply Classifier to Parks

At this stage, hopefully the user has completed Step 1, and has successfully pickled a classifier that performs well when tested in terms of accurately categorizing landcover types. 

In this notebook, we will download park shapes from OSM, apply our classifier, and calculate a variety of summary statistics. The end product of this will be a dataframe, with park geometries, and summary statistics attached.

Credits: Chris Van Diemen, Jim Groot (GCW) & Charles Fox (GOST)

First, we provide the path to the directory containing the specialized functions developed for this analysis:

In [52]:
import os
import sys

# add funciotns directory to system path
module_path = os.path.abspath(r'C:\Users\charl\Documents\GitHub\GOST_PublicGoods\Implementations\Georgia - Urban Green Space\Functions')
if module_path not in sys.path:
    sys.path.append(module_path)

Here, we import modules. This will require an environment with GBDX tools pre-installed. 

**N.B.** In this folder you will find a file called recommended-environment.yml. I suggest creating a new conda environment with this file, by running the following line, and setting that as your active environment. This will save you the hassle of installing all of the files below. 

- this line will create a new environment from the recommended-environment.yml file:

```conda env create -f recommended-environment.yml```

- this line will register the kernel with jupyter for you to use:

```python -m ipykernel install --user --name GBDX --display-name "Python (GBDX)" ```

In [53]:
import sklearn
from gbdxtools import Interface
from gbdxtools import CatalogImage
from gbdxtools.task import env
from shapely import geometry
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
import rasterio
import copy
import math
import shapely.geometry
from shapely.geometry import box, shape
import rasterio
import rasterio.features
from rasterio import features
from shapely.ops import cascaded_union
from functools import partial
import pyproj
from shapely.ops import transform

# import fiona
import rasterio
import rasterio.mask
import ogr
from matplotlib import pyplot
from shapely.geometry.polygon import LinearRing, Polygon
from shapely.geometry import mapping, Polygon, MultiPolygon
import geopandas as gpd
import Somefunctions as somef

import utm
import osr
import geopy
from datetime import datetime
import pandas as pd

from functools import partial
import pyproj
from shapely.ops import transform
%matplotlib inline

# connect to gbdx
gbdx = Interface()

# scikit-image modules for image science operators
from skimage import filters, morphology

This handy function comes in useful with shapely's box object

In [56]:
# convert a list of numbers to a string list without brackets or parentheses 
def listToStringWithoutBrackets(list1):
    return str(list1).replace('[','').replace(']','')

Here we grab the city bounding box that we are working with, and ensure it is unprojected

In [57]:
workspace = r'D:\GOST\Georgia'
city = 'Tbilisi'
shapefile = os.path.join(workspace, 'Admin_Boundary',r'Tbilisi_sazrvari.shp')

parkmin = 1 # hectares
parkmax = 500 # hectares

adminboundary = gpd.read_file(shapefile)
if adminboundary.crs != {'init' :'epsg:4326'}:
    adminboundary = adminboundary.to_crs({'init' :'epsg:4326'})

In this block we generate bounding box objects from the loaded city shapefile, and choose the correct UTM projection for the city

In [32]:
# Select the right bounding city polygon and load it as a shapely object
boundpoly = adminboundary.geometry.loc[0]

# generate a string version
boundpoly_wkt = str(boundpoly)

# convert to a bounding box
bbox_large_area_float = list(boundpoly.bounds)

# convert to a bounding box in string format
bbox_large_area_str = listToStringWithoutBrackets(bbox_large_area_float)

boundpoly_shape = adminboundary.geometry.loc[0]

lat_max = adminboundary.bounds.maxy.loc[0]
lon_max = adminboundary.bounds.maxx.loc[0]
lat_min = adminboundary.bounds.miny.loc[0]
lon_min = adminboundary.bounds.maxy.loc[0]

# identify the UTM zone for city of interest
zone_cal = round((183+bbox_large_area_float[0])/6,0)
EPSG = 32700-round((45+bbox_large_area_float[1])/90,0)*100+round((183+bbox_large_area_float[0])/6,0)
UTM_EPSG_code ='EPSG:%i'  % (EPSG)

print('CITY: %s, UTM ZONE: %s' % (str(city.upper()), UTM_EPSG_code))

CITY: TBILISI, UTM ZONE: EPSG:32638


GBDX vectors has the ability to query OSM vector objects. We do this here, looking for Park objects, within our boundary polygon expressed as a WKT object

In [58]:
# convert the bounding box into a well-known text format
bbox_wkt = box(*bbox_large_area_float).wkt

# query OSM vectors (results come back formatted as geojson)
# for water use: AND attributes.natural:water
# for grass/forest use: AND attributes.landuse:grass or forest
# for footway use: AND attributes.highway:footway
# for park use: item_type:Park

parks_geojson = gbdx.vectors.query(boundpoly_wkt, 
                                   query="ingest_source:OSM AND item_type:Park",
                                   index="vector-osm-*", 
                                   count=1e6)

# Count the number of park
print('Total number of Park features:', len(parks_geojson))

Total number of Park features: 259


Having queried OSM via GBDX, we are now going to load the returned information into a pandas DataFrame:

In [59]:
# Create an empty dataframe to save all of the returned objects
parks_df = pd.DataFrame(columns=['id','OSM_id','item_type','name','geom_type','area','check'])

# Convert geojson from OSM to shapely polygons 
geom_list = []
for geojson in parks_geojson:
    geom = shape(geojson['geometry'])
    geom_list.append(geom)
    
# Generate a special 'partial' object that projects unprojected geometry objects to the target UTM geometry
project = partial(
    pyproj.transform,
    pyproj.Proj(init='epsg:4326'), # source coordinate system
    pyproj.Proj(init=UTM_EPSG_code)) # destination coordinate system

# Loop over all polygons and load information to dataframe 
i = 0;
for r in parks_geojson:
    
    # get properties
    props = r['properties']
    
    # get geometry type
    geom_type = r['geometry']['type']
    
    start =  "{0} Central Station".format(city)
    
    end = props[u'name']
   
    # measure area if geometry is polygon, otherwise set area to 0
    if geom_type == u'Polygon':
        
        park_utm = transform(project, geom_list[i])  # apply UTM projection
                
        park_area = park_utm.area/10000 # calculate area in ha
        
    else:
        park_area = 0;
              
    # load all metadata to dataframe. We are only keeping parks that are Polygon type (not line or MultiPolygon)
    # and are within the correct size bracket
    parks_df = parks_df.append({'id':i ,
                                'OSM_id':props[u'id'],
                                'item_type':props[u'item_type'][0],
                                'name':props[u'name'],
                                'geom_type':geom_type,
                                'area':park_area,
                                'check':geom_type == u'Polygon' and park_area > parkmin and park_area < parkmax
                               },ignore_index=True
                              ) #set test for desirable parks
    
    i = i + 1   
    
# get indices of all parks that pass the test set above    
park_list = parks_df.loc[parks_df.check == True]['id'];

# count number of valid parks for our analysis
print('Park feature larger than {0}ha, smaller than {0}ha:'.format(str(parkmin), str(parkmax)), len(parks_df[parks_df.check == True]))

# generate selection - a slightly simplified dataframe with some extraneous columns dropped
selection = parks_df[parks_df.check == True].reset_index().drop(['check','index','item_type','geom_type'],axis=1)

Park feature larger than 1ha, smaller than 1ha: 61


Here, we print out the sorted selection dataframe for visual inspection before running the main analysis on each item

In [60]:
selection = selection.sort_values("area",ascending=False)
selection

Unnamed: 0,id,OSM_id,name,area
56,234,OSM-way-363703984,,139.826828
0,0,OSM-way-363703984,,109.213811
27,123,OSM-relation-1180631,,101.921389
44,185,OSM-way-588730740,,75.539344
34,150,OSM-way-65352759,,47.407431
2,6,OSM-way-61957808,,42.630342
47,193,OSM-way-28065006,,36.542003
9,51,OSM-way-142588810,,19.191257
10,55,OSM-way-34747066,,14.791980
55,230,OSM-way-33704334,,13.292134


In [51]:
## get trained classifier from pickle

import pickle
pick = r'D:\GOST\Georgia\Pickle'

with open(os.path.join(pick, 'trained_classifier_Tbilisi_SAVE.pickle'), 'rb') as handle:
    gs = pickle.load(handle,encoding='latin1')

AttributeError: Can't get attribute 'DeprecationDict' on <module 'sklearn.utils.deprecation' from 'C:\\Users\\charl\\Anaconda3\\envs\\GBDX\\lib\\site-packages\\sklearn\\utils\\deprecation.py'>

In [10]:
# create dataframe to save indicator values and a separate df with scoring of values

park_selection_df = []

parks_scoring_df = []

parks_selection_df = pd.DataFrame(columns=['id','OSM_id','X_wgs','Y_wgs','name','area','Fac_Bench', 'Fac_Waste', 
                                           'Fac_Toilet', 'Fac_Water','Fac_Play', 'Fac_Hist', 
                                           'Fac_Retail', 'Fac_Fountain', 'Fac_Sports','ndvidiff','meanLAI',
                                           'wArea', 'wEccentricity', 'wMaj_Axis_Length', 'wMin_Axis_Length',
                                           'wPerimeter', 'RepPer_vegcover'])


parks_scoring_df = pd.DataFrame(columns=['id','OSM_id','X_wgs','Y_wgs','name'])

In [11]:
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]

In [12]:
chunks = list(chunks(selection.index.values,100))

In [17]:
i = 1; 
##################################################################################
#                           ITERATING OVER ALL PARKS                             #
##################################################################################
for chunk in chunks:

    for park_nr in chunk:
        
        print('park number: %d of %d' % (park_nr, len(chunk)))

        selectedPark=parks_geojson[selection.id[park_nr]]
        parkshape=shape(selectedPark['geometry'])
        parkid=selectedPark['properties'][u'id']

        project = partial(
            pyproj.transform,
            pyproj.Proj(init='epsg:4326'), # source coordinate system
            pyproj.Proj(init=UTM_EPSG_code)) # destination coordinate system

        centroid_x,centroid_y = parkshape.centroid.xy

        parkshape_utm = transform(project, parkshape)  # apply projection

        x_wgs,y_wgs = parkshape.exterior.xy
        bbox_park_area = min(x_wgs), min(y_wgs), max(x_wgs), max(y_wgs)
        bbox_wkt = box(*bbox_park_area).wkt

        aoi = bbox_wkt

        ##################################################################################
        # COUNTING AMENETIES

        # Import the function to count amenities within a park
        import get_OSM_Amenities as getOSM

        reload(getOSM)

        amenities_df = getOSM.get_OSM_Amenities(parkid,parkshape)
        presence = amenities_df.iloc[0,1:10]>0
        percentage = (presence.sum())*100/9

        ######################### SETTING SCORE ########################################

        # Give scoring to percentage of amenities present from 1 (bad) to 5 (Perfect)

        if percentage > 80:
            amenscore = 5
        elif percentage > 60:
            amenscore = 4
        elif percentage > 40:
            amenscore = 3
        elif percentage > 20:
            amenscore = 2
        elif percentage >= 0:
            amenscore = 1

        #################################################################################
        # calculate difference in NDVI

        import Somefunctions as somef
        import Watertools as wt


        dfs,dfw,ims,imw = somef.get_SumWin(parkshape,UTM_EPSG_code)
        
        if not isinstance(dfs, pd.DataFrame):
             
            print('not enough images for this park, skip park')    
                
        else:
            print('is ok')

            imgs,imgw,resh = somef.reziseimages(ims,imw)

            ndvis,nviw,ndvidiff,ndvi = somef.NDVIdiff(imgs,imgw,parkshape_utm)

            ######################### SETTING SCORE ########################################
            ndvidiff = abs(ndvidiff)
            # Give scoring to NDVI difference between summer and winter from 1 (bad) to 5 (Perfect)
            if ndvidiff >0.8:
                greenscore = 1
            elif ndvidiff >0.6:
                greenscore = 2
            elif ndvidiff >0.4:
                greenscore = 3
            elif ndvidiff >0.2:
                greenscore = 4
            elif ndvidiff >=0:
                greenscore = 5

            #################################################################################
            # Calulate LAI based on NDVI (NDVI: LAImean = 10.36 NDVI – 6.17, r2 = 0.55, SEE = 0.48) from Stenberg et al. (2004

            ndvisel = copy.copy(ndvi)
            ndvisel[ndvisel<=0.6]=0
            laimask= ndvisel==0
            LAI = (10.36*ndvisel)-6.17
            LAI=np.ma.array(LAI, mask=laimask)
            if ndvisel.max() < 0.6:
                meanLAI = 0
            else:
                meanLAI=LAI.mean()

            ######################### SETTING SCORE ########################################

            # Give scoring to mean Leaf Area Index value from 1 (bad) to 5 (Perfect)

            if meanLAI > 4:
                LAIscore = 5
            elif meanLAI > 3:
                LAIscore = 4
            elif meanLAI > 2:
                LAIscore = 3
            elif meanLAI > 1:
                LAIscore = 2
            elif meanLAI >= 0:
                LAIscore = 1

            #################################################################################
            # calculate are of water features
            # Resizes image is returned as numpy array and doess not have metadata anymore therefore we need to find out which 
            # image was resized

            if resh == 1:
                cellsize = imgw.ipe_metadata["image"]["groundSampleDistanceMeters"]
            else:
                cellsize = imgs.ipe_metadata["image"]["groundSampleDistanceMeters"]

            water = wt.Waterextract(imgw,parkshape_utm,cellsize)

            waterdf = wt.Watersegment(water,cellsize)

            # sort the waterbodies from large to small
            wbodies=waterdf['Min_Axis_Length'].sort_values(ascending=False)

            ########################## SETTING SCORE ########################################

            # Give scoring to the largest waterbody from 1 (bad) to 5 (Perfect)
            if len (wbodies)>0:
                Maxwbody=wbodies.iloc[0]
            else:
                Maxwbody=0

            if Maxwbody >40:
                watscore = 5
            elif Maxwbody >30:
                watscore = 4
            elif Maxwbody >20:
                watscore = 3
            elif Maxwbody>10:
                watscore = 2
            elif Maxwbody >=0:
                watscore = 1



            #################################################################################
            # calculate percentage of vegetation cover in the riparian zone

            # check if there is water 
            if len (waterdf) > 0:
                riparian= wt.Wateredge(water,cellsize, 20)
                vegcover = wt.Riparianveg(riparian,ndvi)
            else:
                vegcover = 0

            ########################## SETTING SCORE ########################################

            # Give scoring to the amount of vegetation in riparian zone from 1 (bad) to 5 (Perfect)

            if vegcover >80:
                ripscore = 5
            elif vegcover >60:
                ripscore = 4
            elif vegcover >40:
                ripscore = 3
            elif vegcover >20:
                ripscore = 2
            elif vegcover >=0:
                ripscore = 1    

            #################################################################################
            # classify different land use types and count 

            from PIL import Image, ImageDraw


            image_array = ims.read()

            # reshape image for classification
            n_bands, rows, cols = image_array.shape
            n_samples = rows*cols
            reshaped_data = image_array.reshape(8,(rows*cols))

            result = gs.predict(reshaped_data.T)
            classification = result.T.T.T.reshape((rows, cols))




            x,y = parkshape_utm.buffer(20).exterior.xy

            # resize polygon and plot polygon over image

            # subtract minimal values from utm polygon x and y to set 0,0 point as start 
            x1 = np.subtract(x, min(x))
            y1 = np.subtract(y, min(y))

            # devide the x and y coordinate of the polygon by the size of the image to match both sizes 
            x2 = np.divide(x1,max(x1)/cols)
            y2 = np.divide(y1,max(y1)/rows)

            #summer
            polygon = [(x2[i], y2[i]) for i in range(len(x2))]


            imgp = Image.new('L', (cols, rows), 0)
            ImageDraw.Draw(imgp).polygon(polygon, fill=1)
            mask = np.flipud(np.array(imgp))

            classification = classification * mask


            total_area_park_ppix = selection.area[park_nr]/sum(sum(mask))

            tree_area_park = sum(sum(classification == 1)) * total_area_park_ppix

            grass_area_park = sum(sum(classification == 2)) * total_area_park_ppix

            water_area_park = waterdf.Area.sum()/10000 #this value is in m2 while the rest is in ha

            imper_area_park = sum(sum(classification == 4)) * total_area_park_ppix

            # Now calculate the percentages of the total park area

            tree_perc_park = tree_area_park/selection.area[park_nr]*100

            grass_perc_park = grass_area_park/selection.area[park_nr]*100

            water_perc_park = water_area_park/selection.area[park_nr]*100

            imper_perc_park = imper_area_park/selection.area[park_nr]*100

            ########################## SETTING SCORE ########################################

            # Give scoring to the percentage of green (grass/trees) for stormwater capture from 1 (bad) to 5 (Perfect)
            green_perc_park = round(tree_perc_park+grass_perc_park)    

            if green_perc_park >80:
                stormscore = 5
            elif green_perc_park >60:
                stormscore = 4
            elif green_perc_park >40:
                stormscore = 3
            elif green_perc_park >20:
                stormscore = 2
            elif green_perc_park >=0:
                stormscore = 1    

            ########################## SETTING SCORE ########################################

            # Give scoring to amount of impervious surface for infiltration capacity  1 (bad) to 5 (Perfect)

            if imper_perc_park >80:
                infilscore = 1
            elif imper_perc_park > 60:
                infilscore = 2
            elif imper_perc_park > 40:
                infilscore = 3
            elif imper_perc_park > 20:
                infilscore = 4
            elif imper_perc_park >= 0:
                infilscore = 5   

            ########################## SETTING SCORE ########################################

            # Give scoring to ratio green versus grey from 1 (bad) to 5 (Perfect)
            greengrey = green_perc_park/imper_perc_park

            if  greengrey > 7/3:
                greyscore = 5
            elif  greengrey > 6/4:
                greyscore = 4
            elif  greengrey > 5/5:
                greyscore = 3
            elif  greengrey > 4/6:
                greyscore = 2
            elif  greengrey >= 0:
                greyscore = 1   


            #################################################################################
            # Divide area classified as trees by average tree crown value from Ptretzsch et al. (2015) and add monetary value

            tree_area = tree_area_park*10000
            avgcrown = math.pi*4.2**2

            # now multiply the number of trees in the park with the value of $50 per tree.  
            nrtrees = tree_area/avgcrown
            value = int(nrtrees*50) 






            from mpl_toolkits.axes_grid1 import make_axes_locatable


        #     from matplotlib import pyplot as plt
        #     f = plt.figure( figsize = (20,20))
        #     f.add_subplot(1, 2,1)
        #     r = image_array[3,:,:]
        #     g = image_array[2,:,:]
        #     b = image_array[1,:,:]
        #     rgb = np.dstack([r,g,b])
        #     f.add_subplot(1, 2, 1)
        #     plt.imshow(rgb/3000)
        #     f.add_subplot(1, 2, 2)

        #     im = plt.imshow(classification,cmap='jet')

        #     plt.colorbar(im,fraction=0.046, pad=0.04)

        #     plt.show()


            #################################################################################
            # load all data in dataframe

            parks_selection_df = parks_selection_df.append({'id':selection.id[park_nr] ,
                                                            'OSM_id': selection.OSM_id[park_nr],
                                                            'X_wgs':centroid_x[0],
                                                            'Y_wgs':centroid_y[0],
                                                            'name':selection.name[park_nr],
                                                            'area':selection.area[park_nr],
                                                            'Fac_Bench':amenities_df.Fac_Bench[0], 
                                                            'Fac_Waste':amenities_df.Fac_Waste[0], 
                                                            'Fac_Toilet':amenities_df.Fac_Toilet[0], 
                                                            'Fac_Water':amenities_df.Fac_Water[0],
                                                            'Fac_Play':amenities_df.Fac_Play[0], 
                                                            'Fac_Hist':amenities_df.Fac_Hist[0],
                                                            'Fac_Retail':amenities_df.Fac_Retail[0], 
                                                            'Fac_Fountain':amenities_df.Fac_Fountain[0],
                                                            'Fac_Sports':amenities_df.Fac_Sports[0],
                                                            'ndvidiff':ndvidiff,
                                                            'meanLAI':meanLAI,
                                                            'wArea':waterdf.Area.sum()/10000,
                                                            'wEccentricity':waterdf.Eccentricity.mean(), 
                                                            'wMaj_Axis_Length':waterdf.Maj_Axis_Length.mean(), 
                                                            'wMin_Axis_Length':waterdf.Min_Axis_Length.mean(),
                                                            'wPerimeter':waterdf.Perimeter.sum(), 
                                                            'RepPer_vegcover':vegcover,
                                                            'Impermeable':imper_area_park,
                                                            'Trees':tree_area_park,
                                                            'Grass':grass_area_park,
                                                            'Monetary': value

                                                           }




                                                            ,ignore_index=True) #set test for desirable parks

            parks_scoring_df = parks_scoring_df.append({'id':selection.id[park_nr] ,
                                                            'OSM_id': selection.OSM_id[park_nr],
                                                            'X_wgs':centroid_x[0],
                                                            'Y_wgs':centroid_y[0],
                                                            'name':selection.name[park_nr],
                                                            'Temp_LAI': LAIscore,
                                                            'Temp_Water':watscore, 
                                                            'Infil_Storm':stormscore, 
                                                            'Infil_Rip':ripscore, 
                                                            'Infil_Inper':infilscore,
                                                            'Soc_Amen':amenscore, 
                                                            'Soc_Winter':greenscore,
                                                            'Soc_Grey':greyscore, 
                                                            'Monetary': value

                                                           }




                                                            ,ignore_index=True) #set test for desirable parks




            ### write data to pickle


            if i%20 == 0: 



                with open('/home/gremlin/GGCW_tools_git/Pickle/park_score_set_{0}_{1}.pickle'.format(city,i), 'wb') as handle:
                    pickle.dump(parks_scoring_df, handle, protocol=pickle.HIGHEST_PROTOCOL)


                with open('/home/gremlin/GGCW_tools_git/Pickle/park_indicators_set_{0}_{1}.pickle'.format(city,i), 'wb') as handle:
                    pickle.dump(parks_scoring_df, handle, protocol=pickle.HIGHEST_PROTOCOL)                


                    parks_selection_df

            i = i + 1;
        
        
    #     break


    print(park_nr)

park number: 0 of 43
is ok
park number: 1 of 43
is ok
park number: 2 of 43
is ok
park number: 3 of 43
is ok
park number: 4 of 43
is ok
park number: 5 of 43
is ok
park number: 6 of 43
is ok
park number: 7 of 43
is ok
park number: 8 of 43
is ok
park number: 9 of 43
is ok
park number: 10 of 43
is ok
park number: 11 of 43
not enough images for this park, skip park
park number: 12 of 43
is ok
park number: 13 of 43
is ok
park number: 14 of 43
is ok
park number: 15 of 43
is ok
park number: 16 of 43
is ok
park number: 17 of 43
is ok
park number: 18 of 43
is ok
park number: 19 of 43
is ok
park number: 20 of 43
is ok
park number: 21 of 43
is ok
park number: 22 of 43
is ok
park number: 23 of 43
is ok
park number: 24 of 43
is ok
park number: 25 of 43
is ok
park number: 26 of 43
is ok
park number: 27 of 43
is ok
park number: 28 of 43
is ok
park number: 29 of 43
is ok
park number: 30 of 43
is ok
park number: 31 of 43
is ok
park number: 32 of 43
is ok
park number: 33 of 43
is ok
park number: 34 of 43

## Write Data to Files

In [None]:
parks_scoring_df.to_csv('parks_scoring_df_{}.csv'.format(city))
parks_selection_df.to_csv('parks_selection_df_{}.csv'.format(city))