# Prague Pedestrian Accessibility for Children (age 10-16)

### Installing packages

I will use OSMnx - a Python package by Geoff Boeing to work with street networks: retrieve, construct, analyze, and visualize street networks (and more) from OpenStreetMap.
<a>https://github.com/gboeing/osmnx</a>.

In [2]:
!conda install -c conda-forge/label/gcc7 osmnx

Solving environment: done

# All requested packages already installed.



Installing gecoder. A simple and consistent geocoding library

In [3]:
!conda install -c conda-forge geocoder

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    psycopg2-2.8.2             |   py36h72c5cf5_0         163 KB  conda-forge
    gdal-2.2.2                 |   py36hc209d97_1         767 KB
    libssh2-1.8.2              |       h22169c7_2         257 KB  conda-forge
    libcurl-7.65.3             |       h20c2e04_0         588 KB
    libpq-11.2                 |       h20c2e04_0         2.7 MB
    fiona-1.7.13               |   py36hb00a9d7_1         1.8 MB  conda-forge
    freetds-1.1rc3             |       h4fe99da_0         2.4 MB  conda-forge
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    hdf5-1.10.1                |                2         5.0 MB  conda-forge
    pycurl-7.43.0.3            |   py36h16ce93b_0    

In [4]:
!conda install -c conda-forge pandana

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - pandana


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    osmnet-0.1.5               |             py_3          29 KB  conda-forge
    pandana-0.4.4              |   py36hb3f55d8_0         158 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         187 KB

The following NEW packages will be INSTALLED:

    osmnet:  0.1.5-py_3           conda-forge
    pandana: 0.4.4-py36hb3f55d8_0 conda-forge


Downloading and Extracting Packages
osmnet-0.1.5         | 29 KB     | ##################################### | 100% 
pandana-0.4.4        | 158 KB    | ##################################### | 100% 
Preparing transaction: done
Verifying transaction: done
Executing transaction: done


In [5]:
!conda update --all

Solving environment: \ 
  - defaults::libgfortran-3.0.0-1
  - conda-forge::libgfortran-3.0.0done

## Package Plan ##

  environment location: /opt/conda/envs/Python36


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    xlsxwriter-1.2.1           |             py_0         106 KB
    soupsieve-1.9.3            |           py36_0          60 KB
    beautifulsoup4-4.8.0       |           py36_0         147 KB
    mkl_fft-1.0.14             |   py36ha843d7b_0         173 KB
    fiona-1.8.4                |   py36hc38cc03_0         1.0 MB
    openjpeg-2.3.0             |       h05c96fa_1         456 KB
    jeepney-0.4.1              |             py_0          21 KB
    jupyterlab_server-1.0.6    |             py_0          26 KB
    fastcache-1.1.0            |   py36h7b6447c_0          31 KB
    dask-2.6.0                 |             py_0          12 KB
    markupsafe-1.1.1           |   py

Importing all nessecary libraries

In [6]:
import osmnx as ox
import io
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import folium

import json
import urllib.request
import requests
from pandas.io.json import json_normalize
import operator
from shapely.geometry import Polygon
import geopandas as gpd
from pandana.loaders import osm


## Preparing the envieroment 

Some operation for preparing and cleanning data can consume a lot of computation resourses and time. For optimization purpose we will upload our prepared datasets to IBM Cloud Storage. Further in "Data analysis" and "Modeling" sections will will use this uploaded data. 

In [77]:
# The code was removed by Watson Studio for sharing.

Define upload and download functions.

In [78]:
import sys
from ibm_botocore.client import Config
import ibm_boto3

def upload_file(credentials,local_file_name,key): 
    storage = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['apikey'],
    ibm_service_instance_id=credentials['iam_serviceid_crn'],
    ibm_auth_endpoint=credentials['auth_ep'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ep'])
    
    try:
        res=storage.upload_file(Filename=local_file_name, Bucket=credentials['bucket'],Key=key)
    except Exception as e:
        print(Exception, e)
    else:
        print('File {} Uploaded'.format(local_file_name))
        
def download_file(credentials,local_file_name,key):  
    storage = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['apikey'],
    ibm_service_instance_id=credentials['iam_serviceid_crn'],
    ibm_auth_endpoint=credentials['auth_ep'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ep'])
    try:
        res= storage.download_file(Bucket=credentials['bucket'],Key=key,Filename=local_file_name)
    except Exception as e:
        print(Exception, e)
    else:
        print('File {} Downloaded'.format(local_file_name))

For building network toology will need geo coordinates of collected point of interests (POI). For retriving coordinates we will use geocoder package with Arcgis provider.

In [56]:
#!conda install -c conda-forge geocoder #Uncomment this cell to install geocoder package if it is not installed

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    ratelim-0.1.6              |             py_2           6 KB  conda-forge
    geocoder-1.38.1            |             py_1          53 KB  conda-forge
    certifi-2019.9.11          |           py36_0         147 KB  conda-forge
    ca-certificates-2019.9.11  |       hecc5488_0         144 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geocoder:        1.38.1-py_1       conda-forge
    ratelim:         0.1.6-py_2        conda-forge

The following packages will be UPDATED:

    

Define coordinates retrivial function. As an input parameters it takes DataFrame and name of row with address string

In [57]:
import geocoder

def get_coordinates(dataFrame, index_row):
    dict_coordinates = {}
    for index, row in dataFrame.iterrows():
        try:
            g = geocoder.arcgis(row[index_row])
            lat = g.json['lat']
            lng = g.json['lng']
            dict_coordinates[index] = [lat, lng]
        except:
            print ('Failed to get coordinates for {}: {}'.format(index_row, sys.exc_info()[0]))
    
    dataFrame['latitude'] = 0.0
    dataFrame['longitude'] = 0.0
    
    for k, v in dict_coordinates.items():
        dataFrame.loc[dataFrame.index == k,'latitude']=v[0]
        dataFrame.loc[dataFrame.index == k,'longitude']=v[1]
    print('Done')

## Data acquisition and cleaning

As main data source I sellected <a>http://opendata.praha.eu</a>.  This is a big database of data of different types amd it contains data from different fileds: transport, society, ecology, population e.t.c  
This data sets were mainly colleted and structured by Prague Institute of Planning and Development <a>www.iprpraha.cz</a> For my project I mainly intrested in:
* Shape and location of Prague administrative districs
* Districts population
* Geo location different types of social infrastructure 

### Districts borders and population.

First step is to determine the shape and location of the administrative disctrics of Prague.

In [67]:
mestky_casty_url = 'http://opendata.iprpraha.cz/CUR/DTMP/TMMESTSKECASTI_P/WGS_84/TMMESTSKECASTI_P.json'
results = requests.get(mestky_casty_url).json(encoding = "utf8")
mestky_casty = json_normalize(results['features']) 
mestky_casty.head()

Unnamed: 0,type,geometry.type,geometry.coordinates,properties.OBJECTID,properties.DAT_VZNIK,properties.DAT_ZMENA,properties.PLOCHA,properties.ID,properties.KOD_MC,properties.NAZEV_MC,properties.KOD_MO,properties.KOD_SO,properties.TID_TMMESTSKECASTI_P,properties.POSKYT,properties.ID_POSKYT,properties.STAV_ZMENA,properties.NAZEV_1,properties.Shape_Length,properties.Shape_Area
0,Feature,Polygon,"[[[14.533725418000074, 50.16223134300003], [14...",1,20181106141412,20190423111436,10183715.88,25,547310,Praha-Čakovice,94,221,25,HMP-IPR,43,U,Čakovice,0.213162,10183720.0
1,Feature,Polygon,"[[[14.293206908000059, 50.07751405400006], [14...",2,20181106141412,20181106164427,3253142.41,52,547174,Praha 17,60,213,52,HMP-IPR,43,U,Praha 17,0.095029,3253142.0
2,Feature,Polygon,"[[[14.483934895000061, 49.99241857800007], [14...",3,20181009145125,20190821104230,5234736.54,19,547051,Praha-Libuš,43,124,19,HMP-IPR,43,U,Libuš,0.200404,5234737.0
3,Feature,Polygon,"[[[14.506905018000054, 50.17143575600005], [14...",4,20170817145228,20170818091113,3380681.9,35,538124,Praha-Březiněves,86,86,35,HMP-IPR,43,U,Březiněves,0.127235,3380682.0
4,Feature,Polygon,"[[[14.43852135000003, 50.06691477800007], [14....",5,20180910110223,20180910113234,4184937.95,30,500089,Praha 2,27,27,30,HMP-IPR,43,U,Praha 2,0.134652,4184938.0


In [68]:
mestky_casty.shape

(57, 19)

In [69]:
geo_unique = np.array(mestky_casty['properties.NAZEV_MC'].apply(lambda x: x.lower()).unique())
geo_unique

array(['praha-čakovice', 'praha 17', 'praha-libuš', 'praha-březiněves',
       'praha 2', 'praha 1', 'praha 11', 'praha-zbraslav', 'praha 15',
       'praha 4', 'praha 5', 'praha 20', 'praha-dolní měcholupy',
       'praha 6', 'praha 9', 'praha 10', 'praha 14', 'praha 12',
       'praha-kolovraty', 'praha-újezd', 'praha 13', 'praha-řeporyje',
       'praha-suchdol', 'praha-ďáblice', 'praha-šeberov',
       'praha-dolní chabry', 'praha 19', 'praha-koloděje',
       'praha-satalice', 'praha-petrovice', 'praha 3',
       'praha-velká chuchle', 'praha-dolní počernice',
       'praha-přední kopanina', 'praha-královice', 'praha-kunratice',
       'praha-slivenec', 'praha-vinoř', 'praha-lochkov', 'praha-nebušice',
       'praha-benice', 'praha 18', 'praha-křeslice', 'praha-troja',
       'praha 7', 'praha-nedvězí', 'praha 21', 'praha-běchovice',
       'praha-štěrboholy', 'praha-dubeč', 'praha-lysolaje',
       'praha-lipence', 'praha 8', 'praha 22', 'praha-zličín', 'praha 16',
       'praha-

Districts population

In [70]:
url_population =  'https://www.czso.cz/documents/10180/25233177/sldb_zv.csv'
df_population = pd.read_csv(url_population,encoding = "ISO 8859-2")
df_population.head()

Unnamed: 0,typuz_naz,nazev,uzcis,uzkod,u01,u02,u03,u04,u05,u06,u07,u08,u09,u10,u11
0,kraj,Hlavní město Praha,100,3018,1268796.0,613738.0,655058.0,153622.0,908321.0,201029.0,644643.0,600730.0,92927.0,542168.0,579509.0
1,kraj,Středočeský kraj,100,3026,1289211.0,637252.0,651959.0,199300.0,895024.0,190911.0,639851.0,587539.0,286780.0,482860.0,523045.0
2,kraj,Jihočeský kraj,100,3034,628336.0,308296.0,320040.0,91119.0,435187.0,100000.0,307130.0,280844.0,123048.0,247608.0,262692.0
3,kraj,Plzeňský kraj,100,3042,570401.0,282137.0,288264.0,79469.0,396468.0,92734.0,278674.0,255278.0,105835.0,226298.0,242397.0
4,kraj,Karlovarský kraj,100,3051,295595.0,145483.0,150112.0,42159.0,207480.0,44538.0,139871.0,123100.0,39845.0,119403.0,128904.0


Cleaning population data set. For my project only make sence data 

In [71]:
df_population = df_population[(df_population.uzcis == 44)& (df_population.nazev.str.find('Praha') != -1)][['nazev','u01','u04', 'u05', 'u06']]
df_population.rename(columns={'nazev':'Name','u01':'Total', 'u04':'Kids', 'u05':'Middle', 'u06':'Senior'}, inplace = True)
df_population['Name'] = df_population['Name'].map(lambda x: x.lower())
df_population.shape

(57, 5)

Quick analizing of distrcit population

In [72]:
population_unique = df_population['Name'].unique()
population_unique

array(['praha 1', 'praha 2', 'praha 3', 'praha 4', 'praha 5', 'praha 6',
       'praha 7', 'praha 8', 'praha 9', 'praha 10', 'praha-běchovice',
       'praha-benice', 'praha-březiněves', 'praha-dolní počernice',
       'praha-dubeč', 'praha 20', 'praha-klánovice', 'praha-koloděje',
       'praha-kolovraty', 'praha-královice', 'praha-křeslice',
       'praha-nedvězí', 'praha-satalice', 'praha 22', 'praha 21',
       'praha-vinoř', 'praha-lipence', 'praha-lochkov',
       'praha-přední kopanina', 'praha 16', 'praha-řeporyje',
       'praha-slivenec', 'praha 13', 'praha-\x8aeberov', 'praha-újezd',
       'praha-zbraslav', 'praha-zličín', 'praha 11', 'praha-kunratice',
       'praha-libu\x9a', 'praha 12', 'praha-velká chuchle',
       'praha-lysolaje', 'praha-nebu\x9aice', 'praha 17', 'praha-suchdol',
       'praha-ďáblice', 'praha-dolní chabry', 'praha-čakovice',
       'praha-troja', 'praha 19', 'praha 14', 'praha-dolní měcholupy',
       'praha 15', 'praha-petrovice', 'praha-\x8atěrboho

Checking the difference between to datasets

In [73]:
districts_diff_geo = list(set(geo_unique)-set(population_unique))
districts_diff_geo

['praha-šeberov', 'praha-štěrboholy', 'praha-libuš', 'praha-nebušice']

In [74]:
districts_diff_pop = list(set(population_unique)-set(geo_unique))
districts_diff_pop

['praha-libu\x9a',
 'praha-nebu\x9aice',
 'praha-\x8aeberov',
 'praha-\x8atěrboholy']

Population data set has encoding errors. Lets fix it

In [75]:
df_population.loc[df_population.Name == 'praha-libu\x9a', 'Name'] = 'praha-libuš'
df_population.loc[df_population.Name == 'praha-\x8aeberov', 'Name'] = 'praha-šeberov'
df_population.loc[df_population.Name == 'praha-nebu\x9aice', 'Name'] = 'praha-nebušice'
df_population.loc[df_population.Name == 'praha-\x8atěrboholy', 'Name'] = 'praha-štěrboholy'

In [76]:
population_unique = df_population['Name'].unique()
districts_diff_pop = list(set(population_unique)-set(geo_unique))
districts_diff_pop

[]

Districts dataset and population dataset have the same districs. Lets make a join of two data sets

In [77]:
result = []

result.append([
    v['properties']['NAZEV_MC'].lower(),
    v['geometry']['coordinates'][0],
    v['properties']['PLOCHA']] for v in results['features'])
    
df_prague_districts = pd.DataFrame([item for result in result for item in result])
df_prague_districts.columns = ['Name', 'Geometry', 'Area']

In [79]:
df_prague = df_prague_districts.set_index('Name').join(df_population.set_index('Name'))
quotient = df_prague['Middle']/1000
df_prague['Kids_per_1000'] = df_prague['Kids']/quotient
df_prague.sort_values('Name', inplace = True)
df_prague.reset_index(inplace=True)

In [80]:
df_prague.shape

(57, 8)

In [90]:
get_coordinates(df_prague, 'Name')

Done


In [91]:
df_prague.head()

Unnamed: 0,Name,Geometry,Area,Total,Kids,Middle,Senior,Kids_per_1000,latitude,longitude
0,praha 1,"[[14.410891049000043, 50.078674687000046], [14...",5538443.86,30561.0,2391.0,22963.0,4594.0,104.124026,50.08728,14.41742
1,praha 10,"[[14.531321086000048, 50.072240288000046], [14...",18599366.98,113200.0,12213.0,76625.0,23937.0,159.386623,50.06762,14.46016
2,praha 11,"[[14.54355294800007, 50.03618763800006], [14.5...",9793679.84,75741.0,8688.0,54983.0,11816.0,158.012477,50.03178,14.50719
3,praha 12,"[[14.450632163000023, 50.01452735600003], [14....",23317909.06,53515.0,6156.0,39699.0,7480.0,155.066878,50.00564,14.40462
4,praha 13,"[[14.320621949000042, 50.04010680700003], [14....",13196802.19,59906.0,7985.0,46514.0,5109.0,171.668745,50.05163,14.34231


Saving data set to stroage for later use

In [95]:
file_name = 'prague_district_population.csv'
df_prague.to_csv(file_name)
upload_file(storage_creds,file_name,file_name)

File prague_district_population.csv Uploaded


Explore children popupualtion in Prague

In [25]:
gdf_kids = gdf[['Kids_per_1000', 'geometry']]
gdf_kids.reset_index(inplace = True)


### Points of interest

#### Playgrounds
Data from Hřiště Praha 2014 - 2016 <a>http://www.hristepraha.cz</a> Last update 19.01.2018

In [123]:
url_playgrounds = 'http://opendata.praha.eu/dataset/3c3ca9ca-fbc0-4f97-b624-ed967f5d9a24/resource/e19c2e29-5e33-4449-8847-5dc8f5b8a2f2/download/db144c03-1a0f-456f-a32b-9c48ccfc0813-playgrounds.json'
results = requests.get(url_playgrounds).json(encoding = "utf8")
df_playgrounds = json_normalize(results['features']) 
df_playgrounds.head()

Unnamed: 0,type,properties.id,properties.name,properties.url,properties.perex,properties.content,properties.district,properties.address,properties.properties,properties.image.url,geometry.type,geometry.coordinates
0,Feature,101,Sídliště Petrovice - Rezlerova,http://www.hristepraha.cz/hriste/mapa/sidliste...,"Lokalita nabízí několik pěkných menších hřišť,...",Za panelovým domem v Rezlerově ulici se rozklá...,praha-petrovice,"Rezlerova 278, 109 00 Praha-Praha-Petrovice, Č...",[],http://www.hristepraha.cz/images/img/41f5da50e...,Point,"[14.56323719, 50.038024902]"
1,Feature,43,Bohnice a Čimice - Čimice,http://www.hristepraha.cz/hriste/mapa/bohnice-...,Nedaleko od sebe leží 2 pěkná hřiště.,"Větší hřiště se rozkládá, mezi ulicemi Toruňsk...",praha-8,"Skálova 545/24, Čimice, 181 00 Praha-Praha 8, ...",[],http://www.hristepraha.cz/images/img/b07bef69a...,Point,"[14.438850403, 50.13401413]"
2,Feature,100,Na Krejcárku - hřiště 60.B,http://www.hristepraha.cz/hriste/mapa/na-krejc...,Lokalita se skvěle hodí pro rodiny s dětmi růz...,Dětské hřiště (60.A) najdete na konci ulice St...,praha-3,"Za Žižkovskou vozovnou 2716/19, Žižkov, 130 00...",[],http://www.hristepraha.cz/images/img/a0cad32d8...,Point,"[14.476410866, 50.094387054]"
3,Feature,131,Uhříněves - hřiště 82.B,http://www.hristepraha.cz/hriste/mapa/uhrineve...,"Hřiště, lesopark a další zajímavá místa, to je...","Cestu doporučujeme zahájit na Novém náměstí, k...",praha-22,"V Bytovkách 754/30, Uhříněves, 104 00 Praha-Pr...",[],http://www.hristepraha.cz/images/img/3063cb73f...,Point,"[14.593131065, 50.036453247]"
4,Feature,72,Hostivařský lesopark (východní část) - hřiště ...,http://www.hristepraha.cz/hriste/mapa/hostivar...,Trasa je vhodným polodenním rodinným výletem.,Popis: Asi 300 m od prodejny Lidl v Hornoměcho...,praha-15,"U Břehu 1111, Hostivař, 102 00 Praha-Praha 15,...",[],http://www.hristepraha.cz/images/img/2d73f6832...,Point,"[14.539891243, 50.043731689]"


In [124]:
poi_type = 'playground'
result = []

result.append([
    poi_type,
    v['properties']['district'].lower(),
    v['geometry']['coordinates'][0],
    v['geometry']['coordinates'][1]] for v in results['features'])
    
df_prague_poi = pd.DataFrame([item for result in result for item in result])
df_prague_poi.columns = ['Type', 'District_Name', 'latitude','longitude']
df_prague_poi.head()

Unnamed: 0,Type,District_Name,latitude,longitude
0,playground,praha-petrovice,14.563237,50.038025
1,playground,praha-8,14.43885,50.134014
2,playground,praha-3,14.476411,50.094387
3,playground,praha-22,14.593131,50.036453
4,playground,praha-15,14.539891,50.043732


In [125]:
df_prague_poi.shape

(145, 4)

### Sport facilities
Magistrát hl. m. Prahy 	1. duben 2019, 0:00 (UTC+02:00)

In [102]:
url_sport = 'http://opendata.praha.eu/datastore/dump/5d1ee13f-f6e9-4ee9-a1bd-48d5ca2bb867?format=json'
results = requests.get(url_sport).json(encoding = "utf8")
result = []

result.append([
    'sport',
    'praha {}'.format(v[6]),
     v[2]] for v in results['records'])
    
df_sport = pd.DataFrame([item for result in result for item in result])
df_sport.columns = ['Type', 'District_Name', 'Address']
df_sport.head()

Unnamed: 0,Type,District_Name,Address
0,sport,praha 5,"Butovická 837/41, Praha 5"
1,sport,praha 5,"Zahradníčkova, Praha 5, 150 00"
2,sport,praha 1,"Senovážné náměstí 6, Praha 1, 110 00"
3,sport,praha 12,"Zelenkova 3/530, Praha 12, 142 00"
4,sport,praha 7,"Štvanice 38, Praha 7, 170 00"


In [103]:
df_sport.isnull().values.any()

False

In [106]:
get_coordinates(df_sport, 'Address')
df_sport.head()    

Done


Unnamed: 0,Type,District_Name,Address,latitude,longitude
0,sport,praha 5,"Butovická 837/41, Praha 5",50.052154,14.360772
1,sport,praha 5,"Zahradníčkova, Praha 5, 150 00",50.068925,14.345478
2,sport,praha 1,"Senovážné náměstí 6, Praha 1, 110 00",50.085924,14.431106
3,sport,praha 12,"Zelenkova 3/530, Praha 12, 142 00",50.009011,14.447198
4,sport,praha 7,"Štvanice 38, Praha 7, 170 00",50.09669,14.44014


In [109]:
df_sport.drop(columns=['Address'], inplace = True)

In [110]:
df_sport.shape

(877, 4)

### Libraries

In [114]:
url_libs = 'https://cs.wikipedia.org/wiki/M%C4%9Bstsk%C3%A1_knihovna_v_Praze'

f = urllib.request.urlopen(url_libs)
html = f.read()

try: 
    from BeautifulSoup import BeautifulSoup
except ImportError:
    from bs4 import BeautifulSoup

parsed_html = BeautifulSoup(html)
tag_header = parsed_html.find_all('h4')
district_tags = []

for tag in tag_header:
    tag_match = False
    district =''
    for child in tag.children:
        if child.get("class")[0] == 'mw-headline':
            district = child.get_text().lower()
            tag_match = True
    if tag_match == True:
        nextsibling = tag.next_sibling
        while  True:
            if nextsibling.find('ul') != -1 :
                lists = nextsibling.find_all('li')
                for lib in  lists:
                    district_tags.append(['library',district,lib.get_text()])
                break
            else:
                nextsibling = nextsibling.next_sibling

df_libs = pd.DataFrame(data=district_tags)
df_libs.columns = ['Type', 'District_Name', 'Address']
df_libs.head()

Unnamed: 0,Type,District_Name,Address
0,library,praha 1,"„Školská“, Nové Město, Školská 1267/30"
1,library,praha 1,"„Hradčany“, Hradčany, Pohořelec 111/25"
2,library,praha 2,"„Záhřebská“, Vinohrady, Záhřebská 158/20"
3,library,praha 2,"„Dittrrichova“, Nové Město, Dittrichova 1543/2"
4,library,praha 2,"„Ostrčilovo náměstí“, Nusle, Ostrčilovo náměst..."


In [115]:
get_coordinates(df_libs, 'Address')
df_libs.head()

Done


Unnamed: 0,Type,District_Name,Address,latitude,longitude
0,library,praha 1,"„Školská“, Nové Město, Školská 1267/30",50.079501,14.424045
1,library,praha 1,"„Hradčany“, Hradčany, Pohořelec 111/25",50.087778,14.389993
2,library,praha 2,"„Záhřebská“, Vinohrady, Záhřebská 158/20",50.07191,14.43697
3,library,praha 2,"„Dittrrichova“, Nové Město, Dittrichova 1543/2",50.07362,14.41641
4,library,praha 2,"„Ostrčilovo náměstí“, Nusle, Ostrčilovo náměst...",50.06579,14.42468


In [117]:
df_libs.drop(columns='Address', inplace = True)
df_libs.shape

(41, 4)

Union the results

In [127]:
df_prague_poi = pd.concat([df_prague_poi,df_sport], sort=True)
df_prague_poi = pd.concat([df_prague_poi, df_libs], sort=True)
df_prague_poi.shape

In [129]:
poi_file_name = 'prague_poi.csv'
df_prague_poi.to_csv(poi_file_name)
upload_file(storage_creds,poi_file_name,poi_file_name)

File prague_poi.csv Uploaded


### Schools and educational centers 

Rejstřík škol a školských zařízení - Hl. m. Praha
Aktuální data rejstříku škol a školských zařízení - Hl. m. Praha
MŠMT 14.10.2019

In [1]:
import urllib.request
import requests

url_schools = 'https://rejstriky.msmt.cz/opendata/vrejcz010.xml'
file_schools = 'schools.xml'
results = requests.get(url_schools)
results.content
with open(file_schools, 'w') as file:
    file.write(results.text)
print('Done') 

Done


In [3]:
import xml.etree.ElementTree as et 
xtree = et.parse(file_schools)
xroot = xtree.getroot()

In [18]:
dic_scools = []
try:
    for entry in xroot.findall('PravniSubjekt'):
        place_group = entry.find('SkolyZarizeni')
        if(place_group is None):
            continue
        for place in place_group.findall('SkolaZarizeni'):
            s_id = place.find('IZO').text
            s_type = place.find('SkolaDruhTyp').text
            s_name = place.find('SkolaPlnyNazev').text
            s_capasity = place.find('SkolaKapacita').text
            s_adress = place.find('SkolaMistaVykonuCinnosti')
            s_actual_add = s_adress.find('SkolaMistoVykonuCinnosti')
            s_addres1 =  s_actual_add.find('MistoAdresa1').text
            s_addres2 =  s_actual_add.find('MistoAdresa2').text
            s_addres3 =  s_actual_add.find('MistoAdresa3').text
            print(s_id, s_name,  s_type, s_capasity, '{} {} {}'.format(s_addres1, s_addres2, s_addres3))
            dic_scools.append([s_id, s_name,  s_type, s_capasity, '{} {} {}'.format(s_addres1, s_addres2, s_addres3)])
except:
    print ('Exception', sys.exc_info()[0])     

049625918 Mateřská škola A00 52 Ostrovní 139/11 Nové Město 110 00 Praha 1
102413096 Školní jídelna L11 90 Ostrovní 139/11 Nové Město 110 00 Praha 1
107500884 Mateřská škola A00 70 Ke Kamýku 686/2 Kamýk 142 00 Praha 4
161102263 Školní jídelna - výdejna L13 70 Ke Kamýku 686/2 Kamýk 142 00 Praha 4
110034384 Mateřská škola A00 6 Smolkova 567/2 Kamýk 142 00 Praha 4
110380169 Základní škola B00 30 Smolkova 567/2 Kamýk 142 00 Praha 4
110034392 Přípravný stupeň základní školy speciální M60 6 Smolkova 567/2 Kamýk 142 00 Praha 4
060437171 Mateřská škola A00 130 Podpěrova 1879/2 Stodůlky 155 00 Praha 5
102449244 Školní jídelna L11 148 Podpěrova 1879/2 Stodůlky 155 00 Praha 5
110035585 Mateřská škola A00 18 Hábova 1571/22 Stodůlky 155 00 Praha 5
049370782 Mateřská škola A00 86 Žabovřeská 1227 Zbraslav 156 00 Praha 5
102449597 Školní jídelna L11 90 Žabovřeská 1227 Zbraslav 156 00 Praha 5
110020766 Mateřská škola A00 131 Klausova 2448/6 Stodůlky 155 00 Praha 5
110020774 Školní jídelna L11 131 Klauso

In [22]:
import pandas as pd
columns = ['id', 'name', 'type', 'capacity', 'address']
df_schools = pd.DataFrame(dic_scools, columns = columns)
df_schools.head()

Unnamed: 0,id,name,type,capacity,address
0,49625918,Mateřská škola,A00,52,Ostrovní 139/11 Nové Město 110 00 Praha 1
1,102413096,Školní jídelna,L11,90,Ostrovní 139/11 Nové Město 110 00 Praha 1
2,107500884,Mateřská škola,A00,70,Ke Kamýku 686/2 Kamýk 142 00 Praha 4
3,161102263,Školní jídelna - výdejna,L13,70,Ke Kamýku 686/2 Kamýk 142 00 Praha 4
4,110034384,Mateřská škola,A00,6,Smolkova 567/2 Kamýk 142 00 Praha 4


In [48]:
types = df_schools['type'].unique()
for t in types:
    print(t,df_schools[df_schools.type == t].iloc[0,1])

A00 Mateřská škola
L11 Školní jídelna
L13 Školní jídelna - výdejna
B00 Základní škola
M60 Přípravný stupeň základní školy speciální
G21 Školní družina
G22 Školní klub
F10 Základní umělecká škola
C00 Střední škola
D00 Konzervatoř
M20 Školní knihovna
E00 Vyšší odborná škola
M79 Jiné účelové zařízení
H22 Domov mládeže
G11 Dům dětí a mládeže
G40 Zařízení pro další vzdělávání pedagogických pracovníků
M40 Středisko praktického vyučování
H21 Internát
K20 Speciálně pedagogické centrum
G12 Stanice zájmových činností
K10 Pedagogicko-psychologická poradna
F20 Jazyková škola s právem státní jazykové zkoušky
J12 Dětský domov se školou
J21 Středisko výchovné péče
J14 Diagnostický ústav
J11 Dětský domov
J13 Výchovný ústav
L12 Školní jídelna - vývařovna
H10 Škola v přírodě
A15 Mateřská škola (lesní mateřská škola)


In [66]:
school_types = ['B00', 'F10', 'C00']
klub_types = ['H22', 'G11']
columns_to_drop = ['id', 'type','name','capacity']
df_sorted_schools = df_schools[df_schools.type.isin(school_types)]

df_sorted_schools.drop(columns = columns_to_drop, inplace = True)
df_sorted_schools.reset_index(inplace = True)
df_sorted_schools.head()

Unnamed: 0,index,address
0,5,Smolkova 567/2 Kamýk 142 00 Praha 4
1,22,Ostrovní 2070/9 Nové Město 110 00 Praha 1
2,25,Soukenická 1088/10 Nové Město 110 00 Praha 1
3,26,Soukenická 1088/10 Nové Město 110 00 Praha 1
4,30,Písková 126/27 Modřany 143 00 Praha 4


In [67]:
get_coordinates(df_sorted_schools, 'address')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-

Done


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/

In [70]:
file_name = 'prague_schools.csv'
df_sorted_schools.to_csv(file_name)
upload_file(storage_creds,file_name,file_name)


File prague_schools.csv Uploaded


In [71]:
df_sorted_klubs = df_schools[df_schools.type.isin(klub_types)]
df_sorted_klubs.shape

(43, 5)

In [73]:
get_coordinates(df_sorted_klubs, 'address')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-

Done


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/

In [82]:
df_sorted_klubs.head()

Unnamed: 0,id,name,type,capacity,address,latitude,longitude
93,110350359,Domov mládeže,H22,100,Roškotova 1692/4 Braník 140 00 Praha 4,50.040551,14.428124
109,181038234,Dům dětí a mládeže,G11,neuvádí se,Korunní 586/2 Vinohrady 120 00 Praha 2,50.075293,14.43878
148,110036468,Domov mládeže,H22,152,Ohradní 72/24 Michle 140 00 Praha 4,50.052415,14.455805
169,110351029,Domov mládeže,H22,135,Vrbova 1233/34 Braník 147 00 Praha 4,50.031007,14.425772
191,110016718,Domov mládeže,H22,70,U závodiště 325/1 Velká Chuchle 159 00 Praha 5,50.011417,14.393238


In [86]:
file_name_sorted_klubs = 'sorted_klubs.csv'
df_sorted_klubs.to_csv(file_name1)
upload_file(dstorage_creds, file_name_sorted_klubs,file_name_sorted_klubs)

NameError: name 'dstorage_creds' is not defined

In [23]:
polygon = [Polygon(x) for x in df_prague.Geometry]
crs = {'init': 'epsg:4326'}
gdf = gpd.GeoDataFrame(df_prague, crs=crs, geometry=polygon)

In [24]:
gdf.head()

Unnamed: 0_level_0,Geometry,Area,Total,Kids,Middle,Senior,Kids_per_1000,latitude,longitude,geometry
Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
praha 1,"[[14.410891049000043, 50.078674687000046], [14...",5538443.86,30561.0,2391.0,22963.0,4594.0,104.124026,50.4669,4.86746,"POLYGON ((14.41089104900004 50.07867468700005,..."
praha 10,"[[14.531321086000048, 50.072240288000046], [14...",18599366.98,113200.0,12213.0,76625.0,23937.0,159.386623,50.4669,4.86746,"POLYGON ((14.53132108600005 50.07224028800005,..."
praha 11,"[[14.54355294800007, 50.03618763800006], [14.5...",9793679.84,75741.0,8688.0,54983.0,11816.0,158.012477,50.4669,4.86746,"POLYGON ((14.54355294800007 50.03618763800006,..."
praha 12,"[[14.450632163000023, 50.01452735600003], [14....",23317909.06,53515.0,6156.0,39699.0,7480.0,155.066878,50.4669,4.86746,"POLYGON ((14.45063216300002 50.01452735600003,..."
praha 13,"[[14.320621949000042, 50.04010680700003], [14....",13196802.19,59906.0,7985.0,46514.0,5109.0,171.668745,50.4669,4.86746,"POLYGON ((14.32062194900004 50.04010680700003,..."
