# Economic activities in Zürich

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zazuko/ssz/blob/main/notebooks/economy.ipynb)

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/zazuko/ssz/blob/main/notebooks/economy.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/zazuko/ssz/blob/main/notebooks/economy.ipynb"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://raw.githubusercontent.com/zazuko/ssz/main/notebooks/economy.ipynb" download><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

[1. Restaurants over time](#Restaurants-over-time)  
[2. Restaurants in city quartiers](#Restaurants-in-city-quartiers)  
[3. After-school care: gender-representation](#After-school-care:-gender-representation)

In [1]:
# Installing dependencies for Colab environment
!pip install mapclassify
!pip install git+https://github.com/zazuko/graphly.git

[33mDEPRECATION: Python 2.7 reached the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 is no longer maintained. pip 21.0 will drop support for Python 2.7 in January 2021. More details about Python 2 support in pip can be found at https://pip.pypa.io/en/latest/development/release-process/#python-2-support pip 21.0 will remove support for this functionality.[0m
Defaulting to user installation because normal site-packages is not writeable
Collecting mapclassify
  Downloading mapclassify-2.4.2.tar.gz (43 kB)
[K     |████████████████████████████████| 43 kB 1.6 MB/s eta 0:00:011
[?25h[31m    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-bBPv31/mapclassify/setup.py'"'"'; __file__='"'"'/tmp/pip-install-bBPv31/mapclassify/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(co

In [2]:
import mapclassify
import matplotlib
import matplotlib.cm

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

from graphly.api_client import SparqlClient

### SPARQL endpoint

Data on some economic activities is published as linked data. It can be accessed with [SPARQL queries](https://www.w3.org/TR/rdf-sparql-query/).   
You can send queries using HTTP requests. The API endpoint is **[https://ld.zazuko.com/query/](https://ld.zazuko.com/query/).**  
  
  
Let's use `SparqlClient` from [graphly](https://github.com/zazuko/graphly) to communicate with the database. 
Graphly will allow us to:
* send SPARQL queries
* automatically add prefixes to all queries
* format response to `pandas` or `geopandas`

In [20]:
ENDPOINT = "https://ld.zazuko.com/query/"

sparql = SparqlClient(ENDPOINT)
sparql.add_prefixes({
    "schema": "<http://schema.org/>",
    "cube": "<https://cube.link/>",
    "property": "<https://ld.stadt-zuerich.ch/statistics/property/>",
    "measure": "<https://ld.stadt-zuerich.ch/statistics/measure/>",
    "skos": "<http://www.w3.org/2004/02/skos/core#>",
    "ssz": "<https://ld.stadt-zuerich.ch/statistics/>"
})

SPARQL queries can become very long. To improve the readibility, we will work wih [prefixes](https://en.wikibooks.org/wiki/SPARQL/Prefixes).
 
Using `add_prefixes` method, we define persistent prefixes. 
Every time you send a query, `graphly` will add automatically update the prefixes for you.

## Restaurants over time

Let's find the number of restaurants in Zurich over time. This information is available in `AST-BTA` data cube. To give restaurants numbers a context, let's scale them by population size. The number of inhabitants over time can be found in `BEW` data cube.

The query for number of inhabitants and restaurants over time is:



In [21]:
query = """
SELECT *
FROM <https://lindas.admin.ch/stadtzuerich/stat>
WHERE {
    {
    SELECT ?time (SUM(?ast) AS ?restaurants)
    WHERE {
      ssz:AST-BTA a cube:Cube;
                    cube:observationSet/cube:observation ?obs_rest.   
      ?obs_rest property:TIME ?time ;     
           property:RAUM <https://ld.stadt-zuerich.ch/statistics/code/R30000> ;
           property:BTA <https://ld.stadt-zuerich.ch/statistics/code/BTA5000> ;
           measure:AST ?ast . 
    }
     GROUP BY ?time ?place
  }
  {
    SELECT ?time ?pop
    WHERE {
      ssz:BEW a cube:Cube;
                    cube:observationSet/cube:observation ?obs_pop.   
      ?obs_pop property:TIME ?time ;     
           property:RAUM <https://ld.stadt-zuerich.ch/statistics/code/R30000>;
           measure:BEW ?pop
    }
  }  
}
ORDER BY ?time
"""

df = sparql.send_query(query)
df.head()

Unnamed: 0,time,restaurants,pop
0,1934-12-31,1328.0,315864.0
1,1935-12-31,1327.0,317157.0
2,1936-12-31,1321.0,317712.0
3,1937-12-31,1321.0,318926.0
4,1938-12-31,1334.0,326979.0


Let's calculate number of restaurants per 10 000 inhabitants

In [22]:
df = df.fillna(method="ffill")
df["Restaurants per 10 000 inhabitants"] = df["restaurants"]/df["pop"]*10000

In [23]:
fig = px.line(df, x="time", y = "Restaurants per 10 000 inhabitants")
fig.update_layout(title_text='Restaurants in Zürich over time', title_x=0.5)

## Restaurants in city quartiers 

Let's find the number of restaurants in different part of the city. The data on restaurants is available in `AST-BTA` data cube. To place the quertiers on the map, we will need their geographic coordinates. This data is available in `Wikidata`. We will get number of restaurants per district from our endpoint, and quertier centroid from `Wikidata`. 

Both information can be obtained using SPARQL [federated query](https://www.w3.org/TR/sparql11-federated-query/). The endpoint for Wikidata is `<https://query.wikidata.org/sparql>`.

The query for quertiers, its centroids, and number of restaurants is:

In [24]:
# Restaurants per population
query = """
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>

SELECT ?place ?geometry (SUM(?ast) AS ?restaurants)
WHERE {
  ssz:AST-BTA a cube:Cube;
                  cube:observationSet/cube:observation ?obs.   
  ?obs property:TIME ?time ;     
       property:RAUM ?place_uri ;
       property:BTA <https://ld.stadt-zuerich.ch/statistics/code/BTA5000> ;
                   measure:AST ?ast .
  
  ?place_uri skos:inScheme <https://ld.stadt-zuerich.ch/statistics/scheme/Quartier> ;
             schema:name ?place .

  ?place_uri schema:sameAs ?wikidata_id . 
  
  FILTER (?time = "2017-12-31"^^xsd:date)
  
  BIND(IRI(?wikidata_id ) AS ?wikidata_iri ) .
  
  SERVICE <https://query.wikidata.org/sparql> {
    ?wikidata_iri p:P625/ps:P625 ?geometry .
  }
}
GROUP BY ?place ?geometry ?time
"""

df = sparql.send_query(query)
df.head()

Unnamed: 0,place,geometry,restaurants
0,Affoltern,POINT (8.50722 47.42111),35.0
1,Lindenhof,POINT (8.54120 47.37300),65.0
2,Hottingen,POINT (8.56040 47.36810),39.0
3,Gewerbeschule,POINT (8.53174 47.38470),121.0
4,Altstetten,POINT (8.48584 47.38753),106.0


Let's classify the number of restaurants into 5 different buckets. We will use `mapclassify` library to assign values in `restaurant` column into one of five categories.


In [25]:
N_CATEGORIES = 5
df["text"] = df.place + "<br>Restaurants: " + df.restaurants.astype(int).astype(str)
classifier = mapclassify.NaturalBreaks(y=df["restaurants"], k=N_CATEGORIES)
df["rest_buckets"] = df[["restaurants"]].apply(classifier) 

Classified values can be easily visualized on the map.

In [26]:
norm = matplotlib.colors.Normalize(vmin=0, vmax=N_CATEGORIES)
colormap = matplotlib.cm.ScalarMappable(norm=norm, cmap=matplotlib.cm.viridis)
labels = mapclassify.classifiers._get_mpl_labels(classifier, fmt="{:.0f}")

fig = go.Figure()

for bucket in range(N_CATEGORIES):

    subset = df[df.rest_buckets == bucket]
    fig.add_trace(go.Scattermapbox(
        mode="markers",
        lat=subset.geometry.y,
        lon=subset.geometry.x,
        hovertext = subset.text,
        hoverinfo = "text",
        name=labels[bucket],
        marker={'size': ((subset.restaurants)**1.5)*0.6, "sizemode": "area", "sizemin": 4, "color": "rgba{}".format(colormap.to_rgba(bucket+1))}, 
    ))

fig.update_layout(
    margin={'l': 0, 't': 50, 'b': 0, 'r': 0},
    mapbox={
        'center': {"lat": 47.3815, "lon": 8.532},
        'style': "carto-darkmatter",
        'zoom': 11},
    showlegend=True,
    legend_title="Restaurants count",
    title_text='Restaurants in Zürich Quartiers', 
    title_x=0.5
)

fig.show()

## After-school care: gender-representation

Let's take a look at gender representation in public sector. In `BES-BTA-SEX` data cube we can find information on number of employees in different organizations. The data is reported separately for each sex, and various establishment types. Let's find the number of male and female employees in after-school care (*Hort*). 

The query for number of female and male employees in after-school care over time is:


In [27]:
query = """
SELECT ?time ?employees ?sex
FROM <https://lindas.admin.ch/stadtzuerich/stat>
WHERE {
    ssz:BES-BTA-SEX a cube:Cube;
                cube:observationSet/cube:observation ?obs.   
    ?obs property:TIME ?time ;     
        property:RAUM/skos:inScheme <https://ld.stadt-zuerich.ch/statistics/scheme/Gemeinde> ;
        property:BTA/schema:name "Horte" ;
        property:SEX/schema:name ?sex ;
        measure:BES ?employees .
}
ORDER BY ?time
"""
df = sparql.send_query(query)
df.head()

Unnamed: 0,time,employees,sex
0,1966-06-30,86.0,weiblich
1,1966-06-30,1.0,männlich
2,1967-06-30,1.0,männlich
3,1967-06-30,86.0,weiblich
4,1968-06-30,1.0,männlich


Let's rearrange and rename the columns:

In [28]:
df = pd.pivot_table(df, index="time", columns="sex", values="employees")
df = df.reset_index().rename_axis(None, axis=1)
df = df.rename(columns={"männlich": "male", "weiblich": "female"})

In [29]:
fig = px.histogram(df, x="time", y=df.columns, barnorm="percent", labels={'x':'total_bill', 'y':'count'})
fig.update_layout(
    title='After-school care: gender representation', 
    title_x=0.5,
    yaxis_title="% of employees"
)
fig.show()