# Economic activities in Zürich

[1. Restaurants over time](#Restaurants-over-time)  
[2. Restaurants in city quartiers](#Restaurants-in-city-quartiers)  
[3. After-school care: gender-representation](#After-school-care:-gender-representation)

In [34]:
import mapclassify
import matplotlib
import matplotlib.cm

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go

from graphly.api_client import SparqlClient

### SPARQL endpoint

Data on some economic activities is published as linked data. It can be accessed with [SPARQL queries](https://www.w3.org/TR/rdf-sparql-query/).   
You can send queries using HTTP requests. The API endpoint is **[https://ld.zazuko.com/query/](https://ld.zazuko.com/query/).**  
  
  
Let's use `SparqlClient` from [graphly](https://github.com/zazuko/graphly) to communicate with the database. 
Graphly will allow us to:
* send SPARQL queries
* automatically add prefixes to all queries
* format response to `pandas` or `geopandas`

In [35]:
ENDPOINT = "https://ld.zazuko.com/query/"

sparql = SparqlClient(ENDPOINT)
sparql.add_prefixes({
    "schema": "<http://schema.org/>",
    "cube": "<https://cube.link/>",
    "property": "<https://ld.stadt-zuerich.ch/statistics/property/>",
    "measure": "<https://ld.stadt-zuerich.ch/statistics/measure/>",
    "skos": "<http://www.w3.org/2004/02/skos/core#>",
    "ssz": "<https://ld.stadt-zuerich.ch/statistics/>"
})

SPARQL queries can become very long. To improve the readibility, we will work wih [prefixes](https://en.wikibooks.org/wiki/SPARQL/Prefixes).
 
Using `add_prefixes` method, we define persistent prefixes. 
Every time you send a query, `graphly` will add automatically update the prefixes for you.

## Restaurants over time

Let's find the number of restaurants in Zurich over time. This information is available in `AST-BTA` data cube:

```(SPARQL)
ssz:AST-BTA a cube:Cube;
    cube:observationSet/cube:observation ?obs_rest. 
```

We can access the time, place, and company type using:
* `property:TIME` for time
* `property:RAUM` for place (Raum)
* `property:BTA` for company type (Betriebsart)

The number of observation can be accessed using `measure:AST`.

In [36]:
# Number of restanraunts and inhabitants over time
query = """
SELECT *
FROM <https://lindas.admin.ch/stadtzuerich/stat>
WHERE {
    {
    SELECT ?time (SUM(?ast) AS ?restaurants)
    WHERE {
      ssz:AST-BTA a cube:Cube;
                    cube:observationSet/cube:observation ?obs_rest.   
      ?obs_rest property:TIME ?time ;     
           property:RAUM <https://ld.stadt-zuerich.ch/statistics/code/R30000> ;
           property:BTA <https://ld.stadt-zuerich.ch/statistics/code/BTA5000> ;
           measure:AST ?ast . 
    }
     GROUP BY ?time ?place
  }
  {
    SELECT ?time ?pop
    WHERE {
      ssz:BEW a cube:Cube;
                    cube:observationSet/cube:observation ?obs_pop.   
      ?obs_pop property:TIME ?time ;     
           property:RAUM <https://ld.stadt-zuerich.ch/statistics/code/R30000>;
           measure:BEW ?pop
    }
  }  
}
ORDER BY ?time
"""

df = sparql.send_query(query)
df.head()

Unnamed: 0,time,restaurants,pop
0,1934-12-31,1328.0,315864.0
1,1935-12-31,1327.0,317157.0
2,1936-12-31,1321.0,317712.0
3,1937-12-31,1321.0,318926.0
4,1938-12-31,1334.0,326979.0


In [37]:
df = df.fillna(method="ffill")
df["Restaurants per 10 000 inhabitants"] = df["restaurants"]/df["pop"]*10000

In [38]:
fig = px.line(df, x="time", y = "Restaurants per 10 000 inhabitants")
fig.update_layout(title_text='Restaurants in Zürich over time', title_x=0.5)

## Restaurants in city quartiers 

In [45]:
# Restaurants per population
query = """
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>

SELECT ?place ?geometry (SUM(?ast) AS ?restaurants)
WHERE {
  ssz:AST-BTA a cube:Cube;
                  cube:observationSet/cube:observation ?obs.   
  ?obs property:TIME ?time ;     
       property:RAUM ?place_uri ;
       property:BTA <https://ld.stadt-zuerich.ch/statistics/code/BTA5000> ;
                   measure:AST ?ast .
  
  ?place_uri skos:inScheme <https://ld.stadt-zuerich.ch/statistics/scheme/Quartier> ;
             schema:name ?place .

  ?place_uri schema:sameAs ?wikidata_id . 
  
  FILTER (?time = "2017-12-31"^^xsd:date)
  
  BIND(IRI(?wikidata_id ) AS ?wikidata_iri ) .
  
  SERVICE <https://query.wikidata.org/sparql> {
    ?wikidata_iri p:P625/ps:P625 ?geometry .
  }
}
GROUP BY ?place ?geometry ?time
"""

df = sparql.send_query(query)
df.head()

Unnamed: 0,place,geometry,restaurants
0,Affoltern,POINT (8.50722 47.42111),35.0
1,Lindenhof,POINT (8.54120 47.37300),65.0
2,Hottingen,POINT (8.56040 47.36810),39.0
3,Gewerbeschule,POINT (8.53174 47.38470),121.0
4,Altstetten,POINT (8.48584 47.38753),106.0


In [46]:
N_CATEGORIES = 5
df["text"] = df.place + "<br>Restaurants: " + df.restaurants.astype(int).astype(str)
classifier = mapclassify.NaturalBreaks(y=df["restaurants"], k=N_CATEGORIES)
df["rest_buckets"] = df[["restaurants"]].apply(classifier) 

In [51]:
norm = matplotlib.colors.Normalize(vmin=0, vmax=N_CATEGORIES)
colormap = matplotlib.cm.ScalarMappable(norm=norm, cmap=matplotlib.cm.viridis)
labels = mapclassify.classifiers._get_mpl_labels(classifier, fmt="{:.0f}")

fig = go.Figure()

for bucket in range(N_CATEGORIES):

    subset = df[df.rest_buckets == bucket]
    fig.add_trace(go.Scattermapbox(
        mode="markers",
        lat=subset.geometry.y,
        lon=subset.geometry.x,
        hovertext = subset.text,
        hoverinfo = "text",
        name=labels[bucket],
        marker={'size': ((subset.restaurants)**1.5)*0.6, "sizemode": "area", "sizemin": 4, "color": "rgba{}".format(colormap.to_rgba(bucket+1))}, 
    ))

fig.update_layout(
    margin={'l': 0, 't': 50, 'b': 0, 'r': 0},
    mapbox={
        'center': {"lat": 47.3815, "lon": 8.532},
        'style': "carto-darkmatter",
        'zoom': 11},
    showlegend=True,
    legend_title="Restaurants per <br>10 000 inhabitants",
    title_text='Restaurants in Zürich Quartiers', 
    title_x=0.5
)

fig.show()

## After-school care: gender-representation

In [42]:
query = """
SELECT ?time ?employees ?sex
FROM <https://lindas.admin.ch/stadtzuerich/stat>
WHERE {
    ssz:BES-BTA-SEX a cube:Cube;
                cube:observationSet/cube:observation ?obs.   
    ?obs property:TIME ?time ;     
        property:RAUM/skos:inScheme <https://ld.stadt-zuerich.ch/statistics/scheme/Gemeinde> ;
        property:BTA/schema:name "Horte" ;
        property:SEX/schema:name ?sex ;
        measure:BES ?employees .
}
ORDER BY ?time
"""
df = sparql.send_query(query)
df.head()

Unnamed: 0,time,employees,sex
0,1966-06-30,86.0,weiblich
1,1966-06-30,1.0,männlich
2,1967-06-30,1.0,männlich
3,1967-06-30,86.0,weiblich
4,1968-06-30,1.0,männlich


In [43]:
df = pd.pivot_table(df, index="time", columns="sex", values="employees")
df = df.reset_index().rename_axis(None, axis=1)
df = df.rename(columns={"männlich": "male", "weiblich": "female"})

In [33]:
fig = px.histogram(df, x="time", y=df.columns, barnorm="percent", labels={'x':'total_bill', 'y':'count'})
fig.update_layout(
    title='After-school care: gender representation', 
    title_x=0.5,
    yaxis_title="% of employees"
)
fig.show()