# Supermarket Accessibility in Graz


## Introduction

In this exercise, we will perform a GIS analysis using Python to examine supermarket accessibility in Graz, Austria. We’ll replicate an analysis similar to a previous study of distances to the nearest Migros supermarkets in Switzerland, but this time focusing on two major Austrian chains: **Spar** and **Billa**. 

📝 **Linkedin post:** [Arthur A. 2024](https://www.linkedin.com/posts/carbonateturi_you-are-on-average-only-78-km-away-from-activity-7234914074541633536-m11_?utm_source=share&utm_medium=member_desktop&rcm=ACoAAArGF6IBIaCC_rrWE714nEvgGuEgcd1Buy0)

### 1️⃣ Getting Started with GitHub and Setup

First, let's get the materials for this exercise. The exercise is available on GitHub at https://github.com/thibaud-c/GST.200UB. You can **clone** the repository with Git:

In you terminal ...

>```shell
>git clone https://github.com/thibaud-c/GST.200UB.git
>```

💡 _before executing the `git clone` command make sure that you are in the good folder_!

#### 🔄 get updates of the exercise from github

###### Check the **status** of the repo and see what you changed:
```sh
git status
```

###### **Pull** new changes from the origin (github)
```sh
git pull
```

If you did some changes your might have an error:
```sh
error: Your local changes to the following files would be overwritten by merge:
    notebook.ipynb
Please commit your changes or stash them before you merge.
```

🤔 we should **stash** or **commit**

###### Temporarily save your changes with **stash**
```sh
git stash
git status
```

> ℹ️ Save your local modifications in a hidden “stash” pile.
> Revert your working directory to the last committed state.

After the status you should now see “nothing to commit”.

###### **Pull** the latest updates
```sh
git pull
```

✌️ It worked!

###### Bring your changes back
```sh
git stash pop
```

###### 😱 Deal with any conflicts
If you and I edited the same cell in the same notebook, Git might show:
```sh
CONFLICT (content): Merge conflict in notebook.ipynb
...
...
both modified: ...
```

Open VS code, left tab `source control` you should see a file in the merge changes with a ❗, click on it to resolve the conflict!

Find all highlithed texts you should see something like: 
```diff
<<<<<< Updated stream @@
-    change from origin
======
+    your changes
>>>>>> Stashed Changes (Incoming Change)
```
Keep on of the changes, for instance the results should look like

```sh
    change you need
```

Check if there is any other highlighted text, if this is not the case you can click on `Resolve in Merge Editor` and then `Complete Merge`, `complete with conflict` and `Merge with conflict`

Now you are good to go 🎉

---

#### 😱 If you have an error in Python: 
1. **Read it!** What is it about? do you understand it?
2. Discussed with your neighbord, do they have an idea on how to solve the issue?
3. Check on google if someone as a similar error (add your python version)
4. (optional) ask chatGPT with a **precise** prompt, do not copy paste blindly your error!!
5. raise your hand to ask for help

---

**Environment Setup**: This notebook uses several Python libraries that you may need to install:

- `osmnx`:[doc](https://osmnx.readthedocs.io) for retrieving OpenStreetMap data,
- `geopandas`:[doc](https://geopandas.org) for geospatial data handling,
- `shapely`:[doc](https://shapely.readthedocs.io) for geometric operations,
- `keplergl`:[doc](https://github.com/keplergl/kepler.gl/blob/master/README.md) for interactive map visualization,

plus standard ones like `pandas` and `matplotlib`.

You are probably used to install python libraries with `pip` or `conda`, while these tools are great, they are extremely slow...

In this course we will use [uv](https://github.com/astral-sh/uv), which is 10 to 100 times faster! ⚡

In [None]:
# the ! is to run shell commands from the notebook
!pip install uv 

Now, you can install any missing packages at lightning speed by running a cell with `!uv pip install` as shown below.

In [None]:
# Install required libraries (run this if needed)
!uv pip install osmnx geopandas shapely keplergl matplotlib

### 2️⃣ Principles of Good Coding Practice

Before diving into code, let's briefly introduce a few software engineering principles that will help keep our code clean, understandable, and maintainable:

- **DRY** – Don't Repeat Yourself: Avoid writing duplicate code. If you need the same logic in multiple places, consider defining a function or loop to reuse it. This makes code easier to update and less error-prone.
- **KISS** – Keep It Simple, Stupid: Strive for simplicity. Write code in a straightforward way rather than using overly complex or clever solutions. Simple code is easier to understand and debug.
- **SOC** – Separation of Concerns: Organize your code so that different functionality or concerns are separated. For example, data retrieval, data processing, and visualization could be in separate sections or functions. This makes the code modular and easier to manage.

Throughout the exercise, keep these principles in mind. We'll try to write code that is concise and clear, without unnecessary repetition, and logically structured in steps.

This notebook assembles OpenStreetMap data with `osmnx`, shapes the output with `geopandas`, and renders an interactive Kepler.gl map. Distance bands use lighter blues for near-in coverage and progressively darker tones for remoteness.


### 3️⃣ Data Acquisition with OSMnx

🎯 **Goal:** Turn a place name into a boundary polygon and project to an Austrian metric CRS for distance work.

📚 **Read:** In this section, we'll use the OSMnx library to fetch real-time data from OpenStreetMap (OSM). 

OSMnx ▸ *Geocoding & boundaries* → `geocode_to_gdf`  
https://osmnx.readthedocs.io/en/stable/osmnx.html#osmnx.geocoder.geocode_to_gdf

💡 **Information:**

> By default, OSMnx returns data in a latitude-longitude CRS (EPSG:4326). For distance calculations (like creating buffers of a certain radius in meters), it's better to work in a projected CRS where units are in meters.

🚀 **Tasks:**
- define a place name
- get the boundary of the place
- Change the data to the right crs

🧠 **Reflect:** 
- Why is a projected CRS essential for buffers?
- Which CRS would you pick for Austria? Why?


In [None]:
# import osmnx
import osmnx as ox

In [None]:
# define the place name
place_name:str = "Graz, Austria"

In [None]:
# get the boundary polygon of Graz


# change the crs to an Austrian metric CRS EPSG: 31256


👀 **Inspect your results**
- what did you downloaded?
- what datatype?
- how does it looks? 

You can use the function `.plot()` to visualize your data.

In [None]:
# fill the gap
___.plot(color='grey') # quick look of the data in grey

### 4️⃣ Fetch supermarkets from OpenStreetMap

🎯 **Goal:** Retrieve all features tagged supermarket within Graz.

📚 **Read:** OSMnx ▸ *POI & features* → `features_from_place`  
https://osmnx.readthedocs.io/en/stable/osmnx.html#osmnx.features.features_from_place

💡 **Information:**
OSM classifies features by tags (key–value pairs). 

🚀 **Tasks:**
- Create `tags = {"__": "__"}`
- Call `features_from_place` with `city_name` and `tags`
- Observe the multi-index (`node`, `way`, `relation`).
- What are the columns present in your data? what are the ones that are relevant for the exercise?

🧠 **Reflect:**  Why might OSMnx return both "node" and "way" features for supermarkets?

In [None]:
tags = {"key": "value"} # define the tags to filter for supermarkets

# download all supermarkets in Graz & change the crs

# inspect the data
___.plot()

In [None]:
# check all columns
___.columns.tolist()

In [None]:
# keep only relevant columns
___ = ___[['col1', 'col2', 'col3']] # double bracket to extract a subset of your data

💡 
You can use a f-string to insert values within {} directly into text

In [None]:
print(f"Total supermarket features retrieved: {__}") 

### 4️⃣ Harmonize geometry: nodes vs ways

🎯 **Goal:** Work with **points** for all supermarkets.

📚 **Read:** Shapely ▸ `representative_point()`  
https://shapely.readthedocs.io/en/stable/manual.html#object.representative_point

🚀 **Tasks:**
- Extract `nodes` (already points)
- Extract `ways` (often polygons) and convert to `representative_point()`
- Compare the two point sets visually/numerically
- Concatenate back together into a single points GeoDataFrame

🧠 **Reflect:** What is a `representative_point()`? Why not use centroid?

In [None]:
# separate two specific supermarkets formats into their own variable
__n = ___.loc['__'].copy() # copy a specific row by its index
__w = ___.loc['__'].copy() 

# transform polygons to points
__w['geometry'] = __w.geometry.representative_point()

- How many supermarket has each of you variable? 
- What are the `brand` present in Graz? Are they overlapping? What do you notice for Billa or Spar supermarkets?
> you can use `.unique()` to get unique values

👀 Let's visualize the result. We will use [Matplotlib](https://matplotlib.org/stable/).

What is the parameter to change the size of __n and __w? you can check the documentation!

In [None]:
import matplotlib as plt

# Create a figure with 1 row and 2 columns to plot both data side by side
fig, axes = plt.subplots(1, 2, figsize=(18, 6))

# define a plot in the first column of the figure, change the color and size
__n.plot(ax=axes[0], color='blue', __=10)
axes[0].set_title('Your Nice Title 😎')

# define a plot in the second column of the figure, change the color and width
__w.plot(ax=axes[1], color='green', __=3)
axes[0].set_title('Your Nice Title 😎')

# Adjust layout to avoid overlap
plt.tight_layout()

# Display the plots
plt.show()

Are the dataset overlapping? What is the drawback of matplolib for data exploration? 

We will use [KeplerGL](https://docs.kepler.gl/docs/keplergl-jupyter).

In [None]:
from keplergl import KeplerGl

# create your map from the example in the documentation!
# you can add as layer as you want with add_data(), they will be display from the bottom to the top


Now we can concatenate your __n and __w 

In [None]:
# we will use pandas to concatenate the two GeoDataFrames
import pandas as pd
import geopandas as gpd

# 2) Concatenate rows; keep geometry column and CRS explicit
__ = __.GeoDataFrame(
    pd.concat([__n, __w], ignore_index=True), # concat() is used to merge two DataFrames, ignore_index=True to reindex the new DataFrame
    geometry=__, # what is the geometry column name? 
    crs=__ # how to get the CRS of a GeoDataFrame?
)

In [None]:
# sanity check:
print(len(__), len(__), len(__))

### 5️⃣ Filter by brand: Billa & Spar

🎯 **Goal:** Keep only known chains. The `brand` tag may be missing for some entries.

🚀 **Tasks:**
- Drop rows where `brand` is missing.
- Create `gdf_billa` and `gdf_spar` using case‑insensitive string filter on `brand`.
- Count each.

🧠 **Reflect:**
- Which brand is more common in Graz in OSM?
- What other supermarket brands are present in Graz according to OpenStreetMap? What does that say about coverage?
- What does the `str.contains('__', case=False) accomplish? Why do we use case=False and na=False?

In [None]:
# drop none values in your data on a specific column and reset the index
__ = __.__(subset=[__]).reset_index(drop=True)

# filter your data on a specific column and reset the index
gdf_sup = __[__['__'].str.contains('__', case=False, na=False)].reset_index(drop=True)


### 7️⃣ Distance bands (buffers)

🎯 **Goal:** Create simple distance bands around stores, then dissolve overlaps to get coverage zones.

🚀 **Tasks:**
- Choose distances in metres, e.g.: `[250, 500, 1000, 2500]`
- For each brand, make buffers and `dissolve()` per distance.
- Keep the city boundary for context (clip buffers if you wish).


You can toggle layers on/off in the Kepler layer control panel (on the right) to see the coverage of each buffer distance.

By default, Kepler might assign random colors. You can click on a layer in the control panel to change its color or opacity. For example, you might color the smaller buffer a light blue and larger buffers progressively darker blue to mimic the idea that darker = more distant.

Zoom and pan around Graz to see which areas are not covered by, say, the 250m or 500m buffers (these would be areas that are more than that distance away from any Billa).

🧠 **Reflect:**
What buffer distances make sense for **walking** accessibility? 

In [None]:
DISTS:list[int] = [250, 500, 1000, 2500]  # in meters

In [None]:
gdf_sup_buf = gdf_sup.copy() # make a copy of your data to not alter the original one
for d in DISTS:
    # create buffers for each distance and add a new column to your data
    gdf_sup_buf[f'buffer_{__}m'] = gdf_sup_buf.geometry.__ # how to create a buffer of d meters around each point?

In [None]:
# create each buffer geometry

gdf_sup__ = gpd.GeoDataFrame(geometry=__).dissolve()
# repeat for each distance 

### 8️⃣ Interactive map with Kepler.gl

🎯 **Goal:** Explore your results

📚 **Read:** Kepler.gl Jupyter quickstart  
https://github.com/keplergl/kepler.gl/tree/master/bindings/kepler.gl-jupyter

🚀 **Tasks:**
Add layers for:
- City boundary
- Store points (Billa or Spar)
- The buffer layers (dissolved)

💡 **Information:** 
- You can toggle layers on/off in the Kepler layer control panel (on the right) to see the coverage of each buffer distance.
- By default, Kepler might assign random colors. You can click on a layer in the control panel to change its color or opacity. For example, you might color the smaller buffer a light blue and larger buffers progressively darker blue to mimic the idea that darker = more distant.
- Zoom and pan around Graz to see which areas are not covered by, say, the 250m or 500m buffers (these would be areas that are more than that distance away from any Billa).


🧠 **Reflect:**
- What do you observe about the supermarket brand coverage in Graz? Are there parts of the city beyond 1000m (1 km) from the nearest supermarket?
- Compare your map with the distribution of the your colleagues that worked on a different brand. What differences do you observe.
- Which chain seems to have better coverage in Graz? Are there large gaps in one chain's coverage that are covered by the other?

In [None]:
kmap = KeplerGl(height=500)

# add you data to the map (don't forget the city boundariy and the supermarkets locations)

kmap

Following the DRY principle, what 2 techniques could you use to make your code cleaner? Implement one.

### 9️⃣ Stretch goal: Calculate the average distance estimation to supermarket in Graz

🎯 **Goal:** Calculate an area-weighted average using a representative distance (the mid-point of each ring).

🚀 **Tasks:**
1. Calculate the area of each of the buffer
2. Calculate the ring areas by differences of cumulative areas (e.g. A500 − A250). 
3. Calculate the area not fitting within the buffer (`outside_area`)
4. Do a weighted average of the rings (multiply each ring’s area by its mid-point distance), don't forget to add the `outside_area`
5. (Optional 🥊) Per-cell version: Instead of a single city number, overlay the rings with a 1 km² grid and compute the same area-weighted average per cell.

💡 **Information:**
- Using ring mid-points reduces the upward bias you’d get from weighting by the upper edge of each band.
- Pick a plausible representative value for areas beyond your largest radius: `outside_area` 


In [None]:
MID:list[int] = [125, 375, 750, 1750]  # mid-points of each ring
OUT_MID:int = 3000  # representative distance for areas beyond largest ring

city_area = __.geometry.area.sum()

ring_areas = [
    250,  # area within 250m
    500-250,  # area within 250-500m
    __,  # area within 500-1000m
    __,  # area within 1000-2500m
]
outside_area = city_area - __

💡 **Information:**
The function `zip()` pairs two array together.
> ```python
> t1 = [1, 2, 3, 4]
> t2 = ['a', 'b', 'c', 'd']
> zip(t1, t2)
> # [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
> ```

In [None]:
# zip the MID values with ... 
zip_array = zip(MID, __)

# calculate the sum of the weighted distances
sum_weighted = sum(
    __*__ for a, m in zip_array # multiply each ring’s area by its mid-point distance, this a compact way to do for loop
    ) + __ * __

# calculate the average distance
avg_dist_m = sum_weighted / __

print(f"Pseudo-average distance to nearest Spar/Billa (city-wide): {avg_dist_m:.0f} m")

In [None]:
# this line is to clear the output of the notebook, so that when you commit it, it is clean
!jupyter nbconvert --clear-output --inplace lab_01_ex.ipynb