# Spatial Modelling of Residential Land Values and Public Transport Accessibility in Prague
## Project synthesis (00 → 05)

This final notebook provides a compact, end-to-end summary of the project. It links together the full workflow—from data preparation and accessibility construction to exploratory spatial analysis and spatial modelling—and communicates key findings and limitations in a reader-friendly format.

The detailed computations, figures, and intermediate outputs are documented in the individual notebooks:
00_Research_Context → 01_Data_Preprocessing → 02_Accessibility_Distances → 03_Initial_Visual_Exploration → 04_Spatial_Autocorrelation → 05_Spatial_Modelling.

---

## Research question and motivation

The project is motivated by a practical real-estate logic: listings frequently emphasize proximity to public transport (e.g., “3 minutes to tram”, “5 minutes to metro”), suggesting that accessibility is capitalized into land values. Prague is a suitable case study due to its dense, hierarchical public transport network and the availability of a polygon-based land price map.

**Core question:**  
How does walk-scale public transport accessibility (distance to nearest stop by mode) relate to residential land values across Prague, and how spatially heterogeneous is this relationship?

---

## End-to-end workflow (what was built)

The pipeline is structured as a clean, reproducible sequence:

1) Define study extent (Prague boundary), align spatial layers, and prepare price polygons.  
2) Construct a defensible “residential” subset using land-use intersection (area-weighted filtering).  
3) Compute accessibility measures as Euclidean distance to nearest public transport stop by mode (metro / tram / bus / rail), including nearest stop IDs.  
4) Perform exploratory analysis to understand distributions, scaling, and spatial structure.  
5) Quantify spatial autocorrelation globally (Moran’s I) and locally (LISA).  
6) Fit and compare models: OLS (benchmark) → GWR (local coefficients) → MGWR (multiscale local coefficients), including residual diagnostics and interpretation of coefficient surfaces.

The key methodological theme is that spatial structure is not treated as an afterthought but as a first-class property that guides model choice.

---

## 00 — Research context (problem framing)

This notebook defines the urban-economic logic behind the project and clarifies what the analysis is (and is not). The focus is on residential land values and walk-scale accessibility, not on travel times, flows, or timetable-based network modelling. It also documents the main spatial data sources: land price polygons, land-use plan for residential filtering, public transport stops/routes by mode, and Prague administrative boundaries.

**Key outcome:**  
A clear and defensible scope that supports a portfolio-grade workflow and a coherent modelling narrative.

---

## 01 — Data preprocessing and residential filtering

This notebook constructs the spatial foundation of the analysis:

- Prague boundary is derived and used to clip / filter all layers consistently.  
- Public transport data are filtered to Prague and separated by mode (metro / tram / bus / rail).  
- The land price map is prepared and cleaned for analysis.  
- A crucial methodological pivot is the residential filter: instead of assuming that each price polygon is purely residential, the land-use plan is intersected with price polygons and the residential share is computed (area-weighted approach). A threshold is applied to define a residential (or mixed-residential) subset, explicitly acknowledging an “uncertain band”.

**Key outcome:**  
A defensible residential dataset with transparent assumptions that prevents hidden land-use contamination of results.

---

## 02 — Accessibility measures (distance to nearest stop by mode)

Accessibility is operationalized as Euclidean distance from each price polygon (representative point) to the nearest public transport stop by mode. This choice is deliberate: it is interpretable, computationally tractable, and aligned with the walk-access framing of the research question.

Distances are computed via 1-nearest neighbour search (k = 1), which also enables storing the nearest stop ID for QA and interpretation.

**Key outcome:**  
A single enriched GeoPackage containing residential price polygons and mode-specific accessibility measures (including nearest stop IDs), ready for EDA and modelling.

---

## 03 — Initial visual exploration (EDA)

This notebook establishes the empirical “shape” of the problem before formal spatial statistics:

- Land price distribution is heavily right-skewed, motivating log transformation.  
- Spatial distribution of prices shows strong central concentration and secondary structures.  
- Area and accessibility variables display scale effects and heteroskedastic patterns.  
- Scatter plots suggest that the price–accessibility relationship is not globally uniform and likely differs across parts of the city.

**Key outcome:**  
Strong evidence that global linear assumptions will be violated and that spatial structure must be explicitly addressed.

---

## 04 — Spatial autocorrelation (Global Moran’s I + LISA)

This notebook formally tests and localizes spatial structure:

- Global Moran’s I confirms statistically significant positive spatial autocorrelation in land prices and in accessibility variables (especially metro / tram / rail).  
- Local Moran (LISA) identifies where clustering occurs (High–High, Low–Low) and where transitions exist (High–Low, Low–High), mapping spatial regimes that align with known urban structure and transport corridors.  
- Scatter plots colored by LISA membership show systematic regime separation, indicating spatial heterogeneity in marginal effects.

**Key outcome:**  
Spatial dependence and spatial heterogeneity are intrinsic features of the dataset, motivating spatial modelling as the correct methodological next step.

---

## 05 — Spatial modelling (OLS → GWR → MGWR)

This notebook transitions from diagnostics to modelling.

OLS is fitted first as a benchmark. It confirms the existence of strong global relationships (notably for metro and tram), but explains only a limited share of variance and—crucially—produces spatially structured residuals. Moran’s I and LISA on OLS residuals demonstrate that OLS fails in spatially systematic ways, meaning the model is structurally misspecified rather than merely noisy.

GWR is then introduced to relax stationarity by allowing coefficients to vary across space. It improves fit and reveals meaningful spatial variation in effects, but remains constrained by a single bandwidth, which limits its ability to disentangle processes operating at different spatial scales.

MGWR is finally used to allow variable-specific bandwidths (multiscale effects). This provides the most flexible decomposition of spatial mechanisms and supports a coherent interpretation of hierarchical accessibility effects across the city.

---

## Model hierarchy and residual spatial dependence (core diagnostic result)

A central diagnostic test is whether spatial modelling absorbs spatial dependence rather than leaving it in residuals.

- OLS residuals exhibit strong positive spatial autocorrelation, confirming misspecification under stationarity.  
- MGWR residuals show a small negative Moran’s I (statistically significant due to large N), and LISA reveals only fragmented, localized micro-clusters rather than coherent regimes.  
- This pattern indicates that the dominant spatial structure has been successfully decomposed into variable-specific effects, and remaining variation is plausibly attributable to micro-scale unobservables and noise rather than omitted global spatial processes.

**Key implication:**  
MGWR provides a spatially well-calibrated representation of the underlying urban price formation processes in the scope of the available predictors.

---

## What the coefficient surfaces reveal (identified spatial regimes)

The MGWR coefficient maps reveal a consistent hierarchy of transport capitalization mechanisms:

- **Land area:** stronger positive effects in peripheral / low-density zones, weaker effects in the inner city where location dominates price formation.  
- **Metro:** the most structured and robust accessibility premium, with strong negative distance effects forming coherent spatial gradients.  
- **Tram:** a context-dependent accessibility layer with mixed local regimes; strong negative effects in some urban fabrics, weaker or locally reversed effects elsewhere.  
- **Rail:** corridor-based and ambivalent; effects differ by local context (integration vs. nuisance / barrier effects).  
- **Bus:** the weakest and most fragmented surface; often functions as background service rather than a universal value-generating infrastructure.

**Key takeaway:**  
Accessibility is not a single monotonic process but a layered system operating at multiple spatial scales.

---

## Final conclusion (chapter-level synthesis)

Across the full pipeline, the project shows that Prague land values exhibit strong spatial dependence and that transport accessibility effects are highly heterogeneous across space. Global regression provides a useful baseline but fails diagnostically due to spatially structured residuals. GWR improves interpretability by uncovering local variation, while MGWR provides the most coherent multiscale decomposition, supported by substantially reduced residual spatial dependence and interpretable coefficient regimes. The results are broadly consistent with urban economic intuition—especially the strong capitalization of metro (and often tram) accessibility—while bus and rail effects are more context-dependent and regime-specific. Overall, the analysis demonstrates that multiscale spatial modelling is necessary to disentangle overlapping accessibility mechanisms and communicate a realistic, policy-relevant urban story.

---

## Limitations

The project intentionally prioritizes interpretability and a transparent pipeline, which introduces limitations:

- Accessibility is proxied by Euclidean distance rather than network travel times or service frequency.  
- The analysis is cross-sectional and cannot identify causal effects or temporal dynamics.  
- Some predictors are structurally correlated (e.g., metro and tram accessibility), reflecting the hierarchy of the transport system.  
- The model omits socio-economic and regulatory covariates (amenities, zoning constraints, school quality, noise exposure), which likely explain part of the remaining micro-scale residual variation.

These limitations do not invalidate the spatial patterns; they define the scope of inference.

---

## Where the analysis can go next

Natural extensions of the project include:

- Network-based accessibility (walk network, travel time, generalized cost) and service quality metrics (frequency, headways).  
- A temporal design around infrastructure changes (e.g., before/after new stations or lines) to move toward causal inference.  
- Richer covariate set capturing amenities, constraints, and disamenities to explain remaining residual pockets.  
- Scenario modelling: how accessibility changes (e.g., new line openings) could shift the spatial distribution of land value premiums.

---

## Reproducibility note

The project is organized as a sequence of notebooks with clear responsibilities (preprocessing → accessibility → EDA → spatial autocorrelation → modelling). Intermediate outputs are stored as processed GeoPackages to avoid repeating expensive computations and to ensure consistent inputs across modelling steps.

For readers: start with this synthesis notebook, then follow the numbered series for full detail.
