# Transit GTFS Quality: Public Transport Accessibility Analysis

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ucid-foundation/ucid/blob/main/notebooks/03_transit_gtfs_quality.ipynb)

---

## Overview

This notebook analyzes public transit accessibility using **GTFS (General Transit Feed Specification)** data. You will learn:

1. Loading and parsing GTFS feeds
2. Computing transit accessibility scores
3. Analyzing service frequency and coverage
4. Identifying transit deserts

### Transit Score Components

| Component | Weight | Description |
|-----------|--------|-------------|
| Stop Density | 25% | Stops per km² |
| Service Frequency | 30% | Trips per hour |
| Route Diversity | 20% | Number of routes |
| Operating Hours | 15% | Service span |
| Mode Variety | 10% | Transit types |

---

In [None]:
# Install dependencies
%pip install -q ucid gtfs-kit

In [None]:
# Imports
import pandas as pd

import ucid
from ucid.contexts import TransitContext

print(f"UCID version: {ucid.__version__}")

---

## 1. GTFS Data Overview

### 1.1 GTFS File Structure

GTFS feeds contain the following required files:

| File | Description |
|------|-------------|
| agency.txt | Transit agencies |
| routes.txt | Transit routes |
| trips.txt | Trips for each route |
| stops.txt | Stop locations |
| stop_times.txt | Arrival/departure times |
| calendar.txt | Service dates |

In [None]:
# Initialize Transit context
context = TransitContext()

print("Transit Context Configuration:")
print(f"  Context ID: {context.context_id}")
print(f"  Max walk distance: {context.max_walk_distance_m}m")
print(f"  Peak hours: {context.peak_hours}")

### 1.2 Route Types

In [None]:
# GTFS route types
route_types = {
    0: "Tram/Light Rail",
    1: "Subway/Metro",
    2: "Rail",
    3: "Bus",
    4: "Ferry",
    5: "Cable Tram",
    6: "Aerial Lift",
    7: "Funicular",
    11: "Trolleybus",
    12: "Monorail",
}

print("GTFS Route Types:")
for code, name in route_types.items():
    print(f"  {code}: {name}")

---

## 2. Transit Accessibility Scoring

### 2.1 Single Location Analysis

In [None]:
# Analyze transit accessibility at Taksim Square
lat, lon = 41.0370, 28.9850

result = context.compute(
    lat=lat,
    lon=lon,
    timestamp="2026W02T08",  # Morning rush hour
)

print("Transit Accessibility Score:")
print("=" * 40)
print(f"Overall Score:  {result.score}/100")
print(f"Grade:          {result.grade}")
print(f"Confidence:     {result.confidence}%")

In [None]:
# View component breakdown
print("\nComponent Breakdown:")
print("-" * 40)
for component, score in result.breakdown.items():
    bar = "█" * int(score / 5) + "░" * (20 - int(score / 5))
    print(f"{component:18s}: {bar} {score:.1f}")

### 2.2 Nearby Stops Analysis

In [None]:
# Get nearby stops metadata
metadata = result.metadata

print("Nearby Transit Infrastructure:")
print(f"  Stops within 500m: {metadata.get('stops_500m', 'N/A')}")
print(f"  Routes available: {metadata.get('route_count', 'N/A')}")
print(f"  Transit modes: {metadata.get('modes', 'N/A')}")

---

## 3. Service Frequency Analysis

### 3.1 Peak vs Off-Peak

In [None]:
# Compare peak and off-peak service
peak_times = [
    ("Morning Peak", "2026W02T08"),
    ("Midday", "2026W02T12"),
    ("Evening Peak", "2026W02T18"),
    ("Night", "2026W02T22"),
]

frequency_data = []
for name, timestamp in peak_times:
    result = context.compute(lat=lat, lon=lon, timestamp=timestamp)
    frequency_data.append(
        {
            "period": name,
            "score": result.score,
            "frequency_score": result.breakdown.get("service_frequency", 0),
        }
    )

freq_df = pd.DataFrame(frequency_data)
print("Service by Time Period:")
freq_df

### 3.2 Headway Analysis

In [None]:
# Service quality thresholds
headway_quality = {
    "Excellent": "< 5 min",
    "Good": "5-10 min",
    "Acceptable": "10-15 min",
    "Poor": "15-30 min",
    "Very Poor": "> 30 min",
}

print("Headway Quality Standards:")
for quality, headway in headway_quality.items():
    print(f"  {quality:12s}: {headway}")

---

## 4. Transit Coverage Analysis

### 4.1 Multi-Location Comparison

In [None]:
# Compare transit across Istanbul neighborhoods
locations = [
    {"name": "Taksim", "lat": 41.0370, "lon": 28.9850},
    {"name": "Kadıköy", "lat": 40.9927, "lon": 29.0276},
    {"name": "Levent", "lat": 41.0847, "lon": 29.0114},
    {"name": "Beşiktaş", "lat": 41.0428, "lon": 29.0052},
    {"name": "Bakırköy", "lat": 40.9801, "lon": 28.8725},
]

coverage_data = []
for loc in locations:
    result = context.compute(lat=loc["lat"], lon=loc["lon"])
    coverage_data.append(
        {
            "neighborhood": loc["name"],
            "score": result.score,
            "grade": result.grade,
        }
    )

coverage_df = pd.DataFrame(coverage_data)
print("Transit Coverage by Neighborhood:")
coverage_df.sort_values("score", ascending=False)

### 4.2 Identifying Transit Deserts

In [None]:
# Identify areas with poor transit access
transit_desert_threshold = 40

transit_deserts = coverage_df[coverage_df["score"] < transit_desert_threshold]

if len(transit_deserts) > 0:
    print(f"Transit Deserts (score < {transit_desert_threshold}):")
    print(transit_deserts)
else:
    print("No transit deserts identified in sample locations.")

---

## 5. Mode Analysis

### 5.1 Multi-Modal Access

In [None]:
# Analyze available transit modes
mode_weights = {
    "metro": 1.0,  # Highest capacity and speed
    "tram": 0.8,  # High capacity
    "bus": 0.6,  # Standard service
    "ferry": 0.7,  # Important for Istanbul
    "funicular": 0.5,  # Limited coverage
}

print("Transit Mode Weights:")
for mode, weight in mode_weights.items():
    bar = "█" * int(weight * 10)
    print(f"  {mode:12s}: {bar} {weight:.1f}")

---

## 6. Integration with UCID

### 6.1 Generate Transit UCIDs

In [None]:
from ucid import create_ucid

# Create Transit-context UCIDs for each location
transit_ucids = []

for loc in locations:
    ucid_str = create_ucid(
        city="IST",
        lat=loc["lat"],
        lon=loc["lon"],
        timestamp="2026W02T08",
        context="TRANSIT",
    )
    transit_ucids.append(
        {
            "neighborhood": loc["name"],
            "ucid": ucid_str,
        }
    )

ucid_df = pd.DataFrame(transit_ucids)
print("Transit UCIDs:")
for _, row in ucid_df.iterrows():
    print(f"  {row['neighborhood']}: {row['ucid']}")

---

## Summary

This notebook demonstrated:

1. **GTFS Data**: Understanding transit feed structure
2. **Transit Scoring**: Computing accessibility metrics
3. **Frequency Analysis**: Peak vs off-peak service levels
4. **Coverage Mapping**: Identifying transit deserts
5. **Mode Analysis**: Multi-modal transit evaluation

### Key Metrics

- Stop density within walking distance
- Service frequency during peak hours
- Route and mode diversity
- Operating hours span

---

*Copyright 2026 UCID Foundation. Licensed under EUPL-1.2.*