# 🚊 GTFS Transit Analysis Dashboard

## Comprehensive Analysis of Public Transit Data

This notebook demonstrates a complete analysis pipeline for GTFS (General Transit Feed Specification) data, including:

- 📊 **Data Processing & Feature Engineering**
- 🗺️ **Network Analysis & Routing Algorithms**
- 🤖 **Machine Learning for Delay Prediction**
- 📈 **Demand Forecasting**
- 📱 **Interactive Visualizations & Dashboard**

---

## 🛠️ Setup and Imports

In [27]:
# Import standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
import sys
import os

# Now import our custom modules
from data_processing import GTFSProcessor
from routing import TransitRouter
from prediction import DelayPredictor, DemandForecaster
from visualization import TransitVisualizer

# Add the parent directory to the Python path to access src modules
parent_dir = os.path.dirname(os.getcwd())
src_path = os.path.join(parent_dir, 'src')
if src_path not in sys.path:
    sys.path.insert(0, src_path)

warnings.filterwarnings('ignore')

# Set display options
pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("✅ All imports successful!")

✅ All imports successful!


## 📂 Data Loading and Initial Exploration

In [6]:
# Initialize GTFS processor
data_path = '../data'
gtfs = GTFSProcessor(data_path)

# Load all GTFS data files
loaded_data = gtfs.load_data()

print(f"\n📊 Loaded {len(loaded_data)} GTFS data files")
for name, df in loaded_data.items():
    print(f"  - {name}: {len(df):,} records")

Loading GTFS data files...
✓ Loaded agency.txt: 1 records
✓ Loaded routes.txt: 343 records
✓ Loaded trips.txt: 56226 records
✓ Loaded stops.txt: 10911 records
✓ Loaded stop_times.txt: 961814 records
✓ Loaded calendar.txt: 204 records
✓ Loaded calendar_dates.txt: 712 records
✓ Loaded transfers.txt: 151411 records

📊 Loaded 8 GTFS data files
  - agencies: 1 records
  - routes: 343 records
  - trips: 56,226 records
  - stops: 10,911 records
  - stop_times: 961,814 records
  - calendar: 204 records
  - calendar_dates: 712 records
  - transfers: 151,411 records
✓ Loaded stop_times.txt: 961814 records
✓ Loaded calendar.txt: 204 records
✓ Loaded calendar_dates.txt: 712 records
✓ Loaded transfers.txt: 151411 records

📊 Loaded 8 GTFS data files
  - agencies: 1 records
  - routes: 343 records
  - trips: 56,226 records
  - stops: 10,911 records
  - stop_times: 961,814 records
  - calendar: 204 records
  - calendar_dates: 712 records
  - transfers: 151,411 records


In [7]:
# Get summary statistics
stats = gtfs.get_summary_statistics()

print("\n📈 Dataset Summary:")
print("=" * 40)
for key, value in stats.items():
    if isinstance(value, dict):
        print(f"{key}:")
        for k, v in value.items():
            print(f"  - {k}: {v}")
    else:
        print(f"{key}: {value:,}")


📈 Dataset Summary:
num_agencies: 1
num_routes: 343
route_types:
  - 3: 286
  - 2: 28
  - 0: 15
  - 800: 13
  - 4: 1
num_stops: 10,911
num_trips: 56,226
num_stop_times: 961,814


## 🔧 Feature Engineering and Data Processing

In [8]:
# Create engineered features
features_df = gtfs.create_features()

# Display sample of engineered features
print("\n🔍 Sample of Engineered Features:")
print("=" * 50)
display(features_df[[
    'stop_id', 'trip_id', 'route_id', 'hour', 'is_rush_hour', 
    'time_period', 'route_type_name', 'is_first_stop', 'is_last_stop'
]].head(10))

Creating engineered features...
✓ Created features for 961814 records

🔍 Sample of Engineered Features:
✓ Created features for 961814 records

🔍 Sample of Engineered Features:


Unnamed: 0,stop_id,trip_id,route_id,hour,is_rush_hour,time_period,route_type_name,is_first_stop,is_last_stop
0,U15775Z2,1,L104D99,4,0,Night,Bus,1,0
1,U15773Z2,1,L104D99,4,0,Night,Bus,0,0
2,U15627Z2,1,L104D99,4,0,Night,Bus,0,0
3,U15640Z2,1,L104D99,4,0,Night,Bus,0,0
4,U15639Z2,1,L104D99,4,0,Night,Bus,0,0
5,U15637Z2,1,L104D99,4,0,Night,Bus,0,0
6,U15628Z2,1,L104D99,4,0,Night,Bus,0,0
7,U15634Z2,1,L104D99,4,0,Night,Bus,0,0
8,U15631Z2,1,L104D99,4,0,Night,Bus,0,0
9,U15632Z1,1,L104D99,4,0,Night,Bus,0,0


In [9]:
# Calculate travel times between stops
travel_times_df = gtfs.calculate_travel_times()

print("\n⏱️ Travel Time Analysis:")
print("=" * 40)
print(f"Total segments analyzed: {len(travel_times_df):,}")
print(f"Average travel time: {travel_times_df['travel_time_minutes'].mean():.2f} minutes")
print(f"Average distance: {travel_times_df['distance_km'].mean():.2f} km")
print(f"Average speed: {travel_times_df['speed_kmh'].mean():.2f} km/h")

# Display sample
display(travel_times_df.head())

Calculating travel times...
✓ Calculated travel times for 905588 segments

⏱️ Travel Time Analysis:
Total segments analyzed: 905,588
Average travel time: 1.81 minutes
Average distance: 0.83 km
Average speed: 25.36 km/h
✓ Calculated travel times for 905588 segments

⏱️ Travel Time Analysis:
Total segments analyzed: 905,588
Average travel time: 1.81 minutes
Average distance: 0.83 km
Average speed: 25.36 km/h


Unnamed: 0,trip_id,from_stop_id,to_stop_id,from_stop_sequence,to_stop_sequence,departure_time,arrival_time,travel_time_minutes,distance_km,speed_kmh
0,1,U15775Z2,U15773Z2,1,2,4:32:00,4:34:00,2.0,1.148881,34.466435
1,1,U15773Z2,U15627Z2,2,3,4:34:00,4:39:00,5.0,4.038613,48.463361
2,1,U15627Z2,U15640Z2,3,4,4:39:00,4:41:00,2.0,0.670503,20.1151
3,1,U15640Z2,U15639Z2,4,5,4:41:00,4:43:00,2.0,1.001682,30.050454
4,1,U15639Z2,U15637Z2,5,6,4:43:00,4:46:00,3.0,1.156107,23.122132


## 🗺️ Network Analysis and Routing

In [10]:
# Initialize transit router
router = TransitRouter(gtfs)

# Build network graph
network_graph = router.build_network(weight_type='travel_time')

# Get network statistics
network_stats = router.get_network_statistics()

print("\n🕸️ Network Analysis Results:")
print("=" * 40)
for key, value in network_stats.items():
    if isinstance(value, float):
        print(f"{key}: {value:.3f}")
    else:
        print(f"{key}: {value}")

Building transit network with travel_time weights...
✓ Built network with 10911 nodes and 15439 edges

🕸️ Network Analysis Results:
num_nodes: 10911
num_edges: 15439
is_connected: False
num_connected_components: 3256
average_degree: 2.830
network_density: 0.000
average_clustering: 0.091
✓ Built network with 10911 nodes and 15439 edges

🕸️ Network Analysis Results:
num_nodes: 10911
num_edges: 15439
is_connected: False
num_connected_components: 3256
average_degree: 2.830
network_density: 0.000
average_clustering: 0.091


In [11]:
# Analyze network centrality
centrality_measures = router.analyze_centrality(['degree', 'betweenness', 'closeness'])

# Find most important stops
print("\n🎯 Most Important Stops by Centrality:")
print("=" * 50)

for measure, values in centrality_measures.items():
    if values:  # Check if values exist
        top_stops = sorted(values.items(), key=lambda x: x[1], reverse=True)[:5]
        print(f"\n{measure.capitalize()} Centrality:")
        for i, (stop_id, score) in enumerate(top_stops, 1):
            stop_name = "Unknown"
            if gtfs.stops is not None:
                stop_info = gtfs.stops[gtfs.stops['stop_id'] == stop_id]
                if len(stop_info) > 0:
                    stop_name = stop_info.iloc[0].get('stop_name', 'Unknown')
            print(f"  {i}. {stop_name} ({stop_id}): {score:.4f}")

Calculating centrality measures: ['degree', 'betweenness', 'closeness']
  Computing degree centrality...
  Computing betweenness centrality...
  Computing closeness centrality...
  Computing closeness centrality...
✓ Centrality analysis completed

🎯 Most Important Stops by Centrality:

Degree Centrality:
  1. Hlavní nádraží (U1146Z99): 68.0000
  2. Hlavní nádraží (U1146Z98): 66.0000
  3. Hlavní nádraží (U1146Z2): 61.0000
  4. Hlavní nádraží (U1146Z1): 61.0000
  5. Hlavní nádraží (U1146Z3): 58.0000

Betweenness Centrality:
  1. Vyškov, železniční stanice (U17446Z1): 127.3069
  2. Boskovice, aut.st. (U12635Z12): 6.5215
  3. Boskovice, aut.st. (U12635Z2): 5.7266
  4. Blansko, aut. st. (U12342Z31): 4.4403
  5. Znojmo, autobusové nádraží (U18009Z51): 4.3717

Closeness Centrality:
  1. Hlavní nádraží (U1146Z99): 0.0172
  2. Hlavní nádraží (U1146Z98): 0.0171
  3. Hlavní nádraží (U1146Z2): 0.0169
  4. Hlavní nádraží (U1146Z1): 0.0169
  5. Hlavní nádraží (U1146Z3): 0.0169
✓ Centrality analysis 

In [12]:
# Demonstrate shortest path finding
if gtfs.stops is not None and len(gtfs.stops) > 1:
    # Get two random stops for demonstration
    stops = gtfs.stops['stop_id'].tolist()
    start_stop = stops[0]
    end_stop = stops[min(len(stops)-1, 10)]  # Pick a stop not too far for demo
    
    print("\n🎯 Shortest Path Analysis:")
    print("=" * 40)
    print(f"From: {start_stop}")
    print(f"To: {end_stop}")
    
    # Find shortest path
    try:
        path, cost = router.find_shortest_path(start_stop, end_stop)
        if path:
            print("\n✅ Path found!")
            print(f"Number of stops: {len(path)}")
            print(f"Total cost: {cost:.2f} minutes")
            print(f"Path: {' → '.join(path[:5])}{'...' if len(path) > 5 else ''}")
        else:
            print("❌ No path found between these stops")
    except Exception as e:
        print(f"⚠️ Error finding path: {e}")


🎯 Shortest Path Analysis:
From: U15775Z2
To: U15635Z2

✅ Path found!
Number of stops: 7
Total cost: 16.00 minutes
Path: U15775Z2 → U15773Z2 → U15627Z2 → U15634Z2 → U15631Z2...


## 🤖 Machine Learning: Delay Prediction

In [13]:
# Initialize delay predictor
delay_predictor = DelayPredictor(gtfs)

# Prepare training data (with simulated delays)
training_data = delay_predictor.prepare_training_data(simulate_delays=True)

print("\n🎲 Training Data for Delay Prediction:")
print("=" * 45)
print(f"Training samples: {len(training_data):,}")
print(f"Features: {len([col for col in training_data.columns if col != 'delay_minutes'])}")
print("\nDelay Statistics:")
print(f"  Mean delay: {training_data['delay_minutes'].mean():.2f} minutes")
print(f"  Median delay: {training_data['delay_minutes'].median():.2f} minutes")
print(f"  Max delay: {training_data['delay_minutes'].max():.2f} minutes")
print(f"  Min delay: {training_data['delay_minutes'].min():.2f} minutes")

# Show sample
display(training_data.head())

Preparing training data for delay prediction...
Creating engineered features...
✓ Created features for 961814 records
✓ Prepared training data with 853517 records and 9 features

🎲 Training Data for Delay Prediction:
Training samples: 853,517
Features: 9

Delay Statistics:
  Mean delay: 5.01 minutes
  Median delay: 2.56 minutes
  Max delay: 204.42 minutes
  Min delay: -12.65 minutes
✓ Created features for 961814 records
✓ Prepared training data with 853517 records and 9 features

🎲 Training Data for Delay Prediction:
Training samples: 853,517
Features: 9

Delay Statistics:
  Mean delay: 5.01 minutes
  Median delay: 2.56 minutes
  Max delay: 204.42 minutes
  Min delay: -12.65 minutes


Unnamed: 0,hour,is_rush_hour,route_type,time_period,route_type_name,stop_sequence,is_first_stop,is_last_stop,weather_condition,delay_minutes
0,4,0,3,Night,Bus,1,1,0,0,1.220097
1,4,0,3,Night,Bus,2,0,0,2,25.397272
2,4,0,3,Night,Bus,3,0,0,1,5.826867
3,4,0,3,Night,Bus,4,0,0,1,-0.47122
4,4,0,3,Night,Bus,5,0,0,2,4.344499


In [14]:
# Train the delay prediction model
training_results = delay_predictor.train_model(training_data, model_type='random_forest')

print("\n🎯 Model Training Results:")
print("=" * 40)
print(f"Training MAE: {training_results['train_mae']:.3f} minutes")
print(f"Test MAE: {training_results['test_mae']:.3f} minutes")
print(f"Training RMSE: {training_results['train_rmse']:.3f} minutes")
print(f"Test RMSE: {training_results['test_rmse']:.3f} minutes")
print(f"Training R²: {training_results['train_r2']:.3f}")
print(f"Test R²: {training_results['test_r2']:.3f}")
print(f"CV MAE: {training_results['cv_mae']:.3f} ± {training_results['cv_mae_std']:.3f}")

# Feature importance
if 'feature_importance' in training_results:
    print("\n🔍 Top 5 Most Important Features:")
    for i, feature in enumerate(training_results['feature_importance'][:5], 1):
        print(f"  {i}. {feature['feature']}: {feature['importance']:.4f}")

Training random_forest model for delay prediction...
✓ Model training completed

🎯 Model Training Results:
Training MAE: 4.055 minutes
Test MAE: 4.062 minutes
Training RMSE: 6.679 minutes
Test RMSE: 6.697 minutes
Training R²: 0.299
Test R²: 0.296
CV MAE: 4.069 ± 0.094

🔍 Top 5 Most Important Features:
  1. weather_condition: 0.5596
  2. is_rush_hour: 0.4092
  3. stop_sequence: 0.0143
  4. hour: 0.0074
  5. route_type: 0.0034
✓ Model training completed

🎯 Model Training Results:
Training MAE: 4.055 minutes
Test MAE: 4.062 minutes
Training RMSE: 6.679 minutes
Test RMSE: 6.697 minutes
Training R²: 0.299
Test R²: 0.296
CV MAE: 4.069 ± 0.094

🔍 Top 5 Most Important Features:
  1. weather_condition: 0.5596
  2. is_rush_hour: 0.4092
  3. stop_sequence: 0.0143
  4. hour: 0.0074
  5. route_type: 0.0034


In [15]:
# Demonstrate delay prediction
print("\n🔮 Delay Prediction Examples:")
print("=" * 40)

# Create sample scenarios
scenarios = [
    {
        'name': 'Morning Rush Hour - Bus',
        'hour': 8,
        'is_rush_hour': 1,
        'route_type': 3,  # Bus
        'stop_sequence': 5,
        'weather_condition': 0  # Normal weather
    },
    {
        'name': 'Midday - Subway',
        'hour': 14,
        'is_rush_hour': 0,
        'route_type': 1,  # Subway
        'stop_sequence': 10,
        'weather_condition': 1  # Mild weather issues
    },
    {
        'name': 'Evening Rush - Tram',
        'hour': 18,
        'is_rush_hour': 1,
        'route_type': 0,  # Tram
        'stop_sequence': 3,
        'weather_condition': 2  # Severe weather
    }
]

for scenario in scenarios:
    prediction = delay_predictor.predict_delay(scenario)
    print(f"\n{scenario['name']}:")
    print(f"  Predicted delay: {prediction:.2f} minutes")
    print(f"  Scenario: {scenario['hour']:02d}:00, Weather level: {scenario['weather_condition']}")


🔮 Delay Prediction Examples:

Morning Rush Hour - Bus:
  Predicted delay: 5.34 minutes
  Scenario: 08:00, Weather level: 0

Midday - Subway:
  Predicted delay: 5.54 minutes
  Scenario: 14:00, Weather level: 1

Evening Rush - Tram:
  Predicted delay: 20.74 minutes
  Scenario: 18:00, Weather level: 2


## 📈 Demand Forecasting

In [16]:
# Initialize demand forecaster
demand_forecaster = DemandForecaster(gtfs)

# Simulate ridership data
ridership_data = demand_forecaster.simulate_ridership_data(days=30)

print("\n📊 Ridership Data Simulation:")
print("=" * 40)
print(f"Total records: {len(ridership_data):,}")
print(f"Date range: {ridership_data['date'].min()} to {ridership_data['date'].max()}")
print(f"Total ridership: {ridership_data['ridership'].sum():,}")
print(f"Average daily ridership: {ridership_data.groupby('date')['ridership'].sum().mean():.0f}")

# Show sample
display(ridership_data.head(10))

Simulating ridership data for 30 days...
✓ Simulated 28500 ridership records

📊 Ridership Data Simulation:
Total records: 28,500
Date range: 2024-01-01 to 2024-01-30
Total ridership: 1,763,327
Average daily ridership: 58778
✓ Simulated 28500 ridership records

📊 Ridership Data Simulation:
Total records: 28,500
Date range: 2024-01-01 to 2024-01-30
Total ridership: 1,763,327
Average daily ridership: 58778


Unnamed: 0,date,hour,stop_id,route_id,ridership,is_weekend,is_holiday,day_of_week,month
0,2024-01-01,5,U15775Z2,L5D99,7,False,True,0,1
1,2024-01-01,5,U15773Z2,L259D99,5,False,True,0,1
2,2024-01-01,5,U15627Z2,L835D99,9,False,True,0,1
3,2024-01-01,5,U15640Z2,L812D99,15,False,True,0,1
4,2024-01-01,5,U15639Z2,L502D99,4,False,True,0,1
5,2024-01-01,5,U15637Z2,L541D99,4,False,True,0,1
6,2024-01-01,5,U15628Z2,L166D99,12,False,True,0,1
7,2024-01-01,5,U15634Z2,L668D99,9,False,True,0,1
8,2024-01-01,5,U15631Z2,L130D99,12,False,True,0,1
9,2024-01-01,5,U15632Z1,L75D99,9,False,True,0,1


In [17]:
# Analyze demand patterns
demand_patterns = demand_forecaster.get_demand_patterns()

print("\n📈 Demand Pattern Analysis:")
print("=" * 40)

# Peak hours
print("\n🕐 Peak Hours:")
for hour, ridership in demand_patterns['peak_hours'].items():
    print(f"  {hour}:00 - {ridership:.0f} average riders")

# Weekend vs weekday
print("\n📅 Weekend vs Weekday:")
for is_weekend, ridership in demand_patterns['weekend_vs_weekday'].items():
    day_type = "Weekend" if is_weekend else "Weekday"
    print(f"  {day_type}: {ridership:.0f} average riders")

# Top stops
print("\n🚏 Top 5 Busiest Stops:")
for i, (stop_id, total_ridership) in enumerate(list(demand_patterns['top_stops'].items())[:5], 1):
    print(f"  {i}. {stop_id}: {total_ridership:,.0f} total riders")


📈 Demand Pattern Analysis:

🕐 Peak Hours:
  9:00 - 101 average riders
  8:00 - 99 average riders
  7:00 - 99 average riders

📅 Weekend vs Weekday:
  Weekday: 70 average riders
  Weekend: 40 average riders

🚏 Top 5 Busiest Stops:
  1. U15431Z1: 36,530 total riders
  2. U15552Z1: 36,075 total riders
  3. U15317Z3: 36,037 total riders
  4. U15775Z2: 35,965 total riders
  5. U15627Z2: 35,940 total riders


In [18]:
# Generate ridership forecast
forecast_data = demand_forecaster.forecast_ridership(forecast_days=7)

print("\n🔮 7-Day Ridership Forecast:")
print("=" * 40)

# Daily totals
daily_forecast = forecast_data.groupby('date').agg({
    'forecasted_ridership': 'sum',
    'confidence_lower': 'sum',
    'confidence_upper': 'sum'
}).round(0)

print("Daily Forecast Summary:")
display(daily_forecast)

# Calculate forecast accuracy simulation
total_forecast = daily_forecast['forecasted_ridership'].sum()
avg_daily_forecast = daily_forecast['forecasted_ridership'].mean()
print(f"\nTotal 7-day forecast: {total_forecast:,.0f} riders")
print(f"Average daily forecast: {avg_daily_forecast:,.0f} riders")

Forecasting ridership for 7 days...
✓ Generated 6650 ridership forecasts

🔮 7-Day Ridership Forecast:
Daily Forecast Summary:
✓ Generated 6650 ridership forecasts

🔮 7-Day Ridership Forecast:
Daily Forecast Summary:


Unnamed: 0_level_0,forecasted_ridership,confidence_lower,confidence_upper
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2024-01-31,67614,43044,92351
2024-02-01,67207,43640,90945
2024-02-02,66483,42506,90607
2024-02-03,38003,24192,51927
2024-02-04,37541,23894,51326
2024-02-05,62184,36718,87880
2024-02-06,66702,42188,91334



Total 7-day forecast: 405,734 riders
Average daily forecast: 57,962 riders


## 📊 Interactive Visualizations

In [19]:
# Initialize visualizer
visualizer = TransitVisualizer(gtfs, router)

print("🎨 Creating Interactive Visualizations...\n")

🎨 Creating Interactive Visualizations...



In [20]:
# Create route analysis plots
route_fig = visualizer.plot_route_analysis()
route_fig.show()

print("✅ Route analysis visualization complete!")

Creating route analysis plots...
✓ Route analysis plots created
✓ Route analysis plots created


✅ Route analysis visualization complete!


In [21]:
# Create delay prediction visualizations
delay_fig = visualizer.plot_delay_predictions(delay_predictor)
delay_fig.show()

print("✅ Delay prediction visualization complete!")

Creating delay prediction visualizations...
Preparing training data for delay prediction...
Creating engineered features...
✓ Created features for 961814 records
✓ Prepared training data with 853517 records and 9 features
✓ Created features for 961814 records
✓ Prepared training data with 853517 records and 9 features
✓ Delay prediction visualizations created
✓ Delay prediction visualizations created


✅ Delay prediction visualization complete!


In [22]:
# Create network map
try:
    network_map = visualizer.plot_network_map(interactive=True)
    if network_map:
        # Save map
        network_map.save('../outputs/transit_network_map.html')
        print("✅ Interactive network map created and saved!")
        print("📁 Map saved as: ../outputs/transit_network_map.html")
    else:
        print("⚠️ Could not create interactive map (insufficient coordinate data)")
except Exception as e:
    print(f"⚠️ Map creation failed: {e}")
    print("Creating static network visualization instead...")
    visualizer.plot_network_map(interactive=False)

Creating transit network map...
✓ Interactive network map created
✓ Interactive network map created
✅ Interactive network map created and saved!
📁 Map saved as: ../outputs/transit_network_map.html
✅ Interactive network map created and saved!
📁 Map saved as: ../outputs/transit_network_map.html


## 📱 Comprehensive Dashboard

In [23]:
# Create comprehensive interactive dashboard
dashboard_fig = visualizer.create_interactive_dashboard(
    delay_predictor=delay_predictor,
    demand_forecaster=demand_forecaster
)

dashboard_fig.show()

print("✅ Comprehensive dashboard created!")

Creating interactive dashboard...
Forecasting ridership for 7 days...
Forecasting ridership for 7 days...
✓ Generated 6650 ridership forecasts
✓ Interactive dashboard created
✓ Generated 6650 ridership forecasts
✓ Interactive dashboard created


✅ Comprehensive dashboard created!


In [24]:
# Save dashboard as HTML
try:
    visualizer.save_dashboard_html(dashboard_fig, '../outputs/transit_dashboard.html')
    print("💾 Dashboard saved successfully!")
    print("📁 Dashboard saved as: ../outputs/transit_dashboard.html")
except Exception as e:
    print(f"⚠️ Could not save dashboard: {e}")

✓ Dashboard saved as ../outputs/transit_dashboard.html
💾 Dashboard saved successfully!
📁 Dashboard saved as: ../outputs/transit_dashboard.html


## 🎯 Real-time Analysis Demo

In [25]:
# Simulate real-time analysis for current hour
from datetime import datetime

current_hour = datetime.now().hour
print(f"\n⏰ Real-time Analysis for {current_hour}:00")
print("=" * 45)

# Get routes with current delays
current_delays = visualizer.find_route_with_delay(current_hour, delay_predictor)

print("\n🚨 Routes with Highest Predicted Delays:")
for i, route_data in enumerate(current_delays[:5], 1):
    route_id = route_data['route_id']
    delay = route_data['predicted_delay']
    category = route_data['delay_category']
    
    status_emoji = "🔴" if delay > 5 else "🟡" if delay > 2 else "🟢"
    print(f"  {i}. {status_emoji} {route_id}: {delay:.1f} min delay ({category})")

# Current ridership prediction
current_ridership = demand_forecaster.get_demand_patterns()['hourly'].get(current_hour, 0)
print(f"\n👥 Expected ridership this hour: {current_ridership:.0f} passengers")

# System performance summary
avg_delay = np.mean([r['predicted_delay'] for r in current_delays])
high_delay_routes = len([r for r in current_delays if r['predicted_delay'] > 5])

print("\n📊 System Performance Summary:")
print(f"  Average delay: {avg_delay:.1f} minutes")
print(f"  Routes with high delays: {high_delay_routes}/{len(current_delays)}")
print(f"  Network connectivity: {'Good' if network_stats.get('is_connected', False) else 'Limited'}")
print(f"  Total active routes: {len(gtfs.routes) if gtfs.routes is not None else 'Unknown'}")


⏰ Real-time Analysis for 12:00

🚨 Routes with Highest Predicted Delays:
  1. 🔴 L100D99: 8.8 min delay (High)
  2. 🔴 L120D99: 8.8 min delay (High)
  3. 🔴 L123D99: 8.8 min delay (High)
  4. 🔴 L124D99: 8.8 min delay (High)
  5. 🔴 L12D99: 8.8 min delay (High)

👥 Expected ridership this hour: 55 passengers

📊 System Performance Summary:
  Average delay: 5.3 minutes
  Routes with high delays: 13/20
  Network connectivity: Limited
  Total active routes: 343

🚨 Routes with Highest Predicted Delays:
  1. 🔴 L100D99: 8.8 min delay (High)
  2. 🔴 L120D99: 8.8 min delay (High)
  3. 🔴 L123D99: 8.8 min delay (High)
  4. 🔴 L124D99: 8.8 min delay (High)
  5. 🔴 L12D99: 8.8 min delay (High)

👥 Expected ridership this hour: 55 passengers

📊 System Performance Summary:
  Average delay: 5.3 minutes
  Routes with high delays: 13/20
  Network connectivity: Limited
  Total active routes: 343


## 📋 Summary and Insights

In [26]:
print("\n🎉 GTFS Transit Analysis Complete!")
print("=" * 50)

print("\n📊 Key Findings:")
print(f"  • Analyzed {stats.get('num_routes', 'N/A')} routes across {stats.get('num_stops', 'N/A')} stops")
print(f"  • Network has {network_stats['num_nodes']} nodes and {network_stats['num_edges']} connections")
print(f"  • Average travel time: {travel_times_df['travel_time_minutes'].mean():.1f} minutes")
print(f"  • ML model accuracy: {training_results['test_r2']:.3f} R²")
print(f"  • 7-day ridership forecast: {total_forecast:,.0f} passengers")

print("\n🛠️ Generated Outputs:")
print("  • Interactive network map")
print("  • Delay prediction model")
print("  • Demand forecasting system")
print("  • Comprehensive dashboard")
print("  • Real-time analysis capability")

print("\n🎯 Use Cases Demonstrated:")
print("  • Route optimization")
print("  • Delay prediction and management")
print("  • Capacity planning")
print("  • Real-time passenger information")
print("  • Network performance monitoring")

print("\n✨ This analysis provides a foundation for:")
print("  📈 Data-driven transit planning")
print("  🤖 Predictive maintenance")
print("  👥 Passenger experience optimization")
print("  📱 Real-time information systems")
print("  🌍 Sustainable transportation insights")

print("\n" + "=" * 50)
print("🚊 Thank you for exploring GTFS Transit Analysis! 🚊")
print("=" * 50)


🎉 GTFS Transit Analysis Complete!

📊 Key Findings:
  • Analyzed 343 routes across 10911 stops
  • Network has 10911 nodes and 15439 connections
  • Average travel time: 1.8 minutes
  • ML model accuracy: 0.296 R²
  • 7-day ridership forecast: 405,734 passengers

🛠️ Generated Outputs:
  • Interactive network map
  • Delay prediction model
  • Demand forecasting system
  • Comprehensive dashboard
  • Real-time analysis capability

🎯 Use Cases Demonstrated:
  • Route optimization
  • Delay prediction and management
  • Capacity planning
  • Real-time passenger information
  • Network performance monitoring

✨ This analysis provides a foundation for:
  📈 Data-driven transit planning
  🤖 Predictive maintenance
  👥 Passenger experience optimization
  📱 Real-time information systems
  🌍 Sustainable transportation insights

🚊 Thank you for exploring GTFS Transit Analysis! 🚊
