# Maintainer Contact

**Name:** THAT Le Quang (Xiel)

- **Role:** AI & DS Major Student
- **GitHub:** [thatlq1812](https://github.com/thatlq1812)
- **Email:** fxlqthat@gmail.com / thatlqse183256@fpt.edu.com / thatlq1812@gmail.com
- **Phone:** +84 33 863 6369 / +84 39 730 6450


# Traffic Forecast System - Complete Runbook

**Version**: Academic v4.0 
**Purpose**: Complete setup and operation guide 
**Last Updated**: October 25, 2025

---

## Table of Contents

1. [Environment Setup](#1-Environment-Setup)
2. [Configuration Management](#2-Configuration-Management)
3. [Data Collection](#3-Data-Collection)
4. [Model Training](#4-Model-Training)
5. [Visualization](#5-Visualization)
6. [Monitoring](#6-Monitoring)
7. [Troubleshooting](#7-Troubleshooting)

---

## Prerequisites

- Python 3.8+
- Conda or Miniconda
- Internet connection
- Google Maps API key (optional - can use mock API)

## 1. Environment Setup

### 1.1 Install Miniconda (if not installed)

**Windows**:

In [None]:
# Run in terminal (not this notebook):
# Download from: https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe
# Install and restart terminal

print("For Windows: Download and install Miniconda from https://docs.conda.io/en/latest/miniconda.html")
print("Then restart your terminal and continue.")

**Linux/Mac**:

In [None]:
%%bash
# Run this cell to install Miniconda on Linux/Mac

if ! command -v conda &> /dev/null; then
 echo "Installing Miniconda..."
 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
 bash ~/miniconda.sh -b -p $HOME/miniconda3
 rm ~/miniconda.sh
 
 # Initialize conda
 $HOME/miniconda3/bin/conda init bash
 echo "Miniconda installed! Please restart your terminal."
else
 echo "Conda already installed"
 conda --version
fi

### 1.2 Create Conda Environment

In [None]:
# Check if running in the correct directory
import os
import sys

expected_files = ['environment.yml', 'configs/project_config.yaml', 'README.md']
missing_files = [f for f in expected_files if not os.path.exists(f)]

if missing_files:
 print("ERROR: Not in project root directory!")
 print(f"Missing files: {missing_files}")
 print(f"Current directory: {os.getcwd()}")
 print("\nPlease navigate to project root before running this notebook.")
else:
 print("Project directory: OK")
 print(f"Location: {os.getcwd()}")
 print("\nReady to proceed!")

In [None]:
%%bash
# Create conda environment from environment.yml

if conda env list | grep -q "^dsp "; then
 echo "Environment 'dsp' already exists"
 echo "Updating environment..."
 conda env update -f environment.yml -n dsp
else
 echo "Creating environment 'dsp'..."
 conda env create -f environment.yml
fi

echo ""
echo "Environment ready!"
echo "Activate with: conda activate dsp"

### 1.3 Verify Installation

In [None]:
# Verify Python version and key packages
import sys
print(f"Python version: {sys.version}")
print()

# Test imports
packages = [
 'yaml',
 'pandas',
 'numpy',
 'requests',
 'pydantic',
 'sklearn',
 'matplotlib',
 'seaborn'
]

print("Testing package imports:")
for pkg in packages:
 try:
 __import__(pkg)
 print(f" {pkg}: OK")
 except ImportError as e:
 print(f" {pkg}: FAILED - {e}")

print("\nInstallation verification complete!")

## 2. Configuration Management

### 2.1 View Current Configuration

In [None]:
import yaml
from pprint import pprint

# Load configuration
with open('configs/project_config.yaml', 'r') as f:
 config = yaml.safe_load(f)

print("Current Configuration:")
print("=" * 50)
print()

print(f"Project: {config['project']['name']} v{config['project']['version']}")
print(f"Timezone: {config['globals']['timezone']}")
print()

print("Scheduler:")
print(f" Mode: {config['scheduler']['mode']}")
print(f" Enabled: {config['scheduler']['enabled']}")
print()

print("Node Selection:")
print(f" Max nodes: {config['node_selection']['max_nodes']}")
print(f" Min degree: {config['node_selection']['min_degree']}")
print(f" Min importance: {config['node_selection']['min_importance_score']}")
print()

print("Google Directions:")
print(f" Mock API: {config['google_directions']['use_mock_api']}")
print(f" Radius: {config['google_directions']['radius_km']} km")
print(f" K neighbors: {config['google_directions']['k_neighbors']}")

### 2.2 View Adaptive Schedule

In [None]:
# Display peak hours
peak_hours = config['scheduler']['adaptive']['peak_hours']['time_ranges']

print("Vietnam Peak Hours:")
print("=" * 50)
for i, range_info in enumerate(peak_hours, 1):
 print(f"{i}. {range_info['start']} - {range_info['end']}")

print()
print("Collection Intervals:")
print(f" Peak: {config['scheduler']['adaptive']['peak_interval_minutes']} minutes")
print(f" Off-peak: {config['scheduler']['adaptive']['offpeak_interval_minutes']} minutes")
print(f" Weekend: {config['scheduler']['adaptive']['weekend_interval_minutes']} minutes")

### 2.3 Modify Configuration (Optional)

In [None]:
# Example: Switch between mock and real API

def toggle_mock_api(enable_mock=True):
 """
 Toggle between mock API (free) and real Google API
 
 Args:
 enable_mock: True for mock API (free), False for real API
 """
 with open('configs/project_config.yaml', 'r') as f:
 config = yaml.safe_load(f)
 
 config['google_directions']['use_mock_api'] = enable_mock
 
 with open('configs/project_config.yaml', 'w') as f:
 yaml.dump(config, f, default_flow_style=False)
 
 mode = "MOCK (free)" if enable_mock else "REAL (costs money)"
 print(f"API mode set to: {mode}")
 
 if not enable_mock:
 print("\nWARNING: Using real Google API!")
 print("Make sure GOOGLE_MAPS_API_KEY is set in .env")
 print("Estimated cost: $720/month for current configuration")

# Uncomment to use:
# toggle_mock_api(enable_mock=True) # Use mock API (free)
# toggle_mock_api(enable_mock=False) # Use real API (costs money)

print("Function defined. Uncomment lines above to use.")

## 3. Data Collection

### 3.1 Single Collection Run

In [None]:
%%bash
# Run one collection cycle
# This will collect traffic, weather, and node features

python scripts/collect_and_render.py --once --no-visualize

### 3.2 View Collected Data

In [None]:
import pandas as pd
import json
from datetime import datetime

# Load latest traffic snapshot
try:
 with open('data/traffic_snapshot_normalized.json', 'r') as f:
 traffic_data = json.load(f)
 
 df = pd.DataFrame(traffic_data)
 
 print("Latest Traffic Snapshot:")
 print("=" * 50)
 print(f"Records: {len(df)}")
 print(f"Timestamp: {df['timestamp'].iloc[0] if len(df) > 0 else 'N/A'}")
 print()
 
 print("Sample data:")
 display(df.head())
 
 print("\nTraffic Level Distribution:")
 print(df['traffic_level'].value_counts())
 
except FileNotFoundError:
 print("No traffic data found. Run a collection first.")
except Exception as e:
 print(f"Error loading data: {e}")

### 3.3 Check Collection Schedule

In [None]:
%%bash
# Display collection schedule
python scripts/collect_and_render.py --print-schedule

### 3.4 Start Continuous Collection (Background)

**Note**: This will start a background process. Use with caution in notebooks.

In [None]:
import subprocess
import os

# Only uncomment if you want to start background collection
# WARNING: This will run continuously until stopped

# proc = subprocess.Popen(
# ['python', 'scripts/collect_and_render.py', '--adaptive'],
# stdout=subprocess.PIPE,
# stderr=subprocess.PIPE,
# text=True
# )
# print(f"Collection started with PID: {proc.pid}")
# print("To stop: kill the process or restart the kernel")

print("Background collection disabled in notebook.")
print("For production, use:")
print(" bash scripts/start_collection.sh")
print("or:")
print(" sudo systemctl start traffic-forecast.service")

## 4. Model Training

### 4.1 Load Training Data

In [None]:
# Check for training data
import os
import pandas as pd

data_dir = 'data/processed'

if os.path.exists(f"{data_dir}/val_predictions.csv"):
 df_val = pd.read_csv(f"{data_dir}/val_predictions.csv")
 print("Validation Predictions:")
 print("=" * 50)
 print(f"Records: {len(df_val)}")
 display(df_val.head())
 
 # Calculate accuracy
 if 'actual' in df_val.columns and 'predicted' in df_val.columns:
 accuracy = (df_val['actual'] == df_val['predicted']).mean()
 print(f"\nAccuracy: {accuracy:.2%}")
else:
 print("No training data found.")
 print("Run the training pipeline first.")

### 4.2 View Model Performance

In [None]:
# Load model metadata
import json

try:
 with open('models/model_metadata.json', 'r') as f:
 metadata = json.load(f)
 
 print("Model Information:")
 print("=" * 50)
 print(json.dumps(metadata, indent=2))
 
except FileNotFoundError:
 print("No model metadata found.")

## 5. Visualization

### 5.1 Traffic Level Distribution

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Set style
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 6)

# Load data
try:
 with open('data/traffic_snapshot_normalized.json', 'r') as f:
 traffic_data = json.load(f)
 df = pd.DataFrame(traffic_data)
 
 # Plot traffic levels
 plt.figure(figsize=(10, 6))
 df['traffic_level'].value_counts().sort_index().plot(kind='bar', color='steelblue')
 plt.title('Traffic Level Distribution', fontsize=14, fontweight='bold')
 plt.xlabel('Traffic Level', fontsize=12)
 plt.ylabel('Count', fontsize=12)
 plt.xticks(rotation=0)
 plt.tight_layout()
 plt.show()
 
except FileNotFoundError:
 print("No data available for visualization")

### 5.2 Duration Analysis

In [None]:
# Duration distribution
if 'df' in locals():
 plt.figure(figsize=(12, 6))
 
 plt.subplot(1, 2, 1)
 plt.hist(df['duration_seconds'], bins=30, color='coral', edgecolor='black')
 plt.title('Duration Distribution', fontsize=12, fontweight='bold')
 plt.xlabel('Duration (seconds)', fontsize=10)
 plt.ylabel('Frequency', fontsize=10)
 
 plt.subplot(1, 2, 2)
 df.boxplot(column='duration_seconds', by='traffic_level')
 plt.title('Duration by Traffic Level', fontsize=12, fontweight='bold')
 plt.suptitle('') # Remove automatic title
 plt.xlabel('Traffic Level', fontsize=10)
 plt.ylabel('Duration (seconds)', fontsize=10)
 
 plt.tight_layout()
 plt.show()
else:
 print("Load data first")

## 6. Monitoring

### 6.1 Check System Status

In [None]:
import os
import subprocess
from datetime import datetime

print("System Status")
print("=" * 50)
print(f"Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print()

# Check disk usage
def get_dir_size(path):
 total = 0
 for dirpath, dirnames, filenames in os.walk(path):
 for f in filenames:
 fp = os.path.join(dirpath, f)
 if os.path.exists(fp):
 total += os.path.getsize(fp)
 return total / (1024 * 1024) # MB

print("Storage Usage:")
if os.path.exists('data'):
 print(f" data/: {get_dir_size('data'):.2f} MB")
if os.path.exists('cache'):
 print(f" cache/: {get_dir_size('cache'):.2f} MB")
if os.path.exists('logs'):
 print(f" logs/: {get_dir_size('logs'):.2f} MB")
if os.path.exists('traffic_history.db'):
 print(f" database: {os.path.getsize('traffic_history.db') / (1024*1024):.2f} MB")

print()

# Count data runs
if os.path.exists('data/node'):
 runs = [d for d in os.listdir('data/node') if os.path.isdir(os.path.join('data/node', d))]
 print(f"Data runs: {len(runs)}")
 if runs:
 print(f"Latest: {max(runs)}")

### 6.2 View Recent Logs

In [None]:
# View last 20 lines of log
import os

log_file = 'logs/service.log'
if os.path.exists(log_file):
 with open(log_file, 'r') as f:
 lines = f.readlines()
 print("Recent Logs (last 20 lines):")
 print("=" * 50)
 print(''.join(lines[-20:]))
else:
 print("No log file found")

### 6.3 Database Query

In [None]:
import sqlite3
import pandas as pd

db_path = 'traffic_history.db'

if os.path.exists(db_path):
 conn = sqlite3.connect(db_path)
 
 # Get table info
 tables = pd.read_sql("SELECT name FROM sqlite_master WHERE type='table'", conn)
 print("Database Tables:")
 print(tables)
 print()
 
 # Query traffic history
 try:
 query = """
 SELECT * FROM traffic_history 
 ORDER BY timestamp DESC 
 LIMIT 10
 """
 df_history = pd.read_sql(query, conn)
 print("Recent Traffic History:")
 display(df_history)
 
 # Statistics
 count_query = "SELECT COUNT(*) as total FROM traffic_history"
 total = pd.read_sql(count_query, conn)['total'].iloc[0]
 print(f"\nTotal records: {total}")
 
 except Exception as e:
 print(f"Error querying database: {e}")
 
 conn.close()
else:
 print("Database not found. Run collection with history enabled.")

## 7. Troubleshooting

### 7.1 Common Issues

#### Import Errors

In [None]:
# Reinstall packages
%%bash
pip install -r requirements.txt --upgrade

#### Permission Errors

In [None]:
%%bash
# Fix script permissions
chmod +x scripts/*.sh
chmod 755 scripts/*.py
echo "Permissions fixed"

### 7.2 System Cleanup

In [None]:
%%bash
# Run cleanup script
bash scripts/cleanup.sh

### 7.3 Reset Configuration

In [None]:
# Backup and reset config to defaults
import shutil
from datetime import datetime

config_file = 'configs/project_config.yaml'
backup_file = f'configs/project_config.backup.{datetime.now().strftime("%Y%m%d_%H%M%S")}.yaml'

# Create backup
shutil.copy(config_file, backup_file)
print(f"Configuration backed up to: {backup_file}")
print("\nTo restore defaults, copy from configs/project_config_template.yaml")
print("(if template exists)")

---

## Quick Reference

### Important Commands

```bash
# Activate environment
conda activate dsp

# Single collection
python scripts/collect_and_render.py --once

# Start continuous collection
bash scripts/start_collection.sh

# Check schedule
python scripts/collect_and_render.py --print-schedule

# Cleanup
bash scripts/cleanup.sh

# View logs
tail -f logs/service.log
```

### Configuration Files

- **Main config**: `configs/project_config.yaml`
- **Environment**: `.env`
- **Schema**: `configs/nodes_schema_v2.json`

### Data Locations

- **Raw data**: `data/node/`
- **Processed**: `data/processed/`
- **Database**: `traffic_history.db`
- **Models**: `models/`

### Documentation

- **Deployment**: `DEPLOY.md`
- **Reference**: `doc/reference/`
- **API Docs**: `doc/reference/GOOGLE_API_COST_ANALYSIS.md`

---

**Last Updated**: October 25, 2025 
**Version**: Academic v4.0