# Capstone Project – Dynamic Pricing for Urban Parking

This project simulates real-time dynamic pricing for 14 smart parking lots in an urban area.  
Pricing is computed using two models:

- **Model 1:** Baseline pricing based on occupancy
- **Model 2:** Demand-based pricing using traffic, queue length, vehicle type, and more

Streaming is powered by **Pathway**, and interactive visualization is done using **Bokeh**.


In [None]:
!pip install pathway




## Dataset Overview

The dataset contains live sensor updates per parking lot with the following fields:

- `SystemCodeNumber`: Unique ID for each parking lot
- `Capacity`, `Occupancy`, `QueueLength`: Real-time usage data
- `VehicleType`: Type of incoming vehicle (car/bike/truck)
- `TrafficConditionNearby`: Low, medium, or high traffic
- `IsSpecialDay`: Whether it's a holiday or event
- `LastUpdatedDate`, `LastUpdatedTime`: Real-time timestamp


In [None]:
import pandas as pd

# Upload dataset.csv to Colab environment
from google.colab import files
uploaded = files.upload()

# Load dataset
df = pd.read_csv("dataset.csv")
df.head()


Saving dataset.csv to dataset (3).csv


Unnamed: 0,ID,SystemCodeNumber,Capacity,Latitude,Longitude,Occupancy,VehicleType,TrafficConditionNearby,QueueLength,IsSpecialDay,LastUpdatedDate,LastUpdatedTime
0,0,BHMBCCMKT01,577,26.144536,91.736172,61,car,low,1,0,04-10-2016,07:59:00
1,1,BHMBCCMKT01,577,26.144536,91.736172,64,car,low,1,0,04-10-2016,08:25:00
2,2,BHMBCCMKT01,577,26.144536,91.736172,80,car,low,2,0,04-10-2016,08:59:00
3,3,BHMBCCMKT01,577,26.144536,91.736172,107,car,low,2,0,04-10-2016,09:32:00
4,4,BHMBCCMKT01,577,26.144536,91.736172,150,bike,low,2,0,04-10-2016,09:59:00


In [None]:
import pathway as pw

# Define the streaming schema
class ParkingStreamSchema(pw.Schema):
    SystemCodeNumber: str
    Capacity: float
    Occupancy: float
    QueueLength: float
    VehicleType: str
    TrafficConditionNearby: str
    IsSpecialDay: int
    LastUpdatedDate: str
    LastUpdatedTime: str


In [None]:
# -------------------------------
# 🔧 Interactive PRICING MODEL FLAG
# -------------------------------

valid_models = ["baseline", "demand"]
PRICING_MODEL = input("Select pricing model (baseline / demand): ").strip().lower()

if PRICING_MODEL not in valid_models:
    print(f"⚠️ Invalid input. Defaulting to 'baseline'.")
    PRICING_MODEL = "baseline"

print(f"✅ Currently using pricing model: {PRICING_MODEL.upper()}")



Select pricing model (baseline / demand): baseline
✅ Currently using pricing model: BASELINE


In [None]:
from datetime import datetime

def get_timestamp(row):
    """
    Combine LastUpdatedDate and LastUpdatedTime into a datetime object.
    """
    try:
        return datetime.strptime(
            f"{row['LastUpdatedDate']} {row['LastUpdatedTime']}",
            "%d-%m-%Y %H:%M:%S"
        )
    except:
        return None  # handle bad data gracefully


In [None]:
import time

# Let's check how many unique parking lots
lot_ids = df['SystemCodeNumber'].unique()
print(f"Total unique lots: {len(lot_ids)}")

df['Timestamp'] = pd.to_datetime(df['LastUpdatedDate'] + ' ' + df['LastUpdatedTime'],
                                  format='%d-%m-%Y %H:%M:%S')

# Sort by timestamp if available
df = df.sort_values(by='Timestamp')  # or 'Time', whatever your column name is

# Reset index for clean iteration
df = df.reset_index(drop=True)


Total unique lots: 14


### Model 1: Baseline Linear Pricing

We use a simple linear formula where price increases with occupancy:

\[
\text{Price}_{t+1} = \text{Price}_t + \alpha \cdot \left( \frac{\text{Occupancy}}{\text{Capacity}} \right)
\]

- Base price = \$10
- α = 2.0
- Price is clamped between \$5 and \$20


In [None]:
BASE_PRICE = 10.0
ALPHA = 2.0
previous_prices = {}

def baseline_pricing(row):
    """
    Model 1: Baseline Linear Pricing
    Increases price linearly with occupancy ratio.

    Price(t+1) = Price(t) + ALPHA * (Occupancy / Capacity)

    Args:
        row (pd.Series): One row of the dataset

    Returns:
        float: Updated price
    """
    lot_id = row['SystemCodeNumber']

    # Safe type conversion with fallback
    try:
        occupancy = float(row['Occupancy'])
        capacity = float(row['Capacity'])
    except:
        return BASE_PRICE  # fallback to base price if data is bad

    prev_price = previous_prices.get(lot_id, BASE_PRICE)

    # Avoid divide-by-zero
    occupancy_ratio = occupancy / capacity if capacity > 0 else 0

    new_price = prev_price + ALPHA * occupancy_ratio

    # Keep price within realistic bounds
    new_price = max(5.0, min(20.0, new_price))

    previous_prices[lot_id] = new_price

    return round(new_price, 2)


## Model 2: Demand-Based Pricing

This model adjusts pricing using multiple features to reflect actual demand.

**Formula:**

\[
\text{Demand} = \alpha \cdot \frac{Occupancy}{Capacity} + \beta \cdot QueueLength - \gamma \cdot Traffic + \delta \cdot IsSpecialDay + \epsilon \cdot VehicleTypeWeight
\]

\[
\text{Price} = BasePrice \cdot \left(1 + \lambda \cdot \text{NormalizedDemand} \right)
\]

**Parameters:**
- α = 2.0  
- β = 0.5  
- γ = 1.0  
- δ = 2.0  
- ε = 1.5  
- λ = 0.2


In [None]:
BASE_PRICE = 10.0
LAMBDA = 0.2

# Demand-Based Pricing (Model 2)

def demand_based_pricing(row):
    lot_id = row['SystemCodeNumber']
    occupancy = float(row['Occupancy'])
    capacity = float(row['Capacity'])
    queue = float(row['QueueLength'])
    is_special_day = int(row['IsSpecialDay'])
    traffic = traffic_score(row['TrafficConditionNearby'])
    vehicle_weight = vehicle_type_weight(row['VehicleType'])

    # Coefficients for demand components
    α, β, γ, δ, ε = 2.0, 0.5, 1.0, 2.0, 1.5

    occupancy_ratio = occupancy / capacity if capacity else 0

    # Calculate raw demand score
    demand = (
        α * occupancy_ratio +
        β * queue -
        γ * traffic +
        δ * is_special_day +
        ε * vehicle_weight
    )

    # Normalize demand score to [0, 1]
    normalized_demand = min(max(demand / 10.0, 0), 1)

    # Final price based on normalized demand
    price = BASE_PRICE * (1 + LAMBDA * normalized_demand)

    return round(min(max(price, 5.0), 20.0), 2)


In [None]:
@pw.udf
def pricing_udf(
    model: str,
    lot_id: str,
    capacity: float,
    occupancy: float,
    queue: float,
    vehicle_type: str,
    traffic: str,
    is_special: int
) -> float:
    # Baseline model
    if model == "baseline":
        occ_ratio = occupancy / capacity if capacity else 0
        price = 10 + 2 * occ_ratio
        return round(min(max(price, 5.0), 20.0), 2)

    # Demand-based model
    elif model == "demand":
        traffic_map = {'low': 1, 'medium': 2, 'high': 3}
        vehicle_weight = {'car': 1.0, 'bike': 0.5, 'truck': 1.5}

        t_score = traffic_map.get(traffic.lower(), 1)
        v_weight = vehicle_weight.get(vehicle_type.lower(), 1.0)

        α, β, γ, δ, ε = 2.0, 0.5, 1.0, 2.0, 1.5
        occ_ratio = occupancy / capacity if capacity else 0

        demand = α * occ_ratio + β * queue - γ * t_score + δ * is_special + ε * v_weight
        norm_demand = min(max(demand / 10.0, 0), 1)

        price = 10 * (1 + 0.2 * norm_demand)
        return round(min(max(price, 5.0), 20.0), 2)

    return 10.0  # fallback


In [None]:
# Model selector wrapper
def compute_price(row):
    if PRICING_MODEL == "baseline":
        return baseline_pricing(row)
    elif PRICING_MODEL == "demand":
        return demand_based_pricing(row)
    else:
        raise ValueError("Invalid PRICING_MODEL flag. Use 'baseline' or 'demand'.")


In [None]:
for i in range(100):
    row = df.iloc[i]
    lot_id = row['SystemCodeNumber']

    # Use real timestamp
    timestamp = get_timestamp(row)
    if timestamp is None:
        continue

    price = compute_price(row)


    # Save for visualization
    price_history[lot_id].append(price)
    time_history[lot_id].append(timestamp)

    print(f"[{timestamp}] Lot {lot_id} → ${price:.2f}")

    time.sleep(0.1)


[2016-10-04 07:59:00] Lot BHMBCCMKT01 → $10.21
[2016-10-04 07:59:00] Lot BHMNCPHST01 → $10.39
[2016-10-04 07:59:00] Lot BHMMBMMBX01 → $10.77
[2016-10-04 07:59:00] Lot BHMNCPNST01 → $11.03
[2016-10-04 07:59:00] Lot Shopping → $10.64
[2016-10-04 07:59:00] Lot BHMEURBRD01 → $10.50
[2016-10-04 07:59:00] Lot Broad Street → $10.52
[2016-10-04 07:59:00] Lot Others-CCCPS8 → $10.67
[2016-10-04 07:59:00] Lot Others-CCCPS105a → $10.71
[2016-10-04 07:59:00] Lot Others-CCCPS119a → $10.14
[2016-10-04 07:59:00] Lot BHMBCCTHL01 → $10.62
[2016-10-04 07:59:00] Lot Others-CCCPS135a → $10.56
[2016-10-04 07:59:00] Lot Others-CCCPS202 → $10.37
[2016-10-04 07:59:00] Lot Others-CCCPS98 → $10.38
[2016-10-04 08:25:00] Lot Others-CCCPS8 → $11.40
[2016-10-04 08:25:00] Lot BHMNCPNST01 → $12.13
[2016-10-04 08:25:00] Lot Others-CCCPS105a → $11.48
[2016-10-04 08:25:00] Lot Others-CCCPS202 → $10.83
[2016-10-04 08:25:00] Lot Others-CCCPS135a → $11.31
[2016-10-04 08:25:00] Lot BHMBCCTHL01 → $11.29
[2016-10-04 08:25:00] 

In [None]:
!pip install bokeh



In [None]:
from collections import defaultdict

# Store price history per lot
price_history = defaultdict(list)
time_history = defaultdict(list)


In [None]:
from datetime import datetime

for i in range(100):
    row = df.iloc[i]
    lot_id = row['SystemCodeNumber']
    time_stamp = row['Timestamp']  # or any timestamp column

    price = baseline_pricing(row)

    # Save price and timestamp for plotting
    price_history[lot_id].append(price)
    time_history[lot_id].append(time_stamp)

    print(f"[{time_stamp}] Lot {lot_id} → ${price:.2f}")
    time.sleep(0.1)


[2016-10-04 07:59:00] Lot BHMBCCMKT01 → $13.19
[2016-10-04 07:59:00] Lot BHMNCPHST01 → $17.67
[2016-10-04 07:59:00] Lot BHMMBMMBX01 → $19.25
[2016-10-04 07:59:00] Lot BHMNCPNST01 → $19.87
[2016-10-04 07:59:00] Lot Shopping → $19.86
[2016-10-04 07:59:00] Lot BHMEURBRD01 → $19.84
[2016-10-04 07:59:00] Lot Broad Street → $20.00
[2016-10-04 07:59:00] Lot Others-CCCPS8 → $17.02
[2016-10-04 07:59:00] Lot Others-CCCPS105a → $17.78
[2016-10-04 07:59:00] Lot Others-CCCPS119a → $12.10
[2016-10-04 07:59:00] Lot BHMBCCTHL01 → $17.94
[2016-10-04 07:59:00] Lot Others-CCCPS135a → $18.83
[2016-10-04 07:59:00] Lot Others-CCCPS202 → $15.05
[2016-10-04 07:59:00] Lot Others-CCCPS98 → $14.50
[2016-10-04 08:25:00] Lot Others-CCCPS8 → $17.74
[2016-10-04 08:25:00] Lot BHMNCPNST01 → $20.00
[2016-10-04 08:25:00] Lot Others-CCCPS105a → $18.56
[2016-10-04 08:25:00] Lot Others-CCCPS202 → $15.50
[2016-10-04 08:25:00] Lot Others-CCCPS135a → $19.59
[2016-10-04 08:25:00] Lot BHMBCCTHL01 → $18.60
[2016-10-04 08:25:00] 

## Real-Time Pricing Trends

The following Bokeh plot shows price variation over time for selected parking lots.

This confirms that pricing adapts gradually and remains within defined bounds.


In [None]:
from bokeh.plotting import figure, show, output_notebook
output_notebook()

# Create Bokeh figure with real time on X-axis
p = figure(
    title="Price Trends per Parking Lot",
    x_axis_label='Timestamp',
    y_axis_label='Price ($)',
    x_axis_type='datetime',  # IMPORTANT: makes the X-axis show time correctly
    width=900,
    height=450
)

# Plot only first 50 lots for clarity
for lot_id in list(price_history.keys())[:50]:
    x_vals = time_history[lot_id]  # ⬅️ Real timestamps here
    y_vals = price_history[lot_id]
    p.line(x_vals, y_vals, line_width=2, legend_label=str(lot_id))

# Legend settings
p.legend.title = "Lot IDs"
p.legend.click_policy = "hide"

# Show the plot
show(p)


In [None]:
def traffic_score(level):
    return {'low': 1, 'medium': 2, 'high': 3}.get(str(level).lower(), 1)

def vehicle_type_weight(vtype):
    return {'car': 1.0, 'bike': 0.5, 'truck': 1.5}.get(str(vtype).lower(), 1.0)


In [None]:
import pandas as pd
from bokeh.plotting import figure, show, output_notebook

# Load Pathway output
df_out = pd.read_csv("pathway_output.csv")
output_notebook()

# Convert timestamp to datetime
df_out['timestamp'] = pd.to_datetime(df_out['timestamp'])

# Plot sample lots
p = figure(title="Pathway Pricing Output", x_axis_type="datetime", width=900, height=450)
for lot in df_out['lot_id'].unique()[:5]:  # first 5 lots only
    df_lot = df_out[df_out['lot_id'] == lot]
    p.line(df_lot['timestamp'], df_lot['price'], legend_label=lot, line_width=2)

p.legend.title = "Lot IDs"
p.legend.click_policy = "hide"
show(p)


  df_out['timestamp'] = pd.to_datetime(df_out['timestamp'])


## Visualization of Pricing Behavior

We visualize how real-time prices respond to:
- Occupancy level
- Traffic congestion
- Queue length
- Vehicle type and special days

We also compare multiple lots to simulate competitive pricing.


In [None]:
# 👇 Set model you want to test: "baseline" or "demand"
model_choice = "demand"

# Set up the input stream from dataset.csv
input_table = pw.io.csv.read(
    "dataset.csv",
    schema=ParkingStreamSchema,
    mode="streaming",  # simulates real-time
    autocommit_duration_ms=1000,
)

# Apply pricing logic per row
result = input_table.select(
    lot_id = input_table.SystemCodeNumber,
    timestamp = input_table.LastUpdatedDate + " " + input_table.LastUpdatedTime,
    price = pricing_udf(
        model_choice,
        input_table.SystemCodeNumber,
        input_table.Capacity,
        input_table.Occupancy,
        input_table.QueueLength,
        input_table.VehicleType,
        input_table.TrafficConditionNearby,
        input_table.IsSpecialDay,
    )
)

# Output stream to JSON file
pw.io.csv.write(result, filename="pathway_output.csv")

# Run the Pathway pipeline
pw.run()
