<a href="https://colab.research.google.com/github/tejaswinimoguram1206/Capstone-project-SA2025/blob/main/Capstoneproject.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
##Dynamic Pricing for Urban Parking Lots - Capstone Project
##Summer Analytics 2025

In [2]:
#1. Setup and Data Loading
import pandas as pd
import numpy as np
import datetime
import math # For Haversine distance calculation

# Bokeh for real-time visualization
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, DatetimeTickFormatter, NumeralTickFormatter
from bokeh.io import push_notebook, output_notebook
output_notebook() # Enable Bokeh to display plots in the notebook


In [3]:
 df = pd.read_csv('dataset.csv')

In [5]:
df.shape

(18368, 12)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 18368 entries, 0 to 18367
Data columns (total 12 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   ID                      18368 non-null  int64  
 1   SystemCodeNumber        18368 non-null  object 
 2   Capacity                18368 non-null  int64  
 3   Latitude                18368 non-null  float64
 4   Longitude               18368 non-null  float64
 5   Occupancy               18368 non-null  int64  
 6   VehicleType             18368 non-null  object 
 7   TrafficConditionNearby  18368 non-null  object 
 8   QueueLength             18368 non-null  int64  
 9   IsSpecialDay            18368 non-null  int64  
 10  LastUpdatedDate         18368 non-null  object 
 11  LastUpdatedTime         18368 non-null  object 
dtypes: float64(2), int64(5), object(5)
memory usage: 1.7+ MB


In [7]:
df.head()

Unnamed: 0,ID,SystemCodeNumber,Capacity,Latitude,Longitude,Occupancy,VehicleType,TrafficConditionNearby,QueueLength,IsSpecialDay,LastUpdatedDate,LastUpdatedTime
0,0,BHMBCCMKT01,577,26.144536,91.736172,61,car,low,1,0,04-10-2016,07:59:00
1,1,BHMBCCMKT01,577,26.144536,91.736172,64,car,low,1,0,04-10-2016,08:25:00
2,2,BHMBCCMKT01,577,26.144536,91.736172,80,car,low,2,0,04-10-2016,08:59:00
3,3,BHMBCCMKT01,577,26.144536,91.736172,107,car,low,2,0,04-10-2016,09:32:00
4,4,BHMBCCMKT01,577,26.144536,91.736172,150,bike,low,2,0,04-10-2016,09:59:00


In [9]:
#2. Data Preprocessing and Feature Engineering

df['DateTime'] = pd.to_datetime(df['LastUpdatedDate'] + ' ' + df['LastUpdatedTime'], format='%d-%m-%Y %H:%M:%S')
df = df.sort_values(by=['SystemCodeNumber', 'DateTime']).reset_index(drop=True)


In [10]:
# Extract time-based features
df['HourOfDay'] = df['DateTime'].dt.hour
df['DayOfWeek'] = df['DateTime'].dt.dayofweek # Monday=0, Sunday=6
df['MinuteOfDay'] = df['DateTime'].dt.hour * 60 + df['DateTime'].dt.minute
df['IsWeekend'] = df['DayOfWeek'].apply(lambda x: 1 if x >= 5 else 0)
print("Extracted time-based features ('HourOfDay', 'DayOfWeek', 'MinuteOfDay', 'IsWeekend').")

Extracted time-based features ('HourOfDay', 'DayOfWeek', 'MinuteOfDay', 'IsWeekend').


In [11]:
def get_time_of_day_category(hour):
    if 8 <= hour < 10:
        return 'MorningPeak' # E.g., Commute peak
    elif 10 <= hour < 12:
        return 'MidMorning'
    elif 12 <= hour < 14:
        return 'LunchPeak' # E.g., Lunchtime rush
    elif 14 <= hour < 17: # Data up to 4:30 PM (16:30)
        return 'Afternoon'
    else:
        return 'OffPeak'

In [12]:
df['TimeOfDayCategory'] = df['HourOfDay'].apply(get_time_of_day_category)
print("Created 'TimeOfDayCategory' feature.")

Created 'TimeOfDayCategory' feature.


In [13]:
df = pd.get_dummies(df, columns=['VehicleType', 'TrafficConditionNearby', 'TimeOfDayCategory'], drop_first=True)
print("One-hot encoded 'VehicleType', 'TrafficConditionNearby', and 'TimeOfDayCategory'.")


One-hot encoded 'VehicleType', 'TrafficConditionNearby', and 'TimeOfDayCategory'.


In [18]:
df['OccupancyRate'] = df['Occupancy'] / df['Capacity']
df.fillna({'OccupancyRate':0}, inplace=True)
df['OccupancyRate'] = df['OccupancyRate'].replace([np.inf, -np.inf], 0)
df['OccupancyRate'] = np.clip(df['OccupancyRate'], 0, 1)

In [19]:
df['QueueLengthRate'] = df['QueueLength'] / df['Capacity']
df.fillna({'QueueLengthRate': 0} , inplace=True)
df['QueueLengthRate'] = df['QueueLengthRate'].replace([np.inf, -np.inf], 0)
print("Calculated 'QueueLengthRate'.")

Calculated 'QueueLengthRate'.


In [20]:
df['DemandProxy'] = df['OccupancyRate'] + df['QueueLengthRate']
df['DemandProxy'] = np.clip(df['DemandProxy'], 0, 2)
print("Calculated 'DemandProxy' (OccupancyRate + QueueLengthRate).")

Calculated 'DemandProxy' (OccupancyRate + QueueLengthRate).


In [21]:
def haversine_distance(lat1, lon1, lat2, lon2):
    R = 6371 # Earth radius in kilometers
    lat1_rad = math.radians(lat1)
    lon1_rad = math.radians(lon1)
    lat2_rad = math.radians(lat2)
    lon2_rad = math.radians(lon2)

    dlon = lon2_rad - lon1_rad
    dlat = lat2_rad - lat1_rad

    a = math.sin(dlat / 2)**2 + math.cos(lat1_rad) * math.cos(lat2_rad) * math.sin(dlon / 2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
    distance = R * c
    return distance

In [22]:
parking_space_coords = df[['SystemCodeNumber', 'Latitude', 'Longitude']].drop_duplicates().set_index('SystemCodeNumber')


In [24]:
PROXIMITY_THRESHOLD_KM = 1.5

In [25]:
nearby_spaces_map = {}
for sys_code1, coords1 in parking_space_coords.iterrows():
    nearby_list = []
    for sys_code2, coords2 in parking_space_coords.iterrows():
        if sys_code1 != sys_code2:
            dist = haversine_distance(coords1['Latitude'], coords1['Longitude'],
                                      coords2['Latitude'], coords2['Longitude'])
            if dist <= PROXIMITY_THRESHOLD_KM:
                nearby_list.append(sys_code2)
    nearby_spaces_map[sys_code1] = nearby_list

df['NearbyCompetitionPressure'] = 0.0

In [26]:
hourly_avg_demand = df.groupby(['SystemCodeNumber', 'DateTime'])['DemandProxy'].mean().reset_index()
hourly_avg_demand.rename(columns={'DemandProxy': 'HourlyAvgDemandProxy'}, inplace=True)

In [27]:
temp_nearby_pressure = []

for dt_slice in df['DateTime'].unique():
    current_time_slice = df[df['DateTime'] == dt_slice].copy()

    for idx, row in current_time_slice.iterrows():
        current_sys_code = row['SystemCodeNumber']


        nearby_competitors_ids = nearby_spaces_map.get(current_sys_code, [])

        nearby_demands = current_time_slice[
            current_time_slice['SystemCodeNumber'].isin(nearby_competitors_ids)
        ]['DemandProxy']

        if not nearby_demands.empty:
            avg_nearby_demand = nearby_demands.mean()
        else:
            avg_nearby_demand = 0.0

        temp_nearby_pressure.append({'ID': row['ID'], 'NearbyCompetitionPressure': avg_nearby_demand})

In [28]:
nearby_pressure_df = pd.DataFrame(temp_nearby_pressure)
df = df.merge(nearby_pressure_df, on='ID', how='left')

In [70]:
class CustomDynamicPricingModel:
    def __init__(self, base_price=5.0, demand_proxy_coeff=10.0,
                 competition_coeff=5.0, traffic_high_coeff=2.0,
                 special_day_coeff=3.0, morning_peak_coeff=1.5,
                 lunch_peak_coeff=2.0, max_price=30.0):

        self.base_price = base_price
        self.demand_proxy_coeff = demand_proxy_coeff
        self.competition_coeff = competition_coeff
        self.traffic_high_coeff = traffic_high_coeff
        self.special_day_coeff = special_day_coeff
        self.morning_peak_coeff = morning_peak_coeff
        self.lunch_peak_coeff = lunch_peak_coeff
        self.max_price = max_price

        self.coefficients = {
            'Base_Price': self.base_price,
            'Demand_Proxy_Effect': self.demand_proxy_coeff,
            'Competition_Effect': self.competition_coeff,
            'Traffic_High_Effect': self.traffic_high_coeff,
            'Special_Day_Effect': self.special_day_coeff,
            'Morning_Peak_Effect': self.morning_peak_coeff,
            'Lunch_Peak_Effect': self.lunch_peak_coeff
        }

    def predict_price(self, features):
        # Initialize current_price with the base price
        current_price = self.base_price

        # Additive effects from demand and competitive pressure
        current_price = current_price + self.demand_proxy_coeff * features['DemandProxy']
        current_price = current_price + self.competition_coeff * features['NearbyCompetitionPressure_y'] # Use the correct column name

        # Additive effects from categorical conditions (one-hot encoded features)
        if 'TrafficConditionNearby_high' in features and features['TrafficConditionNearby_high'] == 1:
            current_price += self.traffic_high_coeff
        # You could add more specific traffic conditions here if needed.

        if 'IsSpecialDay' in features and features['IsSpecialDay'] == 1:
            current_price += self.special_day_coeff

        if 'TimeOfDayCategory_MorningPeak' in features and features['TimeOfDayCategory_MorningPeak'] == 1:
            current_price += self.morning_peak_coeff
        if 'TimeOfDayCategory_LunchPeak' in features and features['TimeOfDayCategory_LunchPeak'] == 1:
            current_price += self.lunch_peak_coeff


        # Ensure price is non-negative and within a maximum limit
        current_price = max(self.base_price, current_price) # Price should not drop below base
        current_price = min(self.max_price, current_price) # Price should not exceed max


        return current_price

In [60]:

    def predict_price(self, features):
 self.coefficients = {
            'Base_Price': self.base_price,
            'Demand_Proxy_Effect': self.demand_proxy_coeff,
            'Competition_Effect': self.competition_coeff,
            'Traffic_High_Effect': self.traffic_high_coeff,
            'Special_Day_Effect': self.special_day_coeff,
            'Morning_Peak_Effect': self.morning_peak_coeff,
            'Lunch_Peak_Effect': self.lunch_peak_coeff
}



In [64]:
    def predict_price(self, features):
        # Initialize current_price with the base price
        current_price = self.base_price

        # Additive effects from demand and competitive pressure
        current_price = current_price + self.demand_proxy_coeff * features['DemandProxy']
        current_price = current_price + self.competition_coeff * features['NearbyCompetitionPressure']

        # Additive effects from categorical conditions (one-hot encoded features)
        if 'TrafficConditionNearby_high' in features and features['TrafficConditionNearby_high'] == 1:
            current_price += self.traffic_high_coeff
        # You could add more specific traffic conditions here if needed.

        if 'IsSpecialDay' in features and features['IsSpecialDay'] == 1:
            current_price += self.special_day_coeff

        if 'TimeOfDayCategory_MorningPeak' in features and features['TimeOfDayCategory_MorningPeak'] == 1:
            current_price += self.morning_peak_coeff
        if 'TimeOfDayCategory_LunchPeak' in features and features['TimeOfDayCategory_LunchPeak'] == 1:
            current_price += self.lunch_peak_coeff

        # Ensure price is non-negative and within a maximum limit
        current_price = max(self.base_price, current_price) # Price should not drop below base
        current_price = min(self.max_price, current_price) # Price should not exceed max

        return current_price

In [73]:
pricing_model = CustomDynamicPricingModel()
print("Custom Dynamic Pricing Model initialized.")
print("Model Coefficients for reference:")
for key, value in pricing_model.coefficients.items():
    print(f"- {key}: {value}")

# Apply the pricing model to the entire preprocessed DataFrame
# This simulates the batch application of the model on processed features
df['PredictedPrice'] = df.apply(lambda row: pricing_model.predict_price(row), axis=1)
print("\nPredicted prices calculated for all data points.")
print(df[['DateTime', 'SystemCodeNumber', 'OccupancyRate', 'QueueLength', 'NearbyCompetitionPressure_y', 'PredictedPrice', 'IsSpecialDay']].head())

Custom Dynamic Pricing Model initialized.
Model Coefficients for reference:
- Base_Price: 5.0
- Demand_Proxy_Effect: 10.0
- Competition_Effect: 5.0
- Traffic_High_Effect: 2.0
- Special_Day_Effect: 3.0
- Morning_Peak_Effect: 1.5
- Lunch_Peak_Effect: 2.0

Predicted prices calculated for all data points.
             DateTime SystemCodeNumber  OccupancyRate  QueueLength  \
0 2016-10-04 07:59:00      BHMBCCMKT01       0.105719            1   
1 2016-10-04 08:25:00      BHMBCCMKT01       0.110919            1   
2 2016-10-04 08:59:00      BHMBCCMKT01       0.138648            2   
3 2016-10-04 09:32:00      BHMBCCMKT01       0.185442            2   
4 2016-10-04 09:59:00      BHMBCCMKT01       0.259965            2   

   NearbyCompetitionPressure_y  PredictedPrice  IsSpecialDay  
0                     0.273793        7.443489             0  
1                     0.331949        9.286263             0  
2                     0.419889       10.020589             0  
3                     0.

In [78]:
unique_parking_spaces = df['SystemCodeNumber'].unique()
# Plot up to 3-4 spaces for a manageable visual output, or adjust as needed.
num_spaces_to_plot = min(4, len(unique_parking_spaces))
spaces_for_viz = unique_parking_spaces[:num_spaces_to_plot]

plots = []

for space_id in spaces_for_viz:
    space_df = df[df['SystemCodeNumber'] == space_id].sort_values('DateTime').copy()

    # Create ColumnDataSource for this specific parking space
    source = ColumnDataSource(space_df)

    # Create a new plot
    p = figure(x_axis_type="datetime",
               title=f"Dynamic Pricing & Demand for Parking Space: {space_id}",
               height=350, width=900,
               tools="pan,wheel_zoom,box_zoom,reset,save",
               x_axis_label="Time", y_axis_label="Value") # Generic Y-label, as it shows different metrics

    # Line for Predicted Price
    p.line(x='DateTime', y='PredictedPrice', source=source,
           line_width=2, color='blue', legend_label="Predicted Price ($)")

    # Line for Occupancy Rate (as a proxy for demand/utilization)
    p.line(x='DateTime', y='OccupancyRate', source=source,
           line_width=1.5, color='green', legend_label="Occupancy Rate (0-1)", line_dash='dashed')

    # Line for Nearby Competition Pressure
    # This visualizes a key input to the pricing model
    p.line(x='DateTime', y='NearbyCompetitionPressure_y', source=source,
           line_width=1.5, color='red', legend_label="Nearby Competition (Demand Proxy)", line_dash='dotted')

    # Format X-axis for datetime
    p.xaxis.formatter = DatetimeTickFormatter(
        hours="%H:%M",
        days="%d %b %H:%M",
        months="%d %b %Y",
        years="%Y"
    )
    p.xaxis.major_label_orientation = np.pi/4 # Rotate labels for readability

    # Add tools for interactivity
    # p.add_tools() # This line is incorrect, tools are added in the figure function

    p.legend.location = "top_left"
    p.legend.click_policy="hide" # Allow hiding/showing lines by clicking legend

    plots.append(p)

# Show all plots (Bokeh will stack them vertically)
for plot_item in plots:
    show(plot_item)

###**Project Report: Dynamic Pricing for Urban Parking Lots**

#1. Project Objective and Background:**
Urban parking spaces are a finite resource, and static pricing models often lead to suboptimal utilization, resulting in either overcrowding during peak times or underutilization during off-peak hours. The primary goal of this capstone project is to develop an intelligent, data-driven dynamic pricing engine designed to enhance the utilization of 14 urban parking spaces. The system simulates real-time data streams to continuously adjust parking prices based on fluctuating demand, competitive dynamics, and real-time environmental conditions.

##2. Data Description and Preprocessing:**
The project utilizes a dataset comprising 73 days of operational data from 14 distinct urban parking spaces, with data sampled at 18 time points daily (from 8:00 AM to 4:30 PM, at 30-minute intervals). Each record in the `dataset.csv` file provides crucial information:
* **Location Information:** `Latitude` and `Longitude` for each parking space, essential for calculating proximity to competitors.
* **Parking Lot Features:** `Capacity` (maximum vehicles), `Occupancy` (current vehicles), and `QueueLength` (vehicles waiting for entry).
* **Vehicle Information:** `VehicleType` (car, bike, or truck).
* **Environmental Conditions:** `TrafficConditionNearby` and `IsSpecialDay` (e.g., holidays, events).

**Key Preprocessing Steps:**
* **DateTime Unification:** `LastUpdatedDate` and `LastUpdatedTime` columns were combined and converted into a single `DateTime` object for time-series analysis and sorting.
* **Temporal Feature Extraction:** `HourOfDay`, `DayOfWeek`, `MinuteOfDay`, and `IsWeekend` were derived to capture cyclical and daily demand patterns. A `TimeOfDayCategory` (MorningPeak, MidMorning, LunchPeak, Afternoon, OffPeak) was created based on common urban parking demand profiles.
* **Categorical Encoding:** `VehicleType`, `TrafficConditionNearby`, and `TimeOfDayCategory` columns were one-hot encoded to transform categorical text data into a numerical format suitable for the pricing model.
* **Occupancy Rate Calculation:** `OccupancyRate` was computed as `Occupancy / Capacity`, providing a normalized measure of current parking space utilization. Values were clipped between 0 and 1.
* **Queue Length Rate:** `QueueLength` was normalized by `Capacity` to get `QueueLengthRate`, reflecting the intensity of waiting vehicles relative to the lot's size.
* **Demand Proxy Formulation:** A composite `DemandProxy` was engineered by summing `OccupancyRate` and `QueueLengthRate`. This metric serves as a direct indicator of instantaneous demand pressure on a parking space.
* **Nearby Competition Pressure:** To quantify competitive influence, `NearbyCompetitionPressure` was calculated for each parking space at each timestamp. This metric represents the average `DemandProxy` of all other parking spaces within a defined geographical radius (e.g., 1.5 km, calculated using the Haversine formula). This captures how crowded or in-demand neighboring parking facilities are, directly influencing competitive pricing strategy.

##3. Dynamic Pricing Model (Custom Implementation):**

The dynamic pricing logic is encapsulated within a custom Python class, `CustomDynamicPricingModel`. Adhering to the project's constraints, this model is built from scratch using only `numpy` and `pandas` functionalities. It operates as an interpretable, rule-based linear model, where the final price is an aggregation of a base price and additive adjustments based on various input features.

**Demand Function:**
The core of our pricing strategy is the `DemandProxy`. The model's logic follows a functional form that adjusts prices based on the perceived demand and competitive environment: