# EV Charging Duration Classification with Decision Trees

## Background
As electric vehicle (EV) adoption continues to grow globally, understanding charging behavior is critical for optimizing infrastructure, energy consumption, and user experience. This project explores global EV charging session data to classify the duration of a charging session as short, medium, or long using a decision tree classifier. By analyzing session-specific features such as battery capacity, temperature, and energy delivered, we aim to develop a simple, interpretable model that can support real-time decision-making at charging stations.

## Data
The dataset used is from Kaggle’s Global EV Charging Behavior 2024. It includes 800 charging sessions across various countries, with features such as:
- Battery Capacity (kWh)
- Energy Delivered (kWh)
- Charging Cost ($)
- Temperature (°C)
- Station Utilization Rate (%)
- Charging Duration (mins) (used to generate the target class)

A new categorical feature was created:  
- Duration Category: Short (≤ 60 mins), Medium (61–120 mins), Long (> 120 mins)

## Advantages
- Interpretable: Decision trees are easy to understand and visualize, making results explainable to non-technical stakeholders.
- No preprocessing of features required: Works natively with both numerical and categorical inputs.
- Fast and lightweight: Ideal for real-time systems at charging stations.

## Disadvantages
- Prone to overfitting: Without pruning or limiting depth, decision trees may capture noise.
- Limited generalization: Performance may not match that of more complex ensemble methods (e.g., Random Forests).
- No probabilistic output: Unlike logistic regression, predictions are discrete unless explicitly modified.

## Function
The task in this notebook is to classify electric vehicle (EV) charging sessions into three categories — short, medium, or long — based on session features such as battery capacity, energy delivered, charging cost, ambient temperature, and station utilization rate. The classification is performed using a decision tree model, which is trained on historical session data and evaluated on its ability to accurately predict the duration category of unseen data.

This notebook walks through the full pipeline from data loading and feature engineering to model training, evaluation, and visualization.


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

import warnings
warnings.filterwarnings("ignore")

In [None]:
# load csv into a dataframe
df = pd.read_csv("Global_EV_Charging_Behavior_2024.csv")
df.head()

In [None]:
# check for missing values and get a sense of the data
df.info()
df.describe()
df.isnull().sum()

In [None]:
X = df[features]  # input features we want the model to learn from
y = df["Duration Category"]  # labels we’re trying to predict

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)  # split into 70% train, 30% test

In [None]:
clf = DecisionTreeClassifier(max_depth=5, random_state=42)  # create decision tree with limited depth
clf.fit(X_train, y_train)  # train the model on training data
y_pred = clf.predict(X_test)  # predict on test set

In [None]:
# fit the decision tree model
clf = DecisionTreeClassifier(max_depth=5, random_state=42)
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

In [None]:
# see how it did on the test set
print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))
cm = confusion_matrix(y_test, y_pred)

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=clf.classes_, yticklabels=clf.classes_)
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

In [None]:
plt.figure(figsize=(20,10))
plot_tree(clf, filled=True, feature_names=features, class_names=clf.classes_)
plt.title("Decision Tree Visualization")
plt.show()

## Conclusion:
- We built a Decision Tree model to classify charging durations into short, medium, or long.
- The classifications are clearly shown in color, with purple being short, green being medium, and the earth tones being long. 
- Further improvements could include:
  - Encoding categorical features (EV Model, City, etc.)
  
