# Initial feature engineering

This notebook performs initial feature engineering on the raw traffic dataset. It adds basic temporal features such as hour, weekday, and is_rush_hour, then saves the dataset for future use. 

------

I load the cleaned and formatted traffic dataset (from Notebook 1) and add new features in preparation for modeling. I extract hour and weekday from the timestamp to allow downstream models to learn temporal patterns in congestion.


In [7]:
import pandas as pd

# Loading cleaned data (saved from notebook 1)
df = pd.read_csv("../data/cleaned_tfl_road_status.csv")

#Extract hour (0–23) and weekday (0=Monday, 6=Sunday) for each traffic observation
df['hour'] = pd.to_datetime(df['timestamp']).dt.hour
df['day_of_week'] = pd.to_datetime(df['timestamp']).dt.weekday  # 0 = Monday

# Weekend flag
df['is_weekend'] = df['day_of_week'].isin([5, 6]).astype(int)


#Flag whether the observation falls within typical London rush hours (morning 7–9 AM, evening: 4–6 PM)
#Encoded as 1 for rush hour, 0 otherwise
df['is_rush_hour'] = df['hour'].isin([7, 8, 9, 16, 17, 18]).astype(int)

# Severity level mapping
severity_map = {
    'Good': 0,
    'No Exceptional Delays': 0,
    'Minor': 1,
    'Minor Delays': 1,
    'Serious Delays': 2,
        'Serious': 2,  

    'Severe Delays': 3
}
df['severity_level'] = df['status'].map(severity_map)

#Save the updated dataset, which now includes basic time-based features

df.to_csv("../data/engineered_traffic_data.csv", index=False)
print("Engineered dataset saved")


Engineered dataset saved


In [8]:
unmapped = df[~df['status'].isin(severity_map.keys())]['status'].unique()
print("Unmapped statuses:", unmapped)



Unmapped statuses: []


In [9]:
print("NaNs in severity_level:", df['severity_level'].isnull().sum())


NaNs in severity_level: 0


In [10]:
print(df['status'].value_counts())


status
Good       63108
Minor      11823
Serious     4557
Name: count, dtype: int64


In [11]:
df


Unnamed: 0,road,status,description,timestamp,hour,weekday,day_of_week,is_weekend,is_rush_hour,severity_level
0,A1,Good,No Exceptional Delays,2025-03-10 00:08:00,0,Monday,0,0,0,0
1,Western Cross Route,Good,No Exceptional Delays,2025-03-10 00:08:00,0,Monday,0,0,0,0
2,Southern River Route,Good,No Exceptional Delays,2025-03-10 00:08:00,0,Monday,0,0,0,0
3,Inner Ring,Good,No Exceptional Delays,2025-03-10 00:08:00,0,Monday,0,0,0,0
4,Farringdon Cross Route,Good,No Exceptional Delays,2025-03-10 00:08:00,0,Monday,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...
79483,A2,Good,No Exceptional Delays,2025-05-20 23:34:00,23,Tuesday,1,0,0,0
79484,A1,Good,No Exceptional Delays,2025-05-20 23:34:00,23,Tuesday,1,0,0,0
79485,Southern River Route,Good,No Exceptional Delays,2025-05-20 23:34:00,23,Tuesday,1,0,0,0
79486,A24,Minor,Minor Delays,2025-05-20 23:34:00,23,Tuesday,1,0,0,1



## Summary

This notebook adds basic temporal features (`hour`, `weekday`, `is_rush_hour`) to the raw traffic data. These features are useful for capturing daily and weekly congestion patterns. Further feature engineering (e.g lagged congestion, weather, and baseline probabilities) will be applied in subsequent notebooks.
