Ensemble Methods for `clean_business_df` and `clean_economy_df`
- Bagging and Pasting
- Random Forest

In [7]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import cross_val_score

In [8]:
business_df = pd.read_csv('../../data/clean/clean_business_df.csv')
economy_df = pd.read_csv('../../data/clean/clean_economy_df.csv')

In [9]:
business_df.head()

Unnamed: 0,flight_date,airline_name,flight_code,departure_time,departure_city,arrival_time,arrival_city,flight_duration,stops,price,departure_time_group,arrival_time_group
0,2022-02-11,Air India,AI-868,18:00,Delhi,20:00,Mumbai,120,0,25612,Evening,Evening
1,2022-02-11,Air India,AI-624,19:00,Delhi,21:15,Mumbai,135,0,25612,Evening,Night
2,2022-02-11,Air India,AI-531,20:00,Delhi,20:45,Mumbai,1485,1,42220,Evening,Evening
3,2022-02-11,Air India,AI-839,21:25,Delhi,23:55,Mumbai,1590,1,44450,Night,Night
4,2022-02-11,Air India,AI-544,17:15,Delhi,23:55,Mumbai,400,1,46690,Afternoon,Night


In [10]:
economy_df.head()

Unnamed: 0,flight_date,airline_name,flight_code,departure_time,departure_city,arrival_time,arrival_city,flight_duration,stops,price,departure_time_group,arrival_time_group
0,2022-02-11,SpiceJet,SG-8709,18:55,Delhi,21:05,Mumbai,130,0,5953,Evening,Night
1,2022-02-11,SpiceJet,SG-8157,06:20,Delhi,08:40,Mumbai,140,0,5953,Morning,Morning
2,2022-02-11,Air Asia,I5-764,04:25,Delhi,06:35,Mumbai,130,0,5956,Early Morning,Morning
3,2022-02-11,Vistara,UK-995,10:20,Delhi,12:35,Mumbai,135,0,5955,Morning,Afternoon
4,2022-02-11,Vistara,UK-963,08:50,Delhi,11:10,Mumbai,140,0,5955,Morning,Morning


### Data Preprocessing
- Convert 'flight_date' to datetime to extract relevant time features
- Encode categorical features
- Define Features (X) and Target (y)
- Split Data into Training and Testing sets

In [12]:
print(business_df.dtypes)

flight_date             object
airline_name            object
flight_code             object
departure_time          object
departure_city          object
arrival_time            object
arrival_city            object
flight_duration          int64
stops                    int64
price                    int64
departure_time_group    object
arrival_time_group      object
dtype: object


In [13]:
print(economy_df.dtypes)

flight_date             object
airline_name            object
flight_code             object
departure_time          object
departure_city          object
arrival_time            object
arrival_city            object
flight_duration          int64
stops                    int64
price                    int64
departure_time_group    object
arrival_time_group      object
dtype: object


Convert 'flight_date' to datetime and extract relevant time features

In [14]:
# business
business_df['flight_date'] = pd.to_datetime(business_df['flight_date'])
business_df['departure_hour'] = business_df['departure_time'].apply(lambda x: int(x.split(':')[0]))
business_df['arrival_hour'] = business_df['arrival_time'].apply(lambda x: int(x.split(':')[0]))

In [15]:
# economy
economy_df['flight_date'] = pd.to_datetime(economy_df['flight_date'])
economy_df['departure_hour'] = economy_df['departure_time'].apply(lambda x: int(x.split(':')[0]))
economy_df['arrival_hour'] = economy_df['arrival_time'].apply(lambda x: int(x.split(':')[0]))

Encode categorical features