## Dataset
Dataset has 11 features.

1. **Airline**: The name of the airline company is stored in the airline column. It is a categorical feature having 6 different airlines.
2. **Flight**: Flight stores information regarding the plane's flight code. It is a categorical feature.
3. **Source City**: City from which the flight takes off. It is a categorical feature having 6 unique cities.
4. **Departure Time**: This is a derived categorical feature obtained created by grouping time periods into bins. It stores information about the departure time and have 6 unique time labels.
5. **Stops**: A categorical feature with 3 distinct values that stores the number of stops between the source and destination cities.
6. **Arrival Time**: This is a derived categorical feature created by grouping time intervals into bins. It has six distinct time labels and keeps information about the arrival time.
7. **Destination City**: City where the flight will land. It is a categorical feature having 6 unique cities.
8. **Class**: A categorical feature that contains information on seat class; it has two distinct values: Business and Economy.
9. **Duration**: A continuous feature that displays the overall amount of time it takes to travel between cities in hours.
10. **Days Left**: This is a derived characteristic that is calculated by subtracting the trip date by the booking date.
11. **Price**: Target variable stores information of the ticket price.

In [1]:
# import libraries needed for exploratory data analysis (eda) and feature engineering (fe)
import os
import time
import datetime
import pandas as pd
pd.set_option('display.max_columns',None)
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')

# import libraries needed for model training
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.tree import DecisionTreeRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import RandomForestRegressor,AdaBoostRegressor, ExtraTreesRegressor, BaggingRegressor, GradientBoostingRegressor
from sklearn.svm import SVR
from catboost import CatBoostRegressor
from xgboost import XGBRegressor


In [2]:
pd.set_option('display.max_columns',None) #display all possible columns
for dirname, _, filenames in os.walk('../data'): 
    for filename in filenames:
        print(os.path.join(dirname, filename)) #list all files in the data directory

../data\business.csv
../data\clean_dataset.csv
../data\economy.csv


In [3]:
df=pd.read_csv('../data/clean_dataset.csv') #load data into dataframe
df.head(5) #display head (top 5 rows)

Unnamed: 0.1,Unnamed: 0,airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
0,0,SpiceJet,SG-8709,Delhi,Evening,zero,Night,Mumbai,Economy,2.17,1,5953
1,1,SpiceJet,SG-8157,Delhi,Early_Morning,zero,Morning,Mumbai,Economy,2.33,1,5953
2,2,AirAsia,I5-764,Delhi,Early_Morning,zero,Early_Morning,Mumbai,Economy,2.17,1,5956
3,3,Vistara,UK-995,Delhi,Morning,zero,Afternoon,Mumbai,Economy,2.25,1,5955
4,4,Vistara,UK-963,Delhi,Morning,zero,Morning,Mumbai,Economy,2.33,1,5955


In [4]:
df.tail(5) #display tail (last 5 rows)

Unnamed: 0.1,Unnamed: 0,airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
300148,300148,Vistara,UK-822,Chennai,Morning,one,Evening,Hyderabad,Business,10.08,49,69265
300149,300149,Vistara,UK-826,Chennai,Afternoon,one,Night,Hyderabad,Business,10.42,49,77105
300150,300150,Vistara,UK-832,Chennai,Early_Morning,one,Night,Hyderabad,Business,13.83,49,79099
300151,300151,Vistara,UK-828,Chennai,Early_Morning,one,Evening,Hyderabad,Business,10.0,49,81585
300152,300152,Vistara,UK-822,Chennai,Morning,one,Evening,Hyderabad,Business,10.08,49,81585


In [5]:
print(f"Shape: ",df.shape) #get total shape of dataset, total rows and columns
print("Number of Columns:", df.shape[1])
print("Number of Rows:", df.shape[0])

Shape:  (300153, 12)
Number of Columns: 12
Number of Rows: 300153


In [6]:
df.info() #quick info about data

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300153 entries, 0 to 300152
Data columns (total 12 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   Unnamed: 0        300153 non-null  int64  
 1   airline           300153 non-null  object 
 2   flight            300153 non-null  object 
 3   source_city       300153 non-null  object 
 4   departure_time    300153 non-null  object 
 5   stops             300153 non-null  object 
 6   arrival_time      300153 non-null  object 
 7   destination_city  300153 non-null  object 
 8   class             300153 non-null  object 
 9   duration          300153 non-null  float64
 10  days_left         300153 non-null  int64  
 11  price             300153 non-null  int64  
dtypes: float64(1), int64(3), object(8)
memory usage: 27.5+ MB


In [7]:
df.describe().transpose() #statistics for numerical datatypes

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Unnamed: 0,300153.0,150076.0,86646.852011,0.0,75038.0,150076.0,225114.0,300152.0
duration,300153.0,12.221021,7.191997,0.83,6.83,11.25,16.17,49.83
days_left,300153.0,26.004751,13.561004,1.0,15.0,26.0,38.0,49.0
price,300153.0,20889.660523,22697.767366,1105.0,4783.0,7425.0,42521.0,123071.0


In [8]:
df.drop('Unnamed: 0',axis=1, inplace = True) #drop unwanted column permanently

In [9]:
df.isna().sum() #number of missing values per column

airline             0
flight              0
source_city         0
departure_time      0
stops               0
arrival_time        0
destination_city    0
class               0
duration            0
days_left           0
price               0
dtype: int64

In [10]:
df.dropna() #drop rows with any NA values

Unnamed: 0,airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
0,SpiceJet,SG-8709,Delhi,Evening,zero,Night,Mumbai,Economy,2.17,1,5953
1,SpiceJet,SG-8157,Delhi,Early_Morning,zero,Morning,Mumbai,Economy,2.33,1,5953
2,AirAsia,I5-764,Delhi,Early_Morning,zero,Early_Morning,Mumbai,Economy,2.17,1,5956
3,Vistara,UK-995,Delhi,Morning,zero,Afternoon,Mumbai,Economy,2.25,1,5955
4,Vistara,UK-963,Delhi,Morning,zero,Morning,Mumbai,Economy,2.33,1,5955
...,...,...,...,...,...,...,...,...,...,...,...
300148,Vistara,UK-822,Chennai,Morning,one,Evening,Hyderabad,Business,10.08,49,69265
300149,Vistara,UK-826,Chennai,Afternoon,one,Night,Hyderabad,Business,10.42,49,77105
300150,Vistara,UK-832,Chennai,Early_Morning,one,Night,Hyderabad,Business,13.83,49,79099
300151,Vistara,UK-828,Chennai,Early_Morning,one,Evening,Hyderabad,Business,10.00,49,81585


In [11]:
print("Number of Duplicates: ", df.duplicated().sum())

Number of Duplicates:  0


In [12]:
df.drop_duplicates() #drop rows with duplicate vales

Unnamed: 0,airline,flight,source_city,departure_time,stops,arrival_time,destination_city,class,duration,days_left,price
0,SpiceJet,SG-8709,Delhi,Evening,zero,Night,Mumbai,Economy,2.17,1,5953
1,SpiceJet,SG-8157,Delhi,Early_Morning,zero,Morning,Mumbai,Economy,2.33,1,5953
2,AirAsia,I5-764,Delhi,Early_Morning,zero,Early_Morning,Mumbai,Economy,2.17,1,5956
3,Vistara,UK-995,Delhi,Morning,zero,Afternoon,Mumbai,Economy,2.25,1,5955
4,Vistara,UK-963,Delhi,Morning,zero,Morning,Mumbai,Economy,2.33,1,5955
...,...,...,...,...,...,...,...,...,...,...,...
300148,Vistara,UK-822,Chennai,Morning,one,Evening,Hyderabad,Business,10.08,49,69265
300149,Vistara,UK-826,Chennai,Afternoon,one,Night,Hyderabad,Business,10.42,49,77105
300150,Vistara,UK-832,Chennai,Early_Morning,one,Night,Hyderabad,Business,13.83,49,79099
300151,Vistara,UK-828,Chennai,Early_Morning,one,Evening,Hyderabad,Business,10.00,49,81585


In [13]:
df.nunique() #number of unique values in each column

airline                 6
flight               1561
source_city             6
departure_time          6
stops                   3
arrival_time            6
destination_city        6
class                   2
duration              476
days_left              49
price               12157
dtype: int64

In [14]:
df.columns #show all cloumns

Index(['airline', 'flight', 'source_city', 'departure_time', 'stops',
       'arrival_time', 'destination_city', 'class', 'duration', 'days_left',
       'price'],
      dtype='object')

In [15]:
numerical_features = [feature for feature in df.columns if df[feature].dtype != 'O']
categorical_features = [feature for feature in df.columns if df[feature].dtype == 'O']

print('Numerical Features : {} : {}'.format(len(numerical_features), numerical_features))
print('Categorical Features : {} : {}'.format(len(categorical_features), categorical_features))


Numerical Features : 3 : ['duration', 'days_left', 'price']
Categorical Features : 8 : ['airline', 'flight', 'source_city', 'departure_time', 'stops', 'arrival_time', 'destination_city', 'class']


In [16]:
#get unique values in categorical columns
for column in categorical_features:
    unique_values = df[column].unique()
    print(f"Unique values in column '{column}': {unique_values}")

Unique values in column 'airline': ['SpiceJet' 'AirAsia' 'Vistara' 'GO_FIRST' 'Indigo' 'Air_India']
Unique values in column 'flight': ['SG-8709' 'SG-8157' 'I5-764' ... '6E-7127' '6E-7259' 'AI-433']
Unique values in column 'source_city': ['Delhi' 'Mumbai' 'Bangalore' 'Kolkata' 'Hyderabad' 'Chennai']
Unique values in column 'departure_time': ['Evening' 'Early_Morning' 'Morning' 'Afternoon' 'Night' 'Late_Night']
Unique values in column 'stops': ['zero' 'one' 'two_or_more']
Unique values in column 'arrival_time': ['Night' 'Morning' 'Early_Morning' 'Afternoon' 'Evening' 'Late_Night']
Unique values in column 'destination_city': ['Mumbai' 'Bangalore' 'Kolkata' 'Hyderabad' 'Chennai' 'Delhi']
Unique values in column 'class': ['Economy' 'Business']


In [17]:
df.info() #quick info about data

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300153 entries, 0 to 300152
Data columns (total 11 columns):
 #   Column            Non-Null Count   Dtype  
---  ------            --------------   -----  
 0   airline           300153 non-null  object 
 1   flight            300153 non-null  object 
 2   source_city       300153 non-null  object 
 3   departure_time    300153 non-null  object 
 4   stops             300153 non-null  object 
 5   arrival_time      300153 non-null  object 
 6   destination_city  300153 non-null  object 
 7   class             300153 non-null  object 
 8   duration          300153 non-null  float64
 9   days_left         300153 non-null  int64  
 10  price             300153 non-null  int64  
dtypes: float64(1), int64(2), object(8)
memory usage: 25.2+ MB


In [18]:
df.describe().transpose() #statistics for numerical datatypes

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
duration,300153.0,12.221021,7.191997,0.83,6.83,11.25,16.17,49.83
days_left,300153.0,26.004751,13.561004,1.0,15.0,26.0,38.0,49.0
price,300153.0,20889.660523,22697.767366,1105.0,4783.0,7425.0,42521.0,123071.0


In [19]:
x = df.drop(columns=['price'],axis=1) #dataframe contains all cloumns which shold be used to predicted
y=df['price'] #series contains to be predicted


In [20]:
print(x.head())
print(type(x)) #datatype of x

    airline   flight source_city departure_time stops   arrival_time  \
0  SpiceJet  SG-8709       Delhi        Evening  zero          Night   
1  SpiceJet  SG-8157       Delhi  Early_Morning  zero        Morning   
2   AirAsia   I5-764       Delhi  Early_Morning  zero  Early_Morning   
3   Vistara   UK-995       Delhi        Morning  zero      Afternoon   
4   Vistara   UK-963       Delhi        Morning  zero        Morning   

  destination_city    class  duration  days_left  
0           Mumbai  Economy      2.17          1  
1           Mumbai  Economy      2.33          1  
2           Mumbai  Economy      2.17          1  
3           Mumbai  Economy      2.25          1  
4           Mumbai  Economy      2.33          1  
<class 'pandas.core.frame.DataFrame'>


In [21]:
print(y.head())
print(type(y)) #datatype of y

0    5953
1    5953
2    5956
3    5955
4    5955
Name: price, dtype: int64
<class 'pandas.core.series.Series'>


## Data Encoding & Feature Scaling

**Encoding** is transform **categorical data** into numerical representations which can be understood by machine learning algotihhms.
Common types of Encoding :
1. One-Hot Encoding (OHE)
   Good for categories with no inherent prder or relationship. Each category is represented as a binary vector. This is most widely used technique.
2. Label Encoding
   Suitable for dataset with two distinct categories (eg size of t-shirt), each categories are assigned integer values.
3. Ordinal Encoding
   Similar to label encoding however the explicit mapping can be provided for integer assignments. (eg education degree)

Scaling is used to improve the consistency of numerical features. StandardScaler is the most common type of scaling applied to numerical features.

Standardization is a data preparation method that involves adjusting the input (features) by first centering them (subtracting the mean from each data point) and then dividing them by the standard deviation, resulting in the data having a mean of 0 and a standard deviation of 1.

**StandardScaler** is used to standardize the input data in a way that ensures that the data points have a balanced scale, which is crucial for machine learning algorithms, especially those that are sensitive to differences in feature scales.


**ColumnTransformer** allows different features of the input to be transformed separately and the features generated by each transformer will be concatenated to form a single feature space.
Example we will apply onehot encoding to categorical features and standard scaler to numerical features

In [22]:
from sklearn.preprocessing import OneHotEncoder, StandardScaler
from sklearn.compose import ColumnTransformer

numerical_features = x.select_dtypes(exclude="object").columns
categorical_features = x.select_dtypes(include="object").columns

numerical_transformer = StandardScaler()
ohe_transformer = OneHotEncoder()

preprocessor = ColumnTransformer(
    [
        ("OneHotEncoder", ohe_transformer, categorical_features),
         ("StandardScaler", numerical_transformer, numerical_features),
    ]
)
X = preprocessor.fit_transform(x)   #pre-processing source data x data and saving in X 
print(f"Shape of original data (x): {x.shape}")
print(f"Shape of transformed data (X): {X.shape}")


Shape of original data (x): (300153, 10)
Shape of transformed data (X): (300153, 1598)


## Split Training & Test Data

- Data needs to be split into training and test.Refer https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
- x is original dataset of independent features
- X is encoded dataset of independent features
- y is dependent data which needs to be predicted
- Training dataset is applied with fit_transform()
- Test dataset is applied with transform()
- The fit() method is calculating the mean and variance of each of the features present in our data. 
- The transform() method is transforming all the features using the respective mean and variance.
- The fit_transform() method is used on the training data so that we can scale the training data and also learn the scaling parameters of that data.
- To avoid any bias the test data is nto applied with fit and only transform.  


In [23]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.2,random_state=None) #using 20% to test and 80% for training.
print(f"Shape of training data : {X_train.shape}")
print(f"Shape of test data : {X_test.shape}")

Shape of training data : (240122, 1598)
Shape of test data : (60031, 1598)


## Regression Model Performance Metrics

### MAE (Mean Absolute Error)
The MAE value itself indicates the average absolute error between predicted and actual values. The smaller the MAE, the better the model’s predictions align with the actual data.

### MSE (Mean Squared Error)
Mean squared error (MSE) measures the amount of error in statistical models. It assesses the average squared difference between the observed and predicted values. When a model has no error, the MSE equals zero. As model error increases, its value increases. The mean squared error is also known as the mean squared deviation (MSD).

### RMSE (Root Mean Square Error)
The root mean square error (RMSE) measures the average difference between a statistical model’s predicted values and the actual values. Mathematically, it is the standard deviation of the residuals. Residuals represent the distance between the regression line and the data points.Use the root mean square error to assess the amount of error in a regression or other statistical model. A value of 0 means that the predicted values perfectly match the actual values, but you’ll never see that in practice. Low RMSE values indicate that the model fits the data well and has more precise predictions. 


### R-Squared (R²)
R-Squared (R²) is a statistical measure used to determine the proportion of variance in a dependent variable that can be predicted or explained by an independent variable.
In other words, R-Squared shows how well a regression model (independent variable) predicts the outcome of observed data (dependent variable).
R-Squared is also commonly known as the coefficient of determination. It is a goodness of fit model for linear regression analysis.Higher R-squared values suggest a better fit, but it doesn’t necessarily mean the model is a good predictor in an absolute sense.

### Adjusted R-Squared (R²)
Adjusted R-squared addresses a limitation of Adjusted R Squared, especially in multiple regression (models with more than one independent variable). Adjusted R-squared vs adjusted r squared penalizes the addition of unnecessary variables.

In [24]:
#Initialise dataframe for Regression Performace Metrics
performance_metrics={
    'Model Name':[], 
    'MAE':[] ,
#    'MSE':[] ,
    'RMSE':[] ,
    'R2 Score':[],
    'Adjusted R2 Score':[] ,
    'Training Duration':[],
    'Predection Duration':[],
    'Evaluation Duration':[]
    }
df_ModelPerformance=pd.DataFrame(performance_metrics)
print(type(df_ModelPerformance))
df_ModelPerformance.head()

<class 'pandas.core.frame.DataFrame'>


Unnamed: 0,Model Name,MAE,RMSE,R2 Score,Adjusted R2 Score,Training Duration,Predection Duration,Evaluation Duration


In [25]:
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error, root_mean_squared_error

#Define a function to evaluate model
def evaluate_model(true, predicted):
    mae = mean_absolute_error(true, predicted)
    mse = mean_squared_error(true, predicted)
    rmse = root_mean_squared_error(true, predicted)
    r2_square = r2_score(true, predicted)
    return mae, mse, rmse, r2_square

In [26]:
#Define Models

models = {
    "Linear": LinearRegression(),
    "Lasso": Lasso(alpha=0.1),
    "Ridge": Ridge(),
    "Bagging": BaggingRegressor(),
    "ExtraTrees": BaggingRegressor(),
    "SVR": SVR(),
    "K-Neighbors": KNeighborsRegressor(n_neighbors=5),
    "Random Forest": RandomForestRegressor(),
    "Decision Tree": DecisionTreeRegressor(),
    "XGBRegressor": XGBRegressor(), 
    "CatBoosting": CatBoostRegressor(verbose=False),
    "AdaBoost": AdaBoostRegressor()
}

for key, value in models.items():
    model_name = key
    model = value
    test_performance_metrics = {}

    print('-'*80)
    
    t1=time.time()
    print(f'{datetime.datetime.fromtimestamp(t1).strftime("%Y-%m-%d %H:%M:%S")} - {model_name} - performing training')
    model.fit(X_train, y_train) # Training the Model with training dataset

    # Predicting Values of test dataset
    
    t2=time.time()
    print(f'{datetime.datetime.fromtimestamp(t2).strftime("%Y-%m-%d %H:%M:%S")} - {model_name} - predecting training dataset')
    y_train_pred = model.predict(X_train)
    
    t3=time.time()
    print(f'{datetime.datetime.fromtimestamp(t3).strftime("%Y-%m-%d %H:%M:%S")} - {model_name} - predecting test dataset')
    y_test_pred = model.predict(X_test)
    
    # Evaluating Model Performance
    
    t4=time.time()
    print(f'{datetime.datetime.fromtimestamp(t4).strftime("%Y-%m-%d %H:%M:%S")} - {model_name} - evaluating performance of training dataset')
    model_train_mae ,model_train_mse, model_train_rmse, model_train_r2 = evaluate_model(y_train, y_train_pred)
    
    
    t5=time.time()
    print(f'{datetime.datetime.fromtimestamp(t5).strftime("%Y-%m-%d %H:%M:%S")} - {model_name} - evaluating performance of test dataset')
    model_test_mae ,model_test_mse, model_test_rmse, model_test_r2 = evaluate_model(y_test, y_test_pred)    
    
    t6=time.time()
    model_train_adjusted_r2 = (1 - (1-model_train_r2)*(len(y)-1)/(len(y)-x.shape[1]-1))
    model_train_mae = round(model_train_mae,2)
    #model_train_mse = round(model_train_mse,2)
    model_train_rmse = round(model_train_rmse,2)
    model_train_r2 = round(model_train_r2,2)
    model_train_adjusted_r2 = round(model_train_adjusted_r2,2)
    model_train_duration = round(float(t2-t1),2)
    model_train_pred_duration = round(float(t3-t2),2)
    model_train_eval_duration = round(float(t5-t4),2)

    model_test_adjusted_r2 = (1 - (1-model_test_r2)*(len(y)-1)/(len(y)-x.shape[1]-1))
    model_test_mae = round(model_test_mae,2)
    #model_test_mse = round(model_test_mse,2)
    model_test_rmse = round(model_test_rmse,2)
    model_test_r2 = round(model_test_r2,2)
    model_test_adjusted_r2 = round(model_test_adjusted_r2,2)
    model_test_duration = round(float(0),2)
    model_test_pred_duration = round(float(t4-t3),2)
    model_test_eval_duration = round(float(t6-t5),2)
    
    
    train_performance_metrics=pd.DataFrame({'Model Name':f'{model_name} (Train)', 
                                        'MAE':[model_train_mae] ,
                                        #'MSE':[model_train_mse] ,
                                        'RMSE':[model_train_rmse] ,
                                        'R2 Score':[model_train_r2],
                                        'Adjusted R2 Score':[model_train_adjusted_r2],
                                        'Training Duration':[model_train_duration],
                                        'Predection Duration':[model_train_pred_duration],
                                        'Evaluation Duration':[model_train_eval_duration]
                                        })

    test_performance_metrics=pd.DataFrame({'Model Name':f'{model_name} (Test)', 
                                        'MAE':[model_test_mae] ,
                                        #'MSE':[model_test_mse] ,
                                        'RMSE':[model_test_rmse] ,
                                        'R2 Score':[model_test_r2],
                                        'Adjusted R2 Score':[model_test_adjusted_r2],
                                         'Training Duration':[model_test_duration],
                                        'Predection Duration':[model_test_pred_duration],
                                        'Evaluation Duration':[model_test_eval_duration]
                                        })

    df_ModelPerformance = pd.concat([train_performance_metrics,df_ModelPerformance], ignore_index=True)
    df_ModelPerformance = pd.concat([test_performance_metrics,df_ModelPerformance], ignore_index=True)
print('-'*80)

--------------------------------------------------------------------------------
2024-09-27 18:16:04 - Linear - performing training
2024-09-27 18:16:09 - Linear - predecting training dataset
2024-09-27 18:16:09 - Linear - predecting test dataset
2024-09-27 18:16:09 - Linear - evaluating performance of training dataset
2024-09-27 18:16:09 - Linear - evaluating performance of test dataset
--------------------------------------------------------------------------------
2024-09-27 18:16:09 - Lasso - performing training
2024-09-27 18:22:20 - Lasso - predecting training dataset
2024-09-27 18:22:20 - Lasso - predecting test dataset
2024-09-27 18:22:20 - Lasso - evaluating performance of training dataset
2024-09-27 18:22:20 - Lasso - evaluating performance of test dataset
--------------------------------------------------------------------------------
2024-09-27 18:22:20 - Ridge - performing training
2024-09-27 18:22:21 - Ridge - predecting training dataset
2024-09-27 18:22:21 - Ridge - predec

In [60]:
pd.set_option('display.max_columns',None)
df_ModelPerformance
#filepath = f'../outputs/{time.strftime("%Y%m%d_%H%M%S")}_ModelPerformance.csv'
#df_ModelPerformance.to_csv(filepath)  
#df_ModelPerformance.drop(df_ModelPerformance.tail(1).index,inplace=True)

Unnamed: 0,Model Name,MAE,RMSE,R2 Score,Adjusted R2 Score,Training Duration,Predection Duration,Evaluation Duration
0,AdaBoost (Test),3695.92,5797.02,0.93,0.93,0.0,0.11,0.0
1,AdaBoost (Train),3728.14,5837.92,0.93,0.93,9.49,0.46,0.01
2,CatBoosting (Test),1793.4,3110.6,0.98,0.98,0.0,0.17,0.0
3,CatBoosting (Train),1777.7,3076.7,0.98,0.98,32.95,0.62,0.03
4,XGBRegressor (Test),1962.24,3398.99,0.98,0.98,0.0,0.09,0.0
5,XGBRegressor (Train),1937.71,3355.04,0.98,0.98,3.82,0.26,0.01
6,Decision Tree (Test),885.53,2924.46,0.98,0.98,0.0,0.03,0.0
7,Decision Tree (Train),11.04,221.15,1.0,1.0,42.06,0.1,0.02
8,Random Forest (Test),859.79,2341.46,0.99,0.99,0.0,2.19,0.0
9,Random Forest (Train),320.68,892.13,1.0,1.0,3292.77,10.43,0.02
