## 1. Business Understanding (Stakeholders input)
- Goal
- Hypothesis
- Analytical Questions
- More Information about the project if applicable
## 2. Data Understanding
- Load Dataset
- Clean dataset
- EDA (info, describe, duplicates, appropriate columns, check for uniques values) - Univariate (Histogram, check for outliers, calculate skweness, density plots, etc) - Bivariate Analysis (Datatypes, correlation heatmap, violin plot, Pair plots, etc) - Multivariate Analysis (PCA) - Further analysis
- Answer Analytical Questions
- Test Hypothesis
## 3. Data Preparation
## 4. Modelling & Evaluation


# Business Understanding

### Objective:
The primary goal of this project is to accurately predict the estimated time of arrival (ETA) for Yassir trips. This will enhance the reliability of Yassir's services, potentially increasing customer satisfaction and retention while optimizing resource allocation and cost management.

### Stakeholders:

- **Customers:** Require reliable and accurate ETAs to plan their journeys better.
- **Drivers:** Benefit from improved route planning and time management.
- **Yassir Management:** Needs accurate ETA predictions to improve service efficiency, resource allocation, and customer satisfaction.

### Success Criteria:

- **Operational Efficiency:** Better resource management and reduced operational costs.
- **Customer Experience:** Enhanced satisfaction due to accurate ETA predictions.
- **Market Competitiveness:** Improved reliability can make Yassir more attractive compared to competitors.

### Business Questions
1. How do weather conditions affect the ETA of Yassir trips?
   Understanding the influence of factors like temperature, rainfall, and wind speed on travel times can help in more accurate ETA predictions.
2. What is the impact of trip distance on ETA accuracy?
   Investigating whether longer or shorter trips have more variance in ETA predictions can help refine the model.
3. How do different times of the day affect ETA predictions?
   Analyzing time-based patterns (e.g., rush hours vs. non-rush hours) can help improve the predictive model.

### Hypothesis

Null Hypothesis: The ETA for Yassir trips is significantly influenced by weather conditions, particularly rainfall and wind speed, trip distance, and the time of day.

# Data Understanding

In [None]:
# Import the Necessary Packages
# Other packages
import os

import pandas as pd

print("🛬 Imported all packages.", "Warnings hidden. 👻")

### Data Reading

In [None]:
BASE_DIR = '../'
ENV_FILE = os.path.join(BASE_DIR, '.env')
TEST_FILE = os.path.join(BASE_DIR, 'Data/Test.csv')
TRAIN_FILE = os.path.join(BASE_DIR, 'Data/Train.csv')
WEATHER_FILE = os.path.join(BASE_DIR, 'Data/Weather.csv')
MODELS = os.path.join(BASE_DIR, 'models/')

In [None]:
# Date columns to parse
parse_dates = ['Timestamp']

# Load CSV files into the Notebook
train_df = pd.read_csv(TRAIN_FILE, parse_dates=parse_dates)

test_df =pd.read_csv(TEST_FILE, parse_dates=parse_dates)

weather_df = pd.read_csv(WEATHER_FILE)

### Data Cleaning

- Standardize column names- use snake case

In [None]:
train_df.columns = [col.lower() for col in train_df.columns] # Train

test_df.columns = [col.lower() for col in test_df.columns] # Test

### Exploratory Data Analysis

In [None]:
test_df.head()

In [None]:
test_df.info()

In [None]:
train_df.head()

In [None]:
weather_df.head()

In [None]:
train_df.info()

In [None]:
#Checking for missing Values
train_df.isna().sum()

Cleaning the Timestamp column and changing the data type

In [None]:

# # Function to clean the timestamp
# def clean_timestamp(Timestamp):
#     return Timestamp.replace('T', ' ').replace('Z', '')

# # # Function to extract date and time
# # def extract_date_time(Timestamp):
# #     date_time_str = Timestamp.replace('T', ' ').replace('Z', '')
# #     date, time = date_time_str.split(' ')
# #     return date, time

# # Apply the function to clean the 'timestamp' column
# train_df['timestamp'] = train_df['Timestamp'].apply(clean_timestamp)

# # # Apply the function and create two new columns
# # train_df[['date', 'time']] = train_df['Timestamp'].apply(lambda x: pd.Series(extract_date_time(x)))

# print(train_df)


In [None]:
train_df.head()

### Convert seconds to hours, minutes, and second

In [None]:
# Function to convert seconds to hours, minutes, and seconds
def convert_seconds(ETA):
    hours = ETA // 3600
    minutes = (ETA % 3600) // 60
    seconds = ETA % 60
    return f"{hours}h {minutes}m {seconds}s"

# Apply the function to the 'ETA' column and create a new column 'time_taken'
train_df['time_taken'] = train_df['ETA'].apply(convert_seconds)

train_df


### Function to convert meters to kilometers

In [None]:
# Function to convert meters to kilometers
def convert_to_km(Trip_distance):
    return Trip_distance / 1000

# Apply the function to the 'trip distance' column and create a new column 'Distance_KM'
train_df['Distance_KM'] = train_df['Trip_distance'].apply(convert_to_km)

train_df

In [None]:
# Apply the function to the test data
test_df['Distance_KM'] = test_df['Trip_distance'].apply(convert_to_km)

test_df

# Data Preparation

# Modeling and Evaluation

# Deployment