# Coffee Shop Revenue Prediction with Random Forest Regressor

In this notebook, we'll build a machine learning model to predict coffee shop revenue using a Random Forest Regressor. We'll follow these steps:

1. Data Loading and EDA
2. Feature Engineering
3. Model Training with Hyperparameter Tuning
4. Model Evaluation
5. Saving the Model

In [1]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

## 1. Data Loading and EDA

In [2]:
df = pd.read_csv('coffee_shop_revenue.csv')

In [3]:
df.head()

Unnamed: 0,Number_of_Customers_Per_Day,Average_Order_Value,Operating_Hours_Per_Day,Number_of_Employees,Marketing_Spend_Per_Day,Location_Foot_Traffic,Daily_Revenue
0,152,6.74,14,4,106.62,97,1547.81
1,485,4.5,12,8,57.83,744,2084.68
2,398,9.09,6,6,91.76,636,3118.39
3,320,8.48,17,4,462.63,770,2912.2
4,156,7.44,17,2,412.52,232,1663.42


In [4]:
df.info

In [5]:
df.isnull().sum()

Unnamed: 0,0
Number_of_Customers_Per_Day,0
Average_Order_Value,0
Operating_Hours_Per_Day,0
Number_of_Employees,0
Marketing_Spend_Per_Day,0
Location_Foot_Traffic,0
Daily_Revenue,0


In [6]:
df.duplicated().sum()

np.int64(0)

## 2. Feature Engineering

In [7]:
X = df.drop('Daily_Revenue',axis=1)
Y = df['Daily_Revenue']
# y = df.drop['Daily_Revenue']

In [8]:
X.shape

(2000, 6)

In [9]:
Y.shape

(2000,)

## 3. Model Training (Random Forest Regressor)

In [10]:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)



In [11]:
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()

In [12]:
from sklearn.model_selection import GridSearchCV
param_grid = {
    'n_estimators': [100,200],
    'max_depth': [ 10, 20,None],
    'min_samples_split': [2, 5, ],
    'min_samples_leaf': [1, 2, ]
}

In [13]:
ge_search = GridSearchCV(
    estimator=model,
    param_grid=param_grid,
    n_jobs = -1,
    verbose=1
    )


In [14]:
ge_search.fit(X_train, Y_train)

Fitting 5 folds for each of 24 candidates, totalling 120 fits


In [15]:
y_pred = ge_search.predict(X_test)

4 . Model Evaluation



In [16]:
from sklearn.metrics import r2_score
r2_score(Y_test, y_pred)

0.9476893259739528

In [17]:
import pickle as pk

In [18]:
with open ('model.pkl','wb') as fs:
    pk.dump(ge_search,fs)

## 5. Save the Trained Model