In [46]:
import pandas as pd
import os

# Rocinante 36 Model

In [47]:
file_path = 'Exhibit+2.xlsx'
R_data = pd.read_excel(file_path, sheet_name='Rocinante Models')
R_data

Unnamed: 0,Cars,"Sales \n(in 1,000 units)",Price\n(in lakh rupees),Mileage\n(Km/ltr),Top speed (Km/hr)
0,Rocinante 1,171.877,6.1,15.8,168.2
1,Rocinante 2,139.796,6.1,12.1,149.6
2,Rocinante 3,178.947,9.9,17.0,173.4
3,Rocinante 4,140.022,5.8,11.6,170.6
4,Rocinante 5,186.476,10.0,17.2,175.0
5,Rocinante 6,192.123,6.5,17.6,173.1
6,Rocinante 7,175.085,5.5,16.0,184.6
7,Rocinante 8,146.882,8.4,13.0,175.7
8,Rocinante 9,202.847,6.6,19.3,166.7
9,Rocinante 10,149.933,8.8,13.3,175.4


In [48]:
from sklearn.model_selection import train_test_split
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import numpy as np

In [49]:
# Prepare the data
X_rocinante = R_data[['Price\n(in lakh rupees)', 'Mileage\n(Km/ltr)', 'Top speed (Km/hr)']]
y_rocinante = R_data['Sales \n(in 1,000 units)']

### Step-1: Split the data into training and testing sets (80% training, 20% testing)

In [50]:
X_train_roc, X_test_roc, y_train_roc, y_test_roc = train_test_split(X_rocinante, y_rocinante, test_size=0.2, random_state=42)

# Initialize and train a Linear Regression model
model_rocinante = LinearRegression()
model_rocinante.fit(X_train_roc, y_train_roc)

# Make predictions on the test set
y_pred_roc = model_rocinante.predict(X_test_roc)


In [51]:
X_train_roc.shape, X_test_roc.shape

((28, 3), (7, 3))

### Step-2: Calculation of RMSE ( Root Mean Squared Error)

In [52]:
# Calculate RMSE for the Rocinante model
rmse_roc = mean_squared_error(y_test_roc, y_pred_roc, squared=False)

# Print the RMSE result
print(f'Root Mean Squared Error(RMSE) for Rocinante Model : {rmse_roc}')

Root Mean Squared Error(RMSE) for Rocinante Model : 1.1436014350493942


#### The Root Mean Squared Error (RMSE) for the Rocinante Model is 1.14. This value indicates that, on average, the model's predictions of sales deviate from the actual sales by about 1.14 units (in thousands).

### Interpretation: 
#### The model's prediction error is relatively small, suggesting that the model is accurately predicting the sales numbers for Rocinante vehicles, with only minor deviations. A lower RMSE generally means a better fit to the data, and in this case, an RMSE of 1.14 shows that the model has a high level of accuracy in predicting the sales performance based on the input features.

### Predicting both training and testing data

In [53]:
y_train_pred = model_rocinante.predict(X_train_roc)
y_test_pred = model_rocinante.predict(X_test_roc)

### Calculation of RMSE for both training and testing data

In [54]:
train_rmse_roc = mean_squared_error(y_train_roc, y_train_pred)
test_rmse_roc = mean_squared_error(y_test_roc, y_test_pred)

In [55]:
print("Training RMSE:", train_rmse_roc)
print("Testing RMSE:", test_rmse_roc)

Training RMSE: 5.577180380803582
Testing RMSE: 1.3078242422470336


### Step-3: Determining whether the model is Overfitting or Underfitting

In [56]:
if train_rmse_roc < test_rmse_roc:
    print("The model may be overfitting.")
elif train_rmse_roc > test_rmse_roc:
    print("The model may be underfitting.")
else:
    print("The model is performing consistently on both training and testing data.")

The model may be underfitting.


## Interpretation
### The model seems to be underfitting the training data. Here's why:

### The Training RMSE (5.58) is significantly higher than the Testing RMSE (1.31).
### In a good model, the Training RMSE and Testing RMSE should be similar, indicating that the model has learned patterns from the data well.
### In this case, a high Training RMSE suggests that the model isn't fitting the training data well enough, which might mean that the model is too simple and isn't capturing all the patterns present in the data.

# Marengo32 Model

In [57]:
M_data = pd.read_excel(file_path, sheet_name='Marengo Models')
M_data

Unnamed: 0,Cars,"Sales \n(in 1,000 units)",Price\n(in lakh rupees),Mileage\n(Km/ltr),Top speed (Km/hr)
0,Marengo 1,20.896,42.5,9.3,199.4
1,Marengo 2,31.048,36.0,9.7,235.2
2,Marengo 3,29.904,54.7,16.6,240.8
3,Marengo 4,28.792,42.7,11.7,232.5
4,Marengo 5,16.776,44.9,13.7,188.8
5,Marengo 6,18.928,35.5,9.6,184.2
6,Marengo 7,22.776,51.3,13.7,207.7
7,Marengo 8,36.824,30.4,12.6,249.5
8,Marengo 9,22.216,38.4,16.2,175.8
9,Marengo 10,35.456,32.2,9.6,245.6


In [58]:
# Prepare the data
X_marengo  = M_data[['Price\n(in lakh rupees)', 'Mileage\n(Km/ltr)', 'Top speed (Km/hr)']]
y_marengo  = M_data['Sales \n(in 1,000 units)']

### Step-1: Split the data into training and testing sets (80% training, 20% testing)

In [59]:
X_train_mar, X_test_mar, y_train_mar, y_test_mar = train_test_split(X_marengo, y_marengo, test_size=0.2, random_state=42)

# Initialize and train a Linear Regression model
model_marengo = LinearRegression()
model_marengo.fit(X_train_mar, y_train_mar)

# Make predictions on the test set
y_pred_mar = model_marengo.predict(X_test_mar)

In [60]:
X_train_mar.shape, X_test_mar.shape

((24, 3), (7, 3))

### Step-2: Calculation of RMSE ( Root Mean Squared Error)

In [61]:
# Calculate RMSE for the Marengo model
rmse_mar = mean_squared_error(y_test_mar, y_pred_mar, squared=False)

# Print the RMSE result
print(f'Root Mean Squared Error(RMSE) for Marengo Model : {rmse_mar}')

Root Mean Squared Error(RMSE) for Marengo Model : 2.538527222170119


#### The Root Mean Squared Error (RMSE) for the Marengo Model is 2.54. This value indicates that, on average, the model's predictions of sales for Marengo vehicles differ from the actual sales figures by about 2.54 units (in thousands).

### Interpretation:
#### The model is making predictions with a typical error of about 2,540 sales units. A lower RMSE generally indicates a better fit to the data, and while an RMSE of 2.54 suggests the model is reasonably accurate, there may still be room for improvement to reduce prediction errors.

### Predicting both training and testing data

In [62]:
y_train_pred = model_marengo.predict(X_train_mar)
y_test_pred = model_marengo.predict(X_test_mar)

### Calculation of RMSE for both training and testing data

In [63]:
train_rmse_mar = mean_squared_error(y_train_mar, y_train_pred)
test_rmse_mar = mean_squared_error(y_test_mar, y_test_pred)

In [64]:
print("Training RMSE:", train_rmse_mar)
print("Testing RMSE:", test_rmse_mar)

Training RMSE: 4.499201616486006
Testing RMSE: 6.44412045769874


### Step-3: Determining whether the model is Overfitting or Underfitting

In [65]:
if train_rmse_mar < test_rmse_mar:
    print("The model may be overfitting.")
elif train_rmse_mar > test_rmse_mar:
    print("The model may be underfitting.")
else:
    print("The model is performing consistently on both training and testing data.")

The model may be overfitting.


## Interpretation

### The model appears to be overfitting. Here's why:

### The Training RMSE (4.50) is significantly lower than the Testing RMSE (6.44).
### In a well-performing model, both Training RMSE and Testing RMSE should be close in value, indicating that the model generalizes well to new data.
### Here, a lower Training RMSE means the model performs well on the training data but struggles with new data, indicated by the higher Testing RMSE.