# Multiple Linear Regression on Toy Sales dataset

## Importing the libraries

In [13]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

## Importing the dataset

To make things simple, we usually structure the dataset so that the target variable column is the last column in the table

*  **X typically denotes the feature variables**, which are all the columns in the table except the last one
* **y typically denotes the single target variable**, which is the last column in the table

In [14]:
dataset = pd.read_csv('Toy-Sales.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

In [15]:
print ("Feature variables of the entire dataset")
print(X)

Feature variables of the entire dataset
[[ 8.75 50.04 61.13]
 [ 8.99 50.74 60.19]
 [ 7.5  50.14 59.16]
 [ 7.25 50.27 60.38]
 [ 7.4  51.25 59.71]
 [ 8.5  50.65 59.88]
 [ 8.4  50.87 60.14]
 [ 7.9  50.15 60.08]
 [ 7.25 48.24 59.9 ]
 [ 8.7  50.19 59.68]
 [ 8.4  51.11 59.83]
 [ 8.1  51.49 59.77]
 [ 8.4  50.1  59.29]
 [ 7.4  49.24 60.4 ]
 [ 8.   50.04 59.89]
 [ 8.3  49.46 60.06]
 [ 8.1  51.62 60.51]
 [ 8.2  49.78 58.93]
 [ 8.99 48.6  60.09]
 [ 7.99 49.   61.  ]
 [ 8.5  48.   59.  ]
 [ 7.9  54.   59.5 ]
 [ 7.99 48.7  58.  ]
 [ 8.25 50.   60.5 ]]


In [16]:
print ("Target variable of the entire dataset")
print (y)

Target variable of the entire dataset
[73959 71544 78587 80364 78771 71986 74885 73345 76659 71880 73598 74893
 69003 78542 72543 74247 76253 72582 69022 76200 69701 77005 70987 75643]


## Training the Multiple Linear Regression model on the entire dataset

In [17]:
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X, y)

## The model parameters are the coefficients and the intercept
### Determining the coefficients 

In [18]:
coef_array = regressor.coef_
print ("Coefficient for the first feature variable: Price($) is ",coef_array[0])
print ("Coefficient for the second feature variable: AdExp ($000) is ",coef_array[1])
print ("Coefficient for the third feature variable: PromExp ($000) is ",coef_array[2])


Coefficient for the first feature variable: Price($) is  -5055.269865920846
Coefficient for the second feature variable: AdExp ($000) is  648.6121402597208
Coefficient for the third feature variable: PromExp ($000) is  1802.6109561246017


### Determining the intercept

In [19]:
print ("Intercept for the regression model is ", regressor.intercept_)

Intercept for the regression model is  -25096.83292187014


## Generating predictions for 3 different scenarios

### Each scenario involves a different combination of values for the feature variables

In [20]:
scenario1 = [9.10, 52.00, 61.00]
scenario2 = [7.10, 48.00, 57.00]
scenario3 = [8.10, 50.00, 60.00]
scenarios = [scenario1, scenario2, scenario3]
print (scenarios)

[[9.1, 52.0, 61.0], [7.1, 48.0, 57.0], [8.1, 50.0, 60.0]]


In [21]:
scenario_pred = regressor.predict(scenarios)
for index, prediction in enumerate(scenario_pred):
    print (f"Prediction for scenario {index+1} is {prediction}")


Prediction for scenario 1 is 72587.31091535634
Prediction for scenario 2 is 72892.95826166074
Prediction for scenario 3 is 74542.74554463314
