# Multiple Linear Regression


```
y = b0 + b1*x1 + b2*x2 + b3*x3 ...
```

Feature scaling is not needed for multiple linear regression.

#### Dummy variable trap

Categorical variables (e.g city/country string in data which are converted to 0s and 1s columns) are considered as dummy variables for multiple linear regressions.

The rule is to **never adding all** of the dummy variables int equation,  **always omit one**.

## Importing the libraries


In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


## Importing the dataset


In [None]:
dataset = pd.read_csv("50_startups.csv")
# all the rows, all, the columns except the last one
X = dataset.iloc[:, :-1].values
# all the rows, all, only the last column
y = dataset.iloc[:, -1].values

## Encoding categorical datas

Encoding the independent variable

In [None]:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder

# [-1] for encoding only the last columng of X (which is State)
ct = ColumnTransformer(transformers=[("encoder", OneHotEncoder(), [-1])], remainder="passthrough")
columns_before_transformation = len(X[0])
X = np.array(ct.fit_transform(X))
columns_after_transformation = len(X[0])
encoded_columns_length = (columns_after_transformation - columns_before_transformation) + 1
# print encode
# print(X[:, :encoded_columns_length])

## Splitting data set int the Training and Test sets


In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

## Train the Multiple Linear Regression model on the Training set


In [None]:
from sklearn.linear_model import LinearRegression

# dummy variable trap is avoided automatically
# backward elemination technique etc, will also be applied automatically (features without   significant p values are filtered out)
regressor = LinearRegression()
regressor.fit(X=X_train, y=y_train)


## Predict the Test result set


In [None]:
y_pred = regressor.predict(X=X_test)
np.set_printoptions(precision=2)
# reshape in 2d array with single column
y_pred_reshaped = y_pred.reshape(len(y_pred), 1)
y_test_reshaped = y_test.reshape(len(y_test), 1)
# 1 is for horizontal concat
# 2d array  [ [pred, test], [pred, test], ... ], to compare
y_pred_and_test_concat = np.concatenate((y_pred_reshaped, y_test_reshaped), axis=1 )
print(y_pred_and_test_concat)


## Visualising the results


In [None]:
plt.plot(y_pred, color="blue")
plt.plot(y_test, color="red")
plt.title("Estimations vs Actuals")
plt.xlabel("Estimations")
plt.ylabel("Profit")
plt.show()
