<h2>Model Development</h2>

<h3>1- Linear Regression and Multiple Linear Regression</h3>

<h4>Overview of Linear Regression:</h4>

Linear Regression is a fundamental statistical method used for predicting or modeling the relationship between a dependent variable (response) and one or more independent variables (predictors). It assumes a linear relationship between the variables, meaning that the relationship can be approximated by a straight line.

In its simplest form, linear regression is known as "Simple Linear Regression," which involves only one independent variable. The equation of a simple linear regression can be represented as:

y = mx + b
Where:

1. y is the dependent variable (response) we want to predict.
2. x is the independent variable (predictor) that influences the dependent variable.
3. m is the slope of the regression line, representing the change in y for a one-unit change in x.
4. b is the intercept of the regression line, representing the value of y when x is zero.

The main goal in linear regression is to find the best-fitting line that minimizes the difference between the actual observed values and the predicted values (the distance between the data points and the regression line).

<h4>Overview of Multiple Linear Regression:</h4>

Multiple Linear Regression is an extension of simple linear regression, allowing for more than one independent variable to predict the dependent variable. It is used when the relationship between the dependent variable and the predictors is more complex and cannot be adequately represented by a single independent variable.

The equation of multiple linear regression can be represented as:

y = b0 + b1*x1 + b2*x2 + ... + bn*xn

Where:

1. y is the dependent variable (response) we want to predict.
2. x1, x2, ..., xn are the independent variables (predictors) that influence the dependent variable.
3. b0 is the intercept, representing the value of y when all predictors are zero.
4. b1, b2, ..., bn are the coefficients (slopes) of the regression line, representing the change in y for a one-unit change in each respective predictor.

In multiple linear regression, the model estimates the coefficients (slopes) and the intercept that best fit the data to minimize the sum of squared differences between the observed and predicted values.

Multiple linear regression is widely used in various fields such as economics, social sciences, engineering, and machine learning, where there are multiple factors influencing the dependent variable, and understanding their combined effect is essential for making predictions or inferences.

<h3>Linear Regression</h3>

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('automobileEDA.csv', header=0)

In [3]:
df.head()

Unnamed: 0.1,Unnamed: 0,symboling,normalized-losses,make,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,length,...,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price,fuel-type-diesel,fuel-type-gas,aspiration-std,aspiration-turbo
0,55,3,150.0,mazda,two,hatchback,rwd,front,95.3,169.0,...,9.4,135,6000,16,23,15645,False,True,True,False
1,66,0,93.0,mercedes-benz,two,hardtop,rwd,front,106.7,187.5,...,21.5,123,4350,22,25,28176,True,False,False,True
2,145,0,85.0,subaru,four,wagon,4wd,front,96.9,173.6,...,7.7,111,4800,23,23,11694,False,True,False,True
3,128,3,150.0,saab,two,hatchback,fwd,front,99.1,186.6,...,9.31,110,5250,21,28,11850,False,True,True,False
4,139,0,102.0,subaru,four,sedan,fwd,front,97.2,172.0,...,9.0,94,5200,26,32,9960,False,True,True,False


<h3>Linear Regression</h3>

One example of a Data Model that we will be using is:

<h3>Simple Linear Regression</h3>

Simple Linear Regression is a method to help us understand the relationship between two variables:

1. The predictor/independent variable (X)
2. The response/dependent variable (that we want to predict)(Y)

The result of Linear Regression is a linear function that predicts the response (dependent) variable as a function of the predictor (independent) variable.

<h3>Linear Function</h3>

- a refers to the intercept of the regression line, in other words: the value of Y when X is 0
- b refers to the slope of the regression line, in other words: the value with which Y changes when X increases by 1 unit

Let's load the modules for linear regression:

In [4]:
from sklearn.linear_model import LinearRegression

In [5]:
lm = LinearRegression()
lm

<h4>How could "highway-mpg" help us predict car price?</h4>

For this example, we want to look at how highway-mpg can help us predict car price. Using simple linear regression, we will create a linear function with "highway-mpg" as the predictor variable and the "price" as the response variable.

In [6]:
X = df[['highway-mpg']]
Y = df[['price']]

In [7]:
lm.fit(X, Y)

In [9]:
Yhat = lm.predict(X)
Yhat[0:5]

array([[18271.09302326],
       [16593.57271781],
       [18271.09302326],
       [14077.29225963],
       [10722.25164873]])

<h4>What is the value of the intercept (a)?</h4>

In [10]:
lm.intercept_

array([37562.57653593])

<h4>What is the value of the slope (b)?</h4>

In [11]:
lm.coef_

array([[-838.76015272]])

<h4>Question #1 a):</h4>

Create a linear regression object called "lm1".

In [26]:
lm1 = LinearRegression()

In [27]:
lm1

<h4>Question #1 b):</h4>

Train the model using "engine-size" as the independent variable and "price" as the dependent variable?

In [28]:
lm1.fit(df[['engine-size']], df[['price']])
lm1

**Question #1 c):**

Find the slope and intercept of the model.

In [29]:
lm1.coef_

array([[183.03594426]])

In [30]:
lm1.intercept_

array([-8324.5192125])

In [31]:
X = df[['engine-size']]

**Question #1 d):**

What is the equation of the predicted line? You can use x and yhat or "engine-size" or "price".

In [32]:
lm1_pred = lm1.predict(X)
lm1[0:5]

TypeError: 'LinearRegression' object is not subscriptable

In [33]:
# using X and Y
# yhat =Intercept + slop (x)
Yhat=-7963.34 + 166.86*X

Price=-7963.34 + 166.86*df['engine-size']

In [21]:
from sklearn.metrics import r2_score

In [22]:
r2 = r2_score(Y, lm1_pred)

NameError: name 'lm1_pred' is not defined