# **Introduction to Regression:**

Regression is a statistical technique used for predicting continuous values. In this context, we explore its application in estimating CO2 emissions from cars based on various features.

![Alt text](image-6.png)

### **Variables in Regression:**

- **Dependent Variable (Y):** Represents the target we want to predict, such as CO2 emissions.
  
- **Independent Variables (X):** Factors influencing the dependent variable, like engine size, cylinders, and fuel consumption.

**Note:** The key point in the regression is that our dependent value should be continuous and cannot be a discrete value. However, the independent variable, or
variables, can be measured on either a categorical or continuous measurement scale. 


### **Types of Regression:**

- **Simple Regression:** Involves one independent variable to estimate a dependent variable, e.g., predicting CO2 emission using engine size.

- **Multiple Regression:** Utilizes more than one independent variable, e.g., predicting CO2 emission based on both engine size and the number of cylinders.

![Alt text](image-7.png)

### **Regression Process:**

1. **Data Collection:** Gather historical data with labeled CO2 emissions for various cars.

2. **Model Building:** Employ regression to create an estimation model based on the collected data.

3. **Prediction:** Apply the trained model to predict CO2 emissions for new, unseen cars.

### **Applications of Regression:**

- **Sales Forecasting:** Predict sales using variables like age, education, and experience of a salesperson.

- **Psychology Studies:** Determine individual satisfaction based on demographic and psychological factors.

- **Real Estate:** Predict house prices by considering factors like size and number of bedrooms.

- **Employment Income:** Predict income by analyzing variables such as hours worked, education, and experience.

### **Types of Regression Models:**

- **Linear vs. Non-linear:**
  - **Linear Regression:** Assumes a straight-line relationship between variables.
  - **Non-linear Regression:** Deals with more complex or curved relationships.

![Alt text](image-8.png)

## Simple Linear Regression

#### **Understanding the Data**

Consider a dataset related to CO2 emissions from cars, including engine size, cylinders, fuel consumption, and CO2 emissions. Can we predict CO2 emissions using variables like engine size? Yes, with linear regression.

![Alt text](image-10.png)

#### **Linear Regression Basics**

Linear regression approximates a linear model describing the relationship between variables. In simple linear regression, a dependent variable (e.g., CO2 emissions) is predicted by an independent variable (e.g., engine size).



#### **Types of Linear Regression Models**

Two types: 
1. **Simple Linear Regression:** Uses one independent variable to estimate a dependent variable.
2. **Multiple Linear Regression:** Involves more than one independent variable.

![Alt text](image-9.png)

#### **Working of Linear Regression**

Visualize a scatter plot of engine size (independent) against CO2 emissions (dependent). Linear regression fits a line through the data, modeling the relationship.

![Alt text](image-11.png)

Let us assume for a moment that the line is a good fit of the data.We can use it to predict the emission of an unknown car.For example, for a sample car with engine size 2.4, you can find the emission is 214. 


#### **Linear Regression Equation**

The equation is in the form: 
$\hat{y}$ = $\theta_0$ + $\theta_1$*$x_1$

Where y is the dependent variable, x1 is the independent variable, $\theta_0$ is the intercept, and $\theta_1$ is the slope.

$\theta_0$ and $\theta_1$ are the parameters that we need to adjust.

![Alt text](image-12.png)

#### **Parameter Estimation**

Calculate $\theta_1$ (slope) and $\theta_0$ (intercept) using data averages. The objective is to minimize the Mean Squared Error (MSE).

![Alt text](image-13.png)

Here, $\bar{x}$ and $\bar{y}$ are averages of x and y.

![Alt text](image-14.png)

So, the polynomial of the line would be: 
### **$\hat{y}$ = 125.74 + 39 $x_1$**


#### **Predictions with Linear Regression**

Use the line equation to predict CO2 emissions for new cars based on engine size. Parameters $\theta_0$ and $\theta_1$ determine the line.

#### **Utility of Linear Regression**

Linear regression is fundamental, fast, doesn't require parameter tuning, and is highly interpretable. It's a go-to for simplicity and effectiveness.

![Alt text](image-15.png)