Regression is a statistical method used to establish a relationship between one or more independent variables (also known as predictors, features, or inputs) and a dependent variable (also known as the outcome, target, or response). The main goal of regression analysis is to predict the value of the dependent variable based on the values of the independent variables.

Regression models can be used for various purposes, such as forecasting future trends, understanding the relationships between variables, and making predictions. Regression analysis is widely used in various fields such as economics, finance, psychology, engineering, and social sciences.

There are several types of regression models, including:

 - Simple linear regression: 
    This is the simplest form of regression, where one independent variable is used to predict a dependent variable. It assumes that there is a linear relationship between the two variables.

- Multiple linear regression:
    This type of regression involves two or more independent variables to predict a dependent variable. It is useful when there are multiple factors that can affect the outcome.

- Polynomial regression:
    This type of regression involves fitting a polynomial equation to the data. It is useful when the relationship between the variables is not linear.

- Logistic regression:
    This type of regression is used when the dependent variable is categorical. It is used to predict the probability of an event occurring.

- Ridge regression:
    This is a type of regression that is used when there is multicollinearity in the data. It adds a penalty term to the regression equation to reduce the impact of correlated variables.

Regression analysis involves several steps, including data collection, data cleaning, variable selection, model building, model validation, and interpretation of results. The accuracy of the regression model can be measured using various metrics such as R-squared, Mean squared error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE).

In summary, regression is a statistical method used to establish a relationship between independent variables and a dependent variable to make predictions and understand the relationships between variables.

# **Linear Regression**

## *Scratch*

In [2]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

In [None]:
class LinearRegression() :
      
    def __init__( self, learning_rate, iterations ) :
        self.learning_rate = learning_rate
        self.iterations = iterations
          
    # Function for model training
              
    def fit( self, X, Y ) :
        # no_of_training_examples, no_of_features
        self.m, self.n = X.shape
        # weight initialization
        self.W = np.zeros( self.n )
        self.b = 0
        self.X = X
        self.Y = Y
          
          
        # gradient descent learning
                  
        for i in range( self.iterations ) :
              
            self.update_weights()
              
        return self
      
    # Helper function to update weights in gradient descent
      
    def update_weights( self ) :
             
        Y_pred = self.predict( self.X )
          
        # calculate gradients  
      
        dW = - ( 2 * ( self.X.T ).dot( self.Y - Y_pred )  ) / self.m
       
        db = - 2 * np.sum( self.Y - Y_pred ) / self.m 
          
        # update weights
      
        self.W = self.W - self.learning_rate * dW
      
        self.b = self.b - self.learning_rate * db
          
        return self
      
    # Hypothetical function  h( x ) 
      
    def predict( self, X ) :
      
        return X.dot( self.W ) + self.b
     
  
# driver code
  
def main() :
      
    # Importing dataset
      
    df = pd.read_csv( "salary_data.csv" )
  
    X = df.iloc[:,:-1].values
  
    Y = df.iloc[:,1].values
      
    # Splitting dataset into train and test set
  
    X_train, X_test, Y_train, Y_test = train_test_split( 
      X, Y, test_size = 1/3, random_state = 0 )
      
    # Model training
      
    model = LinearRegression( iterations = 1000, learning_rate = 0.01 )
  
    model.fit( X_train, Y_train )
      
    # Prediction on test set
  
    Y_pred = model.predict( X_test )
      
    print( "Predicted values ", np.round( Y_pred[:3], 2 ) ) 
      
    print( "Real values      ", Y_test[:3] )
      
    print( "Trained W        ", round( model.W[0], 2 ) )
      
    print( "Trained b        ", round( model.b, 2 ) )
      
    # Visualization on test set 
      
    plt.scatter( X_test, Y_test, color = 'blue' )
      
    plt.plot( X_test, Y_pred, color = 'orange' )
      
    plt.title( 'Salary vs Experience' )
      
    plt.xlabel( 'Years of Experience' )
      
    plt.ylabel( 'Salary' )
      
    plt.show()

## *Package*

In [None]:
from sklearn.linear_model import LinearRegression

regr = LinearRegression()
 
regr.fit(X_train, y_train)
print(regr.score(X_test, y_test)


# **Multiple Linear Regression**

# **Polynomial Regression**

# **Ridge Regression**