### Linear regresion 

Linear regression analysis is used to predict the value of a variable based on the value of another variable.
The variable you want to predict is called the dependent variable. The variable you are using to predict the 
other variables value is called the independent variable.

This form of analysis estimates the coefficients of the linear equation, involving one or more independent variables 
that best predict the value of the dependent variable. Linear regression fits a straight line or surface that minimizes
the discrepancies between predicted and actual output values. There are simple linear regression calculators that use
a “least squares” method to discover the best-fit line for a set of paired data. You then estimate the value of
X (dependent variable) from Y (independent variable).

### why linear regression is important

Linear-regression models are relatively simple and provide an easy-to-interpret mathematical formula that can generate 
predictions. Linear regression can be applied to various areas in business and academic study.

You’ll find that linear regression is used in everything from biological, behavioral, environmental and social sciences 
to business. Linear-regression models have become a proven way to scientifically and reliably predict the future. Because 
linear regression is a long-established statistical procedure, the properties of linear-regression models are well understood 
and can be trained very quickly.

#### Supervised learning methods: 
It contains past data with labels which are then used for building the model.


Regression:
The output variable to be predicted is continuous in nature, e.g. scores of a student, diamond prices, etc.

#### Classification: 
The output variable to be predicted is categorical in nature, e.g.classifying incoming emails as spam or ham,
                Yes or No, True or False, 0 or 1.


#### Unsupervised learning methods:
It contains no predefined labels assigned to the past data.

#### Simple Linear Regression
Linear regression is a quiet and the simplest statistical regression method used for predictive analysis in machine learning.
Linear regression shows the linear relationship between the independent(predictor) variable i.e. X-axis and the dependent
(output) variable i.e. Y-axis, called linear regression.

In [25]:
import numpy as np
%matplotlib inline
%matplotlib notebook
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from adspy_shared_utilities import load_crime_dataset

In [26]:
# synthetic dataset for simple regression
from sklearn.datasets import make_regression
plt.figure()
plt.title('Sample regression problem with one input variable')
X_R1, y_R1 = make_regression(n_samples = 100, n_features=1,
                            n_informative=1, bias = 150.0,
                            noise = 30, random_state=0)
plt.scatter(X_R1, y_R1, marker= 'o', s=50)
plt.show()

# Communities and Crime dataset
(X_crime, y_crime) = load_crime_dataset()

<IPython.core.display.Javascript object>

In [27]:
# linear model for regression 
X_train, X_test, y_train, y_test = train_test_split(X_R1,y_R1,random_state=0)

linreg = LinearRegression().fit(X_train,y_train)

print("linear regression coef(w) :  {}".format(linreg.coef_))
print("linear regression intercept(b) :  {:.3f}".format(linreg.intercept_))
print("R-suare score(test set) :  {:.3f}".format(linreg.score(X_test,y_test)))
print("R-square score(training set) :  {:.3f}".format(linreg.score(X_train,y_train)))

linear regression coef(w) :  [45.70870465]
linear regression intercept(b) :  148.446
R-suare score(test set) :  0.492
R-square score(training set) :  0.679


In [28]:
### Linear regression example plot

plt.figure(figsize=(6,4))
plt.scatter(X_R1,y_R1,marker='o',s=40,alpha=0.8)
plt.plot(X_R1, linreg.coef_*X_R1 + linreg.intercept_,'r-')
plt.title('Least-squares linear regression')
plt.xlabel('Feature value (x)')
plt.ylabel('Target value (y)')
plt.show()

<IPython.core.display.Javascript object>

In [29]:
X_train, X_test, y_train, y_test = train_test_split(X_crime, y_crime, random_state=0)

linreg = LinearRegression().fit(X_train,y_train)

print('Crime dataset')
print('linear model intercept: {}'
     .format(linreg.intercept_))
print('linear model coeff:\n{}'
     .format(linreg.coef_))
print('R-squared score (training): {:.3f}'
     .format(linreg.score(X_train, y_train)))
print('R-squared score (test): {:.3f}'
     .format(linreg.score(X_test, y_test)))

Crime dataset
linear model intercept: -1728.1306725810568
linear model coeff:
[ 1.61892346e-03 -9.43009110e+01  1.36067510e+01 -3.13380670e+01
 -8.15482715e-02 -1.69455128e+01 -2.42730375e-03  1.53013232e+00
 -1.39193248e-02 -7.72112833e+00  2.28112354e+01 -5.65708295e+00
  9.34751364e+00  2.06969566e-01 -7.43413626e+00  9.65856476e-03
  4.38030290e-03  4.79754625e-03 -4.46469212e+00 -1.60907140e+01
  8.82778012e+00 -5.06734503e-01 -1.42198055e+00  8.17551991e+00
 -3.87048268e+00 -3.54209213e+00  4.48758304e+00  9.30645715e+00
  1.73644996e+02  1.18220766e+01  1.51120836e+02 -3.29613007e+02
 -1.35343395e+02  6.95380108e-01 -2.38369008e+01  2.77038981e+00
  3.82248925e-01  4.38813358e+00 -1.06410851e+01 -4.92294176e-03
  4.14031827e+01 -1.16206866e-03  1.18568968e+00  1.75418465e+00
 -3.68283678e+00  1.59679443e+00 -8.42180230e+00 -3.79703897e+01
  4.74076990e+01 -2.50768374e+01 -2.88246410e-01 -3.65633234e+01
  1.89516080e+01 -4.53336736e+01  6.82698598e+02  1.04478671e+02
 -3.28575414

##### cost function

What is a loss/Cost function? ‘Loss’ in Machine learning helps us understand the difference between the predicted value & the actual value. The Function used to quantify this loss during the training phase in the form of a single real number is known as “Loss Function”. These are used in those supervised learning algorithms that use optimization techniques. Notable examples of such algorithms are regression, logistic regression, etc. The terms cost function & loss function are analogous.

##### most used regression cost function
me
mae
mse
##### cross entropy or log loss  for classification
Cross entropy is a loss function that can be used to quantify the difference between two probability distributions. This can be best explained through an example.

##### gradient descent
Gradient Descent is an iterative optimization algorithm used to minimize the cost function of a machine learning model. 
Gradient descent is a powerful optimization algorithm used to minimize the loss function in a machine learning model. It’s a popular choice for a variety of algorithms, including linear regression, logistic regression, and neural networks.
##### hyphothesis
Hypothesis in Machine Learning is used when in a Supervised Machine Learning, we need to find the function that best maps input to output. This can also be called function approximation because we are approximating a target function that best maps feature to the target.
##### overfitting
When a model has low bias and higher variance it ends up memorizing the data and causing overfitting. Overfitting causes the model to become specific rather than generic. This usually leads to high training accuracy and very low test accuracy.
##### underfitting
When a model has high bias and low variance it ends up not generalizing the data and causing underfitting. It is unable to find the hidden underlying patterns from the data. This usually leads to low training accuracy and very low test accuracy.
##### bias 
In the simplest terms, Bias is the difference between the Predicted Value and the Expected Value. To explain further, the model makes certain assumptions when it trains on the data provided. When it is introduced to the testing/validation data, these assumptions may not always be correct.
#### variance


##### most used metrics 
1. Coefficient of Determination or R-Squared (R2)
2. Root Mean Squared Error (RSME) and Residual Standard Error (RSE)

The fit() method helps in fitting the training dataset into an estimator (ML algorithms).


The transform() helps in transforming the data into a more suitable form for the model.


The fit_transform() method combines the functionalities of both fit() and transform().

### what is correlation


Correlation is the statistical measure of the relationship between two variables. There are different types of correlation 
coefficients like Pearson coefficient (linear) and Spearman coefficient (non-linear) which capture different degrees of
probabilistic dependence but not necessarily causation. The correlation coefficient, or Pearson’s, is calculated using a 
least-squares measure of the error between an estimating line and the actual data values, normalized by the square root of
their variances. The coefficients range in value from -1 (perfect inverse correlation) to 1 (perfect direct correlation), 
with zero being no correlation.