# Summary 

----


### Step by step approach

`

    --- Label encoding ---
    from sklearn import preprocessing    ### label_encoder object knows how to understand word labels. 
    label_encoder = preprocessing.LabelEncoder()

    --- Standardize ---- normally applies on dependent variables 
    from sklearn.preprocessing import StandardScaler
    scalar = StandardScaler().fit(df)

    --- Model Creation & Prediction  --- 
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression

    lr = LinearRegression(fit_intercept = True)  # instantiate Linear Regression model
    model = lr.fit(X_train, y_train)
    y_pred = model.predict(X_test)


    --- Model Evaluation ---
    from sklearn import metrics
    from sklearn.metrics import r2_score

    MAE = metrics.mean_absolute_error(y_test['charges'], y_pred_df['charges'])
    MSE = metrics.mean_squared_error(y_test['charges'], y_pred_df['charges'])
    RMSE = np.sqrt(MSE)
    R_Sq = r2_score(y_test['charges'], y_pred)

`

`
LinearRegression(
    *,
    fit_intercept=True,
    normalize=False,
    copy_X=True,
    n_jobs=None,
    positive=False,
)
`

* While instantiating linear regression model one of the parameter is __fit_intercept__, which helps in contructing model with y-intercept or not.
* __Different slope coefficients__ will be generated, while creating a model with/without `y intercept` 


equation | fit_intercept
--- | ---
y = mx + c | LinearRegression()
y = mx + c | LinearRegression(fit_intercept = True)
y = mx | LinearRegression(fit_intercept = False)


* __normalize = True__ sets the mean in the normal destribution to 0.

* __n_jobs__ talks about, number of processors required to compute the model


    
    The number of jobs to use for the computation. This will only provide
        speedup for n_targets > 1 and sufficient large problems.
        ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
        ``-1`` means using all processors. See :term:`Glossary <n_jobs>`
        for more details.
    

    From the implementation point of view, this is just plain Ordinary
    Least Squares (scipy.linalg.lstsq) or Non Negative Least Squares
    (scipy.optimize.nnls) wrapped as a predictor object.

* positive : bool, default=False

    When set to ``True``, forces the coefficients to be positive. This option is only supported for dense arrays.



* Dense Array stores each element in the memory in the exact sequence in which the items are defined, even the item value is undefined.

`
    Dense is in opposition to sparse, and generally is used when talking about storage. 
    For example, this array is dense:
        a = [undefined, undefined, 2]
    It can be stored in memory exactly like that: a sequence of three locations, the first two being undefined, the third being 2.
`

<hr \>

`
LinearRegression(
    *,
    fit_intercept=True,
    normalize=False,
    copy_X=True,
    n_jobs=None,
    positive=False,
)
`

For the hyper-parameter tuning, among the model parameters, `fit_intercept` only influences the model accuracy.

# Bias and Variance

*** Very Important ***

* For a model, distribution plot of prediction error should be normally distributed. 

Observations | Error disribution | Excess Kurtosis ( >3 ) | insights
--- |--- | --- | ---
1 | normally distributed | No | Best model with less prediction errors
2 | skewed  | No | model is biased
3 | skewed  | Yes | model is biased and high variance

<img src="https://learnopencv.com/wp-content/uploads/2017/02/Bias-Variance-Tradeoff-In-Machine-Learning-1.png" />

<img src="https://community.alteryx.com/t5/image/serverpage/image-id/52874iE986B6E19F3248CF?v=v2" />

# Kurtosis

Kurtosis is a statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution. In other words, kurtosis identifies whether the tails of a given distribution contain extreme values.

<img src="https://cdn.corporatefinanceinstitute.com/assets/kurtosis.png" />

reference - https://corporatefinanceinstitute.com/resources/knowledge/other/kurtosis/



## 1. Mesokurtic

Data that follows a mesokurtic distribution shows an excess kurtosis of zero or close to zero. This means that if the data follows a normal distribution, it follows a mesokurtic distribution.

<img src="https://cdn.corporatefinanceinstitute.com/assets/kurtosis1.png" />


## 2. Leptokurtic

Leptokurtic indicates a positive excess kurtosis. The leptokurtic distribution shows heavy tails on either side, indicating large outliers. In finance, a leptokurtic distribution shows that the investment returns may be prone to extreme values on either side. Therefore, an investment whose returns follow a leptokurtic distribution is considered to be risky.

<img src="https://cdn.corporatefinanceinstitute.com/assets/kurtosis3.png" />

## 3. Platykurtic

A platykurtic distribution shows a negative excess kurtosis. The kurtosis reveals a distribution with flat tails. The flat tails indicate the small outliers in a distribution. In the finance context, the platykurtic distribution of the investment returns is desirable for investors because there is a small probability that the investment would experience extreme returns.

<img src="https://cdn.corporatefinanceinstitute.com/assets/kurtosis2.png" />