# Introduction to supervised leaning and linear regression
## Basic Knowledge
In order to be successful in this course, you will need a working knowledge of the following:

- Familiarity with programming on a Python development environment

- Familiarity with Jupyter notebooks

- Fundamental understanding of Calculus, Linear Algebra, Probability, and Statistics

- Familiarity with Exploratory Data Analysis, Feature Engineering, handling missing values, and handling categorical values


## Types of Machine Learning

A model is a small thing that captures a larger thing.

A good model is going to omit unimportant details while retaining what's important. 

A map is a model of the world 


## Fit parameters and hyperparameters

$$
y_p = f(\Omega, x)
$$

Our framework estimates a relationship between the features and target:

Here, N (the Fit Parameters) involve aspects of the model we estimate (fit) using the data.

To implement our approach, we make decisions regarding how to produce these estimates.

These decisions lead to hyperparameters, that are an important part of the machine learning
workflow (though not explicit components of the model).

Two main modeling approaches:
- Regression: y is numeric.

  - E.g.: stock price, box office revenue, location (x,y coordinates).

- Classification: y is categorical.

  - E.g.: face recognition customer churn, which word comes next.

$$
y_p = f(\Omega, x)
$$


x     : Input.

$y_p$ : Output (values predicted by the model).

f(:)  : Prediction function that generates predictions from x and 0.



Data scientists will train the model to find the best parameters by looking at past data.


J (Y, $y_p$): Loss

Most ML models define a quantitative score for how "good" our predictions are.

Typically measures how close our predictions are to the true values.

Update rule:  determine how to update our parameters, typically trying to find those parameters that will minimize that loss function J . 

## Supervised Machine Learning

### Interpretation and Prediction

Interpretation:
- In some cases, the primary objective is to train a model to find insights from the data.
- In $ y_p = f(\Omega, x) $, the interpretation approach uses N to give us insight into a system.
Common workflow:
- Gather x, y; Train model by finding the N that gives the best prediction $ y_p = f(\Omega, x) $.
- Focus on $\Omega$ (rather than $y_p$) to generate insights.

Example interpretation exercises:
- X = customer demographics, y = sales data; examine N to understand loyalty by segment
- x = car safety features, y = traffic accidents; examine N to understand what makes cars safer
- x = marketing budget, y = movie revenue; examine 2 to understand marketing effectiveness

Prediction:
- In some cases, the primary objective is to make the best prediction.
- In $ y_p = f(\Omega, x) $, the prediction approach compares $y_p$ with y.
- The focus is on performance metrics, which measure the quality of the model's predictions.
- Performance metrics usually involve some measure of closeness between $y_p$ with y.
- Without focusing on interpretability, we risk having a Black-box model.

Example prediction exercises:
- $x$ = customer purchase history , y = customer churn; focus on predicting customer churn
- $x$ = financial information, y = flagged default/non-default; focus on predicting loan default
- $x$ = purchase history, y = next purchase; focus on predicting the next purchase

### Two Common Approaches

Interpretation and prediction in Supervised Machine Learning

Majority of projects will call for a balance.

Interpretation can provide insight into improvements in prediction, and vice-versa.

Not all models will allow both: Supervised Machine Learning models provide

varying levels of support for interpretation vs. prediction

## ML Framework: Takeaways

Machine Learning is the subset of Al that focuses on model building to support a goal of
interpretation and/or prediction.

ML algorithms:
- Use past experience to build a model that is useful for future experience.
- Follow a general form: $y_p = f(\Omega,x)$

## Supervised learning overview

![Supervised learning overview](./images/01_SupervisedLearningOverview.jpg "Supervised learning overview")

![Numeric predicting: Movie Revenue](./images/02_NumericPredicting_MovieRevenue.png "Numeric predicting: Movie Revenue")



![Classification: Category Answers](./images/03_ClassificationCategoricalAnswers.png "Classification: Category Answers")


![Classification: Category Answers Examples](./images/04_ClassificationCategoricalAnswers.png "Classification: Category Answers Example")



### What is Needed for Classification?

Model data with:
- Features that can be quantified
- Labels that are known
- Method to measure similarity

## Linear Regression

![Calculating The residuals](./images/05_CalculatingTheResiduals.png "Caculating the Residuals")


### Minimizing the Error Function

$$
J(\beta_0,\beta_1)=\dfrac{1}{2m}\sum_{i=1}^m((\beta_0+\beta_1 x_{obs}^i)-y_{obs}^i)^2
$$

## Modeling Best Practice

- Use cost function to fit model

- Develop multiple models

- Compare results and choose best one

![Other Measures Of Error](./images/06_OtherMeasuresOfError.png "Other Measures Of Error")



SSE: sum squared error. 

TSS: Total Squared Error

That's going to measure the distance between truth and our predictions, similar to that portion of our cost function that we saw earlier. That's going to be sum of squared error. Our total squared error just measures the distance between the truth and the average values of the truth. Sum of squared error is the unexplained variation from our model. We had a line through each one of our dots and it's going to be, what were we not able to explain. That's going to be the sum of squared error and then the total squared error is the total variation.

## Linear Regression: The Syntax

In [5]:
# Import the class containing the regression method
from sklearn.linear_model import LinearRegression

# Create an instance of the class
LR = LinearRegression()

# Fit the instance on the data and then predict the expected value
LR = LR.fit(X_train, y_train)
y_predict = LR.predict(X_test)



ValueError: Found array with 0 feature(s) (shape=(2, 0)) while a minimum of 1 is required by LinearRegression.