# ![](https://ga-dash.s3.amazonaws.com/production/assets/logo-9f88ae6c9c3871690e33280fcf557f33.png) Model Validation and the Train-Test Split Lab
Week 3 | Lesson 2.4

### LEARNING OBJECTIVES
*After this lesson, you will be able to:*
- Explain the connection between the bias-variance tradeoff and the train-test split
- Perform a split of data into testing and training sets
- Make a prediction on the ISE value using a Linear Regression

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
import sklearn as skl

## Minimize the sum of the Bias and Variance

This is a much more challenging problem. In essence, we seek a model that is simultaneously lacking in complexity (low variance) and able to fit our known data well (low bias). To do this, we split our data into two sets:

- a training set
- a test set

In [3]:
data_file_location = '../../../data/istanbul_stocks.csv'
istanbul_stocks_df = pd.read_csv(data_file_location)

### Display a Single Row from the Head of the DataFrame

In [4]:
istanbul_stocks_df.head(1)

Unnamed: 0,date,SP,DAX,FTSE,NIKKEI,BOVESPA,EU,EM,ISE
0,5-Jan-09,-0.004679,0.002193,0.003894,0.0,0.03119,0.012698,0.028524,0.035754


### Describe the Measures of Central Tendency of the Dataset

In [5]:
istanbul_stocks_df.describe()

Unnamed: 0,SP,DAX,FTSE,NIKKEI,BOVESPA,EU,EM,ISE
count,536.0,536.0,536.0,536.0,536.0,536.0,536.0,536.0
mean,0.000643,0.000721,0.00051,0.000308,0.000935,0.000471,0.000936,0.001629
std,0.014093,0.014557,0.012656,0.01485,0.015751,0.01299,0.010501,0.016264
min,-0.054262,-0.052331,-0.054816,-0.050448,-0.053849,-0.048817,-0.038564,-0.062208
25%,-0.004675,-0.006212,-0.005808,-0.007407,-0.007215,-0.005952,-0.004911,-0.006669
50%,0.000876,0.000887,0.000409,0.0,0.000279,0.000196,0.001077,0.002189
75%,0.006706,0.008224,0.007428,0.007882,0.008881,0.007792,0.006423,0.010584
max,0.068366,0.058951,0.050323,0.061229,0.063792,0.067042,0.047805,0.068952


### Sort Parameters by their Correlation with `ISE`

#### Just get the Names

#### Don't need `ISE`!

---

# Best Practices in Developing Predictive Models

1. Clearly state the problem you wish to solve
1. Clearly state the model you will develop to solve the problem
1. Clearly state a metric you will use to assess your performance
1. Clearly define a benchmark against which you will measure the performance of your model using the metric you selected

## Modeling Changes in the Istanbul Stock Market

### Problem Statement

### Solution Statement

### Metric Selection


<img src="../2.3-Train-Test-Split/assets/regression_metrics.png" width="600px">

#### Import the `metric` you will use from `sklearn`

#### Define a `metric` function

In [None]:
def metric(y_true, y_pred):
    return None

### Benchmark 

---

# the Train-Test Split

The process looks as follows:

1. Split the data into two (not necessarily equally sized) sets, the training set and the test set
1. Set the test set aside
1. Fit the model to the best of our abilities using the training set
1. Evaluate the model separately using both the training set and the test set
   - the evaluation of the model using the training set can be taken to signify bias
   - the evaluation of the model using the test set can be taken to signify variance
1. Repeat steps 3 and 4 until an optimal sum of bias and variance is reached

### Prepare the Data

Pull the target vector off of the dataframe.

Drop the target vector to prepare the feature matrix.

In [None]:
istanbul_stocks_target = None
istanbul_stocks_feature = None

### Step 1: Split the data into a Training Set and a Test Set

#### Import `train_test_split` from `sklearn`

### Forward Selection

We will use forward selection to develop our models.

#### Display the Feature Names

#### Import `LinearRegression` from `sklearn`

#### Let's store the errors

In [None]:
errors_training_set = []
errors_test_set = []

### Step 2: Fit the Model

Here we fit a linear model using a single feature, `LSTAT`.

#### Prepare the data for fitting

In [None]:
print(features_names[:1])

#### Build Model with One Feature

### Step 3: Evaluate the Model

### Step 2: Fit the Model

Here we fit a linear model using two features, `LSTAT` and `RM`.

#### Prepare the data for fitting

In [None]:
print(features_names[:2])

#### Build Model with Two Features

### Step 3: Evaluate the Model

### Step 2: Fit the Model

Here we fit a linear model using three features, `LSTAT`, `RM`, and `PTRATIO`.

#### Prepare the data for fitting

In [None]:
print(features_names[:3])

#### Build Model with Three Features

### Step 3: Evaluate the Model

## Let Python Do The Work

## Plot the Training and the Testing Error as you Forward Select Features

Make sure to include a legend.