Applied machine learning is an empirical skill. 
You cannot get better at it by reading books and articles. 
You have to practice. 

### 18.1 Practice Machine Learning With Projects
#### 18.1.1 Use A Structured Step-By-Step Process

Any predictive modeling machine learning project can be broken down into six common tasks:
1. Define Problem.
2. Summarize Data.
3. Prepare Data.
4. Evaluate Algorithms.
5. Improve Results.
6. Present Results.


### 18.2 Machine Learning Project Template in Python
#### 18.2.1 Template Summary

In [1]:
# Python Project Template
# 1. Prepare Problem
# a) Load libraries
# b) Load dataset
# 2. Summarize Data
# a) Descriptive statistics
# b) Data visualizations
# 3. Prepare Data
# a) Data Cleaning
# b) Feature Selection
# c) Data Transforms
# 4. Evaluate Algorithms
# a) Split-out validation dataset
# b) Test options and evaluation metric
# c) Spot-Check Algorithms
# d) Compare Algorithms
# 5. Improve Accuracy
# a) Algorithm Tuning
# b) Ensembles
# 6. Finalize Model
# a) Predictions on validation dataset
# b) Create standalone model on entire training dataset
# c) Save model for later use

### 18.3 Machine Learning Project Template Steps
### 18.3.1 Prepare Problem
- Python modules, classes and functions that you intend to use.
- Loading your dataset from CSV.

It is also the place where you might need to make a reduced sample of your dataset if it is too large to work with. Ideally, your dataset should be small enough to build a model or create a visualization within a minute, ideally 30 seconds. You can always scale up well performing models later.

#### 18.3.2 Summarize Data
This step is about better understanding the data that you have available.
- Descriptive statistics such as summaries.
- Data visualizations such as plots with Matplotlib, ideally using convenience functions from Pandas.

#### 18.3.3 Prepare Data

This step is about preparing the data in such a way that it best exposes the structure of the problem and the relationships between your input attributes with the output variable.
- Cleaning data by removing duplicates, marking missing values and even imputing missing values.
- Feature selection where redundant features may be removed and new features developed.
- Data transforms where attributes are scaled or redistributed in order to best expose the structure of the problem later to learning algorithms.

#### 18.3.4 Evaluate Algorithms
This involves steps such as:
- Separating out a validation dataset to use for later confirmation of the skill of your developed model.
- Defining test options using scikit-learn such as cross validation and the evaluation metric to use.
- Spot-checking a suite of linear and nonlinear machine learning algorithms.
- Comparing the estimated accuracy of algorithms.

#### 18.3.5 Improve Accuracy
There are two different ways to improve the accuracy of your models:
- Search for a combination of parameters for each algorithm using scikit-learn that yields the best results.
- Combine the prediction of multiple models into an ensemble prediction using ensemble techniques.

#### 18.3.6 Finalize Model
Finalizing a model may involve sub-tasks such as:
- Using an optimal model tuned by scikit-learn to make predictions on unseen data.
- Creating a standalone model using the parameters tuned by scikit-learn.
- Saving an optimal model to file for later use.