# FLIP(01):  Advanced Data Science
**(Tools Module 04: TPOP)**

---
- Materials in this module include resources collected from various open-source online repositories.
- You are free to use, but NOT allowed to change or distribute this package.

Prepared by and for 
**Student Members** |
2006-2018 [TULIP Lab](http://www.tulip.org.au)

---


# Session 03 - Built-in TPOT configurations

TPOT comes with a handful of default operators and parameter configurations that we believe work well for optimizing machine learning pipelines. Below is a list of the current built-in configurations that come with TPOT.

<table>
   <tr>
      <td>Configuration Name</td>
      <td>Description</td>
      <td>Operators</td>
   </tr>
   <tr>
      <td>Default TPOT</td>
      <td>TPOT will search over a broad range of preprocessors, feature constructors, feature selectors, models, and parameters to find a series of operators that minimize the error of the model predictions. Some of these operators are complex and may take a long time to run, especially on larger datasets.
<br><br>
Note: This is the default configuration for TPOT. To use this configuration, use the default value (None) for the config_dict parameter.</td>
     <td><font color='blue'>Classification
<br><br>
Regression</td></font>
   </tr>
   <tr>
      <td>TPOT light</td>
      <td>TPOT will search over a restricted range of preprocessors, feature constructors, feature selectors, models, and parameters to find a series of operators that minimize the error of the model predictions. Only simpler and fast-running operators will be used in these pipelines, so TPOT light is useful for finding quick and simple pipelines for a classification or regression problem.
<br><br>
This configuration works for both the TPOTClassifier and TPOTRegressor.</td>
      <td><font color='blue'>Classification
<br><br>
Regression</td></font>
   </tr>
   <tr>
      <td>TPOT MDR</td>
      <td>TPOT will search over a series of feature selectors and`Multifactor Dimensionality Reduction models to find a series of operators that maximize prediction accuracy. The TPOT MDR configuration is specialized for genome-wide association studies (GWAS), and is described in detail online here.
<br><br>
Note that TPOT MDR may be slow to run because the feature selection routines are computationally expensive, especially on large datasets.</td>
      <td><font color='blue'>Classification
<br><br>
Regression</td></font>
   </tr>
   <tr>
      <td>TPOT sparse</td>
      <td>TPOT uses a configuration dictionary with a one-hot encoder and the operators normally included in TPOT that also support sparse matrices.
<br><br>
This configuration works for both the TPOTClassifier and TPOTRegressor.</td>
      <td><font color='blue'>Classification
<br><br>
Regression</td></font>
   </tr>
</table>

To use any of these configurations, simply pass the string name of the configuration to the `config_dict` parameter (or `-config` on the command line). For example, to use the "TPOT light" configuration:

In [None]:
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25)

tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2,
                      config_dict='TPOT light')
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))
tpot.export('tpot_mnist_pipeline.py')