# FLIP(01):  Advanced Data Science
**(Tools Module 04: TPOP)**

---
- Materials in this module include resources collected from various open-source online repositories.
- You are free to use, but NOT allowed to change or distribute this package.

Prepared by and for 
**Student Members** |
2006-2018 [TULIP Lab](http://www.tulip.org.au)

---


# Session 00 - TPOT with code

We've taken care to design the TPOT interface to be as similar as possible to scikit-learn.

TPOT can be imported just like any regular Python module. To import TPOT, type:

In [None]:
from tpot import TPOTClassifier

then create an instance of TPOT as follows:

In [None]:
pipeline_optimizer = TPOTClassifier()

It's also possible to use TPOT for regression problems with the `TPOTRegressor` class. Other than the class name, a `TPOTRegressor` is used the same way as a `TPOTClassifier`. You can read more about the `TPOTClassifier` and `TPOTRegressor` classes in the API documentation().

Some example code with custom TPOT parameters might look like:

In [None]:
pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
                                    random_state=42, verbosity=2)

Now TPOT is ready to optimize a pipeline for you. You can tell TPOT to optimize a pipeline based on a data set with the `fit` function:

In [None]:
pipeline_optimizer.fit(X_train, y_train)

The `fit` function initializes the genetic programming algorithm to find the highest-scoring pipeline based on average k-fold cross-validation Then, the pipeline is trained on the entire set of provided samples, and the TPOT instance can be used as a fitted model.

You can then proceed to evaluate the final pipeline on the testing set with the `score` function:

In [None]:
print(pipeline_optimizer.score(X_test, y_test))

Finally, you can tell TPOT to export the corresponding Python code for the optimized pipeline to a text file with the `export` function:

In [None]:
pipeline_optimizer.export('tpot_exported_pipeline.py')

Once this code finishes running, `tpot_exported_pipeline.py` will contain the Python code for the optimized pipeline.

Below is a full example script using TPOT to optimize a pipeline, score it, and export the best pipeline to a file.

In [None]:
from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25)

pipeline_optimizer = TPOTClassifier(generations=5, population_size=20, cv=5,
                                    random_state=42, verbosity=2)
pipeline_optimizer.fit(X_train, y_train)
print(pipeline_optimizer.score(X_test, y_test))
pipeline_optimizer.export('tpot_exported_pipeline.py')