# Automated Machine Learning Classification Example

[Adapted example from **TPOT** site](https://epistasislab.github.io) 

**Iris Dataset Description:** <br>
The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.<br>
One class is linearly separable from the other 2; the latter are NOT linearly separable from each other ([UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/iris)).<br>
<br>
**Rows**
1. sepal length in cm 
2. sepal width in cm 
3. petal length in cm 
4. petal width in cm 
5. class: Iris Setosa / Iris Versicolour / Iris Virginica
<br><br>
5.1         , 3.5        , 1.4         , 0.2        , **Iris-setosa**<br>
7.0         , 3.2        , 4.7         , 1.4        , **Iris-versicolor**<br>
5.9         , 3.0        , 5.1         , 1.8        , **Iris-virginica**<br>


------------------------------------

#### Carga de Librerias de Python ####

In [1]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [2]:
from tpot import TPOTClassifier   # Libreria de TPot para evalaucion de clasificadores.

In [3]:
from sklearn.datasets import load_iris # IRIS ya esta precargado en la libreria sklearn (https://scikit-learn.org/).

In [4]:
from sklearn.model_selection import train_test_split # Metodo para dividir el conjunto de datos en entrenamiento y prueba.

In [5]:
iris = load_iris() # Carga de los datos

In [6]:
iris.data[0:5] # Solo a modo de validacion de la carga de datos

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

In [7]:
iris.target # Solo a modo de validacion de la carga de datos

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

#### Se divide el conjunto de datos en 75% para entrenamiento del modelo y 25% validacion del mismo ####

In [8]:
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, train_size=0.75, test_size=0.25)

In [9]:
X_train.shape, X_test.shape, y_train.shape, y_test.shape # Solo a efectos de validar los diversos conjuntos de datos

((112, 4), (38, 4), (112,), (38,))

#### Invocacion al clasificado de la libreria de automated machine learning Tpot ####

In [10]:
tpot = TPOTClassifier(verbosity=2, max_time_mins=2)

![TPOTClassifier](./img/TPOTClassifier.png)

[Click aqui para ver detalle de cada parametro](https://epistasislab.github.io/tpot/api/)

#### Entrenamiento del modelo ####

In [11]:
tpot.fit(X_train, y_train)

Optimization Progress:   0%|          | 0/100 [00:00<?, ?pipeline/s]


Generation 1 - Current best internal CV score: 0.9735177865612649

Generation 2 - Current best internal CV score: 0.9822134387351777

2.01 minutes have elapsed. TPOT will close down.
TPOT closed during evaluation in one generation.


TPOT closed prematurely. Will use the current best pipeline.

Best pipeline: GaussianNB(MultinomialNB(SelectFromModel(input_matrix, criterion=entropy, max_features=0.05, n_estimators=100, threshold=0.30000000000000004), alpha=10.0, fit_prior=False))


TPOTClassifier(max_time_mins=2, verbosity=2)

#### Evaluacion del modelo ####

In [12]:
print(tpot.score(X_test, y_test))

0.9473684210526315


#### Exportar el modelo ####

In [13]:
tpot.export('./exp/tpot_iris_pipeline.py')

#### Codgo generado ####