# 🔥 Apa Itu TPOT?
## TPOT (Tree-based Pipeline Optimization Tool) adalah AutoML framework yang menggunakan Genetic Algorithm (GA) untuk mencari kombinasi terbaik dari preprocessing, feature 
## selection, model, dan hyperparameter.

### 📌 Langkah-langkah Implementasi TPOT
#### Kita akan coba TPOT dengan dataset iris dulu biar simpel.

# Classification

In [9]:
# 1️⃣ Install TPOT
# pip install tpot

In [3]:
# 2️⃣ Import Library

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from tpot import TPOTClassifier




In [5]:
# 3️⃣ Load Dataset & Split Data
# Load dataset iris
iris = load_iris()
X, y = iris.data, iris.target

# Split data (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [6]:
# 4️⃣ Jalankan TPOT untuk Mencari Model Terbaik
# Buat AutoML model dengan TPOT
tpot = TPOTClassifier(generations=5, population_size=20, verbosity=2, random_state=42)

# Fit model ke training data
tpot.fit(X_train, y_train)


Version 0.12.2 of tpot is outdated. Version 1.0.0 was released Wednesday February 26, 2025.


Optimization Progress:   0%|          | 0/120 [00:00<?, ?pipeline/s]


Generation 1 - Current best internal CV score: 0.9833333333333334

Generation 2 - Current best internal CV score: 0.9833333333333334

Generation 3 - Current best internal CV score: 0.9833333333333334

Generation 4 - Current best internal CV score: 0.9833333333333334

Generation 5 - Current best internal CV score: 0.9833333333333334

Best pipeline: LogisticRegression(MultinomialNB(input_matrix, alpha=10.0, fit_prior=False), C=25.0, dual=False, penalty=l2)


In [7]:
# 5️⃣ Evaluasi Model Terbaik yang Ditemukan TPOT
# Cek akurasi di test set
accuracy = tpot.score(X_test, y_test)
print(f"Akurasi TPOT: {accuracy:.4f}")

Akurasi TPOT: 1.0000


In [8]:
# 6️⃣ Simpan Pipeline Terbaik yang Ditemukan
# Simpan pipeline terbaik ke file Python
tpot.export('best_pipeline_tpot.py')


## 🎯 Kesimpulan
#### ✅ TPOT bisa menemukan kombinasi terbaik preprocessing, model, dan hyperparameter secara otomatis.
#### ✅ TPOT menggunakan Genetic Algorithm untuk melakukan pencarian model terbaik.
#### ✅ Hasilnya bisa diekspor ke file Python dan digunakan langsung.

## Regression

In [10]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from tpot import TPOTRegressor


In [11]:
# Load dataset diabetes
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

# Split data (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


In [12]:
# Buat model AutoML dengan TPOTRegressor
tpot = TPOTRegressor(generations=5, population_size=20, verbosity=2, random_state=42)

# Fit model ke training data
tpot.fit(X_train, y_train)


Version 0.12.2 of tpot is outdated. Version 1.0.0 was released Wednesday February 26, 2025.


Optimization Progress:   0%|          | 0/120 [00:00<?, ?pipeline/s]


Generation 1 - Current best internal CV score: -3125.1907746343336


  File "c:\Users\formylife\anaconda3\Lib\site-packages\joblib\externals\loky\backend\context.py", line 257, in _count_physical_cores
    cpu_info = subprocess.run(
               ^^^^^^^^^^^^^^^
  File "c:\Users\formylife\anaconda3\Lib\subprocess.py", line 548, in run
    with Popen(*popenargs, **kwargs) as process:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\formylife\anaconda3\Lib\subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "c:\Users\formylife\anaconda3\Lib\subprocess.py", line 1538, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^



Generation 2 - Current best internal CV score: -3125.1907746343336

Generation 3 - Current best internal CV score: -3125.1907746343336

Generation 4 - Current best internal CV score: -3125.1907746343336

Generation 5 - Current best internal CV score: -3117.5005043352603

Best pipeline: ElasticNetCV(RidgeCV(input_matrix), l1_ratio=0.75, tol=0.001)


In [13]:
# Cek skor R² di test set
r2_score = tpot.score(X_test, y_test)
print(f"R² Score TPOT: {r2_score:.4f}")


R² Score TPOT: -2874.6751


In [14]:
# Simpan pipeline terbaik ke file Python
tpot.export('best_pipeline_tpot_regression.py')
