# Overview Materi

Jelaskan perbedaan singkat antara grid, randomized, bayesian search cv dengan optuna menurut pemahamanmu

source: https://www.youtube.com/watch?v=t-INgABWULw

Metode lama (Grid/Randomized) menghabiskan banyak waktu. Metode modern (Bayesian/Optuna) mencari secara cerdas dan efisien. Optuna adalah pilihan terbaik untuk efisiensi dan performa tertinggi.

# Import Data & Libraries

In [1]:
# jalankan hanya sekali
!pip install optuna -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/400.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m400.9/400.9 kB[0m [31m15.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [4]:
# import library yang dibutuhkan di sini
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, classification_report
import optuna

In [5]:
df = sns.load_dataset('iris')
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


# Data Preprocessing

In [7]:
# ubah variabel kategorik ke numerik
le = LabelEncoder()
df['species'] = le.fit_transform(df['species'])

In [8]:
# subsetting peubah
X = df.drop(['species'], axis=1)
y = df['species']

# Dataset Splitting

In [9]:
# split dengan rasio 80:20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=69)

# Base Model Random Forest

In [11]:
# gunakan random forest classifier
rfr = RandomForestClassifier(random_state=69)
rfr.fit(X_train, y_train)

In [12]:
y_pred = rfr.predict(X_test)

In [13]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 0.967
Precision: 0.970
Recall: 0.967
F1 Score: 0.967
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.89      1.00      0.94         8
           2       1.00      0.92      0.96        12

    accuracy                           0.97        30
   macro avg       0.96      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30



# Optuna

In [14]:
def objective(trial):
    n_estimators = trial.suggest_int('n_estimators', 100, 1000)
    max_depth = trial.suggest_int('max_depth', 2, 50)
    min_samples_split = trial.suggest_int('min_samples_split', 2, 32)
    min_samples_leaf = trial.suggest_int('min_samples_leaf', 1, 32)

    model = RandomForestClassifier(n_estimators=n_estimators,
                                   max_depth=max_depth,
                                   min_samples_split=min_samples_split,
                                   min_samples_leaf=min_samples_leaf,
                                   random_state=69)

    score = cross_val_score(model, X_train, y_train, cv=5, scoring='accuracy')

    return score.mean()

Hyperparameter dapat disesuaikan dengan algoritma yang digunakan. Kali ini kita menggunakan Random Forest sehingga yang dapat kita select adalah *n_estimators, max_depth, min_samples_split,* dan *min_samples_leaf*

In [15]:
study = optuna.create_study(direction='maximize')

[I 2025-09-29 14:03:48,409] A new study created in memory with name: no-name-1bd6bc0c-1f50-448a-b9aa-73ab77ee6e09


In [None]:
study.optimize(objective, n_trials=100)

[I 2025-09-29 14:04:56,386] Trial 0 finished with value: 0.8916666666666668 and parameters: {'n_estimators': 385, 'max_depth': 8, 'min_samples_split': 5, 'min_samples_leaf': 23}. Best is trial 0 with value: 0.8916666666666668.
[I 2025-09-29 14:04:59,354] Trial 1 finished with value: 0.9 and parameters: {'n_estimators': 366, 'max_depth': 42, 'min_samples_split': 27, 'min_samples_leaf': 22}. Best is trial 1 with value: 0.9.
[I 2025-09-29 14:05:03,035] Trial 2 finished with value: 0.9583333333333334 and parameters: {'n_estimators': 473, 'max_depth': 27, 'min_samples_split': 25, 'min_samples_leaf': 1}. Best is trial 2 with value: 0.9583333333333334.
[I 2025-09-29 14:05:06,505] Trial 3 finished with value: 0.9583333333333334 and parameters: {'n_estimators': 449, 'max_depth': 10, 'min_samples_split': 13, 'min_samples_leaf': 9}. Best is trial 2 with value: 0.9583333333333334.
[I 2025-09-29 14:05:08,742] Trial 4 finished with value: 0.9583333333333334 and parameters: {'n_estimators': 191, 'max

it may take a while... so just wait n see ^^
<br>
they recommend to set n_trials at 100 cz it seems there's no significant score increase after 100 trials (also inefficient too, you'll have to wait in a quite long time)

In [17]:
study.best_params

{'n_estimators': 473,
 'max_depth': 27,
 'min_samples_split': 25,
 'min_samples_leaf': 1}

Berikut hasil hyperparameter tuning dari Optuna

In [18]:
# cek hasil hyperparameter tuning dari Optuna
print(f"Best Hyperparameters: {study.best_params}")

Best Hyperparameters: {'n_estimators': 473, 'max_depth': 27, 'min_samples_split': 25, 'min_samples_leaf': 1}


# Random Forest Using Optuna

In [19]:
# simpan hasil best hyperparameter tuning ke variabel bari
best_params = study.best_params

In [20]:
best_model = RandomForestClassifier(**best_params, random_state=69)

best_model.fit(X_train, y_train)

In [22]:
y_pred = best_model.predict(X_test)

In [21]:
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print(f"Precision: {precision_score(y_test, y_pred, average='weighted'):.3f}")
print(f"Recall: {recall_score(y_test, y_pred, average='weighted'):.3f}")
print(f"F1 Score: {f1_score(y_test, y_pred, average='weighted'):.3f}")
print(classification_report(y_test, y_pred))

Accuracy: 0.967
Precision: 0.970
Recall: 0.967
F1 Score: 0.967
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        10
           1       0.89      1.00      0.94         8
           2       1.00      0.92      0.96        12

    accuracy                           0.97        30
   macro avg       0.96      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30



Tidak terdapat kenaikan skor dengan sebelum menggunakan Optuna sebab skor yang dihasilkan melalui base model saja sudah bagus