### Automatyczne Uczenie Maszynowe 
#### Laboratorium 5

Pakiet [`Optuna`](https://optuna.readthedocs.io/en/stable/index.html)

In [None]:
# Installing Optuna, an automatic hyperparameter optimization framework.
# !pip install optuna

In [None]:
# Import the core Optuna library
import optuna
# Import an advanced sampler (Gaussian Process Sampler) to compare with the default TPE sampler
from optuna.samplers import GPSampler
# Import basic ML components from scikit-learn for the HPO demo
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC

----
####  *Zadanie 1*
----

Dla wybranych danych (wykonaj wstępny preprocessing) przygotuj *Objective Function*, która będzie wsadem do optymalizacji w Optuna.
Wybierz model SVM, w którym będziemy optymalizować parametr *C* oraz *kernel*. 

(*) Weź pod uwagę inne hiperparametry zależne od wybranego hiperparametru *kernel*.

In [None]:
# Define the Objective Function
# This function takes a `trial` object and returns the score to be optimized.
def objective(trial):
    
    # Load the data 
    X, y = ...

    # === Define the Hyperparameter Search Space ===

    # Suggest a value for the C parameter 
    svc_c = trial.suggest_float(...)

    # Suggest a kernel type using categorical sampling
    kernel = trial.suggest_categorical(...)
    
    params = {...}

    # === Create model and evaluate it ===
    model = ...

    # Calculate the cross-validation score (accuracy) across 3 folds
    # The mean of the scores is returned for maximization.
    score = cross_val_score(model, X, y, n_jobs=-1, cv=3)
    accuracy = score.mean()

    return accuracy



----
####  *Zadanie 2*
----

Dla funkcji z *Zadania 1* wykonaj optymalizację:

a) używając defaultowego TPE Samplera

b) GPSampler (Gaussian Process Sampler)

Podaj informację jaka konfiguracja jest najlepsza i jaką osiągnęła wartość miary.

In [None]:
# Create a Study and Run the Optimization using the default TPE Sampler
# A "study" is an optimization session. `direction="maximize"` specifies that we want to find the highest score.
study_tpe = optuna.create_study(...)


# Start the optimization. `n_trials` is the total number of parameter combinations (trials) to test.
study_tpe.optimize(objective, n_trials=100)

# 3. Review the Results from the TPE Sampler
print("Optimization finished.")
print(f"Number of finished trials: {len(study_tpe.trials)}")

print("Best trial:")
best_trial = ...
print(f"  Value (Accuracy): {best_trial.value:.4f}")

print("  Best parameters:")


In [None]:
# Create a study using the Gaussian Process Sampler (GPSampler)



### Pruning (Multi-Fidelity HPO)

Pruning automatically stops unpromising trials early based on intermediate results, significantly speeding up the HPO process. The **SuccessiveHalvingPruner** is used here.

In [None]:
import numpy as np
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import train_test_split

In [None]:
# Initialize the Successive Halving Pruner
pruner = optuna.pruners.SuccessiveHalvingPruner()

# Create a study with the Pruner enabled
study_prunned = optuna.create_study(direction="maximize", pruner=pruner)


In [None]:
X, y = load_iris(return_X_y=True)
X_train, X_valid, y_train, y_valid = train_test_split(X, y)
classes = np.unique(y)

def objective(trial):
    alpha = trial.suggest_float("alpha", 0.0, 1.0)
    clf = SGDClassifier(alpha=alpha)
    n_train_iter = 100

    for step in range(n_train_iter):
        clf.partial_fit(X_train, y_train, classes=classes)

        intermediate_value = clf.score(X_valid, y_valid)
        trial.report(intermediate_value, step)

        if trial.should_prune():
            raise optuna.TrialPruned()

    return clf.score(X_valid, y_valid)

In [None]:


# Run the optimization, enabling early stopping for poor-performing trials
study_prunned.optimize(objective, n_trials=100)

# Print the results from the study with pruning
print("\nResults from Optimization with pruning:")
print(f"  Best Accuracy: {study_prunned.best_value:.4f}")
print(f"  Best Parameters: {study_prunned.best_params}")

In [None]:
# Calculate and display the parameter importance from the optimized study
optuna.importance.get_param_importances(study_tpe)

In [None]:
# Import plotly for visualisation
from plotly.io import show

# Plot the parameter relationship as contour plot in a study.
contour_tpe = optuna.visualization.plot_contour(study_tpe, params=["kernel", "C"])
show(contour_tpe)

# Plot optimization history of all trials in a study.
history_tpe = optuna.visualization.plot_optimization_history(study_tpe)
show(history_tpe)

# Plot the high-dimensional parameter relationships in a study.
parallel_tpe = optuna.visualization.plot_parallel_coordinate(study_tpe)
show(parallel_tpe)