<a href="https://colab.research.google.com/github/tzhsu211/CVS/blob/main/Finetune0205.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install datasets
!pip install optuna

Collecting datasets
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess<0.70.17 (from datasets)
  Downloading multiprocess-0.70.16-py311-none-any.whl.metadata (7.2 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.2.0-py3-none-any.whl (480 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.6/480.6 kB[0m [31m24.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading dill-0.3.8-py3-none-any.whl (116 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m13.7 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2024.9.0-py3-none-any.whl 

In [2]:
import torch
import pandas as pd
import numpy as np
import json
import os
import warnings
import random
from accelerate import Accelerator
from transformers import BertTokenizerFast, EarlyStoppingCallback, AutoModelForSequenceClassification, Trainer, TrainingArguments
from matplotlib import pyplot as plt
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from datasets import Dataset
import optuna


In [3]:
torch.cuda.is_available()

True

In [4]:
warnings.filterwarnings('ignore')

### Load data
1. Load raw CSV file
2. Standardize x and y (drop unnecessary columns, then scale the rating)
3. Split train and test dataset
4. Load tokenizer
5. Load pre-trained model

In [5]:
df = pd.read_csv('./cvs_products.csv')
df1 = df.drop(['product', 'store', 'link', 'CVS'], axis = 1)
scaler = StandardScaler()
df1['rating_standard'] = scaler.fit_transform(df['rating'].values.reshape(-1, 1))
X_train, X_test, y_train, y_test = train_test_split(df1['review'], df1['rating_standard'], test_size=0.1, random_state=42)
train_df = pd.DataFrame({'text': X_train, 'label': y_train.astype(float)})
test_df = pd.DataFrame({'text': X_test, 'label': y_test.astype(float)})

train_dataset = Dataset.from_pandas(train_df)
test_dataset = Dataset.from_pandas(test_df)

In [6]:
def preprocess_function(examples, tokenizer):
    return tokenizer(examples['text'], padding='max_length', truncation=True, max_length=256)

google_bert_tokenizer = BertTokenizerFast.from_pretrained('google-bert/bert-base-chinese')
google_bert_train_dataset = train_dataset.map(lambda x: preprocess_function(x, google_bert_tokenizer), batched=True)
google_bert_test_dataset = test_dataset.map(lambda x: preprocess_function(x, google_bert_tokenizer), batched=True)


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/110k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/269k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/624 [00:00<?, ?B/s]

Map:   0%|          | 0/8656 [00:00<?, ? examples/s]

Map:   0%|          | 0/962 [00:00<?, ? examples/s]

In [7]:
def compute_metrics(p):
    preds, labels = p
    preds = preds.squeeze()
    mse = mean_squared_error(labels, preds)
    r2 = r2_score(labels, preds)
    return {"mse": mse, "r2": r2}

In [9]:
checkpoint_path = './google_bert'
google_bert_model = AutoModelForSequenceClassification.from_pretrained(checkpoint_path, config=f'{checkpoint_path}/config.json')


**Optuna** hyperparameter tuning

1. **Hyperparameter Selection for Fine-Tuning**:
   - **Learning Rate**: Chosen from a log-uniform distribution between `1e-5` and `1e-3`.
   - **Dropout Rate**: Set between `0.1` and `0.5`.
   - **Batch Size**: Two options are chosen, `32` and `64`.
   - **Weight Decay**: Optimized within the range of `0.01` to `0.1`.

   These hyperparameters directly affect the training process, and Optuna will explore different combinations to determine which yields the best performance.

2. **Comparing with Pre-Trained Model Config**:
   - **Warm-up Steps**: The default warm-up steps were set to `500` from the pre-trained model's training arguments. In this fine-tuning setup, they were reduced to `300` to speed up training and explore hyperparameter effects more efficiently.
   - **Number of Epochs**: Reduced to `3` to quickly test the random hyperparameters and observe their effects without running long training sessions.

3. **Accelerating Training with Mixed Precision**:
   - **Accelerate Library**: Mixed precision training (FP16) is enabled to optimize GPU usage, reduce memory usage, and improve training speed, especially for large models.

4. **Optuna for Hyperparameter Optimization**:
   - **Study Creation**: A study is created using Optuna’s `create_study` function with the objective of minimizing the validation loss.
   - **Optimization Process**: The `study.optimize` method is used to run the trials (in this case, 10 trials), each with a unique set of hyperparameters, to optimize the model.


In [12]:
def objective(trial, model, model_name, train_dataset, eval_dataset):

    # Define hyperparameters to tune
    learning_rate = trial.suggest_loguniform('learning_rate', 1e-6, 1e-4)
    dropout = trial.suggest_uniform('dropout', 0.2, 0.5)
    batch_size = trial.suggest_categorical('batch_size', [16, 32, 64])
    weight_decay = trial.suggest_loguniform('weight_decay', 0.01, 0.1)

    # Set dropout in model config (dropout is not set in training_args)
    model.config.attention_probs_dropout_prob = dropout
    model.config.hidden_dropout_prob = dropout

    # Set training_args based on fine-tune hyperparameters
    training_args = TrainingArguments(
        output_dir=f'./results/{model_name}_trial_{trial.number}',
        logging_dir=f'./logs/{model_name}_trial_{trial.number}',
        logging_steps=100,
        save_strategy='epoch',
        evaluation_strategy='epoch',
        warmup_steps=300,
        num_train_epochs=5,
        learning_rate=learning_rate,
        per_device_train_batch_size=batch_size,
        per_device_eval_batch_size=batch_size,
        weight_decay=weight_decay,
        load_best_model_at_end=True,
        fp16=True,
        seed=42,
    )

    accelerator = Accelerator(mixed_precision="fp16")

    # Initialize Trainer
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        compute_metrics=compute_metrics,
        callbacks=[EarlyStoppingCallback(early_stopping_patience=2)],
    )

    trainer.train()

    # Evaluation
    eval_result = trainer.evaluate()

    return eval_result['eval_loss']


#### First study:
10 trails base on pre-trained google bert model, best trial: MSE 0.40103545784950256

hyper parameters:
```
learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-3)
dropout = trial.suggest_uniform('dropout', 0.1, 0.5)
batch_size = trial.suggest_categorical('batch_size', [32, 64])
weight_decay = trial.suggest_loguniform('weight_decay', 0.01, 0.1)
n_epoch = 3

```


In [None]:
study = optuna.create_study(
    study_name="study0205",
    direction='minimize',
    storage='sqlite:////content/optuna_db/study.db',
    load_if_exists=True
)

study.optimize(
    lambda trial: objective(
        trial,
        google_bert_model,
        "google-bert-base_optuna",
        google_bert_train_dataset,
        google_bert_test_dataset
    ),
    n_trials=10
)

[I 2025-02-05 07:44:44,231] A new study created in RDB with name: study0205


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mttzuhsu[0m ([33mttzuhsu-none[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.2104,0.474196,0.474196,0.515574
2,0.1738,0.425345,0.425345,0.565479
3,0.1236,0.401035,0.401035,0.590313


[I 2025-02-05 07:51:20,057] Trial 0 finished with value: 0.40103545784950256 and parameters: {'learning_rate': 4.8640483052755485e-05, 'dropout': 0.16892734772481646, 'batch_size': 32, 'weight_decay': 0.03068760271236806}. Best is trial 0 with value: 0.40103545784950256.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.1574,0.517342,0.517342,0.471497
2,0.3777,0.510313,0.510313,0.478678
3,0.1863,0.453165,0.453165,0.537058


[I 2025-02-05 07:57:13,284] Trial 1 finished with value: 0.4531651735305786 and parameters: {'learning_rate': 0.00014515455816385876, 'dropout': 0.18264516830092262, 'batch_size': 32, 'weight_decay': 0.053120852070534345}. Best is trial 0 with value: 0.40103545784950256.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.135,0.485826,0.485826,0.503693
2,0.1374,0.490336,0.490336,0.499085
3,0.0941,0.42888,0.42888,0.561867


[I 2025-02-05 08:03:01,179] Trial 2 finished with value: 0.42888039350509644 and parameters: {'learning_rate': 7.348590114714911e-05, 'dropout': 0.2581720212322267, 'batch_size': 32, 'weight_decay': 0.025390044596121965}. Best is trial 0 with value: 0.40103545784950256.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.084,0.43056,0.43056,0.560151
2,0.0748,0.495935,0.495935,0.493365
3,0.0749,0.454899,0.454899,0.535287


[I 2025-02-05 08:09:20,963] Trial 3 finished with value: 0.4305596649646759 and parameters: {'learning_rate': 6.532192802264272e-05, 'dropout': 0.24747800175688678, 'batch_size': 64, 'weight_decay': 0.04558766627001453}. Best is trial 0 with value: 0.40103545784950256.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.0532,0.431281,0.431281,0.559415
2,0.0684,0.475355,0.475355,0.51439
3,0.0472,0.426706,0.426706,0.564088


[I 2025-02-05 08:16:26,764] Trial 4 finished with value: 0.4267061650753021 and parameters: {'learning_rate': 4.088754286650167e-05, 'dropout': 0.2259343393037393, 'batch_size': 32, 'weight_decay': 0.018632751644231563}. Best is trial 0 with value: 0.40103545784950256.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.0287,0.421922,0.421922,0.568975
2,0.037,0.422618,0.422618,0.568264
3,0.0265,0.431961,0.431961,0.55872


[I 2025-02-05 08:23:58,730] Trial 5 finished with value: 0.4219222664833069 and parameters: {'learning_rate': 2.066688805604789e-05, 'dropout': 0.21981233907657002, 'batch_size': 32, 'weight_decay': 0.04920605659586631}. Best is trial 0 with value: 0.40103545784950256.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.0139,0.440215,0.440215,0.550288
2,0.0534,0.443735,0.443735,0.546692
3,0.0418,0.436182,0.436182,0.554408


[I 2025-02-05 08:31:00,651] Trial 6 finished with value: 0.436181902885437 and parameters: {'learning_rate': 4.7213605898642805e-05, 'dropout': 0.11843898810491744, 'batch_size': 32, 'weight_decay': 0.018615062441723042}. Best is trial 0 with value: 0.40103545784950256.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.0504,0.463146,0.463146,0.526862
2,0.1198,1.424842,1.424842,-0.455582
3,0.4836,0.499042,0.499042,0.490192


[I 2025-02-05 08:38:51,384] Trial 7 finished with value: 0.4631456732749939 and parameters: {'learning_rate': 0.0002799091545763998, 'dropout': 0.3476373045475534, 'batch_size': 64, 'weight_decay': 0.019870359058480377}. Best is trial 0 with value: 0.40103545784950256.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.0973,0.609962,0.609962,0.376879
2,0.4796,1.001441,1.001441,-0.023046
3,1.0925,0.978949,0.978949,-6.9e-05


[I 2025-02-05 08:46:05,532] Trial 8 finished with value: 0.6099615097045898 and parameters: {'learning_rate': 0.0006227600324999825, 'dropout': 0.11189933826384837, 'batch_size': 64, 'weight_decay': 0.012148360811198815}. Best is trial 0 with value: 0.40103545784950256.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.179,1.08387,1.08387,-0.107254
2,0.883,0.777609,0.777609,0.205615
3,0.5855,0.64933,0.64933,0.336661


[I 2025-02-05 08:53:05,210] Trial 9 finished with value: 0.6493299603462219 and parameters: {'learning_rate': 0.00031823888380612815, 'dropout': 0.1448191125732774, 'batch_size': 32, 'weight_decay': 0.03184021249848639}. Best is trial 0 with value: 0.40103545784950256.


In [None]:
study_load = optuna.load_study(
    study_name="study0205",
    storage="sqlite:////content/optuna_db/study.db"
)

print(f"Study name: {study.study_name}")
print(f"Study direction: {study.direction}")
print(f"Number of trials: {len(study.trials)}")

Study name: study0205
Study direction: 1
Number of trials: 10


In [None]:
for trial in study_load.trials:
    print(f"Trial {trial.number}:")
    print(f"  Value: {trial.value}")
    print(f"  Params: {trial.params}")
    print(f"  State: {trial.state}")
    print()

Trial 0:
  Value: 0.40103545784950256
  Params: {'learning_rate': 4.8640483052755485e-05, 'dropout': 0.16892734772481646, 'batch_size': 32, 'weight_decay': 0.03068760271236806}
  State: 1

Trial 1:
  Value: 0.4531651735305786
  Params: {'learning_rate': 0.00014515455816385876, 'dropout': 0.18264516830092262, 'batch_size': 32, 'weight_decay': 0.053120852070534345}
  State: 1

Trial 2:
  Value: 0.42888039350509644
  Params: {'learning_rate': 7.348590114714911e-05, 'dropout': 0.2581720212322267, 'batch_size': 32, 'weight_decay': 0.025390044596121965}
  State: 1

Trial 3:
  Value: 0.4305596649646759
  Params: {'learning_rate': 6.532192802264272e-05, 'dropout': 0.24747800175688678, 'batch_size': 64, 'weight_decay': 0.04558766627001453}
  State: 1

Trial 4:
  Value: 0.4267061650753021
  Params: {'learning_rate': 4.088754286650167e-05, 'dropout': 0.2259343393037393, 'batch_size': 32, 'weight_decay': 0.018632751644231563}
  State: 1

Trial 5:
  Value: 0.4219222664833069
  Params: {'learning_ra

### 2nd trial:
10 trials, hyper parameters:

```
learning_rate = trial.suggest_loguniform('learning_rate', 1e-5, 1e-3)
dropout = trial.suggest_uniform('dropout', 0.1, 0.5)
batch_size = trial.suggest_categorical('batch_size', [32, 64])
weight_decay = trial.suggest_loguniform('weight_decay', 0.01, 0.1)
n_epoch = 3

```

best model: trial2 mse: 0.418368935585022

In [11]:
study1 = optuna.create_study(
    study_name="study0206",
    direction='minimize',
    storage='sqlite:////content/optuna_db/study1.db',
    load_if_exists=True
)

study1.optimize(
    lambda trial: objective(
        trial,
        google_bert_model,
        "google-bert-base_optuna",
        google_bert_train_dataset,
        google_bert_test_dataset
    ),
    n_trials=10
)

[I 2025-02-06 06:29:19,554] A new study created in RDB with name: study0206


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mttzuhsu[0m ([33mttzuhsu-none[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.2134,0.417236,0.417236,0.573763
2,0.1553,0.416259,0.416259,0.574761
3,0.111,0.406589,0.406589,0.58464


[I 2025-02-06 06:35:51,414] Trial 0 finished with value: 0.406588613986969 and parameters: {'learning_rate': 3.303861972058671e-05, 'dropout': 0.478388549159093, 'batch_size': 32, 'weight_decay': 0.03317830696087596}. Best is trial 0 with value: 0.406588613986969.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.2094,0.742512,0.742512,0.241469
2,0.7144,1.032709,1.032709,-0.054989
3,1.0704,0.979327,0.979327,-0.000455


[I 2025-02-06 06:41:12,776] Trial 1 finished with value: 0.7425116896629333 and parameters: {'learning_rate': 0.000595593360102189, 'dropout': 0.34243982912058724, 'batch_size': 64, 'weight_decay': 0.013348753001201176}. Best is trial 0 with value: 0.406588613986969.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.2821,0.452216,0.452216,0.538028
2,0.1669,0.475759,0.475759,0.513977
3,0.1169,0.405131,0.405131,0.586129


[I 2025-02-06 06:47:05,763] Trial 2 finished with value: 0.405130535364151 and parameters: {'learning_rate': 6.409999442818139e-05, 'dropout': 0.4581814267188169, 'batch_size': 32, 'weight_decay': 0.011672927754269086}. Best is trial 2 with value: 0.405130535364151.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.0867,0.418369,0.418369,0.572605
2,0.0999,0.524108,0.524108,0.464585
3,0.122,0.430482,0.430482,0.56023


[I 2025-02-06 06:52:26,231] Trial 3 finished with value: 0.418368935585022 and parameters: {'learning_rate': 8.752208511407435e-05, 'dropout': 0.3027074991711967, 'batch_size': 64, 'weight_decay': 0.026416166071885613}. Best is trial 2 with value: 0.405130535364151.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.0628,0.427291,0.427291,0.563491
2,0.0715,0.430184,0.430184,0.560536
3,0.0532,0.406917,0.406917,0.584304


[I 2025-02-06 06:58:17,672] Trial 4 finished with value: 0.40691718459129333 and parameters: {'learning_rate': 2.7303462427767877e-05, 'dropout': 0.40884073733232307, 'batch_size': 32, 'weight_decay': 0.05781254828360568}. Best is trial 2 with value: 0.405130535364151.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.0425,0.422051,0.422051,0.568844
2,0.072,0.547957,0.547957,0.440221
3,0.1229,0.439601,0.439601,0.550915


[I 2025-02-06 07:03:42,562] Trial 5 finished with value: 0.4220510721206665 and parameters: {'learning_rate': 0.00010971748610464268, 'dropout': 0.1760610549255994, 'batch_size': 64, 'weight_decay': 0.023755003357959895}. Best is trial 2 with value: 0.405130535364151.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.041,0.491554,0.491554,0.497841
2,0.0671,0.461281,0.461281,0.528768
3,0.0615,0.415124,0.415124,0.57592


[I 2025-02-06 07:09:39,957] Trial 6 finished with value: 0.4151236116886139 and parameters: {'learning_rate': 6.083650017142084e-05, 'dropout': 0.41684246671372527, 'batch_size': 32, 'weight_decay': 0.03692108030932481}. Best is trial 2 with value: 0.405130535364151.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.0637,0.499118,0.499118,0.490114
2,0.0927,0.453148,0.453148,0.537076
3,0.0706,0.415443,0.415443,0.575594


[I 2025-02-06 07:15:23,471] Trial 7 finished with value: 0.4154433310031891 and parameters: {'learning_rate': 8.226827197229555e-05, 'dropout': 0.2378410177553461, 'batch_size': 32, 'weight_decay': 0.07306172759079142}. Best is trial 2 with value: 0.405130535364151.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.0395,0.425753,0.425753,0.565062
2,0.0468,0.410553,0.410553,0.58059
3,0.0414,0.419699,0.419699,0.571247


[I 2025-02-06 07:21:25,899] Trial 8 finished with value: 0.4105530381202698 and parameters: {'learning_rate': 1.3107492671970576e-05, 'dropout': 0.3111989221087835, 'batch_size': 64, 'weight_decay': 0.017589304638153815}. Best is trial 2 with value: 0.405130535364151.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.3447,1.023978,1.023978,-0.046069
2,1.0749,0.993177,0.993177,-0.014604
3,1.1195,0.978918,0.978918,-3.7e-05


[I 2025-02-06 07:26:59,958] Trial 9 finished with value: 0.9789178371429443 and parameters: {'learning_rate': 0.0004332301285870981, 'dropout': 0.39807683431666774, 'batch_size': 32, 'weight_decay': 0.028970557867292298}. Best is trial 2 with value: 0.405130535364151.


In [15]:
for trial in study1.trials:
    print(f"Trial {trial.number}:")
    print(f"  Value: {trial.value}")
    print(f"  Params: {trial.params}")
    print(f"  State: {trial.state}")
    print()

Trial 0:
  Value: 0.406588613986969
  Params: {'learning_rate': 3.303861972058671e-05, 'dropout': 0.478388549159093, 'batch_size': 32, 'weight_decay': 0.03317830696087596}
  State: 1

Trial 1:
  Value: 0.7425116896629333
  Params: {'learning_rate': 0.000595593360102189, 'dropout': 0.34243982912058724, 'batch_size': 64, 'weight_decay': 0.013348753001201176}
  State: 1

Trial 2:
  Value: 0.405130535364151
  Params: {'learning_rate': 6.409999442818139e-05, 'dropout': 0.4581814267188169, 'batch_size': 32, 'weight_decay': 0.011672927754269086}
  State: 1

Trial 3:
  Value: 0.418368935585022
  Params: {'learning_rate': 8.752208511407435e-05, 'dropout': 0.3027074991711967, 'batch_size': 64, 'weight_decay': 0.026416166071885613}
  State: 1

Trial 4:
  Value: 0.40691718459129333
  Params: {'learning_rate': 2.7303462427767877e-05, 'dropout': 0.40884073733232307, 'batch_size': 32, 'weight_decay': 0.05781254828360568}
  State: 1

Trial 5:
  Value: 0.4220510721206665
  Params: {'learning_rate': 0.0

### 3rd trial:
10 trials, hyper parameters:

```
learning_rate = trial.suggest_loguniform('learning_rate', 1e-6, 1e-4)
dropout = trial.suggest_uniform('dropout', 0.2, 0.5)
batch_size = trial.suggest_categorical('batch_size', [16, 32, 64])
weight_decay = trial.suggest_loguniform('weight_decay', 0.01, 0.1)
n_epoch = 5

```

In [17]:
study2 = optuna.create_study(
    study_name="study0206_2",
    direction='minimize',
    storage='sqlite:////content/optuna_db/study2.db',
    load_if_exists=True
)

study2.optimize(
    lambda trial: objective(
        trial,
        google_bert_model,
        "google-bert-base_optuna2",
        google_bert_train_dataset,
        google_bert_test_dataset
    ),
    n_trials=10
)

[I 2025-02-06 07:36:20,343] Using an existing study with name 'study0206_2' instead of creating a new one.


Epoch,Training Loss,Validation Loss,Mse,R2
1,1.0121,0.983676,0.983676,-0.004898
2,1.0123,0.979204,0.979204,-0.000329
3,1.0603,0.978882,0.978882,-0.0
4,1.0625,0.979287,0.979287,-0.000414
5,1.0043,0.981483,0.981483,-0.002658


[I 2025-02-06 07:45:10,440] Trial 1 finished with value: 0.9788817167282104 and parameters: {'learning_rate': 5.901410531556497e-05, 'dropout': 0.3589858570912592, 'batch_size': 64, 'weight_decay': 0.015588794923307621}. Best is trial 1 with value: 0.9788817167282104.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.976,0.984316,0.984316,-0.005551
2,1.0488,0.917858,0.917858,0.06234
3,1.1094,0.910308,0.910308,0.070053
4,0.8015,0.892955,0.892955,0.08778
5,0.8876,0.885105,0.885105,0.0958


[I 2025-02-06 07:56:02,595] Trial 2 finished with value: 0.8851050734519958 and parameters: {'learning_rate': 3.6774344623805236e-06, 'dropout': 0.44690639861682807, 'batch_size': 16, 'weight_decay': 0.03692246250688206}. Best is trial 2 with value: 0.8851050734519958.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.8724,0.881613,0.881614,0.099366
2,0.8548,0.889875,0.889875,0.090926
3,0.8374,0.809059,0.809059,0.173486
4,0.8423,0.769156,0.769156,0.214251
5,0.7207,0.760692,0.760692,0.222897


[I 2025-02-06 08:04:59,475] Trial 3 finished with value: 0.7606922388076782 and parameters: {'learning_rate': 8.703435741816198e-06, 'dropout': 0.45559921258069774, 'batch_size': 64, 'weight_decay': 0.03702943269866212}. Best is trial 3 with value: 0.7606922388076782.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.8465,0.873151,0.873151,0.108012
2,0.7739,0.757942,0.757942,0.225706
3,0.8417,0.755259,0.755259,0.228447
4,0.7355,0.74177,0.74177,0.242227
5,0.6509,0.697446,0.697446,0.287508


[I 2025-02-06 08:14:45,900] Trial 4 finished with value: 0.6974456906318665 and parameters: {'learning_rate': 2.661731443765297e-05, 'dropout': 0.3795964672627359, 'batch_size': 32, 'weight_decay': 0.06089272211770329}. Best is trial 4 with value: 0.6974456906318665.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.6286,0.702411,0.702411,0.282435
2,0.6291,0.694069,0.694069,0.290957
3,0.6556,0.701461,0.701461,0.283406
4,0.6737,0.694673,0.694673,0.29034


[I 2025-02-06 08:21:49,239] Trial 5 finished with value: 0.6940690875053406 and parameters: {'learning_rate': 1.2656933649771517e-06, 'dropout': 0.3316755349315288, 'batch_size': 64, 'weight_decay': 0.038774589257273374}. Best is trial 5 with value: 0.6940690875053406.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.622,0.694896,0.694896,0.290112
2,0.6236,0.686631,0.686631,0.298556
3,0.6552,0.691791,0.691791,0.293285
4,0.6591,0.684609,0.684609,0.300621
5,0.5591,0.702797,0.702797,0.282041


[I 2025-02-06 08:30:51,266] Trial 6 finished with value: 0.6846094131469727 and parameters: {'learning_rate': 6.938702413741023e-06, 'dropout': 0.47115309687918017, 'batch_size': 64, 'weight_decay': 0.05550162406107162}. Best is trial 6 with value: 0.6846094131469727.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.9248,0.881056,0.881056,0.099936
2,1.0568,0.911601,0.911602,0.068731
3,1.1878,0.977486,0.977486,0.001425


[I 2025-02-06 08:37:18,426] Trial 7 finished with value: 0.8810560703277588 and parameters: {'learning_rate': 6.690848695076825e-05, 'dropout': 0.3474747855572402, 'batch_size': 16, 'weight_decay': 0.06896045973397652}. Best is trial 6 with value: 0.6846094131469727.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.8708,0.871385,0.871385,0.109815
2,0.8583,0.868283,0.868283,0.112984
3,0.8736,0.85171,0.85171,0.129915
4,0.8782,0.828982,0.828982,0.153134
5,0.7951,0.8359,0.8359,0.146066


[I 2025-02-06 08:47:14,672] Trial 8 finished with value: 0.8289816379547119 and parameters: {'learning_rate': 4.935134361808865e-06, 'dropout': 0.21420770123932137, 'batch_size': 64, 'weight_decay': 0.07664155486106117}. Best is trial 6 with value: 0.6846094131469727.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.9102,0.927848,0.927848,0.052134
2,1.1183,0.964122,0.964122,0.015078
3,1.21,0.982154,0.982154,-0.003343


[I 2025-02-06 08:53:34,521] Trial 9 finished with value: 0.927848219871521 and parameters: {'learning_rate': 3.338569704547321e-05, 'dropout': 0.34814672111293676, 'batch_size': 16, 'weight_decay': 0.08169203923064387}. Best is trial 6 with value: 0.6846094131469727.


Epoch,Training Loss,Validation Loss,Mse,R2
1,0.9292,0.845555,0.845555,0.136203
2,0.8407,1.128994,1.128994,-0.153351
3,1.0417,0.977863,0.977863,0.001041


[I 2025-02-06 08:58:49,332] Trial 10 finished with value: 0.8455546498298645 and parameters: {'learning_rate': 8.548489753402403e-05, 'dropout': 0.26739351392114663, 'batch_size': 64, 'weight_decay': 0.027834598316543658}. Best is trial 6 with value: 0.6846094131469727.


In [18]:
study0205 = optuna.load_study(study_name="study0205", storage="sqlite:///optuna_db/study0205.db")

In [None]:
finetuned_model = AutoModelForSequenceClassification.from_pretrained('./google_finetuned', config = './google_finetuned/config.json')

In [None]:
study0205.optimize(
    lambda trial: objective(
        trial,
        finetuned_model,
        "google-bert-base_finetuned",
        google_bert_train_dataset,
        google_bert_test_dataset
    ),
  n_startup_trials = 0, n_trials = 5
)