# Hyperparameter tuning with Ray Tune

In our project, hyperparameter tuning is performed using Ray Tune. The process works as follows:

1. First, we define a tuning function that receives a configuration dictionary. This dictionary contains different hyperparameter values, such as batch size, learning rate, scheduler gamma, weight decay, and dropout rate, which are explored using grid search.
2. Inside the tuning function, we create a registry of model components, including the model, optimizer, and scheduler. This registry is configured based on the current hyperparameter combination from the configuration.
3. We then run the model training by calling the training launcher with the appropriate validation settings, number of epochs, and batch size taken from the configuration. During the training, the system evaluates the model performance on a validation set.
4. After training, we extract a performance metric (referred to as the "score"). This score is reported back to Ray Tune using the reporting function, which allows the tuner to compare the results across different hyperparameter configurations.
5. A Ray Tune Tuner is set up to run multiple trials concurrently. It uses a resource specification to allocate the necessary CPU and GPU resources for each trial.
6. Finally, Ray Tune aggregates the results into a dataframe and identifies the best hyperparameter configuration based on the maximum score. These results are saved to a CSV file for further analysis.


In [1]:
import pandas as pd

from luna16 import settings

In [2]:
result_grid_path = settings.DATA_DIR / "result_grid.csv"
result_grid = pd.read_csv(result_grid_path)

In [3]:
result_grid.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32 entries, 0 to 31
Data columns (total 20 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   score                     32 non-null     float64
 1   timestamp                 32 non-null     int64  
 2   checkpoint_dir_name       0 non-null      float64
 3   done                      32 non-null     bool   
 4   training_iteration        32 non-null     int64  
 5   trial_id                  32 non-null     object 
 6   date                      32 non-null     object 
 7   time_this_iter_s          32 non-null     float64
 8   time_total_s              32 non-null     float64
 9   pid                       32 non-null     int64  
 10  hostname                  32 non-null     object 
 11  node_ip                   32 non-null     object 
 12  time_since_restore        32 non-null     float64
 13  iterations_since_restore  32 non-null     int64  
 14  config/batch

In [4]:
result_grid.head()

Unnamed: 0,score,timestamp,checkpoint_dir_name,done,training_iteration,trial_id,date,time_this_iter_s,time_total_s,pid,hostname,node_ip,time_since_restore,iterations_since_restore,config/batch_size,config/learning_rate,config/scheduler_gamma,config/weight_decay,config/dropout_rate,logdir
0,0.381805,1740087791,,False,1,728c0_00000,2025-02-20_21-43-11,525.243697,525.243697,446984,42486ab3fb6b,192.168.4.2,525.243697,1,64,0.0001,0.1,0.0001,0.2,728c0_00000
1,0.249871,1740088154,,False,1,728c0_00001,2025-02-20_21-49-14,887.913648,887.913648,446983,42486ab3fb6b,192.168.4.2,887.913648,1,256,0.0001,0.1,0.0001,0.2,728c0_00001
2,0.265861,1740087834,,False,1,728c0_00002,2025-02-20_21-43-54,567.771235,567.771235,446982,42486ab3fb6b,192.168.4.2,567.771235,1,64,0.0001,0.1,0.0001,0.4,728c0_00002
3,0.195672,1740088087,,False,1,728c0_00003,2025-02-20_21-48-07,821.33043,821.33043,446985,42486ab3fb6b,192.168.4.2,821.33043,1,256,0.0001,0.1,0.0001,0.4,728c0_00003
4,0.870337,1740088306,,False,1,728c0_00004,2025-02-20_21-51-46,488.568903,488.568903,468034,42486ab3fb6b,192.168.4.2,488.568903,1,64,0.001,0.1,0.0001,0.2,728c0_00004


# First Run

In our hyperparameter tuning experiment with Ray Tune, we provided a range of hyperparameters to explore their impact on model performance: 

- **batch size** (64, 256)
- **learning rate** (0.0001, 0.001) 
- **scheduler gamma** (0.1, 0.5)
- **weight decay** (0.0001, 0.01)
- **dropout rate** (0.2, 0.4) 

The tuning was executed on a robust machine featuring an NVIDIA A40, 9 vCPU cores, 48GB of VRAM, and 50GB of RAM, which enabled efficient parallel execution and thorough exploration of the parameter space.

## Results & Conclusion

Based on the results captured in the grid, several insights can be drawn from our hyperparameter tuning:

- The trial with a configuration of a 64 batch size, a learning rate of 0.001, a scheduler gamma of 0.5, a weight decay of 0.0001, and a dropout rate of 0.2 achieved one of the highest scores (around 0.96). This suggests that a smaller batch size and the calibrated learning rate combination can lead to improved performance.
- In contrast, trials using a batch size of 256 often resulted in lower scores, highlighting sensitivity to the batch size parameter.
- The experiments also indicate that moderate dropout values tend to perform better than extremes, and the interplay between scheduler gamma and weight decay further fine-tunes model convergence.
- Overall, grid search has proven effective in revealing the impact of each parameter, guiding us toward an optimal configuration for robust performance.


In [10]:
relevant_results = result_grid[
    [
        "score",
        "time_total_s",
        "config/batch_size",
        "config/learning_rate",
        "config/scheduler_gamma",
        "config/weight_decay",
        "config/dropout_rate",
    ]
]
relevant_results.head(10).sort_values("score", ascending=False)

Unnamed: 0,score,time_total_s,config/batch_size,config/learning_rate,config/scheduler_gamma,config/weight_decay,config/dropout_rate
4,0.870337,488.568903,64,0.001,0.1,0.0001,0.2
6,0.735348,509.909528,64,0.001,0.1,0.0001,0.4
8,0.643052,528.94148,64,0.0001,0.5,0.0001,0.2
5,0.512301,809.875896,256,0.001,0.1,0.0001,0.2
0,0.381805,525.243697,64,0.0001,0.1,0.0001,0.2
9,0.325113,720.099079,256,0.0001,0.5,0.0001,0.2
7,0.274573,830.995315,256,0.001,0.1,0.0001,0.4
2,0.265861,567.771235,64,0.0001,0.1,0.0001,0.4
1,0.249871,887.913648,256,0.0001,0.1,0.0001,0.2
3,0.195672,821.33043,256,0.0001,0.1,0.0001,0.4
