Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameter Optimization Module #151

Merged
merged 125 commits into from
May 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
8d3be23
1. bug fixed. 2. kernel extension. 3. batch GP implementatoin.
middleyuan Jun 30, 2023
7f0e3ff
update dependencies
middleyuan Jul 10, 2023
cf3d4e8
explicitliy import scipy.linalg
middleyuan Jul 10, 2023
359eecc
add cartpole configs for gpmpc
middleyuan Jul 10, 2023
89a29b8
add hyperparameter optimization module
middleyuan Jul 10, 2023
9e2a7ef
catch all the exception in hpo for debugging purpose.
middleyuan Jul 10, 2023
27454da
put cartpole configs for gpmpc under the folder of gpmpc
middleyuan Jul 10, 2023
17e408b
add hpo scripts
middleyuan Jul 10, 2023
bb4e1b0
1. include pandas 2. change rel import in gpmpc_experiment.py 3. remo…
middleyuan Jul 10, 2023
14a6db1
rename config to match default algo name.
middleyuan Jul 10, 2023
e6a2e3d
remove old configs
middleyuan Jul 10, 2023
0484388
add tests
middleyuan Jul 10, 2023
84830df
edit bash file with correct arg name
middleyuan Jul 10, 2023
e69e048
add another host in gpmpc_hpo.sh
middleyuan Jul 10, 2023
097e1c2
change to new dir in gpmpc_hpo.sh
middleyuan Jul 10, 2023
405dcea
1. fix a small bug 2. add test_train_gpmpc_cartpole
middleyuan Jul 11, 2023
549ff3e
add a hpo parallelism test
middleyuan Jul 11, 2023
81b5602
saving before runing hpo
middleyuan Jul 11, 2023
a5ad5f2
I think the bug is that it reaches thee goal in the first step.
middleyuan Jul 11, 2023
ce4d75e
1. PPO configs. 2. Make cartpole init states harder. 3. First version…
middleyuan Jul 18, 2023
b40566c
Re-organize a bit (file name, remove __init__.py in test folders).
middleyuan Jul 18, 2023
23f571d
1. HPO strategies. 2. test on hpo for ppo. 3. another way to save che…
middleyuan Jul 22, 2023
802edb6
update gitignore
middleyuan Jul 24, 2023
02d1c33
change configs
middleyuan Jul 24, 2023
20d3a7f
update bash for hpo on gpmpc
middleyuan Jul 24, 2023
ad96f6f
add prior arg in gpmpc_sampler
middleyuan Jul 24, 2023
5318c25
1. HPO effort evaluations. 2. Bash file for hpo strategy evalution.
middleyuan Jul 24, 2023
924d3b3
update dependencies
middleyuan Jul 25, 2023
14ae2aa
add the freedom to choose between random sampler and TPE sampler.
middleyuan Jul 26, 2023
c0b1b34
1. add strategy 5. 2. add unit test accordingly.
middleyuan Aug 3, 2023
f5c3a5a
1. prior configs. 2. update eval.py, sen.sh, and .gitifonore.
middleyuan Aug 3, 2023
0e1248a
gpmpc hpo strategy study
middleyuan Aug 4, 2023
a0feec7
refactor the code
middleyuan Aug 7, 2023
bd39347
1. hpo on sac. 2. add activation arg in sac and fix a small bug.
middleyuan Aug 8, 2023
4342b2a
fix typos
middleyuan Aug 8, 2023
1e1f7cf
change to two jobs
middleyuan Aug 9, 2023
b087c87
change num of repetitions to make sure it at least has same num of sa…
middleyuan Aug 9, 2023
59b4220
Merge branch 'hpo-on-ppo' into hpo-on-sac
middleyuan Aug 9, 2023
4c22c86
reduce the budget
middleyuan Aug 9, 2023
fe02a65
toy example
middleyuan Aug 10, 2023
3d33487
consider 4 version of noisy functions.
middleyuan Aug 11, 2023
714a76d
include var study
middleyuan Aug 13, 2023
2663cde
improve visualization in toy examples
middleyuan Aug 14, 2023
249a284
updated visualization improvement in toy examples.
middleyuan Aug 14, 2023
e03cb33
change naming
middleyuan Aug 14, 2023
ee29967
final experiment setup
middleyuan Aug 30, 2023
3a6448d
Merge branch 'hpo-on-ppo' into hpo-on-sac
middleyuan Aug 30, 2023
831e186
final experiment setup
middleyuan Aug 30, 2023
452937d
modify seeding
middleyuan Aug 30, 2023
7b4a844
Ignore runtime error for hpo
middleyuan Aug 31, 2023
bfd6f21
Merge branch 'hpo-on-ppo' into hpo-on-gpmpc
middleyuan Aug 31, 2023
e310829
Merge branch 'hpo-on-sac' into hpo-on-gpmpc
middleyuan Aug 31, 2023
c45e3e4
merge from sac
middleyuan Aug 31, 2023
8596fc8
fix a bug in hpo_sampler.py
middleyuan Aug 31, 2023
df01b80
final design to show possible lower compute time.
middleyuan Aug 31, 2023
9faae33
1. hpo on ddpg. 2. fix a small bug in ddpg_utils.
middleyuan Aug 31, 2023
de90501
relax the threshold
middleyuan Aug 31, 2023
4322f67
relax the threshold
middleyuan Sep 1, 2023
038f046
make rl_hpo_strategy_eval.sh automatic.
middleyuan Sep 4, 2023
4703d8b
Merge branch 'hpo-on-ppo' into hpo-on-sac
middleyuan Sep 4, 2023
807e358
Merge branch 'hpo-on-ppo' into hpo-on-ddpg
middleyuan Sep 4, 2023
6590134
fix a bug in rl_hpo_strategy_eval.sh
middleyuan Sep 4, 2023
8c594c3
Merge branch 'hpo-on-ppo' into hpo-on-sac
middleyuan Sep 4, 2023
87cbd1e
Merge branch 'hpo-on-ppo' into hpo-on-ddpg
middleyuan Sep 4, 2023
8211c52
add gpmpc_hpo_strategy_eval.sh
middleyuan Sep 4, 2023
26536e9
fix a small bug
middleyuan Sep 4, 2023
5bca27a
fix the budget (trial) bug in configs.
middleyuan Sep 5, 2023
f083b64
prepare comparing hpo strategy on gpmpc
middleyuan Sep 5, 2023
bed32ce
fix a bug in gpmpc_hpo_strategy.sh
middleyuan Sep 5, 2023
9503398
fix bugs in bash files
middleyuan Sep 5, 2023
6cd934f
fix the trial bug in config
middleyuan Sep 5, 2023
ecb27b1
fix a function bug in eval.py
middleyuan Sep 5, 2023
69d416b
Merge branch 'hpo-on-ppo' into hpo-on-ddpg
middleyuan Sep 5, 2023
a1b0756
1. add hpo resume functionality. 2. make eval function more general.
middleyuan Sep 6, 2023
c88e7e7
Merge branch 'hpo-on-ppo' into hpo-on-sac
middleyuan Sep 6, 2023
a9aa83b
update configs
middleyuan Sep 6, 2023
7f791aa
make main.sh general
middleyuan Sep 6, 2023
d8d047b
Merge branch 'hpo-on-ppo' into hpo-on-sac
middleyuan Sep 6, 2023
1e16f06
Merge branch 'hpo-on-ppo' into hpo-on-ddpg
middleyuan Sep 6, 2023
c4a8ded
resume previous config with trial increasd.
middleyuan Sep 6, 2023
80e59ec
fix the sorting bug.
middleyuan Sep 11, 2023
1a3a98a
Merge branch 'hpo-on-ppo' into hpo-on-ddpg
middleyuan Sep 11, 2023
43387d5
Merge branch 'hpo-on-ppo' into hpo-on-sac
middleyuan Sep 11, 2023
952b267
fix sorting bug
middleyuan Sep 11, 2023
7f62195
a small bug fixed
middleyuan Sep 12, 2023
d5109f4
fix a bug on computing reward
middleyuan Sep 12, 2023
1618395
Merge branch 'hpo-on-ppo' into hpo-on-gpmpc
middleyuan Sep 14, 2023
778128b
adsd resume functionality
middleyuan Sep 14, 2023
61ea053
edit main bash file and fix some typos
middleyuan Sep 14, 2023
1caae68
simply assign zero if numerical issues happen during HPO
middleyuan Sep 14, 2023
8583978
Merge branch 'hpo-on-ppo' into hpo-on-gpmpc
middleyuan Sep 14, 2023
f9f701d
adjust eval
middleyuan Sep 16, 2023
5caf0c6
change to boxenplot
middleyuan Sep 19, 2023
eea138a
fix typo
middleyuan Sep 21, 2023
362ea75
add reliable_stats
middleyuan Sep 24, 2023
3649c44
Merge branch 'hpo-on-gpmpc' into hpo-on-ppo
middleyuan Sep 24, 2023
d6e0b08
Merge branch 'hpo-on-ddpg' into hpo-on-ppo
middleyuan Sep 24, 2023
55624cd
update outdated configs
middleyuan Sep 24, 2023
6e9bbc9
Merge remote-tracking branch 'origin/hpo-on-sac' into hpo-on-ppo
middleyuan Sep 24, 2023
02d75ad
update jupyter notebooks
middleyuan Oct 5, 2023
00f6608
update jupyter notebooks.
middleyuan Oct 23, 2023
b5749d0
final update for appendix
middleyuan Oct 24, 2023
e4d616d
update readme
middleyuan Oct 24, 2023
ef29146
fix typo
middleyuan Oct 25, 2023
47f1598
1. clean up code for ppo controller, hyperparameter module. 2. Test o…
middleyuan Apr 10, 2024
83ec989
test training with given optimized hp files.
middleyuan Apr 10, 2024
e890c04
1. test hpo with and without MySQL. 2. update README.
middleyuan Apr 10, 2024
e9499cd
remove discrepancy of readme.
middleyuan Apr 10, 2024
c45a975
update readme
middleyuan Apr 10, 2024
8350b63
1. remove 'pandas' and 'seaborn' in package dependencies. 2. move tes…
middleyuan Apr 15, 2024
ee9ec34
-
middleyuan Apr 15, 2024
a260695
update config_overrides in examples of rl
middleyuan Apr 15, 2024
13ef164
run pre-commit hooks to improve linting
middleyuan Apr 16, 2024
3bbd0ba
1. ignore W503 and W504 as they conflict in pre-commit-config. 2. run…
middleyuan Apr 16, 2024
56f2738
add activation config to the examples that use RL.
middleyuan Apr 16, 2024
ba837c5
1. standardize hpo template in the examples. 2. remove _learn(). 3. a…
middleyuan Apr 17, 2024
51c601e
run pre-commit hooks.
middleyuan Apr 17, 2024
fb572f2
add gpmpc hpo test without using mysql
middleyuan Apr 17, 2024
c5fbeed
1. update config of cartpole task. 2. add max_steps and exponentiated…
middleyuan Apr 19, 2024
f8e3d0c
1. add bash files to automate hpo pipeline for gpmpc. 2. update gpmpc…
middleyuan Apr 22, 2024
1f62d3b
Merge remote-tracking branch 'upstream/main' into hpo
middleyuan Apr 22, 2024
ffb29de
match .gitignore to upstram/main.
middleyuan Apr 22, 2024
1508634
update for review
middleyuan Apr 24, 2024
896ac9f
update based on the review comments.
middleyuan May 13, 2024
3b49f7b
fix typo in readme.
middleyuan May 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions examples/cbf/config_overrides/ppo_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ algo: ppo
algo_config:
# model args
hidden_dim: 64
activation: "relu"
norm_obs: False
norm_reward: False
clip_obs: 10.0
Expand Down
1 change: 1 addition & 0 deletions examples/cbf/config_overrides/sac_config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ algo: sac
algo_config:
# model args
hidden_dim: 256
activation: "relu"
use_entropy_tuning: False

# optim args
Expand Down
67 changes: 67 additions & 0 deletions examples/hpo/gp_mpc/config_overrides/cartpole/cartpole_stab.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
task_config:
Federico-PizarroBejarano marked this conversation as resolved.
Show resolved Hide resolved
constraints:
- constraint_form: default_constraint
constrained_variable: input
- constraint_form: default_constraint
constrained_variable: state
upper_bounds:
- 100
- 100
- 100
- 100
lower_bounds:
- -100
- -100
- -100
- -100
cost: quadratic
ctrl_freq: 15
disturbances:
observation:
- disturbance_func: white_noise
std: 0.0001
done_on_violation: false
episode_len_sec: 10
gui: false
inertial_prop:
cart_mass: 1.0
pole_length: 0.5
pole_mass: 0.1
inertial_prop_randomization_info: null
info_in_reset: false
init_state:
init_x: 0.0
init_x_dot: 0.0
init_theta: 0.0
init_theta_dot: 0.0
init_state_randomization_info:
init_x:
distrib: 'uniform'
low: -0.1
high: 0.1
init_x_dot:
distrib: 'uniform'
low: -0.1
high: 0.1
init_theta:
distrib: 'uniform'
low: -0.2
high: 0.2
init_theta_dot:
distrib: 'uniform'
low: -0.1
high: 0.1
normalized_rl_action_space: false
prior_prop:
cart_mass: 1.0
pole_length: 0.5
pole_mass: 0.1
pyb_freq: 750
randomized_inertial_prop: false
randomized_init: true
task: stabilization
task_info:
stabilization_goal: [0]
stabilization_goal_tolerance: 0.005
use_constraint_penalty: false
verbose: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
algo: gp_mpc
Federico-PizarroBejarano marked this conversation as resolved.
Show resolved Hide resolved
algo_config:
additional_constraints: null
deque_size: 10
eval_batch_size: 10
gp_approx: mean_eq
gp_model_path: null
horizon: 20
prior_info:
prior_prop:
cart_mass: 1.0
pole_length: 0.5
pole_mass: 0.1
initial_rollout_std: 0.0
input_mask: null
learing_rate: null
learning_rate:
- 0.01
- 0.01
- 0.01
- 0.01
normalize_training_data: false
online_learning: false
optimization_iterations:
- 3000
- 3000
- 3000
- 3000
overwrite_saved_data: false
prior_param_coeff: 1.5
prob: 0.95
q_mpc:
- 1
- 1
- 1
- 1
r_mpc:
- 0.1
kernel: Matern
sparse_gp: True
n_ind_points: 40
inducing_point_selection_method: 'kmeans'
recalc_inducing_points_at_every_step: false
soft_constraints:
gp_soft_constraints: false
gp_soft_constraints_coeff: 0
prior_soft_constraints: true
prior_soft_constraints_coeff: 10
target_mask: null
train_iterations: null
test_data_ratio: 0.2
use_prev_start: true
warmstart: true
num_epochs: 5
num_samples: 75
num_test_episodes_per_epoch: 2
num_train_episodes_per_epoch: 2
same_test_initial_state: true
same_train_initial_state: false
rand_data_selection: false
terminate_train_on_done: True
terminate_test_on_done: False
parallel: True

device: cpu
restore: null
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
hpo_config:

hpo: True # do hyperparameter optimization
load_if_exists: True # this should set to True if hpo is run in parallel
use_database: False # this is set to true if MySQL is used
objective: [exponentiated_avg_return] # [other metrics defined in base_experiment.py]
direction: [maximize] # [maximize, maximize]
dynamical_runs: False # if True, dynamically increase runs
warm_trials: 20 # number of trials to run before dyamical runs
approximation_threshold: 5 # this is only used when dynamical_runs is True
repetitions: 5 # number of samples of performance for each objective query
alpha: 1 # significance level for CVaR
use_gpu: True
dashboard: False
seed: 24
save_n_best_hps: 3
# budget
trials: 40

# hyperparameters
hps_config:
horizon: 20
learning_rate:
- 0.01
- 0.01
- 0.01
- 0.01
optimization_iterations:
- 3000
- 3000
- 3000
- 3000
kernel: Matern
n_ind_points: 35
num_epochs: 5
num_samples: 75
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
horizon: 35
kernel: 'RBF'
n_ind_points: 40
num_epochs: 5
num_samples: 75
optimization_iterations: [2800, 2800, 2800, 2800]
learning_rate: [0.023172075157730145, 0.023172075157730145, 0.023172075157730145, 0.023172075157730145]
119 changes: 119 additions & 0 deletions examples/hpo/hpo_experiment.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
"""Template hyperparameter optimization/hyperparameter evaluation script.

"""
import os
from functools import partial

import yaml

import matplotlib.pyplot as plt
import numpy as np

from safe_control_gym.envs.benchmark_env import Environment, Task

from safe_control_gym.hyperparameters.hpo import HPO
from safe_control_gym.experiments.base_experiment import BaseExperiment
from safe_control_gym.utils.configuration import ConfigFactory
from safe_control_gym.utils.registration import make
from safe_control_gym.utils.utils import set_device_from_config, set_dir_from_config, set_seed_from_config


def hpo(config):
"""Hyperparameter optimization.

Usage:
* to start HPO, use with `--func hpo`.

"""

# Experiment setup.
if config.hpo_config.hpo:
set_dir_from_config(config)
set_seed_from_config(config)
set_device_from_config(config)

# HPO
hpo = HPO(config.algo,
config.task,
config.sampler,
config.load_study,
config.output_dir,
config.task_config,
config.hpo_config,
**config.algo_config)

if config.hpo_config.hpo:
hpo.hyperparameter_optimization()
print('Hyperparameter optimization done.')


def train(config):
"""Training for a given set of hyperparameters.

Usage:
* to start training, use with `--func train`.

"""
# Override algo_config with given yaml file
if config.opt_hps == '':
# if no opt_hps file is given
pass
else:
# if opt_hps file is given
with open(config.opt_hps, 'r') as f:
opt_hps = yaml.load(f, Loader=yaml.FullLoader)
for hp in opt_hps:
if isinstance(config.algo_config[hp], list) and not isinstance(opt_hps[hp], list):
config.algo_config[hp] = [opt_hps[hp]] * len(config.algo_config[hp])
else:
config.algo_config[hp] = opt_hps[hp]
# Experiment setup.
set_dir_from_config(config)
set_seed_from_config(config)
set_device_from_config(config)

# Define function to create task/env.
env_func = partial(make, config.task, output_dir=config.output_dir, **config.task_config)
# Create the controller/control_agent.
# Note:
# eval_env will take config.seed * 111 as its seed
# env will take config.seed as its seed
control_agent = make(config.algo,
env_func,
training=True,
checkpoint_path=os.path.join(config.output_dir, 'model_latest.pt'),
output_dir=config.output_dir,
use_gpu=config.use_gpu,
seed=config.seed,
**config.algo_config)
control_agent.reset()

eval_env = env_func(seed=config.seed * 111)
# Run experiment
experiment = BaseExperiment(eval_env, control_agent)
experiment.launch_training()
results, metrics = experiment.run_evaluation(n_episodes=1, n_steps=None, done_on_max_steps=True)
control_agent.close()

return eval_env.X_GOAL, results, metrics


MAIN_FUNCS = {'hpo': hpo, 'train': train}


if __name__ == '__main__':

# Make config.
fac = ConfigFactory()
fac.add_argument('--func', type=str, default='train', help='main function to run.')
fac.add_argument('--opt_hps', type=str, default='', help='yaml file as a result of HPO.')
fac.add_argument('--load_study', type=bool, default=False, help='whether to load study from a previous HPO.')
fac.add_argument('--sampler', type=str, default='TPESampler', help='which sampler to use in HPO.')
# merge config
config = fac.merge()

# Execute.
func = MAIN_FUNCS.get(config.func, None)
if func is None:
raise Exception('Main function {} not supported.'.format(config.func))
func(config)
61 changes: 61 additions & 0 deletions examples/hpo/rl/config_overrides/cartpole/cartpole_stab.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
task_config:
info_in_reset: True
ctrl_freq: 15
pyb_freq: 750
physics: pyb

# state initialization
init_state:
init_x: 0.0
init_x_dot: 0.0
init_theta: 0.0
init_theta_dot: 0.0
randomized_init: True
randomized_inertial_prop: False
normalized_rl_action_space: True

init_state_randomization_info:
init_x:
distrib: 'uniform'
low: -0.1
high: 0.1
init_x_dot:
distrib: 'uniform'
low: -0.1
high: 0.1
init_theta:
distrib: 'uniform'
low: -0.2
high: 0.2
init_theta_dot:
distrib: 'uniform'
low: -0.1
high: 0.1

task: stabilization
task_info:
stabilization_goal: [0]
stabilization_goal_tolerance: 0.005

inertial_prop:
pole_length: 0.5
cart_mass: 1
pole_mass: 0.1

episode_len_sec: 10
cost: rl_reward
obs_goal_horizon: 1

# RL Reward
rew_state_weight: [1, 1, 1, 1]
rew_act_weight: 0.1
rew_exponential: True

# constraints
constraints:
- constraint_form: default_constraint
constrained_variable: state
- constraint_form: default_constraint
constrained_variable: input
done_on_out_of_bound: True
done_on_violation: False
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
activation: leaky_relu
actor_lr: 0.0007948148615930024
clip_param: 0.1
critic_lr: 0.007497368468753617
entropy_coef: 0.00010753631441212628
gae_lambda: 0.8
gamma: 0.98
hidden_dim: 32
max_env_steps: 72000
mini_batch_size: 128
opt_epochs: 5
rollout_steps: 150
target_kl: 1.587713889686473e-07