Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antmaze results #2

Open
dljzx opened this issue Jan 18, 2022 · 3 comments
Open

Antmaze results #2

dljzx opened this issue Jan 18, 2022 · 3 comments

Comments

@dljzx
Copy link

dljzx commented Jan 18, 2022

No description provided.

@dljzx
Copy link
Author

dljzx commented Jan 18, 2022

Thanks for you work on CQL. It really works on many enviornments, but in Antmaze environment it perform badly. Can you figure it out?

@young-geng
Copy link
Owner

Use the following hyperaparameters for Antmaze:

python -m SimpleSAC.conservative_sac_main \
    --env 'antmaze-medium-diverse-v2' \
    --cql.cql_min_q_weight=5.0 \
    --cql.cql_max_target_backup=True \
    --cql.cql_target_action_gap=0.2 \
    --orthogonal_init=True \
    --cql.cql_lagrange=True \
    --cql.cql_temp=1.0 \
    --cql.policy_lr=1e-4 \
    --cql.qf_lr=3e-4 \
    --cql.cql_clip_diff_min=-200 \
    --reward_scale=10.0 \
    --reward_bias=-5.0 \
    --policy_arch='256-256' \
    --qf_arch='256-256-256' \
    --policy_log_std_multiplier=0.0 \
    --eval_period=50 \
    --eval_n_trajs=100 \
    --n_epochs=1200 \
    --bc_epochs=40 \
    --logging.output_dir './experiment_output'

@dljzx
Copy link
Author

dljzx commented Jan 21, 2022

Use the following hyperaparameters for Antmaze:

python -m SimpleSAC.conservative_sac_main \
    --env 'antmaze-medium-diverse-v2' \
    --cql.cql_min_q_weight=5.0 \
    --cql.cql_max_target_backup=True \
    --cql.cql_target_action_gap=0.2 \
    --orthogonal_init=True \
    --cql.cql_lagrange=True \
    --cql.cql_temp=1.0 \
    --cql.policy_lr=1e-4 \
    --cql.qf_lr=3e-4 \
    --cql.cql_clip_diff_min=-200 \
    --reward_scale=10.0 \
    --reward_bias=-5.0 \
    --policy_arch='256-256' \
    --qf_arch='256-256-256' \
    --policy_log_std_multiplier=0.0 \
    --eval_period=50 \
    --eval_n_trajs=100 \
    --n_epochs=1200 \
    --bc_epochs=40 \
    --logging.output_dir './experiment_output'

Thanks for your code update. It did work. By the way, in your code behavior cloning is used in the first 40 epochs, while this trick did not mentioned in the paper. So why is bc so important in antmaze environment? What if we do not use it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants
@young-geng @dljzx and others