Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't repreduce the result on HandReach envrionment #2

Open
poisonwine opened this issue May 10, 2021 · 0 comments
Open

Can't repreduce the result on HandReach envrionment #2

poisonwine opened this issue May 10, 2021 · 0 comments

Comments

@poisonwine
Copy link

poisonwine commented May 10, 2021

I use command 'python -m baselines.ve_run --alg=her --env=HandReach-v0 --num_timesteps=4000000 --size_ensemble=3 --log_path=./data/test_handreach' to train handreach envrionment, but it seems the algorithm has no effect on this envrionment. In the paper, when training 2 million steps, the test success rate is about 40%, but in my training, the success rate is 25% at most and is very unstable. The part of log file is as follows,could you provide any advices?

Logging to ./data/test_HandReach
Training her on goal:HandReach-v0 with arguments
{'size_ensemble': 3}
before mpi_fork: rank 0 num_cpu
after mpi_fork: rank 0 num_cpu 1
Creating a DDPG agent with action space 20 x 1.0...
T: 50
_Q_lr: 0.001
_action_l2: 1.0
_batch_size: 256
_buffer_size: 1000000
_clip_obs: 200.0
_disagreement_fun_name: std
_hidden: 256
_layers: 3
_max_u: 1.0
_n_candidates: 1000
_network_class: baselines.her.actor_critic:ActorCritic
_noise_eps: 0.2
_norm_clip: 5
_norm_eps: 0.01
_pi_lr: 0.001
_polyak: 0.95
_random_eps: 0.3
_relative_goals: False
_replay_k: 4
_replay_strategy: future
_rollout_batch_size: 2
_size_ensemble: 3
_test_with_polyak: False
_ve_batch_size: 1000
_ve_buffer_size: 1000000
_ve_lr: 0.001
_ve_replay_k: 4
_ve_replay_strategy: none
_ve_use_Q: True
_ve_use_double_network: True
aux_loss_weight: 0.0078
bc_loss: 0
ddpg_params: {'buffer_size': 1000000, 'hidden': 256, 'layers': 3, 'network_class': 'baselines.her.actor_critic:ActorCritic', 'polyak': 0.95, 'batch_size': 256, 'Q_lr': 0.001, 'pi_lr': 0.001, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'action_l2': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 63, 'u': 20, 'g': 15, 'info_is_success': 1}, 'T': 50, 'scope': 'ddpg', 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'rollout_batch_size': 2, 'subtract_goals': <function simple_goal_subtract at 0x7f848c260158>, 'sample_transitions': <function make_sample_her_transitions.._sample_her_transitions at 0x7f848c173a60>, 'gamma': 0.98, 'bc_loss': 0, 'q_filter': 0, 'num_demo': 100, 'demo_batch_size': 128, 'prm_loss_weight': 0.001, 'aux_loss_weight': 0.0078, 'info': {'env_name': 'HandReach-v0'}}
demo_batch_size: 128
env_name: HandReach-v0
env_type: goal
gamma: 0.98
gs_params: {'n_candidates': 1000, 'disagreement_fun_name': 'std'}
make_env: <function prepare_params..make_env at 0x7f848c260f28>
n_batches: 40
n_cycles: 50
n_epochs: 800
n_test_rollouts: 10
num_cpu: 1
num_demo: 100
prm_loss_weight: 0.001
q_filter: 0
total_timesteps: 4000000
ve_n_batches: 100
ve_params: {'size_ensemble': 3, 'buffer_size': 1000000, 'lr': 0.001, 'batch_size': 1000, 'use_Q': True, 'use_double_network': True, 'hidden': 256, 'layers': 3, 'norm_eps': 0.01, 'norm_clip': 5, 'max_u': 1.0, 'clip_obs': 200.0, 'relative_goals': False, 'input_dims': {'o': 63, 'u': 20, 'g': 15, 'info_is_success': 1}, 'T': 50, 'scope': 've', 'rollout_batch_size': 2, 'subtract_goals': <function simple_goal_subtract at 0x7f848c260158>, 'clip_pos_returns': True, 'clip_return': 49.99999999999996, 'sample_transitions': <function make_sample_her_transitions.._sample_her_transitions at 0x7f848c173ae8>, 'gamma': 0.98, 'polyak': 0.95}
Training...

| ddpg/stats_g/mean | 0.673 |
| ddpg/stats_g/std | 0.0189 |
| ddpg/stats_o/mean | 0.31 |
| ddpg/stats_o/std | 0.7 |
| epoch | 0 |
| test/episode | 20 |
| test/mean_Q | -2.89 |
| test/success_rate | 0 |
| test/sum_rewards | -49 |
| test/timesteps | 1e+03 |
| time_eval | 1.38 |
| time_rollout | 18.1 |
| time_train | 25.7 |
| time_ve | 311 |
| timesteps | 5e+03 |
| train/actor_loss | -1.62 |
| train/critic_loss | 0.0384 |
| train/episode | 100 |
| train/success_rate | 0 |
| train/sum_rewards | -49 |
| train/timesteps | 5e+03 |
| ve/loss | 0.00142 |
| ve/stats_disag/mean | 0.1 |
| ve/stats_disag/std | 0.0299 |
| ve/stats_g/mean | 0.672 |
| ve/stats_g/std | 0.0195 |
| ve/stats_o/mean | 0.302 |
| ve/stats_o/std | 0.701 |

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant