Offline version of AWR #5

FineArtz · 2021-05-22T09:05:39Z

Hi, I am trying to modify AWR into the offline version (or fully off-policy version). I find that the paper states that one can simply treat the dataset as the replay buffer and don't need to do any modifications. But I notice that if I remove sampling in rl_agent.train, line 105 in rl_agent.py:
train_return, train_path_count, new_sample_count = self._rollout_train(self._samples_per_iter),
new_sample_count will remain 0, so that update steps are also 0.

Would you like to point out a proper way of modifications to obtain the offline AWR?

The text was updated successfully, but these errors were encountered:

xbpeng · 2021-05-25T01:31:42Z

you can just change the code so that the number of update steps do not depend on new_sample_count, like setting it to a constant:

awr/learning/awr_agent.py

Line 225 in 831442f

    
           critic_steps = int(np.ceil(self._critic_steps * new_sample_count / self._samples_per_iter))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offline version of AWR #5

Offline version of AWR #5

FineArtz commented May 22, 2021

xbpeng commented May 25, 2021

Offline version of AWR #5

Offline version of AWR #5

Comments

FineArtz commented May 22, 2021

xbpeng commented May 25, 2021