What is the reason for returning mean in SAC get_action function if it's never used? #333

sudonymously · 2022-12-23T10:47:23Z

Problem Description

In the script sac_continuous_action.py, the get_action function in the Actor class returns action, log_prob, mean. action and log_prob is used but mean is never used. Is there a reason to return that value when it's never used in the code? As a new comer it's a little confusing on why that is needed.

Checklist

I have installed dependencies via poetry install (see CleanRL's installation guideline.
I have checked that there is no similar issue in the repo.
I have checked the documentation site and found not relevant information in GitHub issues.

Current Behavior

Works as expected

Expected Behavior

Works as expected

Possible Solution

Remove the mean returned in the get_action function in Actor class

The text was updated successfully, but these errors were encountered:

dosssman · 2023-01-10T05:14:54Z

Greetings.
Sorry for the late answer.

In the original implementation, the mean is used for deterministic evaluation of the agent.
Intuitively, using mean corresponds to the greediest policy, and would results in maximal performance.

While CleanRL directly uses the episodic return collected during training (i.e. stochastic action sampling), the mean
is left for compatibility with original implementation.

Leaving mean in the code should also help facilitate researchers who build on top of it to directly access it if they need it for their experiments / evaluation.

Hope it helps.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the reason for returning mean in SAC get_action function if it's never used? #333

What is the reason for returning mean in SAC get_action function if it's never used? #333

sudonymously commented Dec 23, 2022

dosssman commented Jan 10, 2023

What is the reason for returning mean in SAC get_action function if it's never used? #333

What is the reason for returning mean in SAC get_action function if it's never used? #333

Comments

sudonymously commented Dec 23, 2022

Problem Description

Checklist

Current Behavior

Expected Behavior

Possible Solution

dosssman commented Jan 10, 2023