You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the script sac_continuous_action.py, the get_action function in the Actor class returns action, log_prob, mean. action and log_prob is used but mean is never used. Is there a reason to return that value when it's never used in the code? As a new comer it's a little confusing on why that is needed.
In the original implementation, the mean is used for deterministic evaluation of the agent.
Intuitively, using mean corresponds to the greediest policy, and would results in maximal performance.
While CleanRL directly uses the episodic return collected during training (i.e. stochastic action sampling), the mean
is left for compatibility with original implementation.
Leaving mean in the code should also help facilitate researchers who build on top of it to directly access it if they need it for their experiments / evaluation.
Problem Description
In the script
sac_continuous_action.py
, theget_action
function in theActor
class returnsaction, log_prob, mean
.action
andlog_prob
is used butmean
is never used. Is there a reason to return that value when it's never used in the code? As a new comer it's a little confusing on why that is needed.Checklist
poetry install
(see CleanRL's installation guideline.Current Behavior
Works as expected
Expected Behavior
Works as expected
Possible Solution
Remove the
mean
returned in theget_action
function inActor
classThe text was updated successfully, but these errors were encountered: