-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Naming and typing improvements in Actor/Critic/Policy forwards #1032
Conversation
@MischaPanch as of now if you agree with the naming there will be other places, as you mentioned, where logits should be replaced. So will continue once we agree the naming is fine for you. |
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## master #1032 +/- ##
==========================================
+ Coverage 88.15% 88.21% +0.05%
==========================================
Files 100 100
Lines 8180 8297 +117
==========================================
+ Hits 7211 7319 +108
- Misses 969 978 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
Hi guys, these places i think logits
should also be reformated: ImitationPolicy Class tianshou/tianshou/policy/imitation/base.py Lines 74 to 78 in 4756ee8
DiscreteSACPolicy Class tianshou/tianshou/policy/modelfree/discrete_sac.py Lines 109 to 115 in 4756ee8
SACPolicy Class tianshou/tianshou/policy/modelfree/sac.py Lines 176 to 198 in 4756ee8
REDQPolicy Class: Also here loc_scale maybe should be renamed? @MischaPanch tianshou/tianshou/policy/modelfree/redq.py Lines 147 to 166 in 4756ee8
Net Class tianshou/tianshou/utils/net/common.py Lines 258 to 284 in 4756ee8
Actor Class tianshou/tianshou/utils/net/continuous.py Lines 74 to 85 in 4756ee8
Actor Class tianshou/tianshou/utils/net/discrete.py Lines 69 to 82 in 4756ee8
So basically for all actors and all policies that uses actors. Also seems the Net class which i have to figure out how it is properly related to Policies and Actors. Any objections? @MischaPanch @opcode81 |
What's wrong with the loc_scale?
Looks good, go ahead! But you should also rename things inside of some BatchPrototypes, as parrtially discussed above |
Because the logical flow i have seen is usually
so the output of the policy network (torch loc_scale, h = self.actor(batch.obs, state=state, info=batch.info)
loc, scale = loc_scale
dist = Independent(Normal(loc, scale), 1) I just wanted to keep consistency. If i call it |
In this case it is not passed to a general action dist, but to a gaussian, where it is used as loc and scale. So here the name |
Hmm I bet that in the case of a continuous actor: tianshou/tianshou/utils/net/continuous.py Lines 74 to 85 in 4756ee8
The output of line 85 that is named logits is also a mean and standard deviation, no? But is still called |
here you don't know how the output of forward will be used (into which kind of continuous action_dist it would go), so you shouldn't call it If tianshou would only support Gaussians for continuous actors, then the output of the continuous The fact that tianshou right now implicitly depends on the In this PR, the goal should be to make the naming at least a bit better and consistent with the current interfaces. If you know that the variable has to be of the form That's what I mean when I say the the var names should reflect semantics as precisely as possible |
Understood, so wherever there are
but if already a more accurate name (e.g. |
…anges. Also changed docstrings.
26fb7ed
to
51f393a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @arnaujc91 , it brings us an important step towards removing wrong names across the source code!
The issue is not fully complete at this point since all kinds of BatchProtocol still carry logits
, but it can't be fully completed before introducing an Algorithm
abstraction as part of release 2.0.0
anyway. So for the version 1.x,y
this is likely as far as we can push it.
I will address the comments myself and merge this
tianshou/policy/modelfree/pg.py
Outdated
action_dist_input_BD, hidden_BH = self.actor(batch.obs, state=state, info=batch.info) | ||
# in the case that self.action_type == "discrete", the dist should always be Categorical, and D=A | ||
# therefore action_dist_input_BD is equivalent to logits_BA | ||
if self.action_type == "discrete": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if-else seems weird and unnecessary, I'll look into it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required a breaking change to fix, see d0d9d45. Also required a change in the (interiors of the) high-level interfaces
@opcode81 @maxhuettenrauch pls have a look at that commit. Max, I believe you had written a todo on how to improve the typing of the dist-fn - this commit addresses the todo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran several low-level and high-level examples, things work. I think I haven't missed anything :)
@@ -29,7 +29,7 @@ class TD3Policy(DDPGPolicy[TTD3TrainingStats], Generic[TTD3TrainingStats]): # t | |||
"""Implementation of TD3, arXiv:1802.09477. | |||
|
|||
:param actor: the actor network following the rules in | |||
:class:`~tianshou.policy.BasePolicy`. (s -> logits) | |||
:class:`~tianshou.policy.BasePolicy`. (s -> actions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it's true, the actor might be expected to output dist inputs. Gonna look into it
tianshou/utils/net/continuous.py
Outdated
logits, hidden = self.preprocess(obs, state) | ||
logits = self.max_action * torch.tanh(self.last(logits)) | ||
return logits, hidden | ||
"""Mapping: s -> action_values, hidden_state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow naming convention
This simplifies the interfaces and the implementations
…l#1032) Closes thu-ml#917 ### Internal Improvements - Better variable names related to model outputs (logits, dist input etc.). thu-ml#1032 - Improved typing for actors and critics, using Tianshou classes like `Actor`, `ActorProb`, etc., instead of just `nn.Module`. thu-ml#1032 - Added interfaces for most `Actor` and `Critic` classes to enforce the presence of `forward` methods. thu-ml#1032 - Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see associated breaking change). thu-ml#1032 - Use `.mode` of distribution instead of relying on knowledge of the distribution type. thu-ml#1032 ### Breaking Changes - Changed interface of `dist_fn` in `PGPolicy` and all subclasses to take a single argument in both continuous and discrete cases. thu-ml#1032 --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
…l#1032) Closes thu-ml#917 ### Internal Improvements - Better variable names related to model outputs (logits, dist input etc.). thu-ml#1032 - Improved typing for actors and critics, using Tianshou classes like `Actor`, `ActorProb`, etc., instead of just `nn.Module`. thu-ml#1032 - Added interfaces for most `Actor` and `Critic` classes to enforce the presence of `forward` methods. thu-ml#1032 - Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see associated breaking change). thu-ml#1032 - Use `.mode` of distribution instead of relying on knowledge of the distribution type. thu-ml#1032 ### Breaking Changes - Changed interface of `dist_fn` in `PGPolicy` and all subclasses to take a single argument in both continuous and discrete cases. thu-ml#1032 --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
…l#1032) Closes thu-ml#917 ### Internal Improvements - Better variable names related to model outputs (logits, dist input etc.). thu-ml#1032 - Improved typing for actors and critics, using Tianshou classes like `Actor`, `ActorProb`, etc., instead of just `nn.Module`. thu-ml#1032 - Added interfaces for most `Actor` and `Critic` classes to enforce the presence of `forward` methods. thu-ml#1032 - Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see associated breaking change). thu-ml#1032 - Use `.mode` of distribution instead of relying on knowledge of the distribution type. thu-ml#1032 ### Breaking Changes - Changed interface of `dist_fn` in `PGPolicy` and all subclasses to take a single argument in both continuous and discrete cases. thu-ml#1032 --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
…l#1032) Closes thu-ml#917 ### Internal Improvements - Better variable names related to model outputs (logits, dist input etc.). thu-ml#1032 - Improved typing for actors and critics, using Tianshou classes like `Actor`, `ActorProb`, etc., instead of just `nn.Module`. thu-ml#1032 - Added interfaces for most `Actor` and `Critic` classes to enforce the presence of `forward` methods. thu-ml#1032 - Simplified `PGPolicy` forward by unifying the `dist_fn` interface (see associated breaking change). thu-ml#1032 - Use `.mode` of distribution instead of relying on knowledge of the distribution type. thu-ml#1032 ### Breaking Changes - Changed interface of `dist_fn` in `PGPolicy` and all subclasses to take a single argument in both continuous and discrete cases. thu-ml#1032 --------- Co-authored-by: Arnau Jimenez <arnau.jimenez@zeiss.com> Co-authored-by: Michael Panchenko <m.panchenko@appliedai.de>
Closes #917
Internal Improvements
Actor
,ActorProb
, etc.,instead of just
nn.Module
. Naming and typing improvements in Actor/Critic/Policy forwards #1032Actor
andCritic
classes to enforce the presence offorward
methods. Naming and typing improvements in Actor/Critic/Policy forwards #1032PGPolicy
forward by unifying thedist_fn
interface (see associated breaking change). Naming and typing improvements in Actor/Critic/Policy forwards #1032.mode
of distribution instead of relying on knowledge of the distribution type. Naming and typing improvements in Actor/Critic/Policy forwards #1032Breaking Changes
dist_fn
inPGPolicy
and all subclasses to take a single argument in bothcontinuous and discrete cases. Naming and typing improvements in Actor/Critic/Policy forwards #1032