-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade gym #613
Upgrade gym #613
Conversation
Codecov Report
@@ Coverage Diff @@
## master #613 +/- ##
==========================================
- Coverage 93.69% 93.25% -0.45%
==========================================
Files 72 72
Lines 4805 4890 +85
==========================================
+ Hits 4502 4560 +58
- Misses 303 330 +27
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
I'll take a look later today. Thanks for the contribution anyway! cc @ultmaster |
This reverts commit 558a5ea.
@Trinkle23897 can you please take another look at this PR? I've implemented your suggested changes and also made |
tianshou/env/venvs.py
Outdated
self.workers[i].send(None, **kwargs) | ||
ret_list = [self.workers[i].recv() for i in id] | ||
|
||
if "return_info" in kwargs and kwargs["return_info"]: | ||
obs_list = [r[0] for r in ret_list] | ||
else: | ||
obs_list = ret_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to check it by type? Or do you have any other better approach?
if isinstance(ret_list[0], (tuple, list)) and len(ret_list[0]) == 2 and isinstance(ret_list[0][1], dict):
# return obs, info
# return obs
I know it's a little bit confusing since the observation may also be a tuple. However, I personally don't like this gym's API. Usually the user won't change the environment return data type during the whole process, so the return_info
argument should be placed in __init__
instead of reset
function.
From this point of view, envpool uses gym_reset_return_info
option in initialization. However, this method cannot support envpool. If possible, would you please add a test case for envpool? Though it only affects the venv wrapper part.
P.S. gym will finally remove this argument in reset function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we don't support tuple observation space and recommend using dict space. This is quite a strong assumption you can use here. If anyone uses tuple observation and meet exception, we can raise the corresponding hint for them by saying something like please change your observation space from tuple to array or dict space
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think checking by type is a workable approach, adopted it in 364b46e .
Also added test for envpool, and added exception for using tuple observation in the same commit
tianshou/env/venvs.py
Outdated
has_infos = isinstance(ret_list[0], (tuple, list)) and len( | ||
ret_list[0] | ||
) == 2 and isinstance(ret_list[0][1], dict) | ||
if has_infos: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use self.has_infos
to reduce checking overhead for all classes?
if self.has_info is None:
self.has_info = isinstance(ret_list[0], (tuple, list)) and len(
ret_list[0]
) == 2 and isinstance(ret_list[0][1], dict)
if self.has_info:
...
else:
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
implemented in 662a68a. (although I did if hasattr(self, "reset_returns_info")
, as mypy complains about typing in a bunch of places if I do self.has_info is None
). A warning that with this change, users may experience a surprise if they call reset
with return_info
set to True
and then False
, but maybe this isn't likely to happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you ever tested with a full training pipeline?
There are some options:
- venv.reset only returns obs, no change to collector
- vecv.reset only returns (obs, info), change collector to adapt
- venv.reset can return either (obs) or (obs, info), collector needs to handle both cases
I personally favor the second approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you ever tested with a full training pipeline?
You mean run it with one of the scripts in the examples folder? Haven't done that yet but can do.
I'm not sure if the second approach would work in all cases, since there could be some environments that don't return info along with obs right now. It would also be jumping ahead of the planned Gym API changes. If the 3rd option also sound good with you, I can implement that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if there's no info, we attach an empty dict. This can save a lot of code compared with 3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the 3rd option also sound good with you, I can implement that.
I'm ok with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added support to collector for option 3 in 75ecd18
Also I ran python examples/box2d/lunarlander_dqn.py
and it works fine
tianshou/data/collector.py
Outdated
obs = self.preprocess_fn(obs=obs, | ||
env_id=np.arange(self.env_num)).get("obs", obs) | ||
if self.gym_reset_return_info: | ||
obs, info = self.env.reset(return_info=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you cannot do this, at least envpool will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my other comment just now #613 (comment) . With my proposal we shouldn't run into this issue. Open to other ideas as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be fixed now
tianshou/data/collector.py
Outdated
def _reset_env_with_ids( | ||
self, local_ids: Union[List[int], np.ndarray], global_ids: Union[List[int], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a test to check the correctness of info storage order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some checks to info storage in test_collector
in c2ff71f , hopefully that's what you were looking for.
def collect( | ||
self, | ||
n_step: Optional[int] = None, | ||
n_episode: Optional[int] = None, | ||
random: bool = False, | ||
render: Optional[float] = None, | ||
no_grad: bool = True, | ||
gym_reset_kwargs: Optional[Dict[str, Any]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance this can be a callable? Sometimes I want different reset kwargs for different environments, e.g., different seeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to just get this PR wrapped up, and introduce additional functionality in future PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
BTW it's better to update the docs to show how to use this |
fixes some deprecation warnings due to new changes in gym version 0.23: - use `env.np_random.integers` instead of `env.np_random.randint` - support `seed` and `return_info` arguments for reset (addresses thu-ml#605)
make format
(required)make commit-checks
(required)fixes some deprecation warnings due to new changes in gym version 0.23:
env.np_random.integers
instead ofenv.np_random.randint
seed
andreturn_info
arguments for reset (addresses Setting seed, return_info, options for reset #605)