Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade gym #613

Merged
merged 33 commits into from
Jun 27, 2022
Merged

Upgrade gym #613

merged 33 commits into from
Jun 27, 2022

Conversation

ycheng517
Copy link
Contributor

@ycheng517 ycheng517 commented Apr 27, 2022

  • I have marked all applicable categories:
    • exception-raising fix
    • algorithm implementation fix
    • documentation modification
    • new feature
  • I have reformatted the code using make format (required)
  • I have checked the code using make commit-checks (required)
  • If applicable, I have mentioned the relevant/related issue(s)
  • If applicable, I have listed every items in this Pull Request below

fixes some deprecation warnings due to new changes in gym version 0.23:

@codecov-commenter
Copy link

codecov-commenter commented Apr 27, 2022

Codecov Report

Merging #613 (edfcbcf) into master (aba2d01) will decrease coverage by 0.44%.
The diff coverage is 76.76%.

@@            Coverage Diff             @@
##           master     #613      +/-   ##
==========================================
- Coverage   93.69%   93.25%   -0.45%     
==========================================
  Files          72       72              
  Lines        4805     4890      +85     
==========================================
+ Hits         4502     4560      +58     
- Misses        303      330      +27     
Flag Coverage Δ
unittests 93.25% <76.76%> (-0.45%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
tianshou/env/worker/ray.py 76.59% <50.00%> (-8.78%) ⬇️
tianshou/env/worker/dummy.py 76.47% <53.84%> (-12.82%) ⬇️
tianshou/env/worker/subproc.py 87.02% <65.11%> (-7.06%) ⬇️
tianshou/env/worker/base.py 64.81% <66.66%> (+1.85%) ⬆️
tianshou/env/pettingzoo_env.py 90.00% <77.77%> (-1.49%) ⬇️
tianshou/env/venv_wrappers.py 85.71% <83.33%> (+0.52%) ⬆️
tianshou/env/venvs.py 94.07% <92.85%> (-0.33%) ⬇️
tianshou/data/collector.py 94.24% <100.00%> (+0.42%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Apr 27, 2022

I'll take a look later today. Thanks for the contribution anyway!

cc @ultmaster

@Trinkle23897 Trinkle23897 linked an issue Apr 27, 2022 that may be closed by this pull request
8 tasks
test/modelbased/test_ppo_icm.py Outdated Show resolved Hide resolved
setup.py Show resolved Hide resolved
tianshou/env/pettingzoo_env.py Outdated Show resolved Hide resolved
@ycheng517
Copy link
Contributor Author

@Trinkle23897 can you please take another look at this PR? I've implemented your suggested changes and also made reset support return_info and seed in the venvs.

Comment on lines 202 to 208
self.workers[i].send(None, **kwargs)
ret_list = [self.workers[i].recv() for i in id]

if "return_info" in kwargs and kwargs["return_info"]:
obs_list = [r[0] for r in ret_list]
else:
obs_list = ret_list
Copy link
Collaborator

@Trinkle23897 Trinkle23897 May 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to check it by type? Or do you have any other better approach?

if isinstance(ret_list[0], (tuple, list)) and len(ret_list[0]) == 2 and isinstance(ret_list[0][1], dict):
  # return obs, info
# return obs

I know it's a little bit confusing since the observation may also be a tuple. However, I personally don't like this gym's API. Usually the user won't change the environment return data type during the whole process, so the return_info argument should be placed in __init__ instead of reset function.

From this point of view, envpool uses gym_reset_return_info option in initialization. However, this method cannot support envpool. If possible, would you please add a test case for envpool? Though it only affects the venv wrapper part.

P.S. gym will finally remove this argument in reset function.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we don't support tuple observation space and recommend using dict space. This is quite a strong assumption you can use here. If anyone uses tuple observation and meet exception, we can raise the corresponding hint for them by saying something like please change your observation space from tuple to array or dict space

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking by type is a workable approach, adopted it in 364b46e .
Also added test for envpool, and added exception for using tuple observation in the same commit

Comment on lines 205 to 208
has_infos = isinstance(ret_list[0], (tuple, list)) and len(
ret_list[0]
) == 2 and isinstance(ret_list[0][1], dict)
if has_infos:
Copy link
Collaborator

@Trinkle23897 Trinkle23897 May 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use self.has_infos to reduce checking overhead for all classes?

if self.has_info is None:
  self.has_info = isinstance(ret_list[0], (tuple, list)) and len(
            ret_list[0]
        ) == 2 and isinstance(ret_list[0][1], dict)
if self.has_info:
  ...
else:
  ...

Copy link
Contributor Author

@ycheng517 ycheng517 May 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implemented in 662a68a. (although I did if hasattr(self, "reset_returns_info"), as mypy complains about typing in a bunch of places if I do self.has_info is None). A warning that with this change, users may experience a surprise if they call reset with return_info set to True and then False, but maybe this isn't likely to happen.

Copy link
Collaborator

@Trinkle23897 Trinkle23897 May 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you ever tested with a full training pipeline?

There are some options:

  • venv.reset only returns obs, no change to collector
  • vecv.reset only returns (obs, info), change collector to adapt
  • venv.reset can return either (obs) or (obs, info), collector needs to handle both cases

I personally favor the second approach.

Copy link
Contributor Author

@ycheng517 ycheng517 May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you ever tested with a full training pipeline?

You mean run it with one of the scripts in the examples folder? Haven't done that yet but can do.

I'm not sure if the second approach would work in all cases, since there could be some environments that don't return info along with obs right now. It would also be jumping ahead of the planned Gym API changes. If the 3rd option also sound good with you, I can implement that.

Copy link
Collaborator

@Trinkle23897 Trinkle23897 May 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if there's no info, we attach an empty dict. This can save a lot of code compared with 3.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the 3rd option also sound good with you, I can implement that.

I'm ok with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added support to collector for option 3 in 75ecd18

Also I ran python examples/box2d/lunarlander_dqn.py and it works fine

obs = self.preprocess_fn(obs=obs,
env_id=np.arange(self.env_num)).get("obs", obs)
if self.gym_reset_return_info:
obs, info = self.env.reset(return_info=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you cannot do this, at least envpool will fail

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my other comment just now #613 (comment) . With my proposal we shouldn't run into this issue. Open to other ideas as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be fixed now

Comment on lines 161 to 162
def _reset_env_with_ids(
self, local_ids: Union[List[int], np.ndarray], global_ids: Union[List[int],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a test to check the correctness of info storage order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some checks to info storage in test_collector in c2ff71f , hopefully that's what you were looking for.

def collect(
self,
n_step: Optional[int] = None,
n_episode: Optional[int] = None,
random: bool = False,
render: Optional[float] = None,
no_grad: bool = True,
gym_reset_kwargs: Optional[Dict[str, Any]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance this can be a callable? Sometimes I want different reset kwargs for different environments, e.g., different seeds.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to just get this PR wrapped up, and introduce additional functionality in future PRs

tianshou/env/worker/subproc.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Trinkle23897
Copy link
Collaborator

BTW it's better to update the docs to show how to use this return_info feature, but we can do it in the following PR.

@Trinkle23897 Trinkle23897 merged commit 43792bf into thu-ml:master Jun 27, 2022
BFAnas pushed a commit to BFAnas/tianshou that referenced this pull request May 5, 2024
fixes some deprecation warnings due to new changes in gym version 0.23:
- use `env.np_random.integers` instead of `env.np_random.randint`
- support `seed` and `return_info` arguments for reset (addresses thu-ml#605)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setting seed, return_info, options for reset
5 participants