This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Model-Based RL: Player #1330

Merged

lukaszkaiser merged 24 commits into tensorflow:master from deepsense-ai:player_kc

Jan 8, 2019

Contributor

konradczechowski commented Dec 28, 2018

Utilities and scripts for

Interactive human playing with environment simulated by world model.
Recording trajectories from real and simulated environment using policy trained in MBRL pipeline.

googlebot added the cla: yes label

Contributor Author

konradczechowski commented Dec 28, 2018

It seems that commit 6e0177c fixes one bug for configuration <Python 2.7 TF_VERSION="1.12.*"> (dopamine import) and reveals another, probably older problem (exceptions.OSError: [Errno 2] No such file or directory).

Travis before this PR: https://travis-ci.org/tensorflow/tensor2tensor/jobs/472985666
Travis after: https://travis-ci.org/tensorflow/tensor2tensor/jobs/473105651

lukaszkaiser suggested changes

View reviewed changes

Contributor

lukaszkaiser left a comment

A number of comments - let's put a bit of effort to make this code more unified. Thanks!

tensor2tensor/rl/player_utils.py Outdated

		FLAGS = flags.FLAGS


		def make_simulated_env(real_env, world_model_dir, hparams, random_starts):

Contributor

lukaszkaiser Jan 2, 2019

This does not feel specific to player at all. Let's have a unified utilites and not duplicate this function from rlmb.

Contributor Author

konradczechowski Jan 3, 2019

I've moved initial_frame_chooser function generation to different module, now make_simulated_env (which I renamed to make_simulated_gym_env) have little if any code repetition with rlmb train_agent(). (6a4503a)

Contributor Author

konradczechowski Jan 4, 2019

I've moved make_simulated_env to rl.py.

tensor2tensor/rl/player_utils.py Outdated

+              import rl_utils
+              from envs.simulated_batch_gym_env import FlatBatchEnv
+              from tensor2tensor.models.research.rl import get_policy
+              from tensor2tensor.rl.trainer_model_based import make_simulated_env_fn, \

Contributor

lukaszkaiser Jan 2, 2019

The trainer_model_based file is a binary, not a library, it should not be imported. If we are having many general-purpose functions there, it's a sign we need a separate library with utilities, should we create one?

Contributor Author

konradczechowski Jan 4, 2019

I've removed trainer_model_based imports, moved some functions from there to gym_env and rl_utils.

tensor2tensor/rl/player_utils.py Outdated

		return flat_env


		def last_epoch(data_dir):

Contributor

lukaszkaiser Jan 2, 2019

Feels like this already exists somewhere?

Contributor Author

konradczechowski Jan 3, 2019

As far as I know it does not. Epochs numbers for start_new_epoch(...) (which loads data when restarting) are defined by trainer_model_based.py main loop (and not read from disk).

tensor2tensor/rl/player_utils.py Outdated

		return max([int(epoch_str) for epoch_str in epochs_str])


		def load_t2t_env(hparams, data_dir, which_epoch_data=None,

Contributor

lukaszkaiser Jan 2, 2019

This function should probably be part of T2TBatchGymEnv as a method to allow easy initialization?

Contributor Author

konradczechowski Jan 3, 2019

load_t2t_env uses setup_env(...) (from rl_utils; it unpacks hparams and parses game name), so I'll move setup_env() to T2TEnv (as static method) as well.

tensor2tensor/rl/player_utils.py Outdated

		return t2t_env


		def join_and_check(output_dir, subdirectory):

Contributor

lukaszkaiser Jan 2, 2019

Let's not have special functions for the os module, let's just not check if it exists if not strictly necessary.

Contributor Author

konradczechowski Jan 3, 2019

Checks are useful for faster failure, when incorrect data is given, but I've removed it in
3a3a103.

tensor2tensor/rl/player_utils.py Outdated

+                assert os.path.exists(path), "{} does not exists".format(path)
+                return path
+              class SimulatedEnv(Env):

Contributor

lukaszkaiser Jan 2, 2019

Could one of our existing SimulatedEnv classes already expose that?

Contributor Author

konradczechowski Jan 3, 2019 •

edited

Good point, we want gym.Env interface (not the "batch" one), but FlatBatchEnv wrapper is enough here. I've cleaned it up in
e6c7c08.

tensor2tensor/rl/player_utils.py

		random_starts=random_starts)


		class PPOPolicyInferencer(object):

Contributor

lukaszkaiser Jan 2, 2019

Why a new class? Could our Policy class not provide the appropriate methods already?

Contributor Author

konradczechowski Jan 3, 2019

There are two different interfaces:

PolicyLearner is heavy - requires simulated env (not in _init_ thought), performs high level training and evaluation.
PPOPolicyInferencer only loads policy from disk, provides low level, non-tf interface for policy inference.

PolicyLearner also infers policy, but does it inside of graph, whole idea of PPOPolicyInferencer is to provide python api for different purposes.

All the methods are for these reasons different. We could merge these classes, this would require removing base_event_dir, total_num_epochs arguments from PolicyLearner.__init__ (or make them optional), initialize inference graph in PPOLearner.__init__ (but this will be unused in rlmb pipeline). I'm not sure if that will make things more clear. We can also just move PPOPolicyInferencer to ppo_learner.py. Do you think it would suffice?

tensor2tensor/rl/record_ppo.py Outdated


		"""

		Run this script with the same parameters as trainer_model_based.py /

Contributor

lukaszkaiser Jan 2, 2019

I think that recording should be an option to pass to the player -- let's not make another binary only for that. Let's merge it with player py and have a flag there to set up recording.

Contributor Author

konradczechowski commented Jan 3, 2019

@lukaszkaiser Thanks for all the comments! I'll walk through them.

konradczechowski force-pushed the player_kc branch from f3d54ce to e6c7c08 Compare

January 3, 2019 17:13

Contributor Author

konradczechowski commented Jan 4, 2019

@lukaszkaiser Discussions about evaluation (+ if need record_ppo.py) and merging PPOLearner with PPOPolicyInferencer are open. I've applied rest of changes you suggested, what do you think of these?

We plan to work on evaluation in next week, it would be convenient for us to merge this PR first if this is possible. If adding record_ppo.py file is a problem you can remove it from this PR.

konradczechowski force-pushed the player_kc branch from 59f7011 to af51202 Compare

January 4, 2019 19:01

Contributor Author

konradczechowski commented Jan 4, 2019

I've corrected relative imports (which kills Python 2) and rebased on current master.

konradczechowski force-pushed the player_kc branch from eadc29c to d8afb35 Compare

January 7, 2019 22:01

konradczechowski added 21 commits

January 7, 2019 23:21


          SimulatedEnv with gym-like interface.

79fa882


          Initial Player

7b77890


          Player: Add reward header, keybord reset, CLI, move to player.py

43aaa1d


          Player: add WAIT mode, few CLI options.

1b39563


          Introduce Policy Inferencer

ecaffcd


          Recording videos for ppo and player, some refactoring.

f65876e


          Player refactor. Add real env recording with PPO agent.

0ac71c4


          Extend CLI documentation. Remove some imports.

1c9510d


          Pylint

fbb5cf5


          Extend documentation.

36efbd3


          Move gym.utils.play to global imports.

2bb98ed


          Correct dopamine import.

8cf6dd5


          Remove SimulatedEnv (unnecesarry wrapper for FlatBatchEnv<SimulatedBa…

3da2c6a

…tchGymEnv>)


          Replace join_and_check with os.path.join.

b7b7621


          Move generation of initial_frame_chooser function to rl_utils.

76242aa


          Move make_simulated_env_fn from trainer_model_based.py to rl.py

7905d3f


          Remove trainer_model_based imports, clean up player and record_ppo FL…

fd8ddb9

…AGS.


          Move setup_env and load_t2t_gym_env to T2TGymEnv.

ef902eb


          Correct relative imports.

1c9cdcf


          Custom policy world_model and data paths for player.

302ef71


          Enable BatchGymEnv to load directly from checkpoint file.

ead812e

konradczechowski added 2 commits

January 7, 2019 23:27


          Small fix record_ppo.

eef62c6


          Remove unused record_ppo.py.

5614fd8

konradczechowski force-pushed the player_kc branch from d8afb35 to 5614fd8 Compare

January 7, 2019 22:27


          Merge branch 'master' into player_kc

dd1a4b5

lukaszkaiser approved these changes

View reviewed changes

Contributor

lukaszkaiser left a comment

Thanks, let's get it in.

lukaszkaiser merged commit 186ffc3 into tensorflow:master

tensorflow-copybara pushed a commit that referenced this pull request


          internal merge of PR #1330

d36669f

PiperOrigin-RevId: 228267563

kpe pushed a commit to kpe/tensor2tensor that referenced this pull request


          Model-Based RL: Player (tensorflow#1330)

e66a484

* SimulatedEnv with gym-like interface.

* Initial Player

* Player: Add reward header, keybord reset, CLI, move to player.py

* Player: add WAIT mode, few CLI options.

* Introduce Policy Inferencer

* Recording videos for ppo and player, some refactoring.

* Player refactor. Add real env recording with PPO agent.

* Extend CLI documentation. Remove some imports.

* Pylint

* Extend documentation.

* Correct dopamine import.

* Move gym.utils.play to global imports.

* Remove SimulatedEnv (unnecesarry wrapper for FlatBatchEnv<SimulatedBatchGymEnv>)

* Replace join_and_check with os.path.join.

* Move generation of initial_frame_chooser function to rl_utils.

* Move make_simulated_env_fn from trainer_model_based.py to rl.py

* Remove trainer_model_based imports, clean up player and record_ppo FLAGS.

* Move setup_env and load_t2t_gym_env to T2TGymEnv.

* Correct relative imports.

* Custom policy world_model and data paths for player.

* Enable BatchGymEnv to load directly from checkpoint file.

* Small fix record_ppo.

* Remove unused record_ppo.py.

kpe pushed a commit to kpe/tensor2tensor that referenced this pull request


          internal merge of PR tensorflow#1330

4cb6845

PiperOrigin-RevId: 228267563

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.