Conversation
It seems that commit 6e0177c fixes one bug for configuration <Python 2.7 TF_VERSION="1.12.*"> (dopamine import) and reveals another, probably older problem (exceptions.OSError: [Errno 2] No such file or directory). Travis before this PR: https://travis-ci.org/tensorflow/tensor2tensor/jobs/472985666 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A number of comments - let's put a bit of effort to make this code more unified. Thanks!
tensor2tensor/rl/player_utils.py
Outdated
FLAGS = flags.FLAGS | ||
|
||
|
||
def make_simulated_env(real_env, world_model_dir, hparams, random_starts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not feel specific to player at all. Let's have a unified utilites and not duplicate this function from rlmb.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved initial_frame_chooser
function generation to different module, now make_simulated_env
(which I renamed to make_simulated_gym_env
) have little if any code repetition with rlmb train_agent()
. (6a4503a)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved make_simulated_env
to rl.py.
tensor2tensor/rl/player_utils.py
Outdated
import rl_utils | ||
from envs.simulated_batch_gym_env import FlatBatchEnv | ||
from tensor2tensor.models.research.rl import get_policy | ||
from tensor2tensor.rl.trainer_model_based import make_simulated_env_fn, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trainer_model_based file is a binary, not a library, it should not be imported. If we are having many general-purpose functions there, it's a sign we need a separate library with utilities, should we create one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed trainer_model_based
imports, moved some functions from there to gym_env and rl_utils.
tensor2tensor/rl/player_utils.py
Outdated
return flat_env | ||
|
||
|
||
def last_epoch(data_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like this already exists somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know it does not. Epochs numbers for start_new_epoch(...)
(which loads data when restarting) are defined by trainer_model_based.py
main loop (and not read from disk).
tensor2tensor/rl/player_utils.py
Outdated
return max([int(epoch_str) for epoch_str in epochs_str]) | ||
|
||
|
||
def load_t2t_env(hparams, data_dir, which_epoch_data=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should probably be part of T2TBatchGymEnv as a method to allow easy initialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load_t2t_env
uses setup_env(...)
(from rl_utils
; it unpacks hparams and parses game name), so I'll move setup_env()
to T2TEnv (as static method) as well.
tensor2tensor/rl/player_utils.py
Outdated
return t2t_env | ||
|
||
|
||
def join_and_check(output_dir, subdirectory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not have special functions for the os module, let's just not check if it exists if not strictly necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks are useful for faster failure, when incorrect data is given, but I've removed it in
3a3a103.
tensor2tensor/rl/player_utils.py
Outdated
assert os.path.exists(path), "{} does not exists".format(path) | ||
return path | ||
|
||
class SimulatedEnv(Env): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could one of our existing SimulatedEnv classes already expose that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, we want gym.Env
interface (not the "batch" one), but FlatBatchEnv
wrapper is enough here. I've cleaned it up in
e6c7c08.
random_starts=random_starts) | ||
|
||
|
||
class PPOPolicyInferencer(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a new class? Could our Policy class not provide the appropriate methods already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two different interfaces:
PolicyLearner
is heavy - requires simulated env (not in_init_
thought), performs high level training and evaluation.PPOPolicyInferencer
only loads policy from disk, provides low level, non-tf interface for policy inference.
PolicyLearner
also infers policy, but does it inside of graph, whole idea of PPOPolicyInferencer
is to provide python api for different purposes.
All the methods are for these reasons different. We could merge these classes, this would require removing base_event_dir
, total_num_epochs
arguments from PolicyLearner.__init__
(or make them optional), initialize inference graph in PPOLearner.__init__
(but this will be unused in rlmb pipeline). I'm not sure if that will make things more clear. We can also just move PPOPolicyInferencer
to ppo_learner.py
. Do you think it would suffice?
tensor2tensor/rl/record_ppo.py
Outdated
|
||
""" | ||
|
||
Run this script with the same parameters as trainer_model_based.py / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that recording should be an option to pass to the player -- let's not make another binary only for that. Let's merge it with player py and have a flag there to set up recording.
@lukaszkaiser Thanks for all the comments! I'll walk through them. |
f3d54ce
to
e6c7c08
Compare
@lukaszkaiser Discussions about evaluation (+ if need record_ppo.py) and merging PPOLearner with PPOPolicyInferencer are open. I've applied rest of changes you suggested, what do you think of these? We plan to work on evaluation in next week, it would be convenient for us to merge this PR first if this is possible. If adding record_ppo.py file is a problem you can remove it from this PR. |
59f7011
to
af51202
Compare
I've corrected relative imports (which kills Python 2) and rebased on current master. |
eadc29c
to
d8afb35
Compare
d8afb35
to
5614fd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, let's get it in.
PiperOrigin-RevId: 228267563
* SimulatedEnv with gym-like interface. * Initial Player * Player: Add reward header, keybord reset, CLI, move to player.py * Player: add WAIT mode, few CLI options. * Introduce Policy Inferencer * Recording videos for ppo and player, some refactoring. * Player refactor. Add real env recording with PPO agent. * Extend CLI documentation. Remove some imports. * Pylint * Extend documentation. * Correct dopamine import. * Move gym.utils.play to global imports. * Remove SimulatedEnv (unnecesarry wrapper for FlatBatchEnv<SimulatedBatchGymEnv>) * Replace join_and_check with os.path.join. * Move generation of initial_frame_chooser function to rl_utils. * Move make_simulated_env_fn from trainer_model_based.py to rl.py * Remove trainer_model_based imports, clean up player and record_ppo FLAGS. * Move setup_env and load_t2t_gym_env to T2TGymEnv. * Correct relative imports. * Custom policy world_model and data paths for player. * Enable BatchGymEnv to load directly from checkpoint file. * Small fix record_ppo. * Remove unused record_ppo.py.
PiperOrigin-RevId: 228267563
Utilities and scripts for