# Playing with Garage
Name: Shota Takeshima
* garage is a toolkit for developing and evaluating reinforcement learning algorithms, and an accompanying library of state-of-the-art implementations built using that toolkit.
* I tried to replicate [meta-RL example](https://garage.readthedocs.io/en/latest/user/meta_multi_task_rl_exp.html#) in garage document understanding waht each line in the code means.

## Installation
* garage supports python 3.6 or later.

In [1]:
!python --version

Python 3.6.9


In [2]:
# install garage
!echo "abcd" > mujoco_fake_key
!rm -rf garage
!git clone --depth 1 https://github.com/rlworkgroup/garage/
# in this execution of batch script, an error occurs...
!cd garage && bash scripts/setup_colab.sh --mjkey ../mujoco_fake_key --no-modify-bashrc > /dev/null!

Cloning into 'garage'...
remote: Enumerating objects: 755, done.[K
remote: Counting objects: 100% (755/755), done.[K
remote: Compressing objects: 100% (663/663), done.[K
remote: Total 755 (delta 203), reused 219 (delta 78), pack-reused 0[K
Receiving objects: 100% (755/755), 2.97 MiB | 31.66 MiB/s, done.
Resolving deltas: 100% (203/203), done.
start of setup_colab.sh




debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-like program is installed, so the dialog based frontend cannot be used. at /usr/share/perl5/Debconf/FrontEnd/Dialog.pm line 76, <> line 88.)
debconf: falling back to frontend: Readline
debconf: unable to initialize frontend: Readline
debconf: (This frontend requires a controlling tty.)
debconf: falling back to frontend: Teletype
dpkg-preconfigure: unable to re-open stdin: 


Cloning into '/tmp/tmp.mnEP3VfKQ7/glfw'...
remote: Enumerating objects: 61, done.[K
remote: Counting objects: 100% (61/61), done.[K
remote: Compressing objects: 100% (26

In [3]:
!garage examples

2021-02-01 05:53:14.755134: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
# torch
torch/bc_point.py (garage.examples.torch.bc_point)
torch/bc_point_deterministic_policy.py (garage.examples.torch.bc_point_deterministic_policy)
torch/ddpg_pendulum.py (garage.examples.torch.ddpg_pendulum)
torch/dqn_atari.py (garage.examples.torch.dqn_atari)
torch/dqn_cartpole.py (garage.examples.torch.dqn_cartpole)
torch/maml_ppo_half_cheetah_dir.py (garage.examples.torch.maml_ppo_half_cheetah_dir)
torch/maml_trpo_half_cheetah_dir.py (garage.examples.torch.maml_trpo_half_cheetah_dir)
torch/maml_trpo_metaworld_ml10.py (garage.examples.torch.maml_trpo_metaworld_ml10)
torch/maml_trpo_metaworld_ml1_push.py (garage.examples.torch.maml_trpo_metaworld_ml1_push)
torch/maml_trpo_metaworld_ml45.py (garage.examples.torch.maml_trpo_metaworld_ml45)
torch/maml_vpg_half_cheetah_dir.py (garage.examples.torch.maml_vpg_half_cheetah_dir)
torch/mtppo_met

**Restart this notebook here.** it's needed to recognize the installed packages.

In [1]:
"""This is an example to train a task with TRPO algorithm.

Here it runs CartPole-v1 environment with 100 iterations.

Results:
    AverageReturn: 100
    RiseTime: itr 13
"""
from garage import wrap_experiment
from garage.envs import GymEnv
from garage.experiment.deterministic import set_seed
from garage.np.baselines import LinearFeatureBaseline
from garage.sampler import LocalSampler
from garage.tf.algos import TRPO
from garage.tf.policies import CategoricalMLPPolicy
from garage.trainer import TFTrainer

* Prepare `env`, `policy`, `baseline` which means a value function, `sampler`, `algo` which is an optimizer of policy such as TRPO and MAML, and `trainer`

In [2]:
@wrap_experiment
def trpo_cartpole(ctxt=None, seed=1):
    """Train TRPO with CartPole-v1 environment.

    Args:
        ctxt (garage.experiment.ExperimentContext): The experiment
            configuration used by Trainer to create the snapshotter.
        seed (int): Used to seed the random number generator to produce
            determinism.

    """
    set_seed(seed)
    with TFTrainer(ctxt) as trainer:
        env = GymEnv('CartPole-v1')

        policy = CategoricalMLPPolicy(name='policy',
                                      env_spec=env.spec,
                                      hidden_sizes=(32, 32))

        baseline = LinearFeatureBaseline(env_spec=env.spec)

        sampler = LocalSampler(agents=policy,
                               envs=env,
                               max_episode_length=env.spec.max_episode_length,
                               is_tf_worker=True)

        algo = TRPO(env_spec=env.spec,
                    policy=policy,
                    baseline=baseline,
                    sampler=sampler,
                    discount=0.99,
                    max_kl_step=0.01)

        trainer.setup(algo, env)
        trainer.train(n_epochs=100, batch_size=4000)

In [3]:
trpo_cartpole()

2021-02-01 07:04:18 | [trpo_cartpole] Logging to /content/data/local/experiment/trpo_cartpole_1




Instructions for updating:
Prefer Variable.assign which has equivalent behavior in 2.X.
2021-02-01 07:04:20 | [trpo_cartpole] Obtaining samples...
2021-02-01 07:04:29 | [trpo_cartpole] epoch #0 | Optimizing policy...
2021-02-01 07:04:29 | [trpo_cartpole] epoch #0 | Computing loss before
2021-02-01 07:04:34 | [trpo_cartpole] epoch #0 | Computing KL before
2021-02-01 07:04:34 | [trpo_cartpole] epoch #0 | Optimizing
2021-02-01 07:04:34 | [trpo_cartpole] epoch #0 | Start CG optimization: #parameters: 1282, #inputs: 195, #subsample_inputs: 195
2021-02-01 07:04:34 | [trpo_cartpole] epoch #0 | computing loss before
2021-02-01 07:04:34 | [trpo_cartpole] epoch #0 | computing gradient
2021-02-01 07:04:34 | [trpo_cartpole] epoch #0 | gradient computed
2021-02-01 07:04:34 | [trpo_cartpole] epoch #0 | computing descent direction
2021-02-01 07:04:34 | [trpo_cartpole] epoch #0 | descent direction computed
2021-02-01 07:04:35 | [trpo_cartpole] epoch #0 | backtrack iters: 7
2021-02-01 07:04:35 | [trpo_

NoSuchDisplayException: ignored

In [None]:
* TODO
