## 1. Import Dependencies

In [1]:
from stable_baselines3.common.evaluation import evaluate_policy
from stable_baselines3.common.env_util import make_atari_env
from stable_baselines3.common.vec_env import VecFrameStack
from stable_baselines3 import A2C
import gym
import os

## 2. Test Environment

Download the trainable environment

[Here](http://www.atarimania.com/roms/Rooms.rar)

Don't forget to extract it

In [2]:
!python -m atari_py.import_roms ROMS/

copying adventure.bin from ROMS/Adventure (1980) (Atari, Warren Robinett) (CX2613, CX2613P) (PAL).bin to C:\Users\asus\AppData\Local\Programs\Python\Python39\lib\site-packages\atari_py\atari_roms\adventure.bin
copying air_raid.bin from ROMS/Air Raid (Men-A-Vision) (PAL) ~.bin to C:\Users\asus\AppData\Local\Programs\Python\Python39\lib\site-packages\atari_py\atari_roms\air_raid.bin
copying alien.bin from ROMS/Alien (1982) (20th Century Fox Video Games, Douglas 'Dallas North' Neubauer) (11006) ~.bin to C:\Users\asus\AppData\Local\Programs\Python\Python39\lib\site-packages\atari_py\atari_roms\alien.bin
copying amidar.bin from ROMS/Amidar (1982) (Parker Brothers, Ed Temple) (PB5310) ~.bin to C:\Users\asus\AppData\Local\Programs\Python\Python39\lib\site-packages\atari_py\atari_roms\amidar.bin
copying assault.bin from ROMS/Assault (AKA Sky Alien) (1983) (Bomb - Onbase) (CA281).bin to C:\Users\asus\AppData\Local\Programs\Python\Python39\lib\site-packages\atari_py\atari_roms\assault.bin
copyin

In [3]:
environment_name = 'Breakout-v0'
env = gym.make(environment_name)

In [4]:
env.reset()

array([[[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       ...,

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]],

       [[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        ...,
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]]], dtype=uint8)

In [5]:
env.action_space

Discrete(4)

In [6]:
env.observation_space

Box(0, 255, (210, 160, 3), uint8)

In [7]:
episodes = 5
for episode in range(1, episodes+1):
    obs = env.reset()
    done = False
    score = 0
    
    while not done:
        env.render()
        action = env.action_space.sample()
        obs, reward, done, info = env.step(action)
        score += reward
    print("Episode:{} Score:{}".format(episode, score))
# env.close()



Episode:1 Score:3.0
Episode:2 Score:0.0
Episode:3 Score:1.0
Episode:4 Score:4.0
Episode:5 Score:1.0


In [8]:
env.close()

## 3. Vectorize Environment and Train Model

In [11]:
env = make_atari_env('Breakout-v0', n_envs=4, seed=0)
env = VecFrameStack(env, n_stack=4)

In [12]:
log_path = os.path.join('Training', 'Logs')
model = A2C('CnnPolicy', env, verbose=1, tensorboard_log=log_path)

Using cuda device
Wrapping the env in a VecTransposeImage.


In [13]:
model.learn(total_timesteps=100000)

Logging to Training\Logs\A2C_1
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 273      |
|    ep_rew_mean        | 1.43     |
| time/                 |          |
|    fps                | 56       |
|    iterations         | 100      |
|    time_elapsed       | 35       |
|    total_timesteps    | 2000     |
| train/                |          |
|    entropy_loss       | -1.38    |
|    explained_variance | -0.303   |
|    learning_rate      | 0.0007   |
|    n_updates          | 99       |
|    policy_loss        | -0.0287  |
|    value_loss         | 0.00545  |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 268      |
|    ep_rew_mean        | 1.36     |
| time/                 |          |
|    fps                | 64       |
|    iterations         | 200      |
|    time_elapsed       | 62       |
|    total_timesteps    | 4000     |
| train

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 301      |
|    ep_rew_mean        | 1.94     |
| time/                 |          |
|    fps                | 68       |
|    iterations         | 1400     |
|    time_elapsed       | 409      |
|    total_timesteps    | 28000    |
| train/                |          |
|    entropy_loss       | -1       |
|    explained_variance | 0.779    |
|    learning_rate      | 0.0007   |
|    n_updates          | 1399     |
|    policy_loss        | 0.00243  |
|    value_loss         | 0.0762   |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 314      |
|    ep_rew_mean        | 2.15     |
| time/                 |          |
|    fps                | 68       |
|    iterations         | 1500     |
|    time_elapsed       | 439      |
|    total_timesteps    | 30000    |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 432      |
|    ep_rew_mean        | 4.97     |
| time/                 |          |
|    fps                | 68       |
|    iterations         | 2800     |
|    time_elapsed       | 815      |
|    total_timesteps    | 56000    |
| train/                |          |
|    entropy_loss       | -0.489   |
|    explained_variance | 0.978    |
|    learning_rate      | 0.0007   |
|    n_updates          | 2799     |
|    policy_loss        | -0.00977 |
|    value_loss         | 0.0419   |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 434      |
|    ep_rew_mean        | 4.95     |
| time/                 |          |
|    fps                | 68       |
|    iterations         | 2900     |
|    time_elapsed       | 843      |
|    total_timesteps    | 58000    |
| train/                |          |
|

------------------------------------
| rollout/              |          |
|    ep_len_mean        | 478      |
|    ep_rew_mean        | 5.67     |
| time/                 |          |
|    fps                | 69       |
|    iterations         | 4200     |
|    time_elapsed       | 1207     |
|    total_timesteps    | 84000    |
| train/                |          |
|    entropy_loss       | -0.632   |
|    explained_variance | 0.822    |
|    learning_rate      | 0.0007   |
|    n_updates          | 4199     |
|    policy_loss        | 0.0117   |
|    value_loss         | 0.0267   |
------------------------------------
------------------------------------
| rollout/              |          |
|    ep_len_mean        | 480      |
|    ep_rew_mean        | 5.67     |
| time/                 |          |
|    fps                | 69       |
|    iterations         | 4300     |
|    time_elapsed       | 1235     |
|    total_timesteps    | 86000    |
| train/                |          |
|

<stable_baselines3.a2c.a2c.A2C at 0x22c5dd584f0>

## 4. Save and Reload Model

In [15]:
a2c_path = os.path.join('Training', 'Saved_Model', 'A2C_Breakout_100K_Model')
model.save(a2c_path)



In [16]:
del model

In [18]:
model = A2C.load(a2c_path, env)

Wrapping the env in a VecTransposeImage.


## 5. Evaluate and Test

In [19]:
env = make_atari_env('Breakout-v0', n_envs=1, seed=0)
env = VecFrameStack(env, n_stack=4)

In [21]:
evaluate_policy(model, env, n_eval_episodes=50, render=True)

(7.2, 2.078460969082653)

In [22]:
env.close()

Note: This notebook is code along with [this video](https://www.youtube.com/watch?v=Mut_u40Sqz4&ab_channel=NicholasRenotte)

All file except the notebook will not be pushed into github for makin it cleaner

This is not end yet, there is still 2 project that I will finish next