# **NES Space Invaders with RAM**
## *TFG Reinforcement Learning through the GymRetro Platform.*

In this notebook we will show how to train and load a Tensorforce DQN agent that plays NES' Space Invaders using the Arcade Learning Environment integrated in Gym and GymRetro.

### Previous installs:
Run the following cell only if you are using Google Colab, or if you still haven't installed locally the required libraries.

In [None]:
!pip install gym-retro
!pip install tensorforce
# The following versions are required to work properly with the latest (0.6.5) version of Tensorforce.
!pip install keras==2.6.0
!pip install gym==0.21.0

### Google Drive:
The following code allows the interaction between Google Colab and Google Drive. It will be useful for saving the trained agent.

In [None]:
from google.colab import drive

drive.mount('/content/gdrive')
root_path = './gdrive/My Drive/dir'  #Change 'dir' to the folder in which you want to store the result

### Needed imports:

In [None]:
import retro
import gym
import time
import numpy as np
from tensorforce import Agent, Environment

In [None]:
# This code imports the required ROM into the system (SpaceInvaders-Nes).
# Locally, the ROM may need to be imported diferently.
!python3 -m retro.import './gdrive/My Drive/Path_to_your_rom'

### Discretizer

The following code, adapted from [this](https://github.com/openai/retro/blob/master/retro/examples/discretizer.py), allows us to discretize all possible actions into just the few that interest us. The combo list may be changed as wanted.

In [None]:
class Discretizer(gym.ActionWrapper):
    """
    Wrap a gym environment and make it use discrete actions.
    Args:
        combos: ordered list of lists of valid button combinations
    """

    def __init__(self, env, combos):
        super().__init__(env)
        assert isinstance(env.action_space, gym.spaces.MultiBinary)
        buttons = env.unwrapped.buttons
        self._decode_discrete_action = []
        for combo in combos:
            arr = np.array([False] * env.action_space.n)
            for button in combo:
                arr[buttons.index(button)] = True
            self._decode_discrete_action.append(arr)

        self.action_space = gym.spaces.Discrete(len(self._decode_discrete_action))

    def action(self, act):
        return self._decode_discrete_action[act].copy()


class SpaceInvadersNesDiscretizer(Discretizer):
    def __init__(self, env):
      # We allow the character to stay still, move either way, shoot standing still, or shoot while moving either way.
      super().__init__(env=env, combos=[[], ['LEFT'], ['RIGHT'], ['A'], ['LEFT','A'], ['RIGHT','A']])

### Creation or loading of agent:

Execute the first cell if it's your first time creating the agent.
Execute the second cell if you want to load your previously created agent.
Throughout the notebook, change *AGENT_NAME* to the name of the agent you are loading or saving.

In [None]:
env = retro.make(game='SpaceInvaders-Nes', obs_type=retro.Observations.RAM)
env = SpaceInvadersNesDiscretizer(env)
environment = Environment.create(environment=env)

# Instantiate a Tensorforce agent
agent = Agent.create(
    agent='dqn',
    environment=environment,  # alternatively: states, actions, (max_episode_timesteps)
    batch_size=32,
    memory=1000,
    exploration=0.05,
    # Setting this to GPU will not work in Google Colab. Locally, it depends on having the right versions installed.
    # See https://www.tensorflow.org/install/gpu?hl=es-419 for more on this
    config=dict(device='CPU'),
    # The following setting saves Tensorboard information in the choosen folder. It can be uncommented to enable it.
    # summarizer=dict(
    #    directory=root_path + 'NES/data/summaries/AGENT-NAME',
    #    summaries='all'
    # ),
    # The following setting is necesary for applying posterior XAI techniques.
    tracking = 'all'
)

In [None]:
agent = Agent.load(directory=root_path +'Atari/AGENT_NAME')

### Agent training:

Load this cell if you want to start training from the beginning. Be careful not to execute it otherwise or you may lose previous data.

In [None]:
env = retro.make(game='SpaceInvaders-Nes', obs_type=retro.Observations.RAM)
env = SpaceInvadersNesDiscretizer(env)
environment = Environment.create(environment=env)

episode_reward = []
episodeTimes = []
episodeTimeSteps = []
trainingStart = time.time()

# Train for 100 episodes
for i in range(100):

    # Initialize episode
    states = environment.reset()
    terminal = False
    rewardTotal = 0
    currentEpisodeTimeSteps = 0
    episodeStart = time.time()

    # Main training loop
    while not terminal:
        # Episode timestep
        currentEpisodeTimeSteps += 1
        actions = agent.act(states=states)
        states, terminal, reward = environment.execute(actions=actions)
        agent.observe(terminal=terminal, reward=reward)
        rewardTotal += reward

    # End of episode
    episodeEnd = time.time()
    timeEpisode = episodeEnd - episodeStart
    episodeTimes.append(timeEpisode)
    episode_reward.append(rewardTotal)
    episodeTimeSteps.append(currentEpisodeTimeSteps)  
    print('End of episode', i)
    # Save episodes every 10 episodes
    if len(episodeTimes) == 10:
        with open(root_path +'NES/Episodes/AGENT_NAME/rewards_per_episode.txt', 'a') as f:
            for item in episode_reward:
                f.write("%s\n" % item)
        
        with open(root_path + 'NES/Episodes/AGENT_NAME/timesteps_per_episode.txt', 'a') as f:
            for item in episodeTimeSteps:
                f.write("%s\n" % item)
        
        with open(root_path +'NES/Episodes/AGENT_NAME/times_per_episode.txt', 'a') as f:
            for item in episodeTimes:
                f.write("%s\n" % item)
        episode_reward = []
        episodeTimes = []
        episodeTimeSteps = []
    # Save agent every 10 episodes
    # This can potentially be done with the "saver" setting in the agent too.
    if i % 10 == 9:
      agent.save(directory=root_path + 'GymRetro/AGENT_NAME')

agent.close()
environment.close()

### Check results of training:

You can run this cell to load your agent and check some metrics about the current performance.
Our personal experience was that if we ran this code without agent.observe() and with agent.act() having the parameters independent and deterministic, the agent would get stuck performing a certain action over and over. This doesn't happen while actually training the agent, that is why here the agent is trained too. Feel free to try whichever approach works for you.

In [None]:
agent = Agent.load(directory=root_path +'NES/AGENT_NAME')
env = retro.make(game='SpaceInvaders-Nes', obs_type=retro.Observations.RAM, record = root_path)
env = SpaceInvadersNesDiscretizer(env)
environment = Environment.create(environment=env)

episodeTimes = []
episodeTimeSteps = []
episodeRewards = []
for _ in range(2):
    episodeStart = time.time()
    # Initialize episode
    states = environment.reset()
    terminal = False
    currentEpisodeTimeSteps = 0
    currentReward = 0
    while not terminal:
        # Episode timestep
        currentEpisodeTimeSteps += 1
        actions = agent.act(states=states)
        # May want to try adding 'independent = True' and 'deterministic = True', as well as removing agent.observe
        states, terminal, reward = environment.execute(actions=actions)
        agent.observe(terminal=terminal, reward=reward)
        currentReward += reward
    
    episodeEnd = time.time()
    timeEpisode = episodeEnd - episodeStart
    episodeTimes.append(timeEpisode)
    episodeTimeSteps.append(currentEpisodeTimeSteps)
    episodeRewards.append(currentReward)
    
environment.close()
    
avgEpisodeTime = sum(episodeTimes) / len(episodeTimes)
bestEpisodeTime = max(episodeTimes)
avgEpisodeTimeSteps = sum(episodeTimeSteps) / len(episodeTimeSteps)
bestEpisodeTimeSteps = max(episodeTimeSteps)
avgEpisodeReward = sum(episodeRewards) / len(episodeRewards)
bestEpisodeReward = max(episodeRewards)

In [None]:
print(f"Average time steps per episode: {avgEpisodeTimeSteps} timesteps")
print(f"Episode with most timesteps: {bestEpisodeTimeSteps} timesteps")
print(f"Average reward per episode: {avgEpisodeReward}")
print(f"Best episode reward: {bestEpisodeReward}")