# **Atari 2600 Space Invaders with Vision**
## *TFG Reinforcement Learning through the GymRetro Platform.*

In this notebook we will show how to train and load a Tensorforce DQN agent that plays Atari 2600's Space Invaders using the Arcade Learning Environment integrated in Gym and GymRetro.

### Previous installs:
Run the following cell only if you are using Google Colab, or if you still haven't installed locally the required libraries.

In [None]:
!pip install tensorforce
# The following versions are required to work properly with the latest (0.6.5) version of Tensorforce.
!pip install keras==2.6.0
!pip install gym[atari,accept-rom-license]==0.21.0

### Google drive:
The following code allows the interaction between Google Colab and Google Drive. It will be useful for saving the trained agent.

In [None]:
from google.colab import drive

drive.mount('/content/gdrive')
root_path = 'gdrive/My Drive/dir'  #Change 'dir' to the folder in which you want to store the result

### Needed imports:

In [None]:
import time
from tensorforce import Agent, Environment

### Creation or loading of agent:

Execute the first cell if it's your first time creating the agent.
Execute the second cell if you want to load your previously created agent.

In [None]:
environment = Environment.create(environment='gym', level='SpaceInvaders-v0')

# Instantiate a Tensorforce agent
agent = Agent.create(
    agent='dqn',
    environment=environment,  # alternatively: states, actions, (max_episode_timesteps)
    batch_size=32,
    memory=10000,
    # Setting this to GPU will not work in Google Colab. Locally, it depends on having the right versions installed.
    # See https://www.tensorflow.org/install/gpu?hl=es-419 for more on this
    config=dict(device='CPU'),
    # This setting allows the agent to preprocess the state. It can be edited to a diferent preprocessing, or deleted for no preprocessing.
    state_preprocessing = [
        dict(type='image', height=105, width=80, grayscale=True),
        dict(type='linear_normalization')               
    ],
    # The following setting saves Tensorboard information in the choosen folder. It can be uncommented to enable it.
    #summarizer=dict(
    #    directory=root_path + 'Atari/data/summaries/AGENT_NAME',
    #    summaries='all'
    #),
    # The following setting is necesary for applying posterior XAI techniques.
    tracking = 'all'
)

In [None]:
agent = Agent.load(directory=root_path +'Atari/AGENT_NAME')

### Agent training:

Load this cell if you want to start training from the beginning. Be careful not to execute it otherwise or you may lose previous data.

In [None]:
# If running locally you can add parameter visualize='True' if you want to see the training process (not on Colab).
# Note that this will slow the training process
environment = Environment.create(environment='gym', level='SpaceInvaders-v0')

episode_reward = []
episodeTimes = []
episodeTimeSteps = []
trainingStart = time.time()

# Train for 100 episodes
for i in range(100):

    # Initialize episode
    states = environment.reset()
    terminal = False
    rewardTotal = 0
    currentEpisodeTimeSteps = 0
    episodeStart = time.time()

    # Main training loop
    while not terminal:
        # Episode timestep
        currentEpisodeTimeSteps += 1
        actions = agent.act(states=states)
        states, terminal, reward = environment.execute(actions=actions)
        agent.observe(terminal=terminal, reward=reward)
        rewardTotal += reward
    
    # End of episode
    episodeEnd = time.time()
    timeEpisode = episodeEnd - episodeStart
    episodeTimes.append(timeEpisode)
    episode_reward.append(rewardTotal)
    episodeTimeSteps.append(currentEpisodeTimeSteps)  
    print('End of episode', i)
    # Save info about episodes every 10 episodes
    if len(episodeTimes) == 10:
        with open(root_path +'Atari/Episodes/AGENT_NAME/rewards_per_episode.txt', 'a') as f:
            for item in episode_reward:
                f.write("%s\n" % item)
        
        with open(root_path +'Atari/Episodes/AGENT_NAME/timesteps_per_episode.txt', 'a') as f:
            for item in episodeTimeSteps:
                f.write("%s\n" % item)
        
        with open(root_path +'Atari/Episodes/AGENT_NAME/times_per_episode.txt', 'a') as f:
            for item in episodeTimes:
                f.write("%s\n" % item)
        episode_reward = []
        episodeTimes = []
        episodeTimeSteps = []
    # Save agent every 10 episodes.
    # This can potentially be done with the "saver" setting in the agent too.
    if i % 10 == 9:
      agent.save(directory=root_path + 'Atari/AGENT_NAME')
    
trainingEnd = time.time()
trainingTime = trainingEnd - trainingStart

agent.close()
environment.close()

### Check results of training:

You can run this cell to load your agent and check some metrics about the current performance. Note how the training loop is slightly different, not having agent.observe() and agent.act() having the parameters independent and deterministic. That allows us to check the performance of the agent without training it.

In [None]:
agent = Agent.load(directory=root_path +'Atari/AGENT_NAME')
environment = Environment.create(environment='gym', level='SpaceInvaders-v0')

episodeTimes = []
episodeTimeSteps = []
episodeRewards = []
for i in range(10):
    episodeStart = time.time()
    # Initialize episode
    states = environment.reset()
    terminal = False
    currentEpisodeTimeSteps = 0
    currentReward = 0
    
    while not terminal:
        # Episode timestep
        currentEpisodeTimeSteps += 1
        actions = agent.act(states=states, independent = True, deterministic=True)
        states, terminal, reward = environment.execute(actions=actions)
        currentReward += reward
    
    episodeEnd = time.time()
    timeEpisode = episodeEnd - episodeStart
    episodeTimes.append(timeEpisode)
    episodeTimeSteps.append(currentEpisodeTimeSteps)
    episodeRewards.append(currentReward)
    
environment.close()
    
avgEpisodeTime = sum(episodeTimes) / len(episodeTimes)
bestEpisodeTime = max(episodeTimes)
avgEpisodeTimeSteps = sum(episodeTimeSteps) / len(episodeTimeSteps)
bestEpisodeTimeSteps = max(episodeTimeSteps)
avgEpisodeReward = sum(episodeRewards) / len(episodeRewards)
bestEpisodeReward = max(episodeRewards)

In [None]:
print(f"Average time steps per episode: {avgEpisodeTimeSteps} timesteps")
print(f"Episode with most timesteps: {bestEpisodeTimeSteps} timesteps")
print(f"Average reward per episode: {avgEpisodeReward}")
print(f"Best episode reward: {bestEpisodeReward}")