# Project 3 - Pacman

# Members:
RA 183374 - Helena Steck

RA 187251 - Tainá Turella C. dos Santos

## Introduction
During Project 3 - Pacman the group was required to apply its knowledge around genetic algorithms and reinforcement learning to teach Mr. Pacman to defeat the ghosts while he feeds himself in a maze.
This project was harder than the previous projects because we needed to comprehend how the codified game works and then implement the algorithms in a way that they could interact with each other without friction/issues.

## Code & Difficulties
We used the pacman implementation that is available at [Berkeley](https://inst.eecs.berkeley.edu/~cs188/sp19/assets/files/search.zip "Search.zip") and over this implementation we managed to train our model using reinforcement learning.
We were not able to comprehend how to implement the theory learned about genetic learning, therefore you won't be able to see information about genetic training in this report and we won't be able to make comparisons between the 2 models.

## Reinforcement Learning
The definition of Reinforcement Learning available in [Wikipedia](https://en.wikipedia.org/wiki/Reinforcement_learning "Reinforcement learning") is:
> *Reinforcement learning (RL) is an area of machine learning concerned with how intelligent agents ought to take actions in an environment in order to maximize the notion of cumulative reward.Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.
Reinforcement learning differs from supervised learning in not needing labelled input/output pairs be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).
The environment is typically stated in the form of a Markov decision process (MDP), because many reinforcement learning algorithms for this context use dynamic programming techniques.The main difference between the classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathematical model of the MDP and they target large MDPs where exact methods become infeasible.*

Our implementation will be based on those concepts using Approximate Q-Learning, which is a reinforcement learning algorithm that is based on knowing the value of an action given a particular state. The algorithim calculates the quality of a state-action combination and after each iteration the weigth of each feature is updated and at the end of the training the converged weights are used as params for the Agents.
Using the Aproximate Q-Learning we tried to fit reward function as a linear function that is obtained due to the combination of the features applied and analysed in the model.
One thing that the group believe that should be discussed is the impact that biases can cause in reinforcement learning algorithms. For example, in this project was very tough to teach pacman to eat the ghosts when they are "scared" that happens because in other point it is crucial for pacman to run away from them when they can be harmfull. This is just a silly example, but at the same time made us question somethings during the development.
There are some advantages on using Aproximate Q-Learning, but there are also some disadvantages, such as; the restriction that it applies to the accuracy of learned rewards and it also requires well-known features. It is a more generic algorithm implementation, but its also not perfect.

### Training

With the previous explanation, we can now explain how the training of the Pacman Agents work. 

#### Reward modelling

We use as reward the difference between the Score before an action is done and after that action. Thus the reward is given by: $$reward = currentScore - previousScore$$

#### State/features modelling

We use as state features for our Approximate Q-Learning agent the following:

- Number of ghosts 1 square from Pacman after the action is done
- Distance to closest food after the action is done
- Whether or not a food will be eaten after the action is done
- A bias term to ensure it will converge

#### Training episodes

Even though we saw improvements quickly with a low number of training episodes, we chose to use 1000 training episodes. This is because some states are rather rare to occur, like the state which indicates the number of ghosts 1 square from Pacman, thus the weights associated to those states would never be updated if we didn't encounter them.

#### Calculating the Q-Value

To calculate the Q-Value we simply do a dot-product between our state and their respective weights. Assume $s$ represents a state and $a$ an action, then we have:

$$QValue(s, a) = weights \cdot state(s, a)$$

where $weights$ and $state(s, a)$ are both vectors with weights and states

#### Updating the weights

To update the weights we use the recursive form of the [Bellman Equation](https://en.wikipedia.org/wiki/Bellman_equation):

$$ weights = weights + \alpha * \left( reward + \gamma * QValue(s_{current}, a_{current}) - QValue(s_{prev}, a_{prev}) \right)$$

Where $\alpha$ is the learning rate, $\gamma$ is the discount factor, $reward$ is the reward calculated as show before, $QValue(s_{current}, a_{current})$ is the QValue after the action is applied and $QValue(s_{prev}, a_{prev})$ is the QValue for the action before the current one.

#### Exploration vs Exploitation

When Pacman is in training mode we allow it to act regardless of the QValue it can predict, this is called `Exploration`, the pacman will randomly select one valid action and act on it, and apply the update to the weights based on its outcome. This mode of action is controlled by the $\epsilon$(epsilon) variable, which dictates the chance pacman will not use the know QValue(`Exploitation`) and use a random action(`Exploration`)

#### End conditions

We did not put any conditions to end the game, other than winning it, or losing it. Thus some games can take more time than others.


### Results

The following table shows the win-rate of running each Agent on each available layout:

|                        | smallClassic Agent | mediumClassic Agent | originalClassic Agent |
|:----------------------:|:------------------:|:-------------------:|:---------------------:|
|   smallClassic layout  |        62.3%       |         62.0%       |          62.0%        |
|  mediumClassic layout  |        73.7%       |         72.8%       |          71.2%        |
| originalClassic layout |        44.0%       |         45.0%       |                       |


The following table shows the average score of 1000 runs for each agent on each map:

|                        | smallClassic Agent | mediumClassic Agent | originalClassic Agent |
|:----------------------:|:------------------:|:-------------------:|:---------------------:|
|   smallClassic layout  |        524.234     |         523.457     |          567.344      |
|  mediumClassic layout  |        965.37      |         951.669     |          923.765      |
| originalClassic layout |        1380.767    |         1393.844    |         1360.602      |


### Discussion

We can see from the results table that no matter what agent we use, they all perform really similarly in all of the maps. This is probably because of the low number of states. With the mapped states we have no possibility of the algorithm dealing with being entrapped by ghosts, or even knowing it is allowed to eat ghosts when it eats a pill. This is interesting to note because the Agent will always try to run away from ghosts instead of chasing them, even if it is able to eat the ghosts. To solve this problem, a better mapping of the state should be used.


## Disclaimer:
This project was equally divided between the two group members. Therefore we confirm that the methods, the discussions and the report elaboration were created by both of us. We used the code available at [Berkeley](https://inst.eecs.berkeley.edu/~cs188/sp19/assets/files/search.zip "Search.zip") to run pacman and we used their tutorial as a guide on how to implement the methods of Reinforcement Learning.

We did the entire project during calls via Google Meets, using a pair programming method. In that way we could help each other during the development phase.

# Code

We start by importing the necessary files from the pacman game and numpy for helping with math.

In [1]:
%gui tk
from game import Agent, Actions
from pacman import runGames, readCommand
from graphicsDisplay import PacmanGraphics
from textDisplay import NullGraphics
from ghostAgents import RandomGhost
import numpy as np
import util
import layout
import time
import random

Now we define a base class for our agent which abstracts all the game related things for us and also sets up states for training and running without training

In [2]:
class BaseAgent(Agent):
    def __init__(self, alpha=1.0, epsilon=0.05, gamma=0.8, numTraining=10):
        """
        alpha    - learning rate
        epsilon  - exploration rate
        gamma    - discount factor
        numTraining - number of training episodes
        """
        self.alpha = float(alpha)
        self.epsilon = float(epsilon)
        self.gamma = float(gamma)
        self.numTraining = int(numTraining)

        self.episodesSoFar = 0
        self.accumTrainRewards = 0.0
        self.accumTestRewards = 0.0


    def getLegalActions(self, state):
        """
        Get the actions available for a given
        state. This is what you should use to
        obtain legal actions for a state
        """
        return state.getLegalActions()


    def observeTransition(self, state, action, nextState, deltaReward):
        """
        Inform agent that a transition has
        been observed. This will result in a call to self.update
        on the same arguments
        """
        self.episodeRewards += deltaReward
        self.update(state, action, nextState, deltaReward)


    def startEpisode(self):
        """
        Called when new episode is starting
        """
        self.lastState = None
        self.lastAction = None
        self.episodeRewards = 0.0


    def stopEpisode(self):
        """
        Called when episode is done
        """
        if self.episodesSoFar < self.numTraining:
            self.accumTrainRewards += self.episodeRewards
        else:
            self.accumTestRewards += self.episodeRewards
        self.episodesSoFar += 1
        if self.episodesSoFar >= self.numTraining:
            # Take off the training wheels
            self.epsilon = 0.0  # no exploration
            self.alpha = 0.0  # no learning


    def doAction(self, state, action):
        """
        Called by inherited class when
        an action is taken in a state
        """
        self.lastState = state
        self.lastAction = action

        
    def observationFunction(self, state):
        """
        Called by Pacman game after a new state is generated
        """
        if self.lastState is not None:
            reward = state.getScore() - self.lastState.getScore()
            self.observeTransition(
                self.lastState, self.lastAction, state, reward
            )
        return state

    
    def registerInitialState(self, state):
        """
        Called by Pacman game at the start of a game
        """
        self.startEpisode()
        if self.episodesSoFar == 0:
            print("Beginning %d episodes of Training" % (self.numTraining))


    def final(self, state):
        """
        Called by Pacman game at the terminal state
        """
        deltaReward = state.getScore() - self.lastState.getScore()
        self.observeTransition(
            self.lastState, self.lastAction, state, deltaReward
        )
        self.stopEpisode()

        # Make sure we have this var
        if not "episodeStartTime" in self.__dict__:
            self.episodeStartTime = time.time()
        if not "lastWindowAccumRewards" in self.__dict__:
            self.lastWindowAccumRewards = 0.0
        self.lastWindowAccumRewards += state.getScore()

        NUM_EPS_UPDATE = 100
        if self.episodesSoFar % NUM_EPS_UPDATE == 0:
            print("Reinforcement Learning Status:")
            windowAvg = self.lastWindowAccumRewards / float(NUM_EPS_UPDATE)
            if self.episodesSoFar <= self.numTraining:
                trainAvg = self.accumTrainRewards / float(self.episodesSoFar)
                print(
                    "\tCompleted %d out of %d training episodes"
                    % (self.episodesSoFar, self.numTraining)
                )
                print("\tAverage Rewards over all training: %.2f" % (trainAvg))
            else:
                testAvg = float(self.accumTestRewards) / (
                    self.episodesSoFar - self.numTraining
                )
                print(
                    "\tCompleted %d test episodes"
                    % (self.episodesSoFar - self.numTraining)
                )
                print("\tAverage Rewards over testing: %.2f" % testAvg)
            print(
                "\tAverage Rewards for last %d episodes: %.2f"
                % (NUM_EPS_UPDATE, windowAvg)
            )
            print(
                "\tEpisode took %.2f seconds"
                % (time.time() - self.episodeStartTime)
            )
            self.lastWindowAccumRewards = 0.0
            self.episodeStartTime = time.time()

        if self.episodesSoFar == self.numTraining:
            msg = "Training Done (turning off epsilon and alpha)"
            print("%s\n%s" % (msg, "-" * len(msg)))

Now we implement a function to help us filter the features of our pacman game state. This is important to reduce the complexity in training our Agent by reducing the state space.

In [3]:
def closestFood(pos, food, walls):
    """
    Calculate the distance to the food closest to our pacman by
    also taking in account the walls.

    We do this by doing a fringe search on the 'graph'
    represented by our game board and then for each position
    we check if it has a food, if it has we found our target and
    return the distance.
    """
    fringe = [(pos[0], pos[1], 0)]
    expanded = set()
    while fringe:
        # pop the first pos from the fringe
        pos_x, pos_y, dist = fringe.pop(0)

        # we already visited it: continue
        if (pos_x, pos_y) in expanded:
            continue

        # else: add it to visited locations
        expanded.add((pos_x, pos_y))

        # if we find a food at this location then return the current distance
        if food[pos_x][pos_y]:
            return dist

        # otherwise spread out from the location to its neighbours
        nbrs = Actions.getLegalNeighbors((pos_x, pos_y), walls)
        for nbr_x, nbr_y in nbrs:
            fringe.append((nbr_x, nbr_y, dist + 1))

    # no food found(probably we won the game? if not this will probably bug our Pacman :D)
    return None

In [4]:
def getFeatures(state, action):
    """
    Returns the following features:
    - bias(always one to ensure training will converge)
    - whether food will be eaten
    - how far away the next food is
    - whether a ghost is one step away
    """
    # extract the grid of food and wall locations and get the ghost locations
    food = state.getFood()
    walls = state.getWalls()
    ghosts = state.getGhostPositions()

    features = util.Counter()

    features["bias"] = 1.0

    # compute the location of pacman after he takes the action
    if action is not None:
        x, y = state.getPacmanPosition()
        dx, dy = Actions.directionToVector(action)
        next_x, next_y = int(x + dx), int(y + dy)
    else:
        x, y = state.getPacmanPosition()
        next_x, next_y = x, y

    # count the number of ghosts 1-step away
    features["#-of-ghosts-1-step-away"] = sum(
        (next_x, next_y) in Actions.getLegalNeighbors(g, walls)
        for g in ghosts
    )

    # if there is no danger of ghosts then add the food feature
    if not features["#-of-ghosts-1-step-away"] and food[next_x][next_y]:
        features["eats-food"] = 1.0

    dist = closestFood((next_x, next_y), food, walls)
    if dist is not None:
        # make the distance a number less than one otherwise the update
        # will diverge wildly
        features["closest-food"] = float(dist) / (
            walls.width * walls.height
        )
    features.divideAll(10.0)
    return features

Now that we have selected which features we want to have in our state, we can implement a class that trains our Agent using an Approximate Q-Learning strategy

In [5]:
class ApproximateQAgent(BaseAgent):
    """
    ApproximateQLearningAgent

    You should only have to overwrite getQValue
    and update.  All other QLearningAgent functions
    should work as is.
    """
    def __init__(self, weights = None, **args):
        BaseAgent.__init__(self, **args)
        if weights is None:
            weights = util.Counter()
        self.weights = weights

    
    def getWeights(self):
        return self.weights
    

    def getAction(self, state):
        """
        Compute the action to take in the current state.  With
        probability self.epsilon, we should take a random action and
        take the best policy action otherwise.  Note that if there are
        no legal actions, which is the case at the terminal state, you
        should choose None as the action
        """
        # get legal actions 
        action = None
        legalActions = self.getLegalActions(state)
        
        # no action: return
        if len(legalActions) == 0:
            self.doAction(state, action)
            return action

        # explore new action or use Q-Value?
        explore = util.flipCoin(self.epsilon)
        
        # not exploring: choose action with highest Q-Value
        if not explore:
            totalRewards = [self.getQValue(state, a) for a in legalActions]
            action = legalActions[np.argmax(totalRewards)]
        
        # exploring: choose a random action
        else:
            action = random.choice(legalActions)

        # inform base class of action picked
        self.doAction(state, action)
        
        # return picked action
        return action


    def getQValue(self, state, action):
        """
        Should return Q(state,action) = w * featureVector
        where * is the dotProduct operator
        """
        features = getFeatures(state, action)
        return self.weights * features


    def update(self, state, action, nextState, reward):
        """
        Should update weights based on transition
        """
        # get maximum Q-Value after this state change
        legalNextActions = self.getLegalActions(nextState)
        if len(legalNextActions) > 0:
            max_q_value = np.max(
                [self.getQValue(nextState, a) for a in legalNextActions]
            )
        else:
            max_q_value = 0

        # apply the Bellman equation to update the weights
        difference = (reward + self.gamma * max_q_value) - self.getQValue(state, action)

        incremented_features = getFeatures(state, action)
        for k in incremented_features:
            incremented_features[k] *= self.alpha * difference

        self.weights += incremented_features


    def final(self, state):
        "Called at the end of each game."
        # call the super-class final method
        BaseAgent.final(self, state)

        # training finished: print current weights
        if self.episodesSoFar == self.numTraining:
            print(self.weights)

And now we can train our RL model:

In [6]:
NUM_TRAINING = 1000

In [14]:
smallClassicAgent = ApproximateQAgent(numTraining=NUM_TRAINING)
runGames(layout.getLayout('smallClassic'),
         smallClassicAgent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_TRAINING,
         numTraining=NUM_TRAINING,
         record=False)

Beginning 1000 episodes of Training
Reinforcement Learning Status:
	Completed 100 out of 1000 training episodes
	Average Rewards over all training: 540.81
	Average Rewards for last 100 episodes: 540.81
	Episode took 6.28 seconds
Reinforcement Learning Status:
	Completed 200 out of 1000 training episodes
	Average Rewards over all training: 558.37
	Average Rewards for last 100 episodes: 575.92
	Episode took 6.66 seconds
Reinforcement Learning Status:
	Completed 300 out of 1000 training episodes
	Average Rewards over all training: 556.66
	Average Rewards for last 100 episodes: 553.24
	Episode took 6.57 seconds
Reinforcement Learning Status:
	Completed 400 out of 1000 training episodes
	Average Rewards over all training: 554.98
	Average Rewards for last 100 episodes: 549.93
	Episode took 6.46 seconds
Reinforcement Learning Status:
	Completed 500 out of 1000 training episodes
	Average Rewards over all training: 561.08
	Average Rewards for last 100 episodes: 585.51
	Episode took 14.49 second

[]

In [12]:
mediumClassicAgent = ApproximateQAgent(numTraining=NUM_TRAINING)
runGames(layout.getLayout('mediumClassic'),
         mediumClassicAgent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_TRAINING,
         numTraining=NUM_TRAINING,
         record=False)

Beginning 1000 episodes of Training
Reinforcement Learning Status:
	Completed 100 out of 1000 training episodes
	Average Rewards over all training: 959.01
	Average Rewards for last 100 episodes: 959.01
	Episode took 14.13 seconds
Reinforcement Learning Status:
	Completed 200 out of 1000 training episodes
	Average Rewards over all training: 928.58
	Average Rewards for last 100 episodes: 898.15
	Episode took 14.00 seconds
Reinforcement Learning Status:
	Completed 300 out of 1000 training episodes
	Average Rewards over all training: 946.53
	Average Rewards for last 100 episodes: 982.42
	Episode took 14.80 seconds
Reinforcement Learning Status:
	Completed 400 out of 1000 training episodes
	Average Rewards over all training: 921.52
	Average Rewards for last 100 episodes: 846.51
	Episode took 13.98 seconds
Reinforcement Learning Status:
	Completed 500 out of 1000 training episodes
	Average Rewards over all training: 947.13
	Average Rewards for last 100 episodes: 1049.56
	Episode took 14.52 s

[]

In [13]:
originalClassicAgent = ApproximateQAgent(numTraining=NUM_TRAINING)
runGames(layout.getLayout('originalClassic'),
         originalClassicAgent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_TRAINING,
         numTraining=NUM_TRAINING,
         record=False)

Beginning 1000 episodes of Training
Reinforcement Learning Status:
	Completed 100 out of 1000 training episodes
	Average Rewards over all training: 1250.24
	Average Rewards for last 100 episodes: 1250.24
	Episode took 98.91 seconds
Reinforcement Learning Status:
	Completed 200 out of 1000 training episodes
	Average Rewards over all training: 1373.81
	Average Rewards for last 100 episodes: 1497.37
	Episode took 112.17 seconds
Reinforcement Learning Status:
	Completed 300 out of 1000 training episodes
	Average Rewards over all training: 1390.33
	Average Rewards for last 100 episodes: 1423.38
	Episode took 106.33 seconds
Reinforcement Learning Status:
	Completed 400 out of 1000 training episodes
	Average Rewards over all training: 1357.46
	Average Rewards for last 100 episodes: 1258.86
	Episode took 97.57 seconds
Reinforcement Learning Status:
	Completed 500 out of 1000 training episodes
	Average Rewards over all training: 1330.42
	Average Rewards for last 100 episodes: 1222.23
	Episode t

[]

Now that we trained one agent for each map, we can compare their performances for each other maps and see the impact the training map has on the final weights.

## Running smallClassicAgent

We will be starting with running the smallClassicAgent on all the other maps and getting the percentage of games won.

In [33]:
NUM_RUNNING = 1000

In [34]:
agent = ApproximateQAgent(weights = smallClassicAgent.getWeights(), numTraining=NUM_TRAINING)
runGames(layout.getLayout('smallClassic'),
         agent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_RUNNING,
         numTraining=0,
         record=False);

Beginning 1000 episodes of Training
Pacman died! Score: -48
Pacman emerges victorious! Score: 979
Pacman emerges victorious! Score: 980
Pacman emerges victorious! Score: 961
Pacman emerges victorious! Score: 987
Pacman emerges victorious! Score: 982
Pacman emerges victorious! Score: 978
Pacman emerges victorious! Score: 977
Pacman emerges victorious! Score: 972
Pacman died! Score: -102
Pacman died! Score: -302
Pacman emerges victorious! Score: 987
Pacman emerges victorious! Score: 970
Pacman emerges victorious! Score: 980
Pacman died! Score: -41
Pacman emerges victorious! Score: 968
Pacman emerges victorious! Score: 978
Pacman emerges victorious! Score: 983
Pacman died! Score: -44
Pacman emerges victorious! Score: 983
Pacman died! Score: -148
Pacman emerges victorious! Score: 973
Pacman emerges victorious! Score: 982
Pacman died! Score: -139
Pacman emerges victorious! Score: 977
Pacman emerges victorious! Score: 974
Pacman emerges victorious! Score: 982
Pacman died! Score: -35
Pacman d

Pacman emerges victorious! Score: 970
Pacman emerges victorious! Score: 981
Pacman died! Score: -358
Pacman emerges victorious! Score: 978
Pacman died! Score: -36
Pacman died! Score: -297
Pacman died! Score: -141
Pacman emerges victorious! Score: 984
Pacman emerges victorious! Score: 980
Pacman emerges victorious! Score: 967
Pacman died! Score: -323
Pacman emerges victorious! Score: 935
Pacman emerges victorious! Score: 963
Pacman died! Score: -240
Pacman died! Score: -134
Pacman died! Score: -137
Pacman emerges victorious! Score: 966
Pacman died! Score: -63
Pacman died! Score: -393
Pacman died! Score: -80
Pacman emerges victorious! Score: 955
Pacman emerges victorious! Score: 978
Pacman died! Score: -359
Pacman died! Score: -337
Pacman emerges victorious! Score: 969
Pacman emerges victorious! Score: 979
Pacman died! Score: -42
Pacman emerges victorious! Score: 973
Pacman emerges victorious! Score: 979
Pacman died! Score: -65
Pacman emerges victorious! Score: 981
Pacman emerges victori

Pacman died! Score: -42
Pacman died! Score: -57
Pacman emerges victorious! Score: 974
Pacman died! Score: -289
Pacman emerges victorious! Score: 973
Pacman died! Score: -281
Pacman emerges victorious! Score: 985
Pacman emerges victorious! Score: 983
Pacman emerges victorious! Score: 962
Pacman emerges victorious! Score: 987
Pacman emerges victorious! Score: 951
Pacman emerges victorious! Score: 979
Pacman emerges victorious! Score: 961
Pacman emerges victorious! Score: 980
Pacman died! Score: -378
Pacman emerges victorious! Score: 984
Pacman emerges victorious! Score: 979
Pacman died! Score: -96
Pacman died! Score: -167
Reinforcement Learning Status:
	Completed 500 out of 1000 training episodes
	Average Rewards over all training: 533.65
	Average Rewards for last 100 episodes: 525.03
	Episode took 6.35 seconds
Pacman emerges victorious! Score: 981
Pacman died! Score: -133
Pacman emerges victorious! Score: 977
Pacman died! Score: -326
Pacman died! Score: -37
Pacman emerges victorious! Sc

Pacman emerges victorious! Score: 967
Pacman died! Score: -343
Pacman emerges victorious! Score: 979
Pacman died! Score: -101
Pacman emerges victorious! Score: 965
Pacman emerges victorious! Score: 964
Pacman emerges victorious! Score: 961
Pacman emerges victorious! Score: 963
Pacman emerges victorious! Score: 970
Pacman died! Score: -341
Pacman emerges victorious! Score: 980
Pacman emerges victorious! Score: 984
Pacman died! Score: -344
Pacman emerges victorious! Score: 978
Pacman died! Score: -59
Pacman emerges victorious! Score: 953
Pacman emerges victorious! Score: 986
Pacman emerges victorious! Score: 982
Pacman died! Score: -404
Pacman died! Score: -175
Pacman emerges victorious! Score: 981
Pacman emerges victorious! Score: 983
Pacman emerges victorious! Score: 977
Pacman emerges victorious! Score: 967
Pacman emerges victorious! Score: 979
Pacman emerges victorious! Score: 970
Pacman died! Score: -307
Pacman died! Score: -130
Pacman emerges victorious! Score: 971
Pacman emerges v

Pacman emerges victorious! Score: 984
Pacman died! Score: -349
Pacman emerges victorious! Score: 974
Pacman died! Score: -340
Pacman died! Score: -258
Pacman emerges victorious! Score: 967
Pacman died! Score: -287
Pacman emerges victorious! Score: 976
Pacman died! Score: -56
Pacman emerges victorious! Score: 985
Pacman emerges victorious! Score: 976
Pacman emerges victorious! Score: 969
Pacman emerges victorious! Score: 953
Pacman emerges victorious! Score: 978
Pacman emerges victorious! Score: 983
Pacman died! Score: -264
Pacman died! Score: -323
Pacman emerges victorious! Score: 982
Pacman emerges victorious! Score: 981
Pacman died! Score: -163
Pacman emerges victorious! Score: 954
Pacman emerges victorious! Score: 973
Pacman emerges victorious! Score: 985
Pacman emerges victorious! Score: 978
Pacman died! Score: -440
Pacman died! Score: -258
Pacman died! Score: -271
Pacman emerges victorious! Score: 976
Pacman emerges victorious! Score: 964
Pacman emerges victorious! Score: 974
Pacm

As we can see from the output above, when running the `smallClassic` agent on the map it was trained in, the `smallClassic` map, then it has a 62.3% winrate.

In [35]:
agent = ApproximateQAgent(weights = smallClassicAgent.getWeights(), numTraining=NUM_TRAINING)
runGames(layout.getLayout('mediumClassic'),
         agent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_RUNNING,
         numTraining=0,
         record=False);

Beginning 1000 episodes of Training
Pacman emerges victorious! Score: 1339
Pacman emerges victorious! Score: 1350
Pacman emerges victorious! Score: 1324
Pacman died! Score: -331
Pacman emerges victorious! Score: 1303
Pacman died! Score: 106
Pacman emerges victorious! Score: 1309
Pacman died! Score: 279
Pacman emerges victorious! Score: 1343
Pacman emerges victorious! Score: 1337
Pacman died! Score: -363
Pacman emerges victorious! Score: 1313
Pacman emerges victorious! Score: 1345
Pacman died! Score: 33
Pacman died! Score: 123
Pacman emerges victorious! Score: 1348
Pacman emerges victorious! Score: 1326
Pacman died! Score: -265
Pacman died! Score: 92
Pacman emerges victorious! Score: 1330
Pacman emerges victorious! Score: 1335
Pacman emerges victorious! Score: 1534
Pacman emerges victorious! Score: 1333
Pacman died! Score: -245
Pacman died! Score: 79
Pacman emerges victorious! Score: 1328
Pacman died! Score: 114
Pacman emerges victorious! Score: 1333
Pacman emerges victorious! Score: 13

Pacman emerges victorious! Score: 1321
Pacman died! Score: -221
Pacman emerges victorious! Score: 1513
Pacman emerges victorious! Score: 1336
Pacman emerges victorious! Score: 1319
Pacman emerges victorious! Score: 1307
Pacman died! Score: -115
Pacman emerges victorious! Score: 1316
Pacman died! Score: -6
Pacman emerges victorious! Score: 1313
Pacman emerges victorious! Score: 1508
Pacman died! Score: -340
Pacman emerges victorious! Score: 1325
Pacman emerges victorious! Score: 1333
Pacman died! Score: -248
Pacman emerges victorious! Score: 1338
Pacman emerges victorious! Score: 1341
Pacman emerges victorious! Score: 1338
Pacman emerges victorious! Score: 1347
Pacman emerges victorious! Score: 1351
Pacman emerges victorious! Score: 1324
Pacman emerges victorious! Score: 1534
Pacman emerges victorious! Score: 1347
Pacman emerges victorious! Score: 1343
Pacman died! Score: -380
Pacman died! Score: -423
Pacman emerges victorious! Score: 1343
Pacman emerges victorious! Score: 1321
Pacman e

Pacman emerges victorious! Score: 1316
Pacman emerges victorious! Score: 1311
Pacman died! Score: -393
Pacman emerges victorious! Score: 1308
Pacman emerges victorious! Score: 1335
Pacman died! Score: -386
Pacman died! Score: -309
Pacman emerges victorious! Score: 1336
Pacman emerges victorious! Score: 1314
Pacman emerges victorious! Score: 1333
Pacman died! Score: 165
Pacman emerges victorious! Score: 1341
Pacman died! Score: -377
Pacman died! Score: -332
Pacman emerges victorious! Score: 1348
Pacman emerges victorious! Score: 1300
Pacman emerges victorious! Score: 1342
Pacman emerges victorious! Score: 1326
Pacman emerges victorious! Score: 1329
Pacman emerges victorious! Score: 1348
Pacman emerges victorious! Score: 1346
Pacman emerges victorious! Score: 1304
Pacman died! Score: -358
Pacman emerges victorious! Score: 1334
Pacman emerges victorious! Score: 1316
Pacman emerges victorious! Score: 1322
Pacman emerges victorious! Score: 1342
Pacman emerges victorious! Score: 1323
Pacman 

Pacman emerges victorious! Score: 1327
Pacman emerges victorious! Score: 1324
Pacman emerges victorious! Score: 1321
Pacman emerges victorious! Score: 1315
Pacman emerges victorious! Score: 1312
Pacman emerges victorious! Score: 1340
Pacman emerges victorious! Score: 1340
Pacman died! Score: 3
Pacman emerges victorious! Score: 1346
Pacman emerges victorious! Score: 1348
Pacman died! Score: -53
Reinforcement Learning Status:
	Completed 700 out of 1000 training episodes
	Average Rewards over all training: 952.47
	Average Rewards for last 100 episodes: 1012.36
	Episode took 13.78 seconds
Pacman died! Score: -374
Pacman emerges victorious! Score: 1325
Pacman died! Score: -359
Pacman emerges victorious! Score: 1326
Pacman emerges victorious! Score: 1516
Pacman emerges victorious! Score: 1322
Pacman emerges victorious! Score: 1340
Pacman emerges victorious! Score: 1324
Pacman died! Score: -186
Pacman died! Score: 83
Pacman emerges victorious! Score: 1312
Pacman emerges victorious! Score: 133

Pacman emerges victorious! Score: 1331
Pacman emerges victorious! Score: 1329
Pacman emerges victorious! Score: 1516
Pacman emerges victorious! Score: 1326
Pacman emerges victorious! Score: 1333
Pacman died! Score: 696
Pacman emerges victorious! Score: 1336
Pacman emerges victorious! Score: 1323
Pacman emerges victorious! Score: 1493
Pacman emerges victorious! Score: 1330
Pacman emerges victorious! Score: 1342
Pacman emerges victorious! Score: 1519
Pacman emerges victorious! Score: 1341
Pacman emerges victorious! Score: 1539
Pacman emerges victorious! Score: 1323
Pacman emerges victorious! Score: 1323
Pacman emerges victorious! Score: 1321
Pacman died! Score: -369
Pacman emerges victorious! Score: 1338
Pacman emerges victorious! Score: 1350
Pacman emerges victorious! Score: 1315
Pacman died! Score: -367
Pacman emerges victorious! Score: 1312
Pacman emerges victorious! Score: 1315
Pacman emerges victorious! Score: 1350
Pacman died! Score: 188
Pacman died! Score: -366
Pacman emerges vict

As we can see from the output above, when running the `smallClassic` agent on the map `mediumClassic`, then it has a 73.7% winrate.

In [36]:
agent = ApproximateQAgent(weights = smallClassicAgent.getWeights(), numTraining=NUM_TRAINING)
runGames(layout.getLayout('originalClassic'),
         agent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_RUNNING,
         numTraining=0,
         record=False);

Beginning 1000 episodes of Training
Pacman died! Score: 251
Pacman died! Score: 306
Pacman died! Score: 49
Pacman emerges victorious! Score: 2417
Pacman emerges victorious! Score: 2607
Pacman died! Score: 468
Pacman emerges victorious! Score: 2437
Pacman died! Score: 2
Pacman died! Score: 1220
Pacman died! Score: 1170
Pacman died! Score: 39
Pacman died! Score: -8
Pacman died! Score: 913
Pacman emerges victorious! Score: 2440
Pacman died! Score: 137
Pacman emerges victorious! Score: 2428
Pacman emerges victorious! Score: 2422
Pacman emerges victorious! Score: 2430
Pacman emerges victorious! Score: 2604
Pacman emerges victorious! Score: 2608
Pacman died! Score: 316
Pacman died! Score: 0
Pacman emerges victorious! Score: 2436
Pacman emerges victorious! Score: 2404
Pacman died! Score: -167
Pacman died! Score: 110
Pacman emerges victorious! Score: 2440
Pacman died! Score: 839
Pacman died! Score: 545
Pacman died! Score: -312
Pacman died! Score: 861
Pacman died! Score: -158
Pacman emerges vic

Pacman emerges victorious! Score: 2426
Pacman died! Score: 1371
Pacman emerges victorious! Score: 2442
Pacman emerges victorious! Score: 2613
Pacman emerges victorious! Score: 2453
Pacman died! Score: 1330
Pacman died! Score: 928
Pacman died! Score: -164
Pacman died! Score: 1205
Pacman emerges victorious! Score: 2440
Pacman died! Score: 97
Pacman died! Score: 1218
Pacman died! Score: 138
Pacman died! Score: 346
Pacman died! Score: 213
Pacman died! Score: 166
Pacman emerges victorious! Score: 2412
Pacman died! Score: 957
Pacman emerges victorious! Score: 2449
Pacman emerges victorious! Score: 2441
Pacman died! Score: -205
Pacman emerges victorious! Score: 2447
Pacman emerges victorious! Score: 2403
Pacman died! Score: 609
Pacman died! Score: 1317
Pacman emerges victorious! Score: 2411
Pacman died! Score: -24
Pacman emerges victorious! Score: 2410
Pacman died! Score: 721
Pacman emerges victorious! Score: 2429
Pacman died! Score: 287
Pacman died! Score: -134
Pacman died! Score: 375
Pacman

Pacman emerges victorious! Score: 2418
Pacman emerges victorious! Score: 2447
Pacman died! Score: 1203
Pacman emerges victorious! Score: 2410
Pacman emerges victorious! Score: 2365
Pacman died! Score: -58
Pacman emerges victorious! Score: 2592
Pacman died! Score: 1081
Pacman died! Score: 403
Pacman emerges victorious! Score: 2365
Pacman emerges victorious! Score: 2414
Pacman emerges victorious! Score: 2441
Pacman died! Score: 570
Pacman emerges victorious! Score: 2397
Pacman emerges victorious! Score: 2395
Pacman died! Score: 1026
Pacman emerges victorious! Score: 2415
Pacman emerges victorious! Score: 2408
Pacman died! Score: 1152
Pacman emerges victorious! Score: 2447
Pacman emerges victorious! Score: 2411
Pacman emerges victorious! Score: 2444
Pacman died! Score: 172
Pacman emerges victorious! Score: 2439
Pacman died! Score: -160
Pacman died! Score: 318
Pacman emerges victorious! Score: 2411
Pacman emerges victorious! Score: 2624
Pacman emerges victorious! Score: 2377
Pacman died! S

Pacman emerges victorious! Score: 2438
Pacman died! Score: 104
Pacman emerges victorious! Score: 2411
Pacman died! Score: 1177
Pacman emerges victorious! Score: 2401
Pacman died! Score: -358
Pacman emerges victorious! Score: 2436
Pacman died! Score: 707
Pacman emerges victorious! Score: 2403
Pacman died! Score: 575
Pacman died! Score: -97
Pacman died! Score: 991
Pacman died! Score: -312
Pacman emerges victorious! Score: 2448
Pacman died! Score: 921
Pacman died! Score: 1246
Pacman died! Score: 1176
Pacman emerges victorious! Score: 2610
Pacman emerges victorious! Score: 2404
Pacman died! Score: 205
Pacman emerges victorious! Score: 2417
Pacman emerges victorious! Score: 2442
Pacman died! Score: 111
Pacman died! Score: 1094
Pacman emerges victorious! Score: 2623
Pacman died! Score: 1181
Pacman died! Score: -369
Pacman died! Score: 23
Pacman died! Score: 663
Pacman died! Score: 209
Pacman emerges victorious! Score: 2415
Pacman died! Score: 1389
Pacman died! Score: 1066
Pacman emerges vict

As we can see from the output above, when running the `smallClassic` agent on the map `originalClassic`, then it has a 44% winrate.

## Running mediumClassicAgent

Now we run the mediumClassicAgent on all the maps and getting the percentage of games won.

In [37]:
agent = ApproximateQAgent(weights = mediumClassicAgent.getWeights(), numTraining=NUM_TRAINING)
runGames(layout.getLayout('smallClassic'),
         agent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_RUNNING,
         numTraining=0,
         record=False);

Beginning 1000 episodes of Training
Pacman emerges victorious! Score: 982
Pacman died! Score: -178
Pacman emerges victorious! Score: 977
Pacman emerges victorious! Score: 976
Pacman emerges victorious! Score: 979
Pacman emerges victorious! Score: 981
Pacman emerges victorious! Score: 972
Pacman died! Score: -150
Pacman died! Score: -209
Pacman emerges victorious! Score: 984
Pacman emerges victorious! Score: 964
Pacman died! Score: -107
Pacman emerges victorious! Score: 963
Pacman died! Score: -141
Pacman died! Score: -329
Pacman emerges victorious! Score: 966
Pacman emerges victorious! Score: 966
Pacman emerges victorious! Score: 980
Pacman emerges victorious! Score: 969
Pacman died! Score: -92
Pacman emerges victorious! Score: 984
Pacman emerges victorious! Score: 968
Pacman emerges victorious! Score: 957
Pacman died! Score: -33
Pacman emerges victorious! Score: 978
Pacman died! Score: -341
Pacman died! Score: -117
Pacman died! Score: -299
Pacman died! Score: -325
Pacman died! Score: 

Pacman died! Score: -60
Pacman emerges victorious! Score: 964
Pacman emerges victorious! Score: 969
Pacman died! Score: -314
Pacman died! Score: -103
Pacman emerges victorious! Score: 985
Pacman died! Score: -49
Pacman died! Score: -337
Pacman emerges victorious! Score: 978
Pacman emerges victorious! Score: 972
Pacman emerges victorious! Score: 964
Pacman died! Score: -71
Pacman emerges victorious! Score: 972
Pacman died! Score: -122
Pacman died! Score: -38
Pacman emerges victorious! Score: 967
Pacman died! Score: -64
Pacman died! Score: -248
Pacman died! Score: -115
Pacman emerges victorious! Score: 972
Pacman emerges victorious! Score: 1183
Pacman died! Score: -195
Pacman died! Score: -272
Pacman emerges victorious! Score: 970
Pacman died! Score: -234
Pacman emerges victorious! Score: 981
Pacman emerges victorious! Score: 962
Pacman died! Score: -298
Pacman died! Score: -125
Pacman emerges victorious! Score: 977
Pacman died! Score: -124
Pacman emerges victorious! Score: 980
Pacman em

	Completed 500 out of 1000 training episodes
	Average Rewards over all training: 493.00
	Average Rewards for last 100 episodes: 501.13
	Episode took 6.12 seconds
Pacman emerges victorious! Score: 984
Pacman died! Score: -438
Pacman emerges victorious! Score: 969
Pacman emerges victorious! Score: 968
Pacman died! Score: -348
Pacman emerges victorious! Score: 984
Pacman emerges victorious! Score: 1134
Pacman emerges victorious! Score: 966
Pacman emerges victorious! Score: 977
Pacman died! Score: -147
Pacman died! Score: -115
Pacman died! Score: -49
Pacman died! Score: -220
Pacman emerges victorious! Score: 965
Pacman emerges victorious! Score: 981
Pacman emerges victorious! Score: 958
Pacman died! Score: -344
Pacman died! Score: -353
Pacman emerges victorious! Score: 973
Pacman emerges victorious! Score: 968
Pacman emerges victorious! Score: 924
Pacman emerges victorious! Score: 976
Pacman emerges victorious! Score: 962
Pacman emerges victorious! Score: 981
Pacman emerges victorious! Sco

Pacman died! Score: -175
Pacman emerges victorious! Score: 975
Pacman emerges victorious! Score: 966
Pacman died! Score: -337
Pacman emerges victorious! Score: 969
Pacman emerges victorious! Score: 965
Pacman emerges victorious! Score: 975
Pacman emerges victorious! Score: 966
Pacman emerges victorious! Score: 968
Pacman emerges victorious! Score: 967
Pacman died! Score: -78
Pacman died! Score: -305
Pacman emerges victorious! Score: 967
Pacman emerges victorious! Score: 985
Pacman emerges victorious! Score: 952
Pacman emerges victorious! Score: 976
Pacman died! Score: 6
Pacman died! Score: -344
Pacman died! Score: -274
Pacman emerges victorious! Score: 961
Pacman died! Score: -254
Pacman emerges victorious! Score: 978
Pacman emerges victorious! Score: 980
Pacman died! Score: -221
Pacman died! Score: -127
Pacman died! Score: -320
Pacman died! Score: -58
Pacman died! Score: -112
Pacman emerges victorious! Score: 987
Pacman died! Score: -318
Pacman emerges victorious! Score: 976
Pacman em

Pacman emerges victorious! Score: 977
Pacman died! Score: 137
Pacman emerges victorious! Score: 973
Pacman emerges victorious! Score: 982
Pacman died! Score: -341
Pacman died! Score: -429
Pacman emerges victorious! Score: 962
Pacman emerges victorious! Score: 976
Pacman emerges victorious! Score: 1180
Pacman emerges victorious! Score: 978
Pacman died! Score: -62
Pacman emerges victorious! Score: 970
Pacman emerges victorious! Score: 979
Pacman died! Score: -360
Pacman emerges victorious! Score: 972
Pacman died! Score: -384
Pacman emerges victorious! Score: 972
Pacman died! Score: -63
Pacman died! Score: -310
Pacman emerges victorious! Score: 976
Pacman emerges victorious! Score: 985
Reinforcement Learning Status:
	Completed 1000 out of 1000 training episodes
	Average Rewards over all training: 523.46
	Average Rewards for last 100 episodes: 438.04
	Episode took 6.29 seconds
Training Done (turning off epsilon and alpha)
---------------------------------------------
{'bias': 279.606907168

As we can see from the output above, when running the `mediumClassic` agent on the map `smallClassic`, then it has a 62% winrate.

In [38]:
agent = ApproximateQAgent(weights = mediumClassicAgent.getWeights(), numTraining=NUM_TRAINING)
runGames(layout.getLayout('mediumClassic'),
         agent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_RUNNING,
         numTraining=0,
         record=False);

Beginning 1000 episodes of Training
Pacman died! Score: 95
Pacman emerges victorious! Score: 1326
Pacman died! Score: -368
Pacman emerges victorious! Score: 1343
Pacman emerges victorious! Score: 1348
Pacman emerges victorious! Score: 1322
Pacman emerges victorious! Score: 1328
Pacman emerges victorious! Score: 1315
Pacman emerges victorious! Score: 1335
Pacman emerges victorious! Score: 1330
Pacman emerges victorious! Score: 1350
Pacman died! Score: 256
Pacman emerges victorious! Score: 1536
Pacman emerges victorious! Score: 1329
Pacman emerges victorious! Score: 1311
Pacman emerges victorious! Score: 1502
Pacman emerges victorious! Score: 1331
Pacman emerges victorious! Score: 1341
Pacman emerges victorious! Score: 1328
Pacman emerges victorious! Score: 1329
Pacman died! Score: -241
Pacman died! Score: -193
Pacman emerges victorious! Score: 1332
Pacman died! Score: -378
Pacman emerges victorious! Score: 1347
Pacman emerges victorious! Score: 1322
Pacman emerges victorious! Score: 133

Pacman emerges victorious! Score: 1335
Pacman died! Score: -156
Pacman died! Score: 5
Pacman emerges victorious! Score: 1330
Pacman emerges victorious! Score: 1339
Pacman emerges victorious! Score: 1310
Pacman emerges victorious! Score: 1335
Pacman died! Score: -359
Pacman emerges victorious! Score: 1342
Pacman emerges victorious! Score: 1343
Pacman emerges victorious! Score: 1336
Pacman died! Score: -65
Pacman emerges victorious! Score: 1318
Pacman emerges victorious! Score: 1334
Pacman emerges victorious! Score: 1335
Pacman emerges victorious! Score: 1344
Pacman emerges victorious! Score: 1523
Pacman emerges victorious! Score: 1532
Pacman emerges victorious! Score: 1336
Pacman emerges victorious! Score: 1338
Pacman emerges victorious! Score: 1341
Pacman died! Score: -386
Pacman emerges victorious! Score: 1332
Pacman emerges victorious! Score: 1310
Pacman emerges victorious! Score: 1323
Pacman emerges victorious! Score: 1320
Pacman emerges victorious! Score: 1332
Pacman died! Score: -

Pacman emerges victorious! Score: 1303
Pacman emerges victorious! Score: 1550
Pacman died! Score: -19
Pacman emerges victorious! Score: 1328
Pacman emerges victorious! Score: 1307
Pacman emerges victorious! Score: 1322
Pacman emerges victorious! Score: 1333
Pacman emerges victorious! Score: 1343
Pacman died! Score: 314
Pacman emerges victorious! Score: 1294
Pacman emerges victorious! Score: 1329
Pacman emerges victorious! Score: 1341
Pacman emerges victorious! Score: 1321
Pacman died! Score: 280
Pacman emerges victorious! Score: 1337
Pacman emerges victorious! Score: 1296
Pacman died! Score: -36
Pacman emerges victorious! Score: 1314
Pacman emerges victorious! Score: 1335
Pacman emerges victorious! Score: 1538
Pacman emerges victorious! Score: 1344
Pacman emerges victorious! Score: 1339
Pacman died! Score: 32
Pacman emerges victorious! Score: 1343
Pacman emerges victorious! Score: 1317
Pacman died! Score: -79
Pacman emerges victorious! Score: 1318
Pacman died! Score: 49
Pacman died! Sc

Pacman emerges victorious! Score: 1335
Pacman emerges victorious! Score: 1346
Pacman died! Score: -356
Pacman emerges victorious! Score: 1322
Pacman emerges victorious! Score: 1292
Pacman emerges victorious! Score: 1338
Pacman emerges victorious! Score: 1334
Pacman emerges victorious! Score: 1325
Pacman emerges victorious! Score: 1349
Pacman emerges victorious! Score: 1519
Pacman died! Score: -66
Pacman emerges victorious! Score: 1346
Pacman emerges victorious! Score: 1341
Pacman emerges victorious! Score: 1337
Pacman emerges victorious! Score: 1325
Pacman emerges victorious! Score: 1305
Pacman emerges victorious! Score: 1333
Pacman emerges victorious! Score: 1318
Pacman died! Score: 63
Reinforcement Learning Status:
	Completed 700 out of 1000 training episodes
	Average Rewards over all training: 974.22
	Average Rewards for last 100 episodes: 955.60
	Episode took 13.67 seconds
Pacman died! Score: -36
Pacman emerges victorious! Score: 1335
Pacman emerges victorious! Score: 1330
Pacman e

Pacman emerges victorious! Score: 1517
Pacman emerges victorious! Score: 1337
Pacman died! Score: -321
Pacman died! Score: 289
Pacman died! Score: 50
Pacman emerges victorious! Score: 1263
Pacman died! Score: -247
Pacman emerges victorious! Score: 1329
Pacman emerges victorious! Score: 1343
Pacman emerges victorious! Score: 1316
Pacman emerges victorious! Score: 1545
Pacman emerges victorious! Score: 1327
Pacman emerges victorious! Score: 1721
Pacman died! Score: -388
Pacman emerges victorious! Score: 1322
Pacman emerges victorious! Score: 1326
Pacman emerges victorious! Score: 1335
Pacman died! Score: -419
Pacman emerges victorious! Score: 1328
Pacman emerges victorious! Score: 1327
Pacman died! Score: 48
Pacman emerges victorious! Score: 1330
Pacman died! Score: -9
Pacman emerges victorious! Score: 1334
Pacman emerges victorious! Score: 1328
Pacman emerges victorious! Score: 1327
Pacman emerges victorious! Score: 1327
Pacman died! Score: -321
Pacman emerges victorious! Score: 1721
Pa

As we can see from the output above, when running the `mediumClassic` agent on the map `mediumClassic`, then it has a 72.8% winrate.

In [39]:
agent = ApproximateQAgent(weights = mediumClassicAgent.getWeights(), numTraining=NUM_TRAINING)
runGames(layout.getLayout('originalClassic'),
         agent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_RUNNING,
         numTraining=0,
         record=False);

Beginning 1000 episodes of Training
Pacman emerges victorious! Score: 2433
Pacman died! Score: 1158
Pacman emerges victorious! Score: 2462
Pacman died! Score: 695
Pacman died! Score: 703
Pacman died! Score: 1184
Pacman died! Score: 1264
Pacman emerges victorious! Score: 2446
Pacman died! Score: -199
Pacman died! Score: 621
Pacman died! Score: 395
Pacman emerges victorious! Score: 2433
Pacman died! Score: 625
Pacman emerges victorious! Score: 2405
Pacman died! Score: 144
Pacman emerges victorious! Score: 2419
Pacman died! Score: 1617
Pacman died! Score: 1133
Pacman died! Score: 84
Pacman died! Score: 797
Pacman died! Score: 1105
Pacman died! Score: 1539
Pacman died! Score: -51
Pacman emerges victorious! Score: 2449
Pacman emerges victorious! Score: 2629
Pacman died! Score: -313
Pacman died! Score: 32
Pacman died! Score: 1249
Pacman emerges victorious! Score: 2439
Pacman emerges victorious! Score: 2649
Pacman emerges victorious! Score: 2637
Pacman emerges victorious! Score: 2445
Pacman d

Pacman died! Score: 1261
Pacman emerges victorious! Score: 2601
Pacman emerges victorious! Score: 2446
Pacman emerges victorious! Score: 2436
Pacman emerges victorious! Score: 2419
Pacman emerges victorious! Score: 2451
Pacman emerges victorious! Score: 2634
Pacman emerges victorious! Score: 2373
Pacman emerges victorious! Score: 2431
Pacman died! Score: 197
Pacman died! Score: 1
Pacman emerges victorious! Score: 2521
Pacman died! Score: 493
Pacman emerges victorious! Score: 2457
Pacman emerges victorious! Score: 2603
Pacman emerges victorious! Score: 2426
Pacman died! Score: 508
Pacman emerges victorious! Score: 2440
Pacman emerges victorious! Score: 2449
Pacman emerges victorious! Score: 2395
Pacman died! Score: 517
Pacman died! Score: -153
Pacman died! Score: 899
Pacman died! Score: -348
Pacman died! Score: 138
Pacman died! Score: -243
Pacman died! Score: 1153
Pacman emerges victorious! Score: 2622
Pacman emerges victorious! Score: 2414
Pacman emerges victorious! Score: 2590
Pacman 

Pacman died! Score: 885
Pacman emerges victorious! Score: 2397
Pacman died! Score: 1341
Pacman emerges victorious! Score: 2440
Pacman died! Score: 109
Pacman died! Score: 1256
Pacman died! Score: 98
Pacman died! Score: -133
Pacman emerges victorious! Score: 2438
Pacman died! Score: 379
Pacman emerges victorious! Score: 2410
Pacman died! Score: 817
Pacman died! Score: 716
Pacman died! Score: 876
Pacman emerges victorious! Score: 2571
Pacman died! Score: 550
Pacman died! Score: 71
Pacman emerges victorious! Score: 2455
Pacman died! Score: 821
Pacman died! Score: 660
Pacman died! Score: 1369
Pacman emerges victorious! Score: 2637
Pacman emerges victorious! Score: 2418
Pacman died! Score: 1339
Pacman died! Score: 962
Pacman emerges victorious! Score: 2444
Pacman emerges victorious! Score: 2428
Pacman emerges victorious! Score: 2434
Pacman died! Score: 1193
Pacman died! Score: 1564
Pacman died! Score: 433
Pacman emerges victorious! Score: 2409
Pacman emerges victorious! Score: 2453
Pacman e

Pacman died! Score: 1049
Pacman emerges victorious! Score: 2469
Pacman emerges victorious! Score: 2462
Pacman emerges victorious! Score: 2608
Pacman died! Score: -198
Pacman died! Score: -157
Pacman emerges victorious! Score: 2418
Pacman died! Score: 256
Pacman died! Score: 94
Pacman emerges victorious! Score: 2634
Pacman emerges victorious! Score: 2453
Pacman emerges victorious! Score: 2449
Pacman emerges victorious! Score: 2642
Pacman emerges victorious! Score: 2641
Pacman died! Score: 923
Pacman emerges victorious! Score: 2635
Pacman emerges victorious! Score: 2422
Pacman died! Score: 1052
Pacman emerges victorious! Score: 2448
Pacman died! Score: -179
Pacman emerges victorious! Score: 2605
Pacman died! Score: 406
Pacman emerges victorious! Score: 2418
Pacman emerges victorious! Score: 2421
Pacman died! Score: 88
Pacman died! Score: -243
Pacman died! Score: 954
Pacman emerges victorious! Score: 2635
Pacman emerges victorious! Score: 2407
Pacman emerges victorious! Score: 2419
Pacman

As we can see from the output above, when running the `mediumClassic` agent on the map `originalClassic`, then it has a 45% winrate.

## Running originalClassicAgent

Now we run the originalClassicAgent on all the maps and getting the percentage of games won.

In [40]:
agent = ApproximateQAgent(weights = originalClassicAgent.getWeights(), numTraining=NUM_TRAINING)
runGames(layout.getLayout('smallClassic'),
         agent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_RUNNING,
         numTraining=0,
         record=False);

Beginning 1000 episodes of Training
Pacman emerges victorious! Score: 1161
Pacman died! Score: -440
Pacman emerges victorious! Score: 984
Pacman emerges victorious! Score: 965
Pacman emerges victorious! Score: 959
Pacman emerges victorious! Score: 963
Pacman died! Score: -43
Pacman emerges victorious! Score: 1181
Pacman died! Score: -200
Pacman emerges victorious! Score: 978
Pacman emerges victorious! Score: 958
Pacman emerges victorious! Score: 977
Pacman emerges victorious! Score: 980
Pacman emerges victorious! Score: 974
Pacman emerges victorious! Score: 969
Pacman died! Score: -100
Pacman emerges victorious! Score: 984
Pacman died! Score: -116
Pacman died! Score: -394
Pacman died! Score: -312
Pacman emerges victorious! Score: 979
Pacman emerges victorious! Score: 980
Pacman emerges victorious! Score: 970
Pacman emerges victorious! Score: 965
Pacman emerges victorious! Score: 973
Pacman died! Score: -447
Pacman emerges victorious! Score: 987
Pacman died! Score: -119
Pacman died! Sco

Pacman died! Score: -67
Pacman emerges victorious! Score: 979
Pacman emerges victorious! Score: 983
Pacman emerges victorious! Score: 1163
Pacman died! Score: -457
Pacman emerges victorious! Score: 961
Pacman emerges victorious! Score: 978
Pacman died! Score: -49
Pacman died! Score: -116
Pacman emerges victorious! Score: 981
Pacman emerges victorious! Score: 983
Pacman emerges victorious! Score: 983
Pacman emerges victorious! Score: 974
Pacman emerges victorious! Score: 984
Pacman emerges victorious! Score: 962
Pacman died! Score: -31
Pacman died! Score: -440
Pacman died! Score: -198
Pacman emerges victorious! Score: 980
Pacman emerges victorious! Score: 976
Pacman died! Score: -447
Pacman died! Score: -159
Pacman emerges victorious! Score: 983
Pacman emerges victorious! Score: 982
Pacman emerges victorious! Score: 978
Pacman emerges victorious! Score: 972
Pacman emerges victorious! Score: 975
Pacman emerges victorious! Score: 969
Pacman emerges victorious! Score: 965
Pacman emerges vi

Pacman emerges victorious! Score: 962
Pacman emerges victorious! Score: 1364
Pacman emerges victorious! Score: 958
Pacman emerges victorious! Score: 966
Pacman died! Score: -311
Pacman emerges victorious! Score: 984
Pacman emerges victorious! Score: 1344
Pacman emerges victorious! Score: 979
Pacman died! Score: -184
Pacman died! Score: -123
Pacman emerges victorious! Score: 1177
Pacman emerges victorious! Score: 941
Pacman emerges victorious! Score: 974
Pacman emerges victorious! Score: 974
Pacman emerges victorious! Score: 983
Pacman emerges victorious! Score: 981
Reinforcement Learning Status:
	Completed 500 out of 1000 training episodes
	Average Rewards over all training: 546.93
	Average Rewards for last 100 episodes: 641.87
	Episode took 6.90 seconds
Pacman emerges victorious! Score: 971
Pacman emerges victorious! Score: 951
Pacman emerges victorious! Score: 984
Pacman died! Score: -48
Pacman died! Score: -126
Pacman emerges victorious! Score: 955
Pacman emerges victorious! Score: 

Pacman emerges victorious! Score: 980
Pacman emerges victorious! Score: 969
Pacman died! Score: -297
Pacman emerges victorious! Score: 967
Pacman died! Score: -241
Pacman emerges victorious! Score: 976
Pacman died! Score: -377
Pacman emerges victorious! Score: 974
Pacman died! Score: -115
Pacman died! Score: -72
Pacman emerges victorious! Score: 982
Pacman emerges victorious! Score: 981
Pacman emerges victorious! Score: 968
Pacman died! Score: -77
Pacman emerges victorious! Score: 960
Pacman emerges victorious! Score: 986
Pacman emerges victorious! Score: 987
Pacman emerges victorious! Score: 982
Pacman emerges victorious! Score: 969
Pacman emerges victorious! Score: 980
Pacman died! Score: -258
Pacman died! Score: -287
Pacman died! Score: -420
Pacman emerges victorious! Score: 975
Pacman died! Score: -358
Pacman emerges victorious! Score: 965
Pacman died! Score: -218
Pacman died! Score: -54
Pacman emerges victorious! Score: 976
Pacman emerges victorious! Score: 976
Pacman emerges vict

Pacman died! Score: -241
Pacman emerges victorious! Score: 971
Pacman died! Score: -316
Pacman died! Score: -330
Pacman died! Score: -191
Pacman emerges victorious! Score: 966
Pacman died! Score: -303
Pacman emerges victorious! Score: 985
Pacman emerges victorious! Score: 976
Pacman died! Score: -326
Pacman emerges victorious! Score: 974
Pacman died! Score: -371
Pacman died! Score: -398
Pacman died! Score: -241
Pacman emerges victorious! Score: 979
Pacman died! Score: -99
Pacman emerges victorious! Score: 969
Pacman emerges victorious! Score: 980
Pacman died! Score: -204
Pacman died! Score: -291
Pacman emerges victorious! Score: 964
Pacman emerges victorious! Score: 973
Pacman emerges victorious! Score: 976
Pacman emerges victorious! Score: 967
Pacman emerges victorious! Score: 976
Pacman emerges victorious! Score: 980
Pacman emerges victorious! Score: 963
Pacman died! Score: -330
Pacman died! Score: -288
Pacman emerges victorious! Score: 966
Pacman died! Score: -189
Pacman emerges vic

As we can see from the output above, when running the `originalClassic` agent on the map `smallClassic`, then it has a 66% winrate.

In [41]:
agent = ApproximateQAgent(weights = originalClassicAgent.getWeights(), numTraining=NUM_TRAINING)
runGames(layout.getLayout('mediumClassic'),
         agent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_RUNNING,
         numTraining=0,
         record=False);

Beginning 1000 episodes of Training
Pacman emerges victorious! Score: 1318
Pacman died! Score: -167
Pacman emerges victorious! Score: 1298
Pacman emerges victorious! Score: 1322
Pacman died! Score: 54
Pacman died! Score: -377
Pacman emerges victorious! Score: 1542
Pacman died! Score: 298
Pacman emerges victorious! Score: 1340
Pacman emerges victorious! Score: 1320
Pacman emerges victorious! Score: 1321
Pacman died! Score: -387
Pacman emerges victorious! Score: 1341
Pacman emerges victorious! Score: 1349
Pacman emerges victorious! Score: 1323
Pacman emerges victorious! Score: 1336
Pacman died! Score: -52
Pacman died! Score: -372
Pacman emerges victorious! Score: 1334
Pacman emerges victorious! Score: 1540
Pacman died! Score: 302
Pacman emerges victorious! Score: 1318
Pacman emerges victorious! Score: 1326
Pacman emerges victorious! Score: 1342
Pacman emerges victorious! Score: 1535
Pacman emerges victorious! Score: 1520
Pacman emerges victorious! Score: 1500
Pacman emerges victorious! S

Pacman emerges victorious! Score: 1329
Pacman emerges victorious! Score: 1320
Pacman died! Score: -299
Pacman emerges victorious! Score: 1326
Pacman died! Score: 283
Pacman emerges victorious! Score: 1331
Pacman died! Score: 255
Pacman died! Score: 119
Pacman emerges victorious! Score: 1319
Pacman emerges victorious! Score: 1344
Pacman emerges victorious! Score: 1323
Pacman emerges victorious! Score: 1328
Pacman emerges victorious! Score: 1332
Pacman emerges victorious! Score: 1314
Pacman emerges victorious! Score: 1323
Pacman died! Score: -38
Pacman emerges victorious! Score: 1320
Pacman emerges victorious! Score: 1336
Pacman died! Score: 59
Pacman emerges victorious! Score: 1328
Pacman died! Score: 65
Pacman died! Score: -386
Pacman emerges victorious! Score: 1322
Pacman emerges victorious! Score: 1318
Pacman emerges victorious! Score: 1350
Pacman died! Score: 49
Pacman emerges victorious! Score: 1305
Pacman died! Score: -339
Pacman emerges victorious! Score: 1303
Pacman emerges vict

Pacman emerges victorious! Score: 1339
Pacman emerges victorious! Score: 1330
Pacman emerges victorious! Score: 1516
Pacman died! Score: -305
Pacman emerges victorious! Score: 1346
Pacman emerges victorious! Score: 1307
Pacman emerges victorious! Score: 1342
Pacman emerges victorious! Score: 1339
Pacman emerges victorious! Score: 1308
Pacman emerges victorious! Score: 1297
Pacman died! Score: 191
Pacman died! Score: -357
Pacman died! Score: -341
Pacman emerges victorious! Score: 1335
Pacman emerges victorious! Score: 1348
Pacman emerges victorious! Score: 1348
Pacman emerges victorious! Score: 1331
Pacman emerges victorious! Score: 1322
Pacman emerges victorious! Score: 1527
Pacman emerges victorious! Score: 1338
Pacman emerges victorious! Score: 1351
Pacman emerges victorious! Score: 1335
Pacman emerges victorious! Score: 1341
Pacman emerges victorious! Score: 1329
Pacman emerges victorious! Score: 1342
Pacman emerges victorious! Score: 1337
Pacman emerges victorious! Score: 1532
Pacm

Pacman died! Score: 115
Pacman emerges victorious! Score: 1326
Pacman emerges victorious! Score: 1345
Pacman emerges victorious! Score: 1323
Pacman emerges victorious! Score: 1335
Pacman died! Score: -42
Pacman emerges victorious! Score: 1309
Pacman emerges victorious! Score: 1328
Pacman died! Score: -270
Pacman died! Score: -429
Pacman emerges victorious! Score: 1333
Pacman emerges victorious! Score: 1335
Pacman emerges victorious! Score: 1310
Reinforcement Learning Status:
	Completed 700 out of 1000 training episodes
	Average Rewards over all training: 909.82
	Average Rewards for last 100 episodes: 879.27
	Episode took 13.53 seconds
Pacman died! Score: -431
Pacman died! Score: -330
Pacman emerges victorious! Score: 1322
Pacman emerges victorious! Score: 1339
Pacman emerges victorious! Score: 1329
Pacman died! Score: -280
Pacman died! Score: -29
Pacman died! Score: -412
Pacman emerges victorious! Score: 1507
Pacman emerges victorious! Score: 1349
Pacman emerges victorious! Score: 1339

Pacman emerges victorious! Score: 1348
Pacman died! Score: -374
Pacman emerges victorious! Score: 1349
Pacman died! Score: -244
Pacman emerges victorious! Score: 1345
Pacman emerges victorious! Score: 1336
Pacman emerges victorious! Score: 1543
Pacman died! Score: -387
Pacman emerges victorious! Score: 1332
Pacman emerges victorious! Score: 1345
Pacman emerges victorious! Score: 1339
Pacman died! Score: 306
Pacman emerges victorious! Score: 1320
Pacman emerges victorious! Score: 1319
Pacman died! Score: -337
Pacman died! Score: 106
Pacman emerges victorious! Score: 1285
Pacman emerges victorious! Score: 1329
Pacman emerges victorious! Score: 1543
Pacman emerges victorious! Score: 1334
Pacman died! Score: -393
Pacman emerges victorious! Score: 1301
Pacman emerges victorious! Score: 1333
Pacman died! Score: -324
Pacman emerges victorious! Score: 1341
Pacman emerges victorious! Score: 1320
Pacman emerges victorious! Score: 1343
Pacman emerges victorious! Score: 1530
Pacman emerges victori

As we can see from the output above, when running the `originalClassic` agent on the map `mediumClassic`, then it has a 71.2% winrate.

In [42]:
agent = ApproximateQAgent(weights = originalClassicAgent.getWeights(), numTraining=NUM_TRAINING)
runGames(layout.getLayout('originalClassic'),
         agent,
         [RandomGhost(1), RandomGhost(2), RandomGhost(3), RandomGhost(4)],
         NullGraphics(),
         numGames=NUM_RUNNING,
         numTraining=0,
         record=False);

Beginning 1000 episodes of Training
Pacman died! Score: 280
Pacman died! Score: 608
Pacman died! Score: 630
Pacman emerges victorious! Score: 2625
Pacman died! Score: -137
Pacman emerges victorious! Score: 2642
Pacman died! Score: 1043
Pacman died! Score: 307
Pacman died! Score: 1111
Pacman died! Score: 776
Pacman died! Score: 1312
Pacman emerges victorious! Score: 2426
Pacman died! Score: 923
Pacman emerges victorious! Score: 2402
Pacman emerges victorious! Score: 2428
Pacman died! Score: 1352
Pacman died! Score: 446
Pacman died! Score: 1384
Pacman died! Score: 93
Pacman died! Score: 116
Pacman died! Score: 73
Pacman died! Score: 1185
Pacman died! Score: 1384
Pacman died! Score: 1341
Pacman died! Score: 1204
Pacman emerges victorious! Score: 2612
Pacman died! Score: 1176
Pacman died! Score: 102
Pacman died! Score: 374
Pacman emerges victorious! Score: 2398
Pacman died! Score: 244
Pacman emerges victorious! Score: 2390
Pacman emerges victorious! Score: 2380
Pacman emerges victorious! S

Pacman died! Score: 375
Pacman died! Score: 1285
Pacman died! Score: 1250
Pacman died! Score: 91
Pacman emerges victorious! Score: 2405
Pacman died! Score: 1084
Pacman died! Score: 1189
Pacman died! Score: 53
Pacman emerges victorious! Score: 2407
Pacman died! Score: 238
Pacman died! Score: 1112
Pacman died! Score: 1674
Pacman died! Score: -48
Pacman died! Score: 108
Pacman died! Score: 114
Pacman died! Score: 1
Pacman died! Score: 270
Pacman emerges victorious! Score: 2450
Pacman died! Score: 1401
Pacman died! Score: 1244
Pacman died! Score: 103
Pacman died! Score: -189
Pacman died! Score: 1287
Pacman died! Score: 584
Pacman died! Score: 1116
Pacman died! Score: 1233
Pacman emerges victorious! Score: 2436
Pacman died! Score: 70
Pacman died! Score: 581
Pacman died! Score: 644
Pacman died! Score: -127
Pacman emerges victorious! Score: 2432
Pacman died! Score: 1145
Pacman died! Score: 1458
Pacman died! Score: 81
Pacman died! Score: 256
Pacman emerges victorious! Score: 2450
Pacman emerge

Pacman emerges victorious! Score: 2433
Pacman died! Score: 36
Pacman died! Score: 392
Pacman died! Score: -242
Pacman died! Score: 149
Pacman emerges victorious! Score: 2456
Pacman died! Score: -297
Pacman died! Score: -351
Pacman died! Score: 1339
Pacman emerges victorious! Score: 2445
Pacman emerges victorious! Score: 2418
Pacman died! Score: 876
Pacman emerges victorious! Score: 2434
Pacman died! Score: 256
Pacman died! Score: 245
Pacman emerges victorious! Score: 2452
Pacman died! Score: 1278
Pacman emerges victorious! Score: 2599
Pacman died! Score: 115
Pacman emerges victorious! Score: 2461
Pacman died! Score: 1190
Pacman emerges victorious! Score: 2398
Pacman died! Score: 902
Pacman died! Score: 102
Pacman emerges victorious! Score: 2452
Pacman died! Score: 46
Pacman emerges victorious! Score: 2627
Pacman emerges victorious! Score: 2410
Pacman emerges victorious! Score: 2630
Pacman died! Score: 603
Pacman emerges victorious! Score: 2424
Pacman emerges victorious! Score: 2620
Pac

Pacman died! Score: 1360
Pacman died! Score: 589
Pacman emerges victorious! Score: 2350
Pacman emerges victorious! Score: 2439
Pacman emerges victorious! Score: 2746
Pacman emerges victorious! Score: 2608
Pacman died! Score: 476
Pacman died! Score: 495
Pacman emerges victorious! Score: 2435
Pacman died! Score: -222
Pacman emerges victorious! Score: 2651
Pacman died! Score: 858
Pacman died! Score: 143
Pacman died! Score: 670
Pacman emerges victorious! Score: 2629
Pacman emerges victorious! Score: 2437
Pacman died! Score: 1380
Pacman emerges victorious! Score: 2636
Pacman emerges victorious! Score: 2377
Pacman died! Score: 1373
Pacman died! Score: 238
Pacman died! Score: 698
Pacman died! Score: 974
Pacman emerges victorious! Score: 2421
Pacman died! Score: 1114
Pacman emerges victorious! Score: 2442
Pacman died! Score: 276
Pacman emerges victorious! Score: 2412
Pacman emerges victorious! Score: 2433
Pacman emerges victorious! Score: 2411
Pacman died! Score: 431
Reinforcement Learning Sta

As we can see from the output above, when running the `originalClassic` agent on the map `originalClassic`, then it has a 47% winrate.