# Creating an AI to Play OpenAI's CartPole Simulation

### Deep Q-Network (DQN)

Due the the continous nature of this environment, approximating all of the possible action,states is inefficient, and uses up a substantial amount of resources for a fairly simple environment.  Instead, we will be using a deep Q-network.  A DQN works by approximating the optimal value function through the use of neural networks, as opposed to generating a Q-table.

In this approach, a simple neural network will be used to generate the optimal value function for the CartPole scenario!

The tutorial I will be following can be found [here.](https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html)

In [1]:
# Enables intellisense (press TAB after the .)
%config IPCompleter.greedy=True

import gym
import math
import random
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from collections import namedtuple
from itertools import count
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms as T


### Defining the environment, and the plot

In [3]:
env = gym.make('CartPole-v0').unwrapped

is_ipython = 'inline' in matplotlib.get_backend()

if is_ipython:
    from IPython import display
plt.ion()

[33mWARN: gym.spaces.Box autodetected dtype as <class 'numpy.float32'>. Please provide explicit dtype.[0m


The DQN will be utilizing replay memory, and this replay memory will be randomly sampled to help aid in the agent's decision making.  Since the agent samples from the replay memory randomly, the transitions that build up the batch will now be decorrelated.

There are to classes involved with this, first, the `Transition` class, and the `ReplayMemory` class.

- `Transition`: A named tuple that represents a single transition in an environment.
- `ReplayMemory`: A cyclie buffer of bounded size that maintains recent transitions.  It contains a `.sample()` method, to randomly retrieve a transition batch

In [None]:
Transition = namedtuple('Transition',('state','action','next_state','reward'))

class ReplayMemory(object):
    def __init__()