## Deep Q Learning (DQN) agent on the CartPole-v0 task from the OpenAI Gym

- 目录
    - 任务目标
    - 算法依赖的库
    - Replay Memory
    - DQN algorithm

### 任务目标

- Agent 必须在两种动作中做出选择（向左或向右移动手推车）这样连接在手推车上的杆子才能保持直立。

### Import Packages

- 首先我们需要 gym 环境:
    `pip install gym`

- 和其他关于 pytorch 的库:
    - neural networks (torch.nn)
    - optimization (torch.optim)
    - automatic differentiation (torch.autograd)
    - utilities for vision tasks (torchvision - a separate package).

In [None]:
import gym
import math
import random
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from collections import namedtuple
from itertools import count
from PIL import Image

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms as T

env = gym.make('CartPole-v0').unwrapped

# set up matplotlib
is_ipython = 'inline' in matplotlib.get_backend()
if is_ipython:
    from IPython import display

plt.ion()

# if gpu is to be used
device = torch.device("cuda: 0" if torch.cuda.is_available() else "cpu")

### Replay Memory

- `Transition` 表示环境中单个转换的命名元组。
它实际上将 `(state, action)` 对映射到它们的 `(next_state, reward)` 结果 (状态是屏幕差异图像).

- `ReplayMemory` 一种大小有界的循环缓冲区，用于保存最近观察到的 `Transitions`。
它还实现了一个 `.sample()` 方法，用于选择用于训练的随机一批 `Transitions`。

In [None]:
Transition = namedtuple('Transition',
                        ('state', 'action', 'next_state', 'reward'))

class ReplayMemory(object):

    def __init__(self, capacity):
        self.capacity = capacity
        self.memory = []
        self.position = 0

    def push(self, *args):
        """Saves a transition."""
        if len(self.memory) < self.capacity:
            self.memory.append(None)
        self.memory[self.position] = Transition(*args)
        self.position = (self.position + 1) % self.capacity

    def sample(self, batch_size):
        return random.sample(self.memory, batch_size)

    def __len__(self):
        return len(self.memory)

### DQN algorithm

$$\mathcal{L} = \frac{1}{|B|}\sum_{(s, a, s', r) \ \in \ B} \mathcal{L}(\delta)$$
    