<a href="https://colab.research.google.com/github/wjdqlsdlsp/AI_using_pytorch-reinforce-learning/blob/main/%E1%84%80%E1%85%AA%E1%84%8C%E1%85%A65_Cliff_Walking_with_DQN_%E1%84%87%E1%85%A1%E1%86%A8%E1%84%8C%E1%85%A5%E1%86%BC%E1%84%87%E1%85%B5%E1%86%AB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **[인공지능] 과제5 Cliff Walking 예제 DQN 구현**
*   **DQN Class를 완성하여 결과를 살펴보는 것이 목표**입니다.
*   기본적인 코드는 아래 노트에 모두 작성되어 있습니다. 비어있는 함수 부분을 완성하면 됩니다.
*   **과제 수행 시 주의사항: 외부 라이브러리로 DQN 적용하지 말 것, 수업 때 배운 내용대로 DQN을 주어진 함수에 구현할 것.** 웹 상에 있는 다양한 DQN 코드를 참고하는 것은 괜찮습니다.
*   **보고서 작성 내용**: 여러분이 완성한 DQN 알고리즘의 내용과 결과의 의미를 분석하는 내용을 작성하면 됩니다.
작성한 코드와 실행 결과를 첨부하길 바라며, 코드에는 자세한 주석을 필수적으로 포함하기 바랍니다. 보고서는 PDF로 제출바랍니다.
*   보고서는 12월 16일 오후 11시 59분까지 블랙보드에 보고서 형태로 제출하면 됩니다. 지각은 0점입니다.
*   **Deep Sarsa를 추가로 구현하여 보고서에 관련 내용을 추가적으로 작성 및 제출할 경우 가산점 5점 부여**

# **본 노트를 본인의 drive로 복사하여 활용하기 바랍니다.**

본 과제 역시 이전과제와 동일한 환경을 사용합니다. 따라서 학습된 Q-value를 입력하면 해당하는 Q-value의 greedy 정책이 출력되도록 하는 QtoPolicy Class 또한 동일하게 사용됩니다.

In [None]:
import numpy as np
import random
from tqdm import tqdm
from collections import defaultdict, namedtuple, deque
from gym.envs.toy_text.cliffwalking import CliffWalkingEnv # Cliff Walking 환경

import torch
import torch.nn as nn
import torch.optim as optim

In [None]:
class QtoPolicy:
    def __init__(self):
        self.action = ['↑', '→', '↓', '←', 'X']

    def printPolicy(self, Q):
        policy = np.array([np.argmax(Q[key]) if key in Q else -1 for key in np.arange(48)])
        v = ([np.max(Q[key]) if key in Q else 0 for key in np.arange(48)])
        actions = np.stack([self.action for _ in range(len(policy))], axis=0)
        policy[36:] = np.array([0] + [3] * 10 + [4])

        print(np.take(actions, np.reshape(policy, (4, 12))))
        print('')

DQN의 experience replay기법 구현을 위한 replay buffer class를 생성합니다. DQN에서는 (state, action, reward, next_state)가 저장됩니다.

In [None]:
# Experience replay를 위한 replay buffer class 생성
Transition = namedtuple('Transition',
                        ('state', 'action', 'reward', 'next_state'))


class ReplayBuffer(object):

    def __init__(self, capacity):
        self.buffer = deque([],maxlen=capacity)

    def push(self, *args):
        self.buffer.append(Transition(*args))

    def sample(self, batch_size):
        return random.sample(self.buffer, batch_size)

    def __len__(self):
        return len(self.buffer)

DQN에서 Q-Network으로 사용할 신경망 모델을 PyTorch 기반으로 정의합니다.

In [None]:
class DNN(nn.Module):
    # DNN 모델 설계 및 초기값 설정
    def __init__(self, inputs, outputs):
        super(DNN, self).__init__()
        self.x_dim = inputs
        self.y_dim = outputs
        self.fc_variable_no = 100

        # network 용 변수
        self.fc_in = nn.Linear(self.x_dim, self.fc_variable_no)
        self.fc_hidden1 = nn.Linear(self.fc_variable_no, self.fc_variable_no)
        self.fc_hidden2 = nn.Linear(self.fc_variable_no, self.fc_variable_no)
        self.fc_hidden3 = nn.Linear(self.fc_variable_no, self.fc_variable_no)
        self.fc_out = nn.Linear(self.fc_variable_no, self.y_dim)
        self.relu = nn.ReLU()

    # 전파 과정
    def forward(self, x):
        x = torch.reshape(x, [-1, self.x_dim])
        x = self.relu(self.fc_in(x))
        x = self.relu(self.fc_hidden1(x))
        x = self.relu(self.fc_hidden2(x))
        x = self.relu(self.fc_hidden3(x))
        x = self.fc_out(x)
        return x

DQN 알고리즘 class를 정의합니다. 하이퍼파라미터는 주어진 값을 사용하면 됩니다.

In [None]:
class DQN:
    def __init__(self):
        self.state_no = 48 # state의 갯수
        self.action_no = 4 # action의 갯수
        self.alpha = 0.001 # 학습률
        self.gamma = 0.99 # Discount factor
        self.epsilon = 0.2 # 앱실론

        
        self.batch_size = 32  # Experience replay에서의 batch size
        self.training_interval = 10  # Q-Network 학습 interval
        self.target_update_interval = 100  # target Q-Network 학습 interval

        self.main_net = DNN(self.state_no, self.action_no) # DNN 모델
        self.target_net = DNN(self.state_no, self.action_no) # DNN 모델 (타겟 계산)

        # Fixed target Q-Network를 정의하고 main Q-network와 동일하게 초기화
        self.target_net.load_state_dict(self.main_net.state_dict())
        self.target_net.eval()

        self.optimizer = optim.Adam(self.main_net.parameters(), lr=self.alpha)
        # Experience replay를 위한 buffer 정의
        self.buffer = ReplayBuffer(500)

    # state의 인덱스가 연속적인 의미를 가지고 있지 않으므로 효과적인 학습을
    # 위해 one-hot encoding을 수행
    def one_hot_state(self, state):
        one_hot_encoded = np.zeros((1, self.state_no))
        one_hot_encoded[0, state] = 1

        return one_hot_encoded
    
    # 학습이 끝난 후 Q-Network에서 Q-value 계산하는 함수
    def get_q_values(self):
        q_values = defaultdict(lambda: [0.0] * self.action_no)
        # 각 state 별 Q-value 계산
        for i in range(self.state_no):
            state = torch.tensor(self.one_hot_state(i)).float()
            q_values[i] = self.main_net(state).tolist()
        return q_values

    # 신경망 최적화 모델
    def optimize_model(self):
        # 버퍼의 크기가 배치사이즈보다 작을 경우, return
        if len(self.buffer) < self.batch_size:
            return
        
        # 학습을 위한 transition의 랜덤 배치를 선택
        transitions = self.buffer.sample(self.batch_size)
        
        # 위에서 정의한 Transition을 이용하여 각각의 이름을 설정
        batch = Transition(*zip(*transitions))

        # batch.next_state값을 map을 이용하여 다음으로 변환 : None 여부 판단
        non_final_mask = torch.tensor(tuple(map(lambda s : s is not None,
                                                batch.next_state)), dtype=torch.bool)
        # torch.cat을 이용하여 tensor를 연결
        non_final_next_states = torch.cat([s for s in batch.next_state
                                                if s is not None])
        state_batch = torch.cat(batch.state)
        action_batch = torch.cat(batch.action)
        reward_batch = torch.cat(batch.reward)

        # 밸류 값 계산을 위해 main_net이용. Q(s)를 계산한 뒤, 이에 대한 action의 열을 선택
        state_action_values = self.main_net(state_batch).gather(1, action_batch)
        
        # target 값 계산을 위한 값 0으로 선언
        next_state_values = torch.zeros(self.batch_size)

        '''
        타겟 값 계산을 위해 target_net을 이용. 위에서 정의한 non_final_next_states를 입력
        이를 통해, 다음 state에 대한 Q(s')를 구함
        '''
        next_state_values[non_final_mask] = self.target_net(non_final_next_states).max(1)[0].detach()
        # 공식에 의해서, 타겟 값 계산
        target_state_action_values = (next_state_values * self.gamma) + reward_batch

        # 손실 함수 정의
        criterion = nn.SmoothL1Loss()
        loss = criterion(state_action_values, target_state_action_values)

        # 오류 역전파 진행
        self.optimizer.zero_grad()
        loss.backward()
        for param in self.main_net.parameters():
            param.grad.data.clamp_(-10, 10)
        # 그레디언트 갱신
        self.optimizer.step()

    # DQN 갱신 부분
    def update(self, state, action, reward, next_state, time_step):
        # 버퍼에 값 저장
        self.buffer.push(torch.from_numpy(state).float(),
                         torch.tensor(action).reshape((-1, 1)),
                         torch.tensor(reward).reshape((-1, 1)),
                         torch.from_numpy(next_state).float())
        # train_interval 마다, DNN 최적화 함수 실행
        if (time_step + 1) % self.training_interval == 0:
            self.optimize_model()
        # target_update_interval 마다, 타겟넷의 값을 main_net 값으로 갱신 
        if (time_step + 1) % self.target_update_interval == 0:
            self.target_net.load_state_dict(self.main_net.state_dict())


    # DQN epsilon-greedy 정책
    def act(self, state):
        # 설정한 입실론 값보다 작을 경우, 랜덤 액션을 취함
        if np.random.rand() < self.epsilon:
            action = np.random.choice(self.action_no)
        # 설정한 입실론 값보다 클 경우
        else:
            with torch.no_grad():
                # 모델에 입력하기 위해 텐서 변환
                state = torch.from_numpy(state).float()
                # main_net을 통해 q_value값을 얻음
                q_values = self.main_net(state)
                # q_value값이 가장 큰 값인 행동을 선택하여 return
                action = torch.argmax(q_values, 1).item()
        return action


이전 과제와 동일하게 OpenAI Gym에서의 Cliff Walking 환경을 로드하고, 주어진 Q-value에서 greedy policy를 출력하는 QtoPolicy Class를 정의합니다.


In [None]:
env = CliffWalkingEnv()
policy = QtoPolicy()

DQN Class를 정의하고 5000 episode 동안 학습을 수행합니다.

DQN에서의 training interval 및 target Q-network update interval을 위해 episode와 관계없는 time-step을 사용하여 DQN class에서 활용할 수 있게 해줍니다.

In [None]:
agent_DQN = DQN()
time_step = 0
for ep in tqdm(range(5000)):
    done = False
    state = env.reset()
    state = agent_DQN.one_hot_state(state)
    action = agent_DQN.act(state)
    ep_reward = 0
    ep_steps = 0
    if ep % 50 ==0:
        print("\n")
        policy.printPolicy(agent_DQN.get_q_values())
    while not done:
        next_state, reward, done, info = env.step(action)

        next_state = agent_DQN.one_hot_state(next_state)

        next_action = agent_DQN.act(next_state)

        agent_DQN.update(state, action, reward, next_state, time_step)
        time_step = time_step + 1

        ep_reward+=reward
        state = next_state
        action = next_action
        ep_steps = ep_steps + 1

  return F.smooth_l1_loss(input, target, reduction=self.reduction, beta=self.beta)




[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



  1%|          | 52/5000 [00:40<12:34,  6.56it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '↓' '→' '↓' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



  2%|▏         | 101/5000 [00:53<13:29,  6.05it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '↓' '→' '↓' '→' '→' '↓' '↓' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '↑' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



  3%|▎         | 150/5000 [01:02<16:01,  5.05it/s]



[['→' '↓' '→' '→' '→' '→' '↓' '→' '↓' '↓' '↓' '→']
 ['→' '→' '↓' '↓' '↓' '→' '↓' '↓' '→' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



  4%|▍         | 200/5000 [01:11<29:14,  2.74it/s]



[['→' '→' '→' '→' '→' '→' '↓' '→' '→' '→' '→' '→']
 ['→' '↓' '↓' '→' '→' '↓' '↓' '↓' '→' '→' '→' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



  5%|▌         | 251/5000 [01:19<09:00,  8.78it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



  6%|▌         | 300/5000 [01:25<10:42,  7.32it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



  7%|▋         | 349/5000 [01:32<08:49,  8.78it/s]



[['↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑']
 ['↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑']
 ['↑' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↑']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



  8%|▊         | 400/5000 [01:40<08:53,  8.62it/s]



[['↓' '↓' '→' '↓' '↓' '→' '↓' '→' '→' '↓' '↓' '↓']
 ['↓' '↓' '→' '↓' '↓' '↓' '→' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



  9%|▉         | 451/5000 [01:47<09:20,  8.12it/s]



[['↓' '↓' '↓' '→' '↓' '↓' '→' '↓' '↓' '→' '↓' '↓']
 ['↓' '↓' '↓' '→' '↓' '↓' '↓' '→' '↓' '→' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 10%|█         | 502/5000 [01:55<07:44,  9.69it/s]



[['←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←']
 ['←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←']
 ['←' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '←']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 11%|█         | 551/5000 [02:02<09:55,  7.47it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 12%|█▏        | 600/5000 [02:08<14:13,  5.15it/s]



[['↓' '↓' '↓' '←' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '→' '→' '↓' '↓' '←' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 13%|█▎        | 650/5000 [02:16<10:19,  7.02it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '↑' '→' '→' '→']
 ['→' '→' '↑' '→' '↑' '↑' '→' '→' '→' '↑' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 14%|█▍        | 701/5000 [02:24<11:48,  6.07it/s]



[['↓' '↓' '↓' '↓' '→' '↓' '↓' '↓' '↓' '→' '↓' '↓']
 ['↓' '↓' '↓' '↓' '→' '↓' '→' '↓' '↓' '→' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 15%|█▌        | 752/5000 [02:33<09:06,  7.77it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 16%|█▌        | 803/5000 [02:39<07:00,  9.97it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 17%|█▋        | 851/5000 [02:46<07:09,  9.66it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 18%|█▊        | 900/5000 [02:52<07:30,  9.11it/s]



[['→' '→' '↓' '→' '→' '→' '→' '↓' '↓' '→' '↓' '↓']
 ['→' '→' '→' '→' '↓' '→' '↓' '←' '→' '→' '→' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 19%|█▉        | 949/5000 [03:01<06:47,  9.93it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 20%|██        | 1000/5000 [03:09<15:03,  4.42it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '↓' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 21%|██        | 1050/5000 [03:18<12:20,  5.34it/s]



[['→' '→' '→' '↑' '→' '→' '↑' '→' '↑' '→' '→' '→']
 ['→' '→' '→' '↑' '↑' '↑' '↑' '→' '→' '→' '→' '↑']
 ['↑' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 22%|██▏       | 1100/5000 [03:28<11:51,  5.48it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '←' '←' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 23%|██▎       | 1151/5000 [03:36<08:24,  7.63it/s]



[['↓' '→' '↓' '→' '→' '↓' '→' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '↓' '→' '↓' '→' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 24%|██▍       | 1200/5000 [03:45<11:54,  5.32it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '↑' '→' '→' '↓' '↑' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 25%|██▌       | 1250/5000 [03:52<06:59,  8.94it/s]



[['←' '←' '→' '←' '→' '←' '←' '→' '→' '←' '→' '←']
 ['←' '←' '←' '→' '←' '←' '←' '→' '↓' '←' '←' '→']
 ['←' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '←']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 26%|██▌       | 1301/5000 [03:59<09:06,  6.77it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '→' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '→' '↓' '→' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 27%|██▋       | 1351/5000 [04:07<13:34,  4.48it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '←' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 28%|██▊       | 1400/5000 [04:15<10:30,  5.71it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '↓' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 29%|██▉       | 1451/5000 [04:23<09:48,  6.03it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '↓' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 30%|███       | 1501/5000 [04:31<07:45,  7.52it/s]



[['↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑']
 ['↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '→' '↑' '↑']
 ['↑' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↑']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 31%|███       | 1550/5000 [04:37<06:06,  9.42it/s]



[['→' '→' '→' '←' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '←']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 32%|███▏      | 1601/5000 [04:45<07:17,  7.77it/s]



[['←' '←' '←' '→' '←' '←' '→' '←' '←' '←' '←' '←']
 ['←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←']
 ['←' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '←']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 33%|███▎      | 1650/5000 [04:52<07:44,  7.22it/s]



[['→' '→' '→' '→' '→' '→' '←' '→' '←' '→' '→' '→']
 ['→' '→' '→' '←' '←' '←' '→' '→' '→' '→' '→' '←']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 34%|███▍      | 1701/5000 [05:00<07:17,  7.53it/s]



[['↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑']
 ['↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑']
 ['↑' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↑']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 35%|███▌      | 1750/5000 [05:08<10:22,  5.22it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 36%|███▌      | 1800/5000 [05:16<09:28,  5.63it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↑' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '↑' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 37%|███▋      | 1852/5000 [05:24<06:02,  8.68it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '→' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 38%|███▊      | 1901/5000 [05:32<10:12,  5.06it/s]



[['↓' '↓' '↓' '→' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '→' '→' '→' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 39%|███▉      | 1949/5000 [05:40<14:04,  3.61it/s]



[['↓' '↓' '↓' '→' '↓' '→' '↓' '↓' '↓' '↓' '↓' '↓']
 ['→' '→' '↓' '↓' '→' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 40%|████      | 2000/5000 [05:48<08:32,  5.86it/s]



[['↓' '↓' '↓' '↓' '↓' '→' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '→' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 41%|████      | 2052/5000 [05:56<07:11,  6.84it/s]



[['←' '→' '→' '→' '←' '→' '→' '←' '→' '→' '→' '←']
 ['←' '→' '←' '←' '→' '→' '←' '→' '←' '→' '→' '→']
 ['←' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '←']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 42%|████▏     | 2100/5000 [06:04<07:56,  6.08it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 43%|████▎     | 2152/5000 [06:12<07:07,  6.66it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 44%|████▍     | 2200/5000 [06:19<04:57,  9.42it/s]



[['→' '→' '↓' '→' '→' '→' '→' '→' '→' '→' '↓' '→']
 ['→' '↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 45%|████▌     | 2250/5000 [06:29<09:34,  4.79it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↑' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 46%|████▌     | 2299/5000 [06:35<04:07, 10.90it/s]



[['↓' '←' '←' '←' '↓' '←' '←' '↓' '←' '←' '←' '←']
 ['←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←']
 ['←' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 47%|████▋     | 2351/5000 [06:44<04:55,  8.95it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '↑' '→' '→' '→' '→' '→' '→' '→' '↑' '→']
 ['↑' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↑']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 48%|████▊     | 2402/5000 [06:55<04:35,  9.44it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 49%|████▉     | 2450/5000 [07:03<07:26,  5.71it/s]



[['↓' '→' '→' '→' '→' '↓' '↓' '→' '→' '→' '↓' '↓']
 ['→' '↓' '→' '→' '↓' '→' '→' '→' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 50%|████▉     | 2498/5000 [07:11<08:17,  5.03it/s]



[['→' '↓' '↓' '→' '→' '→' '→' '↓' '↓' '↓' '→' '→']
 ['↓' '→' '↓' '→' '→' '→' '↓' '→' '→' '↓' '→' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 51%|█████     | 2550/5000 [07:21<06:03,  6.75it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 52%|█████▏    | 2601/5000 [07:30<07:33,  5.28it/s]



[['↓' '↓' '↓' '↓' '→' '↓' '↓' '→' '→' '↓' '↓' '↓']
 ['↓' '→' '↓' '↓' '→' '↓' '→' '→' '↓' '↓' '→' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 53%|█████▎    | 2651/5000 [07:38<05:39,  6.92it/s]



[['→' '→' '→' '↓' '→' '→' '↓' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '↓' '→' '→']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 54%|█████▍    | 2701/5000 [07:47<04:33,  8.40it/s]



[['↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑']
 ['↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑']
 ['↑' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↑']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 55%|█████▌    | 2750/5000 [07:55<05:26,  6.90it/s]



[['↓' '↓' '↓' '↓' '↓' '←' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '←' '↓' '←' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 56%|█████▌    | 2802/5000 [08:04<06:17,  5.82it/s]



[['↓' '↓' '↓' '→' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '→' '↓' '↓' '→' '↓' '→' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 57%|█████▋    | 2851/5000 [08:11<06:11,  5.79it/s]



[['→' '←' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '←' '→' '→' '→' '→' '→' '→' '→' '←' '→']
 ['←' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 58%|█████▊    | 2901/5000 [08:22<08:40,  4.03it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 59%|█████▉    | 2951/5000 [08:30<05:58,  5.71it/s]



[['↓' '↓' '↓' '←' '→' '←' '↓' '↓' '↓' '↓' '←' '↓']
 ['↓' '↓' '←' '↓' '↓' '←' '↓' '↓' '→' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 60%|██████    | 3000/5000 [08:37<05:54,  5.64it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 61%|██████    | 3051/5000 [08:48<05:23,  6.02it/s]



[['↓' '↓' '↓' '↑' '→' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['→' '↓' '←' '→' '→' '→' '↓' '←' '←' '↓' '↓' '↓']
 ['↓' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 62%|██████▏   | 3098/5000 [09:05<11:31,  2.75it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 63%|██████▎   | 3150/5000 [09:18<05:32,  5.56it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '←' '←' '→' '→' '←' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 64%|██████▍   | 3201/5000 [09:27<04:05,  7.33it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '→' '↓' '→' '↓' '↓']
 ['→' '↓' '↓' '→' '↓' '→' '↓' '↓' '↓' '↓' '→' '↓']
 ['↓' '→' '→' '→' '→' '→' '↑' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 65%|██████▌   | 3251/5000 [09:37<04:03,  7.19it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 66%|██████▌   | 3300/5000 [09:49<10:15,  2.76it/s]



[['↓' '↓' '↓' '↓' '←' '↓' '←' '←' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '←' '↓' '←' '↓' '←' '↓']
 ['↓' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 67%|██████▋   | 3351/5000 [10:01<04:26,  6.19it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 68%|██████▊   | 3400/5000 [10:12<06:08,  4.34it/s]



[['↓' '→' '↓' '↓' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '↓' '↓' '→' '↓' '↓' '↓' '→' '→' '→' '→' '→']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 69%|██████▉   | 3451/5000 [10:20<03:38,  7.09it/s]



[['↓' '→' '→' '↓' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↓' '↓' '↓' '↓' '↓' '→' '→' '→' '↓' '→' '→' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 70%|██████▉   | 3499/5000 [10:32<04:55,  5.08it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 71%|███████   | 3549/5000 [10:43<04:20,  5.58it/s]



[['↓' '↓' '↓' '↓' '↑' '↑' '↑' '↓' '↑' '↑' '↑' '↓']
 ['↓' '↓' '↓' '↓' '↑' '↓' '↓' '↑' '↑' '↑' '↑' '↓']
 ['↓' '←' '→' '→' '←' '←' '→' '←' '←' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 72%|███████▏  | 3600/5000 [10:52<03:49,  6.09it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '←' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 73%|███████▎  | 3650/5000 [11:02<04:20,  5.19it/s]



[['↓' '↓' '↓' '↓' '↑' '↑' '↓' '↑' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↑' '↓' '↓' '↓' '↓' '↓' '↑' '↑' '↓']
 ['↓' '→' '→' '→' '→' '→' '←' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 74%|███████▍  | 3702/5000 [11:11<04:39,  4.64it/s]



[['↓' '↑' '↑' '↓' '↑' '↑' '↓' '↑' '↑' '↓' '↓' '↓']
 ['↓' '↓' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↓']
 ['↓' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 75%|███████▌  | 3752/5000 [11:23<02:45,  7.54it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 76%|███████▌  | 3801/5000 [11:32<02:42,  7.39it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 77%|███████▋  | 3849/5000 [11:40<02:30,  7.63it/s]



[['↓' '↓' '↓' '↓' '→' '↓' '→' '→' '↓' '↓' '↓' '↓']
 ['↓' '→' '↓' '↓' '↓' '↓' '→' '↓' '↓' '→' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 78%|███████▊  | 3902/5000 [11:49<03:09,  5.78it/s]



[['→' '→' '→' '↓' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 79%|███████▉  | 3951/5000 [12:02<10:38,  1.64it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 80%|████████  | 4001/5000 [12:17<04:10,  3.98it/s]



[['↓' '↓' '↓' '→' '→' '↓' '↓' '→' '↓' '→' '→' '→']
 ['→' '↓' '↓' '→' '→' '→' '→' '→' '↓' '↓' '→' '→']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 81%|████████  | 4052/5000 [12:25<02:18,  6.87it/s]



[['↓' '↓' '↓' '↓' '↑' '↓' '↓' '↑' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↑' '↑' '↓' '↓' '↑' '↓']
 ['↓' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 82%|████████▏ | 4101/5000 [12:36<02:59,  5.02it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 83%|████████▎ | 4150/5000 [12:45<02:25,  5.85it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '↓' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 84%|████████▍ | 4199/5000 [12:54<02:36,  5.11it/s]



[['→' '→' '↓' '↓' '↓' '→' '↓' '→' '→' '↓' '↓' '↓']
 ['↓' '↓' '→' '↓' '→' '→' '→' '↓' '↓' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 85%|████████▌ | 4251/5000 [13:05<03:20,  3.74it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↑' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 86%|████████▌ | 4300/5000 [13:15<01:58,  5.91it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '←' '↓' '←' '←' '←' '↓']
 ['↓' '←' '←' '→' '←' '←' '←' '←' '←' '←' '←' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 87%|████████▋ | 4350/5000 [13:27<02:24,  4.49it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 88%|████████▊ | 4401/5000 [13:38<01:41,  5.90it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '←' '→' '→']
 ['→' '→' '→' '←' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 89%|████████▉ | 4452/5000 [13:49<01:43,  5.29it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '←' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↑' '←' '←' '←' '←' '←' '↑' '↑' '←' '↑' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 90%|█████████ | 4500/5000 [13:57<00:58,  8.48it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 91%|█████████ | 4551/5000 [14:07<01:02,  7.19it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 92%|█████████▏| 4601/5000 [14:16<01:15,  5.26it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '←' '←' '→' '←' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 93%|█████████▎| 4650/5000 [14:28<01:03,  5.47it/s]



[['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→']
 ['→' '→' '→' '↓' '→' '→' '→' '→' '→' '→' '↑' '→']
 ['→' '↑' '↑' '↑' '↑' '→' '→' '→' '↑' '↑' '→' '→']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 94%|█████████▍| 4701/5000 [14:39<00:58,  5.13it/s]



[['→' '→' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 95%|█████████▍| 4749/5000 [14:48<01:06,  3.76it/s]



[['→' '↓' '↓' '←' '→' '↓' '↓' '↓' '↓' '→' '↓' '↓']
 ['→' '↓' '↓' '→' '↓' '↓' '↓' '→' '↓' '←' '←' '↓']
 ['↓' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 96%|█████████▌| 4800/5000 [14:58<00:54,  3.68it/s]



[['↓' '↓' '↓' '↓' '→' '↓' '↓' '→' '↓' '↓' '→' '↓']
 ['↓' '→' '→' '→' '↓' '→' '↓' '→' '↓' '↑' '↓' '↓']
 ['↓' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 97%|█████████▋| 4850/5000 [15:07<00:27,  5.49it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 98%|█████████▊| 4901/5000 [15:20<00:15,  6.51it/s]



[['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓' '↓']
 ['↓' '↓' '↓' '↓' '↓' '↓' '↓' '↑' '↓' '↓' '↓' '↓']
 ['↓' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↑' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



 99%|█████████▉| 4951/5000 [15:30<00:14,  3.34it/s]



[['→' '→' '→' '→' '↓' '→' '→' '↓' '→' '→' '→' '→']
 ['→' '→' '→' '→' '↓' '→' '→' '→' '↓' '→' '→' '↓']
 ['→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



100%|██████████| 5000/5000 [15:39<00:00,  5.32it/s]


학습된 Q-value를 이용하여 학습된 정책을 출력합니다.

In [None]:
print('Learned policy by DQN')
policy.printPolicy(agent_DQN.get_q_values())

Learned policy by DQN
[['→' '↓' '→' '→' '↓' '→' '↓' '↓' '↓' '→' '→' '→']
 ['→' '→' '→' '→' '↓' '↓' '↓' '→' '↓' '↓' '→' '↓']
 ['↓' '→' '→' '→' '→' '→' '→' '→' '→' '→' '→' '↓']
 ['↑' '←' '←' '←' '←' '←' '←' '←' '←' '←' '←' 'X']]



In [None]:
env.close()