# Q-Table Learning

from Arthur Juliani's Simple RL with TF, [Part 0: Q-Learning with Tables and Neural Networks](https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0)

#### Load dependencies

In [3]:
import gym
import numpy as np

#### Load environment

In [4]:
env = gym.make('FrozenLake-v0')

#### Implement Q-Table learning algorithm

In [22]:
# initialize table with zeros
Q = np.zeros([env.observation_space.n,env.action_space.n])

# set learning parameters
lr = .8
y = .95
num_episodes = 2000

# create lists to contain total rewards and steps per episode
rList = []

for i in range(num_episodes):
    
    # reset environment and get first new observation
    s = env.reset()
    rAll = 0
    d = False
    j = 0
    
    # Q-Table learning algorithm proper
    while j < 99:
        j+=1
        
        # choose an action by greedily (with noise) picking from Q table
        a = np.argmax(Q[s,:] + np.random.randn(1,env.action_space.n)*(1./(i+1)))
        
        # get new state and reward from environment
        s1,r,d,_ = env.step(a)
        
        # update Q-Table with new knowledge
        Q[s,a] = Q[s,a] + lr*(r + y*np.max(Q[s1,:]) - Q[s,a])
        rAll += r
        s = s1
        if d == True:
            break
        
    rList.append(rAll)

In [23]:
print("Score over time: " + str(sum(rList)/num_episodes))

Score over time: 0.412


In [24]:
print("Final Q-Table Values")
print(Q)

Final Q-Table Values
[[  1.95794645e-01   7.98826762e-03   4.10328004e-03   8.63883936e-03]
 [  3.04429803e-05   9.17821040e-04   1.12488574e-05   1.65540225e-01]
 [  1.15063310e-03   3.61097970e-03   7.49965224e-04   1.88455197e-01]
 [  1.76734015e-04   3.44417464e-05   1.41545602e-04   1.06767465e-01]
 [  2.20482749e-01   9.30150400e-05   3.36725400e-03   3.22847942e-03]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  9.87291109e-03   3.06552186e-05   4.28245310e-07   1.59966881e-06]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  4.22466729e-03   7.46397943e-04   2.48410934e-04   3.24207802e-01]
 [  2.54973730e-04   6.03919178e-01   1.42776898e-03   1.70675046e-03]
 [  9.16967042e-02   5.40232219e-04   1.18406234e-03   5.11685596e-04]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   1.56975109e-03   4.28588148e-01   0