# **Sonic, the Hedgehog - Using Q learning**

## A simple platform game to make Sonic reach the Goal using **Reinforcement Learning**



## **States**:
   * Start state S<br>
   * Goal state G (and only terminal state)<br>
   * Obstacle_1<br>
   * Obstacle_2<br>
   * Rest all states
   
## **Actions**:
   * Move Straight<br>
   * Low Jump<br>
   * High Jump<br>
   * Slide<br>
   
## **Game Description**:
The aim of the agent (Sonic) is to find an optimal policy to reach the Goal state from Start state, avoiding the obstacles in the way and maximizing the net reward. Sonic is allowed to move only in the forward straight direction. The obstacles on the way can be avoided by jumping (Obstacle_1) or sliding (Obstacle_2). Reward can also be gained in certain cases if the height of the jump is chosen appropriately (higher jump is made).<br>

### **Defining Reward**:

* When agent needs to jump -> **-0.05** reward on jumping, and game continues, **-1** reward on moving forward or on sliding, and game ends. An additional reward of **+0.02** is given if jump is of appropriate height in certain cases.

* When agent needs to slide -> **-0.05** reward on sliding, and game continues, **-1** reward on moving forward or jumping to any height, and game ends.

* When agent needs to move straight - > **-0.05** reward on going straight, and game continues, **-0.2** reward on either jumping to any height or sliding (to make sure that it does not jump or slide unnecessarily) and game continues

* When agent reachs goal state -> Reward of **+1** is given and game ends.

### **Example**:
If obstacle_1 is there between cell 2 and 4, then the agent needs to jump from cell 2 to cell 4. It also needs to check if extra reward is obtained on jumping higher. 

In [0]:
import random

In [0]:
def env(state,action):
  obstacle_1 = [[0,1,1],[3,7,14]]
  obstacle_2 = [11,17]
  
  # when agent needs to jump
  if((state+1) in obstacle_1[1]):
    if(action==1 or action==2):    
      if(obstacle_1[0][obstacle_1[1].index(state+1)] and action==2):
        return -0.03,state+2,0
      else:
        return -0.05,state+1,0
    else:
      return -1,state+1,1
    
  # when agent needs to slides
  elif((state+1) in obstacle_2):
    if(action==3):
      return -0.05,state+2,0
    else:
      return -1,state+1,1
  
  # when goal achieved
  elif(state==19 and action==0):
    return 1,state+1,1
  
  # in all other cases
  else:
    if(action==0):
      return -0.05,state+1,0
    else:
      return -0.2,state+1,0      

In [0]:
def qlearning():
  alpha = 0.1
  discount = 0.9
  epsilon = 0.2
  num_actions = 4
  cells = 21
  episodes = 100000
  
  Q = [[] for i in range(cells)]
  print('Q values before game-play: ')
  print('            STRAIGHT   LOW JUMP   HIGH JUMP   SLIDE')
  
  for i in range(cells):
    if(i==cells-1):
      Q[i][:] = [0,0,0,0]
    else:
      Q[i] = [random.uniform(0,1) for j in range(num_actions)]
      
  for i in range(cells):
    print('state %d:      %s' %(i,Q[i]))
  
  print('Playing the Game %d times..........' %(episodes))
  
  for i in range(episodes):
    currstate = 0
    
    while(currstate!=cells-1):
      
      prob = random.uniform(0,1)
      if(prob>epsilon):
        curraction = Q[currstate].index(max(Q[currstate]))
      else:
        curraction = random.randint(0,num_actions-1)
        
      reward, nextstate, is_over = env(currstate, curraction)
      
      Q[currstate][curraction] = Q[currstate][curraction] + alpha * (reward + discount * max(Q[nextstate]) - Q[currstate][curraction])
      
      if(is_over):
        break
      currstate = nextstate
  
  print('Q values for each state-action pair after game play:')
  print('            STRAIGHT   LOW JUMP   HIGH JUMP   SLIDE')
  for i in range(cells):
    rounded_list = Q[i]
    for j in range(num_actions):
      rounded_list[j] = round(Q[i][j],3)
    print('state %d:          %s' %(i,rounded_list))
    
  print('Policy after self-play starting from 0:')

  currstate = 0
  while(currstate!=cells-1):
    curraction = Q[currstate].index(max(Q[currstate]))
    reward, nextstate, is_over = env(currstate, curraction)
    print('Current state is %d and immediate reward is %.3f' %(currstate,reward))
    if(is_over):
      break
    currstate = nextstate
    
  print('Final state is 20')

In [4]:
if __name__=='__main__':
	qlearning()

Q values before game-play: 
            STRAIGHT   LOW JUMP   HIGH JUMP   SLIDE
state 0:      [0.49707710876286537, 0.060882833306884265, 0.05997944739912564, 0.2889898153020589]
state 1:      [0.22831006901659556, 0.9487221279911136, 0.8714656098187759, 0.352629646282798]
state 2:      [0.6682953657009333, 0.1877733541290052, 0.5682167679949548, 0.1277812960503455]
state 3:      [0.5370295397038074, 0.1406907590459534, 0.322121923465274, 0.6036732999489672]
state 4:      [0.09028275999094937, 0.9662621068319686, 0.3752602774623579, 0.30287516115177016]
state 5:      [0.19936934856597155, 0.11061428397651318, 0.6813402596159617, 0.7192921923990617]
state 6:      [0.8331723648851661, 0.058432012358005325, 0.37189332311703793, 0.7150805209873978]
state 7:      [0.5164150860063151, 0.25593213181034324, 0.6267812726470896, 0.9922128671378846]
state 8:      [0.3695306340459902, 0.5073918806524085, 0.4891936896916528, 0.4374258868808665]
state 9:      [0.21171068691437123, 0.4153722633608283