## Introduction To Reinforcement Learning

Some Common Terms
 - Agent
 - Environment
 - Action, State, Rewards, Observations

## 1. Interacting With The OpenAI Gym API

In [1]:
import gym

#### 1. Create Environment

In [2]:
# There Are Over 100 Different Environments Available In Gym API
env = gym.make('CartPole-v0')

#### 2. Comes With Certain Important Methods/Attributes

 - action_space
 - observation_space
 - reset()
 - step()
 - render()

##### 2.1 reset ( ) --- Returns The Initial State And Also Resets The Environment !

In [3]:
print(env.reset())

[ 0.02967125 -0.00613176 -0.03830356  0.02326004]


#### In This Game Environment The State Could Be Defined Using These 4 Parameters --

 - Location Of The Cart.
 - Velocity Of The Cart.
 - Angular Velocity Of The Rod (Pole Velocity At Tip)
 - Pole Angle 


##### 2.2 render ( )

In [4]:
for t in range(100):
    env.render()
env.close()

##### 2.3 "action_space" --- Consists Of All Possible Actions That Can Be Performed In The Game Environment ! 

In [5]:
print(env.action_space) # Discrete Class Contains A Set Of Items !

Discrete(2)


In [6]:
print(env.action_space.n)

2


##### 2.4 "observation_space" --- Used To Represent A 'n' Dimension Tensor !

In [7]:
print(env.observation_space)

Box(4,)


In [8]:
print(env.observation_space.shape)
print(env.observation_space.shape[0])

(4,)
4


##### 2.5 step ( )

In [9]:
env.reset()
for t in range(100):
    # random_action = env.action_space.sample()
    # env.step(random_action) # Randomly Moves Left Or Right
    _, _, done, _ = env.step(env.action_space.sample())
    if done:
        break
    env.render()
env.close()

#### To Win This Game One Needs 200 Points i.e. Need To Balance The Rod For 200 Time Steps !

## 2. Playing Games With A Random Strategy

 - **Game Episode** -- Entire Game Play, From Start To Game Over !
 - Step ( ) Function In More Detail
 - Game Over ?

#### Step ( ) --- Step Function Returns 4 Things
 
 - New Observation/State
 - Reward
 - Done (True/False)
 - Other Info

##### How To Play Multiple Game Episodes

In [10]:
for e in range(20): # Episode
    # Play 20 Episodes
    observation = env.reset()
    for t in range(50):
        env.render()
        action = env.action_space.sample()
        observation, reward, done, other_info = env.step(action)
        if done:
            # Game Episode Is Over
            print("Game Episode : {} / {} ! High Score : {}".format(e, 20, t))
            break
env.close()
print("All 20 Episodes Over !!")        

Game Episode : 0 / 20 ! High Score : 35
Game Episode : 1 / 20 ! High Score : 9
Game Episode : 2 / 20 ! High Score : 9
Game Episode : 3 / 20 ! High Score : 20
Game Episode : 4 / 20 ! High Score : 14
Game Episode : 5 / 20 ! High Score : 42
Game Episode : 6 / 20 ! High Score : 13
Game Episode : 7 / 20 ! High Score : 20
Game Episode : 8 / 20 ! High Score : 16
Game Episode : 9 / 20 ! High Score : 14
Game Episode : 10 / 20 ! High Score : 23
Game Episode : 11 / 20 ! High Score : 29
Game Episode : 12 / 20 ! High Score : 12
Game Episode : 13 / 20 ! High Score : 34
Game Episode : 14 / 20 ! High Score : 10
Game Episode : 15 / 20 ! High Score : 13
Game Episode : 16 / 20 ! High Score : 15
Game Episode : 17 / 20 ! High Score : 9
Game Episode : 18 / 20 ! High Score : 11
Game Episode : 19 / 20 ! High Score : 26
All 20 Episodes Over !!
