In [None]:
import gym

# 1st Tutorial OpenAI gym doc

In [None]:
env = gym.make("CartPole-v0")

In [None]:
env.reset()

In [None]:
for _ in range(3000):
    env.render()
    env.step(env.action_space.sample())
env.close()

# 2nd: Sampling Actions

In [None]:
env = gym.make("CartPole-v0")

In [None]:
for episode in range(20):
    observation = env.reset(); # returns an initial observation: object representing and observation of the environemnt 
    
    for time_step in range(100):
        env.render()
        print(f"Obs {time_step + 1}:{observation}")
        
        action = env.action_space.sample()
        observation, reward, done, info = env.step(action) # returns S_{t+1} given A_t, R_t given A_t, done means if the episode ended, infor is just a debug var
        if done:
            print(f"Episode {episode + 1} finished after {time_step + 1} time steps")
            break
env.close()

### Shouldn't be supposed to take longer steps over episodes?

# 3rd: States

In [None]:
env = gym.make("CartPole-v0")

In [None]:
env.action_space # This means that each action is discete with 2 values: either can be 0 or 1 

In [None]:
env.observation_space # Valid observations would be an array of 4 float numbers that range from -inf to +inf 

In [None]:
env.observation_space.high

In [None]:
env.observation_space.low

This introspection can be helpful to write generic code that works for many different environments. Box and Discrete are the most common Spaces. You can sample from a Space or check that something belongs to it:

In [None]:
from gym import spaces 
space = spaces.Discrete(8)

In [None]:
space # set = {0,1,2,3,4,5,6,7}

In [None]:
x = space.sample()

In [None]:
assert space.contains(x)

In [None]:
assert space.n == 8

# 4th: Available environments

In [None]:
from gym import envs

In [None]:
list(envs.registry.all())[:10]

This will give you a list of EnvSpec objects. These define parameters for a particular task, including the number of trials to run and the maximum number of steps. For example, EnvSpec(Hopper-v1) defines an environment where the goal is to get a 2D simulated robot to hop; EnvSpec(Go9x9-v0) defines a Go game on a 9x9 board.



### Registry a custom environment:

https://github.com/openai/gym/blob/master/gym/envs/__init__.py

# 5th: Gym Motivation

Reinforcement learning (RL) is the subfield of machine learning concerned with decision making and motor control. It studies how an agent can learn how to achieve goals in a complex, uncertain environment. It’s exciting for two reasons:

* RL is very general, encompassing all problems that involve making a sequence of decisions: for example, controlling a robot’s motors so that it’s able to run and jump, making business decisions like pricing and inventory management, or playing video games and board games. RL can even be applied to supervised learning problems with sequential or structured outputs.
* RL algorithms have started to achieve good results in many difficult environments. RL has a long history, but until recent advances in deep learning, it required lots of problem-specific engineering. DeepMind’s Atari results, BRETT from Pieter Abbeel’s group, and AlphaGo all used deep RL algorithms which did not make too many assumptions about their environment, and thus can be applied in other settings.

However, RL research is also slowed down by two factors:

* The need for better benchmarks. In supervised learning, progress has been driven by large labeled datasets like ImageNet. In RL, the closest equivalent would be a large and diverse collection of environments. However, the existing open-source collections of RL environments don’t have enough variety, and they are often difficult to even set up and use.
* Lack of standardization of environments used in publications. Subtle differences in the problem definition, such as the reward function or the set of actions, can drastically alter a task’s difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.

Gym is an attempt to fix both problems.

# Source: 

https://gym.openai.com/docs/