<a href="https://colab.research.google.com/github/vickkiee/dgadata/blob/main/MLF_Reinforcement_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning Foundations - **Reinforcement Learning**
- Note: Please download and run this notebook locally

Reinforcement Learning does not start of with any data initially. We build an agent to start interacting with an environment. Whenever that agent makes the right decision we give it a positive reward. It's entire goal is to maximize the positive rewards.

We will be using an enviroment where there is a cart and a pole is attached to it. The pole is wobbly and the pole continously falls over. We want our cart to be able to balance the pole by moving right or left continously in order to not let the pole's center of gravity go over the threshold where the pole ends up tipping.

Running these cells below will install the required libraries and import the ones needed. Note: notebook will only work on your local machine (not colab or other server hosted notebooks).

In [None]:
!pip install stable-baselines3[extra]
!pip install gym[all]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting stable-baselines3[extra]
  Downloading stable_baselines3-1.7.0-py3-none-any.whl (171 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m171.8/171.8 KB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
Collecting gym==0.21
  Downloading gym-0.21.0.tar.gz (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m34.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting importlib-metadata~=4.13
  Downloading importlib_metadata-4.13.0-py3-none-any.whl (23 kB)
Collecting rich
  Downloading rich-13.3.1-py3-none-any.whl (239 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m239.0/239.0 KB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
Collecting autorom[accept-rom-license]~=0.4.2
  Downloading AutoROM-0.4.2-py3-none-any.whl (16 kB)
Collecting ale-py==0.7.4
  Downloading ale_py-0.7.4

In [None]:
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy

ModuleNotFoundError: ignored

### **First:** Lets observe how the environment is like and the pole falling over.
- Note: The environment resets every time the pole is just about to fall
- We will see the pole falling over 50 times

Running the cell below will open another window which displays the enviroment (cart and pole).

In [None]:
environment_name = "CartPole-v0"
env = gym.make(environment_name)
episodes = 50
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0

    while not done:
        env.render()
        action = env.action_space.sample()
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))
env.close()

### **Second:** Now we let our agent learn how to balance the pole in the enviroment

**Our Goal:** To make the pole not wobble and tip over as seen from above.

What we are now going to do essentially, is build an agent and let the agent play in the enviroment for a set number of times (episodes). The agent is built in such as way that it learns from experience whenever the pole falls over. The next episode we expect our agent to perform better as every episode it gathers data it self and learns how to balance the pole.

Running the cell below will take some time as the agent begins the learn.

In [None]:
env = gym.make(environment_name)
env = DummyVecEnv([lambda: env])
model = PPO('MlpPolicy', env, verbose = 1)
model.learn(total_timesteps=2000)

### **Thirdly:** We observe our trained agent as it balances the pole


In [None]:
evaluate_policy(model, env, n_eval_episodes=15, render=True)
env.close()

From observation we see that it is attempting to balance the pole but it not doing a great job. The pole still falls over eventually.

### **Finally:** We let our agent learn for a longer period of time and oberserve it perform
- You will notice the cart is able to balance the pole a lot better now when it was allowed to learn for more time
- The environment goes to the next episode not because the pole falls over but it simply runs of time for that episiode
- It is also a lot smoother

In [None]:
env = gym.make(environment_name)
env = DummyVecEnv([lambda: env])
model = PPO('MlpPolicy', env, verbose = 1)
model.learn(total_timesteps=20000)
evaluate_policy(model, env, n_eval_episodes=15, render=True)
env.close()