# Introduction to Reinforcement Learning

Reinforcement learning is one of the dominant machine learning paradigms, where the agent learns **what to do- how to map** situations into action to receive a maximum numeric reward.

![image.png](attachment:image.png)

Reinforcement learning problems consist of an agent in an environment. The agent accomplishes an action and receives a reward from the environment. This reward is usually a measure of how good the agent is doing. The agent's goal is then to maximize the rewards in the environment.

# RL Vs Supervised and Unsupervised 

You have learned about Supervised and Unsupervised methods. Let's highlight it's differences with Reinforcement Learning.


## Supervised Learning

So far, you have worked with supervised learning algorithms that build models that minimize losses over labeled datasets. In general, these algorithms produce outputs $y$ from inputs $x$ using some function $f$ with parameters $\theta$. In your labeled dataset, you have outputs $\hat{y}$ that you want your model to produce. A loss function $L$ takes in inputs $y=f_\theta(x)$ and $\hat{y}$, and gives a scalar loss. Optimizing $\theta$ so that this loss is minimum gives you a model $f$ that best fits the data.

This loss function is a form of supervision for your model. It tells the model how bad the predictions of the model are. If the loss function is differentiable, you could use gradient descent or other optimization algorithms to find optimum $ \theta $. Here, the gradient of the loss function with respect to the parameters $\theta$ gives direction to the parameters. In other words, the loss function gives you how bad the model is, and it also provides some direction with that the parameters need to change towards to make the model work better.


## Unsupervised Learning

We then looked at unsupervised learning algorithms that aim to find patterns in data without explicit supervision. The supervision in these algorithms came in some way from within the structure of the model. For example, in autoencoders, the supervision comes from the fact that a bottleneck forces the network to compress information. A reconstruction loss forces the network to choose important features when compressing. Therefore, unsupervised learning methods are sometimes said to be self-supervised as well.

## Reinforcement Learning

How is RL different then? In reinforcement learning, an agent optimizes a non-differentiable, stochastic loss by performing proper actions. All the agent gets is a reward signal, which follows some reward distribution. Most of the time, the agent does not know what the distribution looks like. It must learn to perform the best actions regardless of these conditions.

# Intuition to Reinforcement Learning
 
Let's simplify Reinforcement Learning with an example. Imagine there is a small baby who is struggling to learn how to walk. There are certain steps that the baby learns while interacting with an environment.
 
![image.png](attachment:image.png)
 
1. First, the baby notices and tries to replicate the way you walk.
 
2. Then he/she realize walking is not simple and tries to crawl. Once he learns how to crawl, he gets the candy(Positive) as a reward from his/her parents.
3. The baby then starts to stand with his/her legs for walking. Ouch! The baby gets hurt and is in pain(Negative Reward.). The baby starts crying
4. Now, the real challenge begins. The baby learns that, before walking, he/she needs to maintain the balance for standing position.
 
Finally, he/she learns how to walk. At first, the child gets motivated after receiving the candy as a reward. The baby determines the right action himself/herself. That's how humans learn by trial and error. The RL system is conceptually the same. Reinforcement learning means learning what steps to take and how to map different situations to actions, all in an attempt to receive a reward.

# Basic Concepts and Terminology

Some of the basic terminologies of Reinforcement learning are;

![image.png](attachment:image.png)

* Agent: Agent is any decision making entity or decision-maker.
* Environment: Environment is the world through which the agent interacts.
* Action: Action is all the possible steps that the agent can take.
* State: State is the current condition return by the environment.
* Reward: The reward is an instant return from the environment to appraise the last action. 

<!-- 
* Policy: The policy is the agent's approach to determine the next move/action based on the current state. Naturally, it is an agent's way of behaving at the given time. 
* Value Function = The value function specifies the long term good of an agent. -->
Now, Let's see the above terminology with concepts; 
![image-2.png](attachment:image-2.png)

The above figure shows we have an **agent** as a fox in an **environment** of a square grid world. Here, the fox's action is all those possible moving steps. Currently, the fox is in the first row and first column of the grid known as the current **state**. Here, the meat is the **reward**. 
<!-- Eating maximum meat is likely to have a maximum reward to our agent, i.e., fox. To eat maximum meat, the fox has to go near the opponent, i.e., tiger, which may lead to death. So, the fox decides to eat the meat near to him rather than near to the opponent. This is known as the **policy**. -->

# Applications of Reinforcement Learning

1. Solving games

    Games are a popular application for reinforcement learning algorithms. They are fun to work on, and are widely used to benchmark RL algorithms.
    
    It is not very simple to form rule-based agents to play games due to a number of problems. Some games have simple rules, but the ways they evolve through time is very complex. For example, the game of chess has simple rules, but the gameplay can get quite complicated. It is infeasible to use tree-search methods in most games because of their branching factor. The average branching factor in chess is about 35 per move, and evaluating board states far across time is not feasible. Other games, particularly video games, have continuous state spaces, which again make search methods infeasible.

    Reinforcement Learning approaches can help solve these issues, as we will see later in the unit.

2. Advertisement

    An RL agent can optimize the advertisement shown to users online to maximize some metrics like Click Through Rate (CTR).

3. Simulation and Control

    Controlling complex robots and machines is a task that RL can be used in, especially when their control commands cannot be solved quickly. RL is also used in autonomous vehicles, particularly in controlling vehicles.

4. Stock trading

    Trading agents can learn to optimize the return from trading stock using methods from RL.

5. Hyperparameter Optimisation

  Hyperparameters are usually non-differentiable, and thus, cannot be optimized using gradient descent. Reinforcement learning methods are used to optimize hyperparameters with some inverse of loss as the reward.









# Key Takeaways

- Reinforcement learning is one of the dominant machine learning paradigms, where the agent learns what to do- how to map situations into action to receive a maximum numeric reward.

-  RL vs. Supervised and Unsupervised Learning
  - Supervised Learning
    - A Parametric Function
    - Labeled dataset
    - Differentiable Loss
  -Unsupervised Learning
    - No explicit supervision
    - Finds structure/pattern in data
  -Reinforcement Learning
    - Optimize a Reward
    - The reward can be stochastic
    - Reward is Non-differentiable

- Basics Terminologies
  - Agent: Agent is any decision making entity or decision-maker.
  -Environment: Environment is the world through which the agent interacts.
  -Action: Action is all the possible steps that the agent can take.
  -State: State is the current condition return by the environment.
  -Reward: The reward is an instant return from the environment to appraise the last action. 

- Solving games, Advertisements, simulation, and control are some of the applications of Reinforcement Learning. 





# References

* Books
   * Richard S. Sutton and Andrew G. Barto, Reinforcement Learning, 2nd edition
       * Check unit 1, page 8 to understand an extended example: Tic-Tac-Toe
       * Check unit 1, page 13 to understand the early history of Reinforcement learning