## <span style="color:#be2d24">OpenAI Gym in Gazebo</span>

### <span style="color:#be2d24">About OpenAI Gym and this simulation</span>

[<img align="right" src="img/logo/openai.png" style="width:324px;height:118px;" />](https://openai.com)

In this course, you are going to learn about using OpenAI Gym and Gazebo, and using a language called [Python](https://www.python.org).

OpenAI is a non-profit, artificial intelligence (AI) research company that aims to "freely collaborate" with other institutions and researchers by making its patents and research open to the public. It has been established with a focus on creating AI that has a positive human impact. 

In April 2016, OpenAI introduced "Gym," a platform for developing and comparing reinforcement learning algorithms. Reinforcement learning is an area of machine learning that allows an intelligent agent (for example, a robot) to learn the best behaviors in an environment by trial-and-error. The agent takes action in an environment so as to maximize its rewards. It is similar to teaching a dog a new trick: reward it if it does the right thing or punish it if it does the wrong thing. The dog eventually learns to behave well so that it keeps getting the rewards.

In this example, we will be seeing how a turtlebot is able to learn navigation through an environment without hitting an obstacle. The turtlebot will use a reinforcement learning method known as Q-learning. 

### <span style="color:#be2d24;">Running the simulation</span>

There are four environments already available with which the user can test their simulations. These environments can be launched using the respective launch files:

- GazeboCircuitTurtlebotLidar_v0.launch:    
- GazeboCircuit2TurtlebotLidar_v0.launch:
- GazeboRoundTurtlebotLidar_v0.launch:
- GazeboMazeTurtlebotLidar_v0.launch:

<table>
<caption>The various environments already available:</caption>
<colgroup>
<col width="20%" />
<col width="20%" />
</colgroup>
<tbody>
<tr class="odd">
<td align="right">Circuit<br><img src="img/env/circuit.png" style="width:124px;height:218px;" alt="" /></td>
<td align="left">Circuit2<br><img src="img/env/circuit2.png" style="width:124px;height:218px;" alt="" /></td>
</tr>
<tr class="even">
<td align="right">Round<br><img src="img/env/round.png" style="width:124px;height:218px;" alt="" /></td>
<td align="left">Maze<br><img src="img/env/maze.png" style="width:124px;height:218px;" alt="" /></td>
</tr>
</tbody>
</table>

It is requested that the user try out the existing environment before developing their own environments for training the robot. An environment is where the robot's possible actions and rewards are defined. For example, in the available environments, there are three possible actions for the Turtlebot robot:

- Forward (with a reward of 5 points)
- Left (with a reward of 1 point)
- Right (with a reward of 1 point)

If it collides with the walls, then the training episode ends (with a penalty of 200 points). The turtlebot has to learn to navigate through the environment, based on the rewards obtained from different episodes. This can be achieved using the Q-learning algorithm. Let's see how it works.

First, we have to set the path, as given below:

In [1]:
import sys
sys.path.append("/usr/local/lib/python2.7/dist-packages/")
sys.path.append("/home/ubuntu/gym-gazebo")
sys.path.append("/home/user/catkin_ws/src/gym_construct/src")
%matplotlib inline

The python scripts in the `gym_construct/src/` folder (listed below) help us simulate the reinforcement learning techniques for a Turtlebot. Currently, the number of episodes has been set to 20.
Feel free to increase the number of episodes in the python scripts (upto 5000) to actually train the robot to navigate the environment completely.

Only uncomment the script corresponding to the environment:

In [2]:
## Circuit-1 Environment --> Q-learning
#%run /home/user/catkin_ws/src/gym_construct/src/circuit_turtlebot_lidar_qlearn.py

## Circuit-2 Environment --> Q-learning
#%run /home/user/catkin_ws/src/gym_construct/src/circuit2_turtlebot_lidar_qlearn.py

## Circuit-2 Environment --> SARSA
#%run /home/user/catkin_ws/src/gym_construct/src/circuit2_turtlebot_lidar_sarsa.py

## Round Environment --> Q-learning
%run /home/user/catkin_ws/src/gym_construct/src/round_turtlebot_lidar_test.py

## Maze Environment --> Q-learning
#%run /home/user/catkin_ws/src/gym_construct/src/maze_turtlebot_lidar_qlearn.py

ERROR:root:File `'/home/user/catkin_ws/src/gym_construct/src/round_turtlebot_lidar_test.py'` not found.


<div>There were ouput files produced in the last step and the ROS environment has changed. Therefore, at this point, restart the kernel. <p><img align="left" src="img/env/restart.png"/></p> </div>

<br/>
<p>Then, set up the path again, as shown below.</p>


In [None]:
import sys
sys.path.append("/usr/local/lib/python2.7/dist-packages/")
sys.path.append("/home/ubuntu/gym-gazebo")
sys.path.append("/home/user/catkin_ws/src/gym_construct/src")
%matplotlib inline

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
%run /home/user/catkin_ws/src/gym_construct/src/display_plot.py
plt.show()

### <span style="color:#be2d24">Define your own environment for training the robot!</span>
<div>
<p>It is possible to define your own environment with the robot's actions and rewards. </p>
<p>All you have to do is edit the sample environment `gazebo_myenv_turtlebot_lidar.py` in the `gym-myenv` folder and add it to the path. </p>
</div>

In [None]:
sys.path.append("/home/user/catkin_ws/src/gym-myenv")

Ensure that the python script in the `gym_construct/src/` that loads the environment `GazeboMyenvTurtlebotLidar_v0` has
```
import gym_myenv
```

And then, run the python script.

```
## Myenv Environment --> Q-learning
%run /home/user/catkin_ws/src/gym_construct/src/myenv_turtlebot_lidar_qlearn.py
``` 
That's all. Now it's up to you!

### <span style="color:#be2d24">Credits</span>

Simulation - gazebo_gym (http://wiki.ros.org/kobuki_gazebo)