# Getting Familiar with Gym

`Gym` serves as an interface for RL experiments, providing standardized APIs to interact with a variety of environments. `MuJoCo`, on the other hand, provides the physics engine that underpins many of these environments.

* `Gym` is a standardized API for reinforcement learning, offering a broad collection of reference environments. It's designed to be simple and Pythonic, allowing for the representation of general RL problems. Users can create environments, initiate them, execute actions based on a user-defined policy, and manage the environment's state, including resetting it upon termination or truncation of episodes. Here's the documentation you would find useful: [Gym Documentation](https://www.gymlibrary.dev/index.html).
* `MuJoCo`, which stands for Multi-Joint dynamics with Contact, is a high-performance physics engine for simulating complex dynamic systems with a focus on robotics, biomechanics, and other areas requiring fast and accurate simulations. It is now open-sourced by DeepMind as of 2022, making it freely available. It includes various environments like `Ant`, `HalfCheetah`, `Hopper`, `Humanoid`, etc. You can see more in the [Gym Documentation on MuJoCo](https://www.gymlibrary.dev/environments/mujoco/index.html).

In this tutorial, you'll get to know how to interact with `Gym`'s interface with the `MuJoCo` and some other environments. `Gym` is designed to be universal, making it easy to switch between different environments without changing your setup much.

## Installation

Before we dive into the code, we strongly recommend you check out `installation.md` first and follow the installation procedures. If followed correctly, the requirements listed below would have been already installed.

In [None]:
!pip install mujoco==2.2.0
!pip install gym==0.25.2
!pip install tensorboard==2.10.0
!pip install tensorboardX==2.5.1
!pip install matplotlib==3.5.3
!pip install ipython==7.34.0
!pip install moviepy==1.0.3
!pip install pyvirtualdisplay==3.0
!pip install torch==1.13.1
!pip install opencv-python==4.6.0.66
!pip install ipdb==0.13.9
!pip install swig==4.0.2
!pip install box2d-py==2.3.8
!pip install mediapy

!pip install gym[classic_control,toytext]

In [None]:
import gym
import mediapy as media
import matplotlib.pyplot as plt

## Creating a Gym Environment

Gym is our gateway to various reinforcement learning environments, including MuJoCo.

You can easily create a gym environment with `gym.make`. Here's how you can do this with `Ant-v4` environment from `MuJoCo`.

Also, in the following code you may notice the usage of `env.reset` and `env.render`:
- `env.reset()`: This method resets the environment to its initial state, and returns the observation of the environment corresponding to the initial state.
- `env.render(mode="rgb_array")`: This method renders the current state as an image.

In [None]:
# Initialize the Ant environment
env_name = "Ant-v4"
env = gym.make(env_name, terminate_when_unhealthy=False)
print(f"Initialized environment: {env_name}")

initial_obs = env.reset()
rendered_state = env.render(mode="rgb_array")
plt.imshow(rendered_state)

 The `Ant-v4` environment simulates a quadruped robot in a 2D space. The ant is a 3D robot consisting of one torso (free rotational body) with four legs attached to it with each leg having two links. The goal is to coordinate the four legs to move in the forward (right) direction by applying torques on the eight hinges connecting the two links of each leg and the torso (nine parts and eight hinges).

The environment defines an attribute called the `observation_space` and `action_space`.
You can see how it looks like below.
The observation space of `Ant-v4` has 27 distinct elements, and the action space has 8 dimensions.
If you want to figure out what each dimension of the observations and the actions means, it would be helpful to see the documents of the environment.
[Ant Documentation](https://www.gymlibrary.dev/environments/mujoco/ant/)

In [None]:
obs_space = env.observation_space
action_space = env.action_space
print(f"The observation space: {obs_space}")
print(f"The action space: {action_space}")

As we previously discussed, we can create many other environments with `gym.make`, with the general interface provided with `gym`.

Here's some other examples, which is not actually from `MuJoCo`, but great for illustrating different observation and action spaces are possible:

In [None]:
def create_and_visualize_gym_env(env_name):
    env = gym.make(env_name)
    env.reset()
    rendered_image = env.render(mode="rgb_array")
    plt.title(env_name)
    plt.imshow(rendered_image)
    plt.show()

    print(f"The observation space of {env_name}: {env.observation_space}")
    print(f"The action space of {env_name}: {env.action_space}")
    return env

create_and_visualize_gym_env("CartPole-v1")

`CartPole-v1` has discrete action space, where the agent can choose to push the cart to the left or to the right.
The goal of the agent is to keep the pole balanced on the cart for as long as possible.

In [None]:
create_and_visualize_gym_env("FrozenLake-v1")

`FrozenLake-v1` environment has a 4x4 grid as an observation space, where each value represents different tiles in the grid.
In this environment, the agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile.

You can see that `FrozenLake-v1` has discrete observation space as well as discrete action space.

## Interacting with the Environment

To interact with the environment, you can use the methods of the `Env` class. Here's the two important functions.
- `reset`: This function resets the environment to its initial state, and returns the observation of the environment corresponding to the initial state.
- `step` : This function takes an action as an input and applies it to the environment, which leads to the environment transitioning to a new state. The reset function returns four things:

1. `observation`: The observation of the state of the environment.
2. `reward`: The reward that you can get from the environment after executing the action that was given as the input to the step function.
3. `done`: Whether the episode has been terminated. If true, you may need to end the simulation or reset the environment to restart the episode.
4. `info`: This provides additional information depending on the environment, such as number of lives left, or general information that may be conducive in debugging.


Let's test with a simple code that takes a single step of the environment with a random action. Note here that you can sample a random action from the action space using `env.action_space.sample()`.

In [None]:
# Create the environment
env = gym.make("Ant-v4", terminate_when_unhealthy=False)

# Reset the environment and see the initial observation
obs = env.reset()
print("The initial observation is {}".format(obs))

# Sample a random action from the entire action space
random_action = env.action_space.sample()

# Take the action and get the new observation space
new_obs, reward, done, info = env.step(random_action)
print("The new observation is {}".format(new_obs))
print("The reward is {}".format(reward))
print("Is the environment terminated?: {}".format(done))
print("Additional informations: {}".format(info))


In [None]:
rendered_image = env.render(mode="rgb_array")
plt.imshow(rendered_image)

Now that you figured out how to take an environment step, let's collect and visualize trajectories from the environment with random policy.

In reinforcement learning, an `episode` represents a sequence of steps taken by an agent from the initial state to a terminal state.
We will now simulate the environment's behavior under a random policy, capturing each step as a frame. The run_episode function executes `frames_per_episodes` steps and return the observations (states).

In [None]:
# Define parameters for our simulation
frames_per_episode = 300  # Number of steps (frames) per episode

In [None]:
# Function to simulate an episode with a random policy
def run_episode(env, frames_per_episode):
    frames = []  # Collect frames for video rendering
    env.reset()
    for _ in range(frames_per_episode):
        frames.append(env.render(mode="rgb_array"))
        action = env.action_space.sample()  # Random action
        new_obs, reward, done, info = env.step(action)
        if done:
            break
    return frames

# Visualize one episode with a random policy
frames = run_episode(env, frames_per_episode)

media.show_video(frames, fps=30)

You can also reset the environment using `env.reset()`. Note that the return value of the reset function is the initial state.

In [None]:
ob = env.reset()
print("initial observation: {}\n".format(ob))
rendered_image = env.render(mode="rgb_array")
media.show_image(rendered_image)

Now let's visualize random trajectories with other environments that we discussed before.

In [None]:
env = gym.make("CartPole-v1")
ob = env.reset()

# Visualize 10 episodes with a random policy
whole_frames = []
for _ in range(10):
    frames = run_episode(env, frames_per_episode=100)
    whole_frames.extend(frames)

media.show_video(whole_frames, fps=30)

In [None]:
env = gym.make("FrozenLake-v1")
ob = env.reset()

# Visualize 10 episodes with a random policy
whole_frames = []
for _ in range(10):
    frames = run_episode(env, frames_per_episode=100)
    whole_frames.extend(frames)

media.show_video(whole_frames, fps=10)

## Conclusion

Now you learned how to create and visualize a gym environment, and how to simulate an episode with a random policy! Feel free to try some other environments that you can find from [gym documentation](https://www.gymlibrary.dev/environments/mujoco/).