# gymnasium Custom Environment - Create a custom environment

gymnasium packages contain a list of environments to test our Reinforcement Learning (RL) algorithm. For example, this previous notebook used FrozenLake environment to test a TD-lerning method. While these environments are greate testbeds, we often want to customize the provided environment to see how an agent behaves in different environments. It is also a great interest to create own custom environment and test our algorithm. 
gymnasium provides an easy way to do them. In this series of notebooks, we will learn
- How to edit an existing environment in gymnasium (last nootebook)
- How to create a custom environment with gymnasium (this notebook)

In this notebook, we will create a fun environment to play Pokemon Red Game. This is motivated by [this cool work](https://www.youtube.com/watch?v=DcYLT37ImBY&t=1741s) by Peter Whidden and [another work](https://github.com/Baekalfen/PyBoy) by Asger Anders Lund Hansen, Mads Ynddal, and Troels Ynddal. The code is mainly adapted from [Peter's git repository](https://github.com/PWhiddy/PokemonRedExperiments) but simplified to convey the key points to define a custom environment. 

## 4 necessary functions to define a custom environment

As reviewed in the previous notebook, a gymnasium environment has four key functions listed as below (obstained from [official documentation](https://gymnasium.farama.org/api/env/))
- reset() : Resets the environment to an initial state, required before calling step. Returns the first agent observation for an episode and information, i.e. metrics, debug info.

- step() : Updates an environment with actions returning the next agent observation, the reward for taking that actions, if the environment has terminated or truncated due to the latest action and information from the environment about the step, i.e. metrics, debug info.

- render() : Renders the environments to help visualise what the agent see, examples modes are “human”, “rgb_array”, “ansi” for text.

- close() : Closes the environment, important when external software is used, i.e. pygame for rendering, databases

When designing a custom environment, we inherit "Env" class of gymnasium. Then, we re-define these four functions based on our needs. Inheriting "Env" class is crucial because it:

- provides access to a rich set of base functionalities and utilities within the Gymnasium library, such as methods for seeding randomness.
- ensures that the custom environment adheres to the Gymnasium framework's standardized interface, allowing it to be used interchangeably with other Gym environments.
- facilitates the integration with other Gymnasium tools and plugins, enhancing the environment's capabilities and simplifying the development and testing process.

By inheriting from the Env class, we can focus on defining the unique aspects of our custom environment such as its observation space, action space, and dynamics, while leveraging the established infrastructure provided by Gymnasium for simulation control, rendering, and interaction with learning algorithms.


## Pokemon Red Game environment

Let's start defining the pokemon environment.

To create a Pokemon Red Game environment, we use a python based game boy emulator called [PyBoy](https://github.com/Baekalfen/PyBoy).
In the Pokemon Red Game environment, there are 7 commands (i.e. action) the agent can use to explore the world:
- Press arrow up
- Press arrow down
- Press arrow right 
- Press arrow left
- Press A botton
- Press B botton
- Press start botton

These are the same commands we can use to play Pokemon Red! The state an agent can be in is defined by the game map. The observed state is a 144x160x3 grid of images (i.e. one 36x40 grid image for RGB). For reward function, we can design it as we want. For this tutorial, let's define a reward as a sum of level of all pokemons caught so far for simplicity. (Note that we need a more sophisticated reward function to train an agent to play pokemon red.)


## Initialize the environment

As I mentioend above, we will create our class by inheriting Env class of gymnausium. Then, we will implement four essential functions, reset, step, render, and close for our new custom class. Before defining these functions, we will first learn the implementation of initialization. This initializaiton process is invoked when the environment is first created. This process establishes the key characteristics of the environment.

During initialization, we define several critical aspects:
- Action space: A set of all possible actions that an agent can take in the environment. It's a way to outline what actions are available for the agent to choose from at any given step

- Observation space: A size or shape of the observations that the agent receives from the environment. Essentially, it describes the form and structure of the data the agent uses to make decisions

- Action frequency: A number of frames before a new action is taken. In the context of PyBoy, an action can be applied once every 24 frames. This setting controls the pace at which the agent can act within the game environment

- Pyboy object: An object to interface with the actual game environment provided by PyBoy. It acts as the bridge between our custom gymnausium environment and the Game Boy game we aim to interact with

- Initial state: A starting state of the agent when the environment is initialized. For the purpose of this tutorial, we will set the initial state to be the moment after choosing the first pokemon, as demonstrated in Peter Whidden's work

In [2]:
import sys

import numpy as np
import matplotlib.pyplot as plt
from pyboy import PyBoy

from gymnasium import Env, spaces
from pyboy.utils import WindowEvent
from skimage.transform import resize
from IPython.display import clear_output


In [3]:
class RedGymEnv(Env):
    def __init__(self, config):
        super(RedGymEnv, self).__init__() # should I do this?
        # Define action psace
        self.valid_actions = [
            WindowEvent.PRESS_ARROW_DOWN,
            WindowEvent.PRESS_ARROW_LEFT,
            WindowEvent.PRESS_ARROW_RIGHT,
            WindowEvent.PRESS_ARROW_UP,
            WindowEvent.PRESS_BUTTON_A,
            WindowEvent.PRESS_BUTTON_B,
        ]
        self.action_space = spaces.Discrete(len(self.valid_actions))
        
        self.valid_actions.extend([
            WindowEvent.PRESS_BUTTON_START,
            WindowEvent.PASS
        ])

        self.release_arrow = [
            WindowEvent.RELEASE_ARROW_DOWN,
            WindowEvent.RELEASE_ARROW_LEFT,
            WindowEvent.RELEASE_ARROW_RIGHT,
            WindowEvent.RELEASE_ARROW_UP
        ]

        self.release_button = [
            WindowEvent.RELEASE_BUTTON_A,
            WindowEvent.RELEASE_BUTTON_B
        ]
        
        # Define observation space
        self.output_shape = (144, 160, 1)
        self.output_full_shape = (144, 160, 3) # 3: RGB
        self.observation_space = spaces.Box(low=0, high=255, shape=self.output_full_shape, dtype=np.uint8)

        # Define action frequency
        self.act_freq = config['action_freq']

        # Create pyboy object
        head = 'SDL2'
        self.pyboy = PyBoy(
                config['gb_path'],
                debugging=False,
                disable_input=False,
                window_type=head,
                hide_window='--quiet' in sys.argv,
            )

        # Initialize the state
        self.init_state = config['init_state']
        with open(self.init_state, "rb") as f:
            self.pyboy.load_state(f)  

        # Initialize a generator of a game image
        self.screen = self.pyboy.botsupport_manager().screen()

        # Initailize variables to monitor agent's state and reward        
        self.agent_stats = []
        self.total_reward = 0

### Render the environment

Next, we will define a render function. This function returns the pixel values of the game screen at any given moment. By default, the screen pixel size in PyBoy is set to (144, 160, 3), representing the resolution and color depth (RGB) of the Game Boy's display.

In [4]:
def render(self):
    game_pixels_render = self.screen.screen_ndarray() # (144, 160, 3)
    return game_pixels_render


## Reset the environment

Next, we will define a reset function. When we run multiple episodes of simulation, we call reset function at the beginning of each episode to reset the environment to a predefined initial state. Note that initialization function (__init__) is called when the environment is created only once when the environment is first created. After that, at the beginning of each new episode, reset function will be called for initialization of the environment, ensuring that each episode starts from a consistent state.
Within this function, for our specific case, we will initialize the state and the total reward value as follows:

In [5]:
def reset(self):
    # restart game, skipping credits
    with open(self.init_state, "rb") as f:
        self.pyboy.load_state(f)  
    
    # reset reward value
    self.total_reward = 0
    return self.render(), {}

## Take a step in the environment

Next, we will define step function. We pass an action as its argument. This function moves the agent based on the specified action and returns the new state, obtained reward, and whether the episode is terminated/truncated. For simplicity, we don't consider the termination or truncation condition in this implementaiton. Thus, the episode is terminated when we stop the execution of this code.


In [6]:
def step(self, action):
    
    # take an aciton
    # press button
    self.pyboy.send_input(self.valid_actions[action])
    for i in range(self.act_freq):
        # release action not to keep taking the action
        if i == 8:
            if action < 4:
                # release arrow
                self.pyboy.send_input(self.release_arrow[action])
            if action > 3 and action < 6:
                # release button 
                self.pyboy.send_input(self.release_button[action - 4])
            if self.valid_actions[action] == WindowEvent.PRESS_BUTTON_START:
                self.pyboy.send_input(WindowEvent.RELEASE_BUTTON_START)
                
        # render pyBoy image at the last frame of each block
        if i == self.act_freq-1:
            self.pyboy._rendering(True)
        self.pyboy.tick()

    # store the new agent state obtained from the corresponding memory address
    # memory addresses from https://datacrystal.romhacking.net/wiki/Pok%C3%A9mon_Red/Blue:RAM_map
    X_POS_ADDRESS, Y_POS_ADDRESS = 0xD362, 0xD361
    LEVELS_ADDRESSES = [0xD18C, 0xD1B8, 0xD1E4, 0xD210, 0xD23C, 0xD268]    
    x_pos = self.pyboy.get_memory_value(X_POS_ADDRESS)
    y_pos = self.pyboy.get_memory_value(Y_POS_ADDRESS)
    levels = [self.pyboy.get_memory_value(a) for a in LEVELS_ADDRESSES]
    self.agent_stats.append({
        'x': x_pos, 'y': y_pos, 'levels': levels
    })

    # store the new screen image (i.e. new observation) and reward    
    obs_memory = self.render()
    new_reward = levels
    
    # for simplicity, don't handle terminate or truncated conditions here
    terminated = False # no max number of step
    truncated = False # no max number of step

    return obs_memory, new_reward, terminated, truncated, {}

## Close the environment

Lastly, we define a close function to properly clean up any resources that were used. We will reuse the close function from the parent's class here. At the same time, we add a code to close pyBoy session. 

Lastly, we will define a close function to ensure proper cleanup of any resources used during the simulation. We will inherit and use the close function from the parent class. Additionally, we will include code specifically designed to terminate the PyBoy session.

In [7]:
def close(self):
    self.pyboy.stop() # terminate pyboy session
    super().close() # call close function of parent's class

## Integrate all functions and define a whole class

Let's integrate all functions, define the whole RedGymEnv class, and test our implementation!

In [8]:
class RedGymEnv(Env):
    def __init__(self, config):
        super(RedGymEnv, self).__init__() # should I do this?
        # Define action psace
        self.valid_actions = [
            WindowEvent.PRESS_ARROW_DOWN,
            WindowEvent.PRESS_ARROW_LEFT,
            WindowEvent.PRESS_ARROW_RIGHT,
            WindowEvent.PRESS_ARROW_UP,
            WindowEvent.PRESS_BUTTON_A,
            WindowEvent.PRESS_BUTTON_B,
        ]
        self.action_space = spaces.Discrete(len(self.valid_actions))
        
        self.valid_actions.extend([
            WindowEvent.PRESS_BUTTON_START,
            WindowEvent.PASS
        ])

        self.release_arrow = [
            WindowEvent.RELEASE_ARROW_DOWN,
            WindowEvent.RELEASE_ARROW_LEFT,
            WindowEvent.RELEASE_ARROW_RIGHT,
            WindowEvent.RELEASE_ARROW_UP
        ]

        self.release_button = [
            WindowEvent.RELEASE_BUTTON_A,
            WindowEvent.RELEASE_BUTTON_B
        ]
        
        # Define observation space
        self.output_shape = (144, 160, 1)
        self.output_full_shape = (144, 160, 3) # 3: RGB
        self.observation_space = spaces.Box(low=0, high=255, shape=self.output_full_shape, dtype=np.uint8)

        # Define action frequency
        self.act_freq = config['action_freq']

        # Create pyboy object
        head = 'SDL2'
        self.pyboy = PyBoy(
                config['gb_path'],
                debugging=False,
                disable_input=False,
                window_type=head,
                hide_window='--quiet' in sys.argv,
            )

        # Initialize the state
        self.init_state = config['init_state']
        with open(self.init_state, "rb") as f:
            self.pyboy.load_state(f)  

        # Initialize a generator of a game image
        self.screen = self.pyboy.botsupport_manager().screen()

        # Initailize variables to monitor agent's state and reward        
        self.agent_stats = []
        self.total_reward = 0
        
    def render(self):
        game_pixels_render = self.screen.screen_ndarray() # (144, 160, 3)
        return game_pixels_render


    def reset(self):
        # restart game, skipping credits
        with open(self.init_state, "rb") as f:
            self.pyboy.load_state(f)  
        
        # reset reward value
        self.total_reward = 0
        return self.render(), {}
    
    def step(self, action):
        
        # take an aciton
        # press button
        self.pyboy.send_input(self.valid_actions[action])
        for i in range(self.act_freq):
            # release action not to keep taking the action
            if i == 8:
                if action < 4:
                    # release arrow
                    self.pyboy.send_input(self.release_arrow[action])
                if action > 3 and action < 6:
                    # release button 
                    self.pyboy.send_input(self.release_button[action - 4])
                if self.valid_actions[action] == WindowEvent.PRESS_BUTTON_START:
                    self.pyboy.send_input(WindowEvent.RELEASE_BUTTON_START)
                    
            # render pyBoy image at the last frame of each block
            if i == self.act_freq-1:
                self.pyboy._rendering(True)
            self.pyboy.tick()

        # store the new agent state obtained from the corresponding memory address
        # memory addresses from https://datacrystal.romhacking.net/wiki/Pok%C3%A9mon_Red/Blue:RAM_map
        X_POS_ADDRESS, Y_POS_ADDRESS = 0xD362, 0xD361
        LEVELS_ADDRESSES = [0xD18C, 0xD1B8, 0xD1E4, 0xD210, 0xD23C, 0xD268]    
        x_pos = self.pyboy.get_memory_value(X_POS_ADDRESS)
        y_pos = self.pyboy.get_memory_value(Y_POS_ADDRESS)
        levels = [self.pyboy.get_memory_value(a) for a in LEVELS_ADDRESSES]
        self.agent_stats.append({
            'x': x_pos, 'y': y_pos, 'levels': levels
        })

        # store the new screen image (i.e. new observation) and reward    
        obs_memory = self.render()
        new_reward = levels
        
        # for simplicity, don't handle terminate or truncated conditions here
        terminated = False # no max number of step
        truncated = False # no max number of step

        return obs_memory, new_reward, terminated, truncated, {}

    def close(self):
        self.pyboy.stop() # terminate pyboy session
        super().close() # call close function of parent's class
            

## Visualize the current state

In the below code, after initializing the environment, we choose random action for 30 steps and visualize the pokemon game screen using render function.
Since we need to use Pokemon Rom file to run this environment, we cannot run it on kaggle. Here is how you can run the below code.
1. Download this notebook
2. Legally obtain Pokemon Red ROM file (You can find this using google)
3. Download has_pokedex_nballs.state file from [this github repository](https://github.com/PWhiddy/PokemonRedExperiments)
4. Upload the below two path variables based on where each file is on your machine
5. Uncomment below cell
6. Ready to run!

In [9]:
ROM_PATH = ""
INIT_STATE_FILE_PATH = ""

In [10]:
# env_config = {
#             'action_freq': 24, 'init_state': INIT_STATE_PATH,
#             'gb_path': ROM_PATH
#         }
# env = RedGymEnv(env_config)
# env.reset()
# states = []
# rewards = []

# try:
#     for i in range(30): # run for 30 steps
#         random_action = np.random.choice(list(range(len(env.valid_actions))),size=1)[0]
#         observation, reward, terminated, truncated, _ = env.step(random_action)
#         states.append(observation)
#         rewards.append(reward)

#         # Display the current state of the environment
#         clear_output(wait=True)
#         plt.imshow(env.render())
#         plt.show()
# finally:
#     env.close()


After setting up your environment, you should be able to see something like below video.

In [11]:
from IPython.display import HTML

HTML('<div align="center"><iframe align = "middle" width="790" height="440" src="https://www.youtube.com/embed/kX_hQjFWqs4?si=Hh5toQP2m2fczMCe" title="Pokemon 30 steps" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe></div>')


It is working - RRed is now navigating the game screen in response to the action commands. With gymnasium, we've successfully created a custom environment for training RL agents.

In future notebooks, I plan to use this environment for training RL agents. Stay tuned for updates and progress!

## Reference

- PokemonRedExperiments github repository by Peter Whidden (https://github.com/PWhiddy/PokemonRedExperiments)
- PyBoy github repository (https://github.com/Baekalfen/PyBoy)