# Build Single-Agent Environment

In XuanCe, users have the flexibility to create and run their own customized environments in addition to utilizing the provided ones.

We need to install XuanCe before getting started.

In [1]:
!pip install xuance

Collecting xuance
  Downloading xuance-1.3.1.tar.gz (490 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/490.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━[0m [32m286.7/490.3 kB[0m [31m8.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m490.3/490.3 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting pyglet==1.5.15 (from xuance)
  Downloading pyglet-1.5.15-py3-none-any.whl.metadata (7.6 kB)
Collecting pettingzoo (from xuance)
  Downloading pettingzoo-1.25.0-py3-none-any.whl.metadata (8.9 kB)
Collecting mpi4py (from xuance)
  Downloading mpi4py-4.1.0-cp311-cp311-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (16 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->xuance)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Colle

## Step 1: Create and Registry a New Environment

First, you need to prepare an original environment, i.e., an Markov decision process.
Then define a new environment based on the basic class ``RawEnvironment`` of XuanCe.
After defining a new class of environment, you need to add it to the ``REGISTRY_ENV``.

Here is an example:

In [4]:
import numpy as np
from gymnasium.spaces import Box
from xuance.environment import RawEnvironment, REGISTRY_ENV

class MyNewEnv(RawEnvironment):
    def __init__(self, env_config):
        super(MyNewEnv, self).__init__()
        self.env_id = env_config.env_id  # The environment id.
        self.observation_space = Box(-np.inf, np.inf, shape=[18, ])  # Define observation space.
        self.action_space = Box(-np.inf, np.inf, shape=[5, ])  # Define action space. In this example, the action space is continuous.
        self.max_episode_steps = 32  # The max episode length.
        self._current_step = 0  # The count of steps of current episode.

    def reset(self, **kwargs):  # Reset your environment.
        self._current_step = 0
        return self.observation_space.sample(), {}

    def step(self, action):  # Run a step with an action.
        self._current_step += 1
        observation = self.observation_space.sample()
        rewards = np.random.random()
        terminated = False
        truncated = False if self._current_step < self.max_episode_steps else True
        info = {}
        return observation, rewards, terminated, truncated, info

    def render(self, *args, **kwargs):  # Render your environment and return an image if the render_mode is "rgb_array".
        return np.ones([64, 64, 64])

    def close(self):  # Close your environment.
        return


## Step 2: Create the Config File and Read the Configurations

Then, you need to create a YAML file by following the step 1 in :doc:`Further Usage <further_usage>`.

Here is an example of configurations for DDPG algorithm, named "ddpg_new_env.yaml".

In [15]:
import textwrap

yaml_content = textwrap.dedent("""
    dl_toolbox: "torch"  # The deep learning toolbox. Choices: "torch", "mindspore", "tensorlayer"
    project_name: "XuanCe_Benchmark"
    logger: "tensorboard"  # Choices: tensorboard, wandb.
    wandb_user_name: "your_user_name"
    render: True
    render_mode: 'rgb_array' # Choices: 'human', 'rgb_array'.
    fps: 50
    test_mode: False
    device: "cpu"
    distributed_training: False
    master_port: '12355'

    agent: "DDPG"
    env_name: "MyNewEnv"
    env_id: "new-v1"
    env_seed: 1
    vectorize: "DummyVecEnv"
    policy: "DDPG_Policy"
    representation: "Basic_Identical"
    learner: "DDPG_Learner"
    runner: "DRL"

    representation_hidden_size:  # If you choose Basic_Identical representation, then ignore this value
    actor_hidden_size: [400, 300]
    critic_hidden_size: [400, 300]
    activation: "leaky_relu"
    activation_action: 'tanh'

    seed: 19089
    parallels: 4  # number of environments
    buffer_size: 200000  # replay buffer size
    batch_size: 100
    learning_rate_actor: 0.001
    learning_rate_critic: 0.001
    gamma: 0.99
    tau: 0.005

    start_noise: 0.5
    end_noise: 0.1
    training_frequency: 1
    running_steps: 1000000  # 1M
    start_training: 10000

    use_grad_clip: False  # gradient normalization
    grad_clip_norm: 0.5
    use_obsnorm: False
    use_rewnorm: False
    obsnorm_range: 5
    rewnorm_range: 5

    test_steps: 10000
    eval_interval: 5000
    test_episode: 5

    log_dir: "./logs/ddpg/"
    model_dir: "./models/ddpg/"
""")

with open("ddpg_new_env.yaml", "w") as f:
    f.write(yaml_content)

## Step 3: Read the Paramters

In [16]:
import argparse
from xuance.common import get_configs
configs_dict = get_configs(file_dir="ddpg_new_env.yaml")
configs = argparse.Namespace(**configs_dict)

REGISTRY_ENV[configs.env_name] = MyNewEnv

## Step 4: Make Your Environment and Run it with XuanCe

You can now make your environment and run it directly with XuanCe's algorithms.

Here is the example of DDPG algorithm:    

In [None]:
from xuance.environment import make_envs
from xuance.torch.agents import DDPG_Agent

envs = make_envs(configs)  # Make parallel environments.
Agent = DDPG_Agent(config=configs, envs=envs)  # Create a DDPG agent from XuanCe.
Agent.train(configs.running_steps // configs.parallels)  # Train the model for numerous steps.
Agent.save_model("final_train_model.pth")  # Save the model to model_dir.
Agent.finish()  # Finish the training.

  2%|▏         | 4916/250000 [00:48<1:16:19, 53.52it/s]

## Full code

The full code for the above steps can be visited in this link:
[https://github.com/agi-brain/xuance/blob/master/examples/new_environments/ddpg_new_env.py](https://github.com/agi-brain/xuance/blob/master/examples/new_environments/ddpg_new_env.py)