# Quickstart

In this section, we will see the main use cases with the Sapientino environment.

The environment is supposed to be configurable.
At the moment, there isn't a default goal to achieve.
The reward should be customized before using the environment.

In [1]:
import gymnasium as gym
import gym_sapientino as sapientino

## Building the environment

First, we set up an agent configuration:

In [2]:
agent_config = sapientino.configurations.SapientinoAgentConfiguration(
    initial_position=[1,1],
    commands=sapientino.actions.GridCommand,
    # Other agent parameters here
)

Next, we define the configuration for the environment:

In [4]:
agent_configs = [agent_config,]
environment_config = sapientino.configurations.SapientinoConfiguration(
    agent_configs=agent_configs,
    reward_outside_grid=-1.0,
    reward_duplicate_beep=-1.0,
    reward_per_step=-0.01,
    #grid_map= ascii_str,
)

The description of the arguments:

- `agent_configs`: the list of agent configurations (provide more than one for multi-agent setting)
- `reward_outside_grid`: the reward to give when the robot tries to go outside the grid.
- `reward_duplicate_beep`:  the reward to give when the robot does a beep in a cell where the beep has been already done.
- `reward_per_step`: the reward to give at each step.

Then, instantiate the environment:

In [5]:
env = sapientino.Sapientino(environment_config)
print(f"Observation space: {env.observation_space}")
print(f"Action space: {env.action_space}")
initial_state = env.reset()
print(f"Initial state: {initial_state}")

Observation space: Tuple(Dict('angle': Box(0.0, 360.0, (1,), float32), 'beep': Discrete(2), 'color': Discrete(11), 'discrete_x': Discrete(7), 'discrete_y': Discrete(5), 'theta': Discrete(4), 'velocity': Box(-0.1, 0.2, (1,), float32), 'x': Box(0.0, 7.0, (1,), float32), 'y': Box(0.0, 5.0, (1,), float32)))
Action space: Tuple(Discrete(6))
Initial state: (({'discrete_x': 1, 'discrete_y': 1, 'x': array([1.], dtype=float32), 'y': array([1.], dtype=float32), 'velocity': array([0.], dtype=float32), 'theta': 1, 'angle': array([90.], dtype=float32), 'beep': 0, 'color': 4},), {})


The observation space
is a tuple of dictionaries (one for each agent) of the following form:

- `x`, the $x$-coordinate of the robot in the grid
- `y`, the $y$-coordinate of the robot in the grid
- `discrete_x`, the $x$-coordinate of the robot in the grid, discretized.
- `discrete_y`, the $y$-coordinate of the robot in the grid, discretized.
- `velocity`, the module of the velocity of the robot.
- `angle`, the direction of the robot.
- `theta`, the orientation of the robot in the grid
  (that is, either $0^\circ$, $90^\circ$, $180^\circ$ or $270^\circ$, discretized so to be between $0$ and $3$).
  This attribute is only present in the `differential` mode (see below).
- `beep`, a boolean that tells whether the last action was a beep.
- `color`, the currently observed color (blank color is $0$).

In the single-agent configuration,
there would be a tuple of only one such observation.

The action space is either "directional" (up, down, left, right)
or "differential" ("turn left", "turn right", "forward", "backward"),
plus a "nop" action and a "beep" action.
The boolean argument `differential` in the agent configuration
controls the action spaces of the associated agent.

Example of directional agent:

<center>
    <img src="/directional.gif">
</center>

Exmaple of differential agent:

<center>
    <img src="/differential.gif">
</center>

With `continuous=True`, you can enable continuous
state space, i.e.:

<center>
    <img src="/continuous.gif">
</center>

## Multiagent setup

It is possible to have multiple agents in the same grid by passing more than one `agent_config`.