# Operation Safe Passage: Gym Environment

The `OSPGym` class is a Gymnasium-compatible wrapper around the Mission Controller.  
It provides a reinforcement learning (RL) interface where:

- **Actions** are encoded in a fixed `MultiDiscrete` vector.
- **Observations** are returned as a flat `Box` of floats.
- **Rewards** can use the default shaping or a custom function.
- Standard Gymnasium API (`reset`, `step`, `render`) is implemented.

In [None]:
from operation_safe_passage.controller.osp_gym import OSPGym
import numpy as np

## Initialization

The environment requires:
- `params.json`: agent and scanner parameters, processing times, etc.
- `network.json`: map generated by the Map Generator.

Arguments:
- `param_path`: path to params file.
- `network_path`: path to network file.
- `output_dir`: where outputs are written.
- `max_steps`: episode truncation step limit (default: 2000).
- `reward_fn`: optional custom reward function.
- `seed`: RNG seed for reproducibility.

In [None]:
env = OSPGym(
    param_path="config/params_agents.json",
    network_path="config/premade_network.json",
    output_dir="output",
    max_steps=200
)

obs, info = env.reset()
print("Observation shape:", obs.shape)
print("Info keys:", info.keys())

## Action Space

The action vector is a `MultiDiscrete` array in fixed order:
[ UAV0_move, UAV0_scan, UAV1_move, UAV1_scan, ..., UGV0_move, UGV1_move, ... ]


- **Move**:  
  0–5 = directions (`E, NE, NW, W, SW, SE`)  
  6 = noop (no move)

- **Scan (UAV only)**:  
  0..S-1 = scanner index for UAV i  
  S = no scan

- **UGVs**:  
  Only have a `move` entry (no scan).

In [None]:
print("Action space:", env.action_space)

# Example random action
action = env.action_space.sample()
print("Random action:", action)

## Observation Space

Observations are returned as a flat float vector (`Box`).

For each agent, values are concatenated in this order:

```python
[ time,
distance_to_goal,
distance_to_goal_weighted (or -1),
current.weight,
current.uav_estimate (or -1 if not present),
current.temperature,
current.wind_speed,
current.visibility,
current.precipitation,
neighbor_weight[E,NE,NW,W,SW,SE] (missing -> -1),
distances_to_other_agents (padded to len(agents)-1 with -1)
]
```

- **time**: mission time (float).  
- **distance_to_goal**: hex distance from agent to mission end.  
- **distance_to_goal_weighted**: terrain-weighted distance (or -1 if not applicable).  
- **weight**: node risk weight (e.g., probability of mine).  
- **uav_estimate**: UAV scan probability estimate (or -1 if UGV).  
- **environment values**: temperature, wind speed, visibility, precipitation.  
- **neighbors**: terrain weights of adjacent nodes, ordered by directions.  
- **agent distances**: distances to all other agents.  

In [None]:
obs, reward, terminated, truncated, info = env.step(action)

print("Reward:", reward)
print("Terminated:", terminated)
print("Truncated:", truncated)
print("Obs length:", len(obs))

## Rewards

Default shaping:

- `-1` per step  
- `+1000` if any UGV reaches the goal  
- `-10` if episode truncates without success

You can override with a custom function:

```python
def my_reward(state, terminated, truncated):
    return -0.5 + (200 if terminated else 0)

env = OSPGym(reward_fn=my_reward)

In [None]:
obs, info = env.reset()
done = False
total_reward = 0

while not done:
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    done = terminated or truncated

print("Episode total reward:", total_reward)

## Render

The `render()` method provides a text summary:

- Mission time
- Each agent’s distance to goal
- Each agent’s current weight

Example:
```python
time=50.0
uav_alpha: d_goal=35, w=40.00
UGV_0: d_goal=37, w=100.00
```

In [None]:
print(env.render())

## Summary

- **OSPGym** wraps the Mission Controller for Gymnasium.
- Action space: flat `MultiDiscrete` vector encoding moves/scans.
- Observation space: flat `Box` array of concatenated agent features.
- Default reward: -1 per step, +1000 on success, -10 on truncation.
- Compatible with RL libraries such as `stable-baselines3`.