# Overcooked Tutorial
This Notebook will demonstrate a couple of common use cases of the Overcooked-AI library, including loading and evaluating agents and visualizing trajectories.


In [1]:
import sys

sys.path.append("src")

In [2]:
from overcooked_ai_py.agents.agent import AgentPair, RandomAgent
from overcooked_ai_py.agents.benchmarking import AgentEvaluator
from overcooked_ai_py.visualization.state_visualizer import StateVisualizer

# Here we create an evaluator for the cramped_room layout
layout = "cramped_room"
ae = AgentEvaluator.from_layout_name(mdp_params={"layout_name": layout, "old_dynamics": True}, 
                                     env_params={"horizon": 400})

ap = AgentPair(RandomAgent(), RandomAgent())

trajs = ae.evaluate_agent_pair(ap, 10)

trajs2 = ae.evaluate_human_model_pair(1)


StateVisualizer().display_rendered_trajectory(trajs2, ipython_display=True)

Avg rew: 0.00 (std: 0.00, se: 0.00); avg len: 400.00; : 100%|██████████| 10/10 [00:00<00:00, 25.70it/s]
Avg rew: 200.00 (std: 0.00, se: 0.00); avg len: 400.00; : 100%|██████████| 1/1 [00:00<00:00, 13.45it/s]


interactive(children=(IntSlider(value=0, description='timestep', max=399), Output()), _dom_classes=('widget-in…

# Deprecated stuff which requires BC and RL training (see README for details)

# Getting started: Training your agent

You can train BC agents using files under the `human_aware_rl/imitation` directory. 

In [3]:
layout = "cramped_room" # any compatible layouts 
from human_aware_rl.imitation.behavior_cloning_tf2 import get_bc_params, train_bc_model
from human_aware_rl.static import CLEAN_2019_HUMAN_DATA_TRAIN

params_to_override = {
    # this is the layouts where the training will happen
    "layouts": [layout], 
    # this is the layout that the agents will be evaluated on
    # Most of the time they should be the same, but because of refactoring some old layouts have more than one name and they need to be adjusted accordingly
    "layout_name": layout, 
    "data_path": CLEAN_2019_HUMAN_DATA_TRAIN,
    "epochs": 10,
    "old_dynamics": True,
}

bc_params = get_bc_params(**params_to_override)
train_bc_model("tutorial_notebook_results/BC", bc_params, verbose = True)

2025-04-01 15:27:21.010507: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-01 15:27:21.011163: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-04-01 15:27:21.013562: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.
2025-04-01 15:27:21.018862: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1743510441.028601  162213 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743510441.03

Loading data from /home/benjamin/Projects/overcooked_ai/src/human_aware_rl/static/human_data/cleaned/2019_hh_trials_train.pickle
Number of trajectories processed for each layout: {'cramped_room': 14}
Epoch 1/10


2025-04-01 15:27:31.394058: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
Expected: Overcooked_observation
Received: inputs=['Tensor(shape=(None, 96))']


446/446 - 1s - 3ms/step - loss: 0.9473 - sparse_categorical_accuracy: 0.7215 - val_loss: 0.8799 - val_sparse_categorical_accuracy: 0.7058 - learning_rate: 0.0010
Epoch 2/10
446/446 - 1s - 1ms/step - loss: 0.8432 - sparse_categorical_accuracy: 0.7245 - val_loss: 0.8174 - val_sparse_categorical_accuracy: 0.7056 - learning_rate: 0.0010
Epoch 3/10
446/446 - 1s - 1ms/step - loss: 0.8050 - sparse_categorical_accuracy: 0.7234 - val_loss: 0.7863 - val_sparse_categorical_accuracy: 0.7020 - learning_rate: 0.0010
Epoch 4/10
446/446 - 1s - 1ms/step - loss: 0.7885 - sparse_categorical_accuracy: 0.7234 - val_loss: 0.7751 - val_sparse_categorical_accuracy: 0.6980 - learning_rate: 0.0010
Epoch 5/10
446/446 - 1s - 1ms/step - loss: 0.7751 - sparse_categorical_accuracy: 0.7227 - val_loss: 0.7703 - val_sparse_categorical_accuracy: 0.7046 - learning_rate: 0.0010
Epoch 6/10
446/446 - 1s - 1ms/step - loss: 0.7693 - sparse_categorical_accuracy: 0.7242 - val_loss: 0.7671 - val_sparse_categorical_accuracy: 0.70

<Functional name=functional, built=True>

# 1): Loading trained agents
This section will show you how to load a pretrained agents. 

## 1.1) Loading BC agent
The BC (behavior cloning) agents are trained separately without using Ray. We showed how to train a BC agent in the previous section, and to load a trained agent, we can use the load_bc_model function

In [4]:
from human_aware_rl.imitation.behavior_cloning_tf2 import load_bc_model
#this is the same path you used when training the BC agent
bc_model_path = "tutorial_notebook_results/BC"
bc_model, bc_params = load_bc_model(bc_model_path)
bc_model, bc_params

(<Functional name=functional, built=True>,
 {'eager': True,
  'use_lstm': False,
  'cell_size': 256,
  'data_params': {'layouts': ['cramped_room'],
   'check_trajectories': False,
   'featurize_states': True,
   'data_path': '/home/benjamin/Projects/overcooked_ai/src/human_aware_rl/static/human_data/cleaned/2019_hh_trials_train.pickle'},
  'mdp_params': {'layout_name': 'cramped_room', 'old_dynamics': True},
  'env_params': {'horizon': 400,
   'mlam_params': {'start_orientations': False,
    'wait_allowed': False,
    'counter_goals': [],
    'counter_drop': [],
    'counter_pickup': [],
    'same_motion_goals': True}},
  'mdp_fn_params': {},
  'mlp_params': {'num_layers': 2, 'net_arch': [64, 64]},
  'training_params': {'epochs': 10,
   'validation_split': 0.15,
   'batch_size': 64,
   'learning_rate': 0.001,
   'use_class_weights': False},
  'evaluation_params': {'ep_length': 400, 'num_games': 1, 'display': False},
  'action_shape': (6,),
  'observation_shape': (96,)})

Now that we have loaded the model, since we used Tensorflow to train the agent, we need to wrap it so it is compatible with other agents. We can do it by converting it to a Rllib-compatible policy class, and wraps it as a RllibAgent. 

In [5]:
from human_aware_rl.imitation.behavior_cloning_tf2 import _get_base_ae, BehaviorCloningPolicy
bc_policy = BehaviorCloningPolicy.from_model(bc_model, bc_params, stochastic=True)
# We need the featurization function that is specifically defined for BC agent
# The easiest way to do it is to create a base environment from the configuration and extract the featurization function
# The environment is also needed to do evaluation

base_ae = _get_base_ae(bc_params)
base_env = base_ae.env

from human_aware_rl.rllib.rllib import RlLibAgent
bc_agent0 = RlLibAgent(bc_policy, 0, base_env.featurize_state_mdp)
bc_agent0

bc_agent1 = RlLibAgent(bc_policy, 1, base_env.featurize_state_mdp)
bc_agent1

<human_aware_rl.rllib.rllib.RlLibAgent at 0x7ec356344760>

Now we have a BC agent that is ready for evaluation 

## 1.3) Loading & Creating Agent Pair

To do evaluation, we need a pair of agents, or an AgentPair. We can directly load a pair of agents for evaluation, which we can do with the load_agent_pair function, or we can create an AgentPair manually from 2 separate RllibAgent instance. To directly load an AgentPair from a trainer:

To create an AgentPair manually, we can just pair together any 2 RllibAgent object. For example, we have created a **ppo_agent** and a **bc_agent**. To pair them up, we can just construct an AgentPair with them as arguments.

In [6]:
from human_aware_rl.rllib.rllib import AgentPair
ap_bc = AgentPair(bc_agent0, bc_agent1)
ap_bc

<overcooked_ai_py.agents.agent.AgentPair at 0x7ec35ada6110>

# 2): Evaluating AgentPair

To evaluate an AgentPair, we need to first create an AgentEvaluator. You can create an AgentEvaluator in various ways, but the simpliest way to do so is from the layout_name. 

You can modify the settings of the layout by changing the **mdp_params** argument, but most of the time you should only need to include "layout_name", which is the layout you want to evaluate the agent pair on, and "old_dynamics", which determines whether the envrionment conforms to the design in the Neurips2019 paper, or whether the cooking should start automatically when all ingredients are present.  

For the **env_params**, you can change how many steps are there in one evaluation. The default is 400, which means the game runs for 400 timesteps. 

In [7]:
from overcooked_ai_py.agents.benchmarking import AgentEvaluator
# Here we create an evaluator for the cramped_room layout
layout = "cramped_room"
ae = AgentEvaluator.from_layout_name(mdp_params={"layout_name": layout, "old_dynamics": True}, 
                                     env_params={"horizon": 400})
ae

<overcooked_ai_py.agents.benchmarking.AgentEvaluator at 0x7ec356816950>

To run evaluations, we can use the evaluate_agent_pair method associated with the AgentEvaluator:

In [8]:
# ap: The AgentPair we created earlier
# 10: how many times we should run the evaluation since the policy is stochastic
trajs = ae.evaluate_agent_pair(ap_bc, 10)
trajs

Avg rew: 50.00 (std: 20.49, se: 6.48); avg len: 400.00; : 100%|██████████| 10/10 [06:03<00:00, 36.31s/it]


{'ep_dones': array([[False, False, False, ..., False, False, True],
        [False, False, False, ..., False, False, True],
        [False, False, False, ..., False, False, True],
        ...,
        [False, False, False, ..., False, False, True],
        [False, False, False, ..., False, False, True],
        [False, False, False, ..., False, False, True]], dtype=object),
 'mdp_params': array([{'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'], ['X', 'D', 'X', 'S', 'X']], 'start_player_positions': [(1, 2), (3, 1)], 'start_bonus_orders': [], 'rew_shaping_params': {'PLACEMENT_IN_POT_REW': 3, 'DISH_PICKUP_REWARD': 3, 'SOUP_PICKUP_REWARD': 5, 'DISH_DISP_DISTANCE_REW': 0, 'POT_DISTANCE_REW': 0, 'SOUP_DISTANCE_REW': 0}, 'start_all_orders': [{'ingredients': ['onion', 'onion', 'onion']}]},
        {'layout_name': 'cramped_room', 'terrain': [['X', 'X', 'P', 'X', 'X'], ['O', ' ', ' ', ' ', 'O'], ['X', ' ', ' ', ' ', 'X'],

The result returned by the AgentEvaluator contains detailed information about the evaluation runs, including actions taken by each agent at each timestep. Usually you don't need to directly interact with them, but the most direct performance measures can be retrieved with result["ep_returns"], which returns the average sparse reward of each evaluation run

In [9]:
trajs["ep_returns"]

array([80, 80, 40, 20, 60, 40, 60, 20, 40, 60])

In [12]:
result = ae.evaluate_agent_pair(ap_bc, 1, 400)

Avg rew: 60.00 (std: 0.00, se: 0.00); avg len: 400.00; : 100%|██████████| 1/1 [00:36<00:00, 36.53s/it]


# 3): Visualization

We can also visualize the trajectories of agents. One way is to run the web demo with the agents you choose, and the specific instructions can be found in the [overcooked_demo](https://github.com/HumanCompatibleAI/overcooked_ai/tree/master/src/overcooked_demo) module, which requires some setup. Another simpler way is to use the StateVisualizer, which uses the information returned by the AgentEvaluator to create a simple dynamic visualization. You can checkout [this Colab Notebook](https://colab.research.google.com/drive/1AAVP2P-QQhbx6WTOnIG54NXLXFbO7y6n#scrollTo=6Xlu54MkiXCR) that let you play with fixed agents

In [13]:
from overcooked_ai_py.visualization.state_visualizer import StateVisualizer
StateVisualizer().display_rendered_trajectory(trajs, ipython_display=True)

interactive(children=(IntSlider(value=0, description='timestep', max=399), Output()), _dom_classes=('widget-in…

This should spawn a window where you can see what the agents are doing at each timestep. You can drag the slider to go forward and backward in time.