## 2 Player Escape Room Game

- Can 2 LLM Agents fulfil an arbitrary task in a Escape Room-like gridworld?
- Let's find out!
- **Created:** 19 Aug 2023 with two agents navigating to two blue squares
- **Updated:** 
    - 26 Aug 2023: Added in pick and place tasks
    - 22 Aug 2023: Added in multiple colored squares


In [1]:
import pygame
import sys
import random
import os
import openai
import json
import re
import time

#API Keys
os.environ['OPENAI_API_TOKEN'] = 'YOUR_API_KEY_HERE'
openai.api_key = os.environ['OPENAI_API_TOKEN']

pygame 2.5.1 (SDL 2.28.2, Python 3.11.3)
Hello from the pygame community. https://www.pygame.org/contribute.html


## Strict JSON Framework

- Taken from: https://github.com/tanchongmin/strictjson

- **system_prompt**: Write in whatever you want GPT to become. "You are a \<purpose in life\>"
- **user_prompt**: The user input. Later, when we use it as a function, this is the function input
- **output_format**: JSON format with the key as the output key, and the value as the output description
    - The output keys will be preserved exactly, while GPT will generate content to match the description of the value as best as possible

#### Example Usage
```python
res = strict_output(system_prompt = 'You are a classifier',
                    user_prompt = 'It is a beautiful day',
                    output_format = {"Sentiment": "Type of Sentiment",
                                    "Tense": "Type of Tense"})
                                    
print(res)
```

#### Example output
```{'Sentiment': 'Positive', 'Tense': 'Present'}```


In [3]:
def strict_output(system_prompt, user_prompt, output_format, default_category = "", output_value_only = False,
                  model = 'gpt-3.5-turbo', temperature = 0, num_tries = 2, verbose = False):
    ''' Ensures that OpenAI will always adhere to the desired output json format. 
    Uses rule-based iterative feedback to ask GPT to self-correct.
    Keeps trying up to num_tries it it does not. Returns empty json if unable to after num_tries iterations.
    If output field is a list, will treat as a classification problem and output best classification category.
    Text enclosed within < > will generated by GPT accordingly'''

    # if the user input is in a list, we also process the output as a list of json
    list_input = isinstance(user_prompt, list)
    # if the output format contains dynamic elements of < or >, then add to the prompt to handle dynamic elements
    dynamic_elements = '<' in str(output_format)
    # if the output format contains list elements of [ or ], then we add to the prompt to handle lists
    list_output = '[' in str(output_format)
    
    # start off with no error message
    error_msg = ''
    
    for i in range(num_tries):
        
        output_format_prompt = f'''\nYou are to output the following in json format: {output_format}. 
Do not put quotation marks or escape character \ in the output fields.'''
        
        if list_output:
            output_format_prompt += f'''\nIf output field is a list, classify output into the best element of the list.'''
        
        # if output_format contains dynamic elements, process it accordingly
        if dynamic_elements: 
            output_format_prompt += f'''
Any text enclosed by < and > indicates you must generate content to replace it. Example input: Go to <location>, Example output: Go to the garden
Any output key containing < and > indicates you must generate the key name to replace it. Example input: {{'<location>': 'description of location'}}, Example output: {{'school': 'a place for education'}}'''

        # if input is in a list format, ask it to generate json in a list
        if list_input:
            output_format_prompt += '''\nGenerate a list of json, one json for each input element.'''
            
        # Use OpenAI to get a response
        response = openai.ChatCompletion.create(
          temperature = temperature,
          model=model,
          messages=[
            {"role": "system", "content": system_prompt + output_format_prompt + error_msg},
            {"role": "user", "content": str(user_prompt)}
          ]
        )

        res = response['choices'][0]['message']['content'].replace('\'', '"')
        
        # ensure that we don't replace away aprostophes in text 
        res = re.sub(r"(\w)\"(\w)", r"\1'\2", res)

        if verbose:
            print('System prompt:', system_prompt + output_format_prompt + error_msg)
            print('\nUser prompt:', str(user_prompt))
            print('\nGPT response:', res)
    
        # try-catch block to ensure output format is adhered to
        try:
            output = json.loads(res)
            if isinstance(user_prompt, list):
                if not isinstance(output, list): raise Exception("Output format not in a list of json")
            else:
                output = [output]
                
            # check for each element in the output_list, the format is correctly adhered to
            for index in range(len(output)):
                for key in output_format.keys():
                    # unable to ensure accuracy of dynamic output header, so skip it
                    if '<' in key or '>' in key: continue
                    # if output field missing, raise an error
                    if key not in output[index]: raise Exception(f"{key} not in json output")
                    # check that one of the choices given for the list of words is an unknown
                    if isinstance(output_format[key], list):
                        choices = output_format[key]
                        # ensure output is not a list
                        if isinstance(output[index][key], list):
                            output[index][key] = output[index][key][0]
                        # output the default category (if any) if GPT is unable to identify the category
                        if output[index][key] not in choices and default_category:
                            output[index][key] = default_category
                        # if the output is a description format, get only the label
                        if ':' in output[index][key]:
                            output[index][key] = output[index][key].split(':')[0]
                            
                # if we just want the values for the outputs
                if output_value_only:
                    output[index] = [value for value in output[index].values()]
                    # just output without the list if there is only one element
                    if len(output[index]) == 1:
                        output[index] = output[index][0]
                    
            return output if list_input else output[0]

        except Exception as e:
            error_msg = f"\n\nResult: {res}\n\nError message: {str(e)}"
            print("An exception occurred:", str(e))
            print("Current invalid json format:", res)
         
    return {}

# Agents
- MetaData Format (Dictionary)
    - 'agent1': [x, y] position of agent1
    - 'agent2': [x, y] position of agent2
    - 'task': task description in plain text
    - 'objects not in inventory': objects in the world that are not in inventory. Of the format object_name: [x, y]
    - 'world_description': description of the grid world and how the actions impact the agent
    - 'blue_squares': list of [x, y] positions of blue squares
    - 'green_squares': list of [x, y] positions of green squares
    - 'red_squares': list of [x, y] positions of red squares
    - 'grid_size': dimension of grid, boundary of grid is a wall
    - 'valid_moves': valid moves and their descriptions

In [4]:
def findnearest(startpos, endposlist):
    ''' finds the nearest square given start position to end position, and starting direction '''
    nearest, direction = 1e9, None
    for endpos in endposlist:
        totaldist = abs(endpos[0]-startpos[0])+abs(endpos[1]-startpos[1])
        if totaldist < nearest:
            nearest = totaldist
            if endpos[1] > startpos[1]: direction = 'Down'
            elif endpos[1] < startpos[1]: direction = 'Up'
            elif endpos[0] > startpos[0]: direction = 'Right'
            elif endpos[0] < startpos[0]: direction = 'Left'
            else: direction = 'None'
    return endpos, direction
    
def NoneAgent(metadata, agent, memory):
    ''' Does nothing'''
    return ''

def RandomAgent(metadata, agent, memory):
    ''' Makes a move randomly '''
    return random.choice(metadata['valid_moves'])['Name']

def NearestCellAgent(metadata, agent, memory):
    ''' Makes a move to nearest blue cell '''
    start_cell = metadata[agent]
    endpos, direction = findnearest(start_cell, metadata['blue_squares'])
    memory['Goal'] = endpos
    return direction

def NearestCellCoopAgent(metadata, agent, memory):
    ''' Makes a move to nearest blue cell, does not move to cell other agent is occupying '''
    start_cell = metadata[agent]
    other_agent = 'agent1' if agent == 'agent2' else 'agent2'
    blue_squares = metadata['blue_squares']
    # remove from choice if other agent is within cell
    if metadata[other_agent] in blue_squares: 
        blue_squares.remove(metadata[other_agent])
    endpos, direction = findnearest(start_cell, blue_squares)
    memory['Goal'] = endpos
    return direction

In [20]:
def LLMAgent(metadata, agent, memory, MAX_MEM = 30):
    ''' Uses LLM to decide what action to take. Make sure action lies within valid_moves '''
    # if there is a series of Next Moves but no more moves left, just wait for central planner again
    if 'List of Next Moves' in memory and len(memory['List of Next Moves']) == 0:
        return 'None'
    # If there is no plan so far for next moves, ask LLM again
    if 'List of Next Moves' not in memory:
        ### Use LLM decision making agent to plan a series of moves   
        res = strict_output(system_prompt = f'''You are {agent}.
You are given the following metadata: {metadata}.
Past memory: {memory}
Choose one of the moves in valid_moves to achieve the task.
Example Goal: pick up flower at [1, 2] and place it at blue square at [3, 4]
Example List of Next Moves: ["Goto 1 2", "Pickup flower", "Goto 3 4", "Place flower"]
Example Goal: Step on green square at [3, 4]
Example List of Next Moves: ["Goto 3 4"]
Example Goal: Goal has been achieved!
Example List of Next Moves: ["None"]
''', 
            user_prompt = '',
    #         output_format = {
    # "Current Position": "Your current position, for example [0, 1]",
    # "Goal": "Your desired goal destination to reach, for example [1, 2]",
    # "Offset": "Offset from Current Position to Goal. For example, for Current Position [1, 1] to Goal [3, 0], Offset is [2, -1]",
    # "Thoughts": "How to get to goal destination from current position. Do not use quotation marks",
    # "List of Next Moves": "A list of valid_moves to be executed that will get you to the destination. For example, ['Down', 'Right']."},
    # verbose = True)
    
            output_format = {
    "Current Position": "Your current position, for example [0, 1]",
    "Inventory": "Your current inventory, for example flower. If nothing, display 'No object'",
    "Goal": "Goal stored in Past memory",
    "Thoughts": "How to get to achieve goal. Do not use quotation marks",
    "List of Next Moves": '''A list of valid_moves to be executed that will fulfil the goal. 
For example, ["Goto 1 2", "Pickup flower", "Goto 3 4", "Place flower"].'''},
    verbose = True)

        # Add List of Moves
        memory["List of Next Moves"] = res["List of Next Moves"]
        
    ## Replace any Goto for next move with the exact list of moves
    ## TODO: Add pathfinding if necessary
    action, *param = memory["List of Next Moves"][0].split(' ')
    if action == 'Goto':
        curpos = metadata[agent]['pos']
        newpos = [int(x) for x in param]
        offset = curpos[0]-newpos[0], curpos[1] - newpos[1]
        newmoves = []
        newmoves.extend(['Right' if offset[0]<0 else 'Left'  for _ in range(abs(offset[0]))])
        newmoves.extend(['Down' if offset[1]<0 else 'Up' for _ in range(abs(offset[1]))])
        if len(newmoves) == 0: newmoves = ['None']
        
        newmoves.extend(memory["List of Next Moves"][1:])
        print('Before subtasks:', memory["List of Next Moves"])
        memory["List of Next Moves"] = newmoves
        print('After subtasks:', memory["List of Next Moves"])
    
    move = memory['List of Next Moves'].pop(0)
    
    # Add to memory
    if 'history' not in memory:
        memory['history'] = []
    memory['history'].append(str(metadata[agent]))
    memory['history'].append(move)
    # Truncate the memory
    memory['history'] = memory['history'][:MAX_MEM]

    return move

def Conversation(memory1, memory2, metadata, controllable_agents, turns):
    ''' Uses the memory to converse between two AI agents '''
    ### TO-DO Implement conversation (Now it is a central planner. Possible to have agents free-flow communicate)
    ### TO-DO Implement dynamic goal planner if the agent is not listening
    res = strict_output(system_prompt = f'''You are a central planner for agents.
You can only control {controllable_agents}. 
If you cannot control an agent, you can only infer their desired goal, and replan the other agents accordingly for the objective.
Metadata: {metadata}
You are to output the goal for the two agents based on the task at hand.
For the goal, you must be specific and must include the position to go to and the action to perform. 
Example goal: Goto [1, 2] and Pickup flower, then Goto [3, 4] and Place flower
Example goal: Goto blue square at [3, 4]
You must plan such that the same agent picks up and places the same object.
You must not use quotation marks in the JSON output.''',
    user_prompt = '',
    output_format = {
"Thoughts": "Describe in plain text how to achieve the task using the two agents. No quotation marks allowed.",
"Goal 1": "Goal for Agent 1",
"Goal 2": "Goal for Agent 2"        
# "Goal 1": "Goal destination for Agent 1, for example [1, 2]"
# "Goal 2": "Goal destination for Agent 2, for example [1, 2]"
    },
verbose = True)
    
    memory1['Goal'] = res['Goal 1']
    memory2['Goal'] = res['Goal 2']
    
    # also override the existing list of next moves so agents can replan
    for memory in [memory1, memory2]:
        if 'List of Next Moves' in memory:
            del memory['List of Next Moves']

    return memory1, memory2

## Environment
- Uses 5 different iterations of grid size 5x5 to 20x20
- Ends at 100 turns
- 5 different agents
     + NoneAgent: Does nothing
     + RandomAgent: Does a random action
     + NearestCellAgent: Moves to nearest cell
     + NearestCellCoopAgent: Moves to nearest cell. If nearest cell is occupied, moves to another.
     + PlayerAgent: Player controls agent. Agent 1 is W, A, S, D. Agent 2 is Up, Down, Left, Right
     + LLMAgent: Uses environmental description to choose an action/list of actions
- Conversation: If one agent is an LLMAgent, have a central planner to assign goals to each agent

In [21]:
# where to get the images from
asset_folder = 'Assets'

# Initialize pygame
pygame.init()

# set the AI agents here
# we have NoneAgent, RandomAgent, NearestCellAgent, NearestCellCoopAgent, LLMAgent
ai_agent1 = LLMAgent
ai_agent2 = LLMAgent

# Constants
WIDTH, HEIGHT = 800, 800
BASE = 60 # this is for inventory
GRID_SIZE = random.randint(5, 8)
CELL_SIZE = min(WIDTH, HEIGHT) // (GRID_SIZE + 2)  # +2 for walls
RED = (255, 0, 0)
GREEN = (0, 255, 0)
GOLD = (255, 130, 0)
BLUE = (0, 0, 255)
WHITE = (255, 255, 255)
BLACK = (0, 0, 0)
WALL_COLOR = (50, 50, 50)
TOTAL_TURNS = 200

# Set up screen
screen = pygame.display.set_mode((WIDTH, HEIGHT + BASE))
pygame.display.set_caption('Two Actor Grid World')

# fill in number of squares here
NUM_BLUE = 2
NUM_RED = 2
NUM_GREEN = 2
task = "Place the keys to their corresponding coloured square"

class Actor:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        self.inventory = []
        self.pos = [self.x, self.y]
        
    def get_state(self):
        ''' Returns the state of the agent '''
        return {'pos': self.pos, 'inventory': self.inventory}

    def move(self, dx, dy):
        ''' Moves by a certain offset '''
        if 0 < self.x + dx < GRID_SIZE+1 and 0 < self.y + dy < GRID_SIZE+1:
            self.x += dx
            self.y += dy
        self.pos = [self.x, self.y]
        
    def pickup(self, obj, obj_dict):
        ''' Picks up an item at a certain position based on obj_dict. Must be in same position as obj '''
        # if no object provided, find an object
        if obj == '':
            for o, pos in obj_dict.items():
                if pos == self.pos:
                    obj = o
            if obj == '':
                return
        
        if obj in obj_dict:
            if obj_dict[obj] == self.pos:
                self.inventory.append(obj)
                del obj_dict[obj]
                
    def place(self, obj, obj_dict):
        ''' Places an item at a certain position based on obj. Must have obj in inventory '''
        # if no object provided, use first object in inventory
        if obj == '':
            if len(self.inventory) > 0:
                obj = self.inventory[0]
            else:
                return
        if obj in self.inventory:
            obj_dict[obj] = self.pos
            self.inventory.remove(obj)
        
    def goto(self, pos):
        self.x, self.y = pos[0], pos[1]
        self.pos = pos

def draw_grid():
    for y in range(GRID_SIZE + 2):
        for x in range(GRID_SIZE + 2):
            color = WHITE
            if x == 0 or x == GRID_SIZE+1 or y == 0 or y == GRID_SIZE+1:
                color = WALL_COLOR
            elif [x, y] in blue_squares:
                color = BLUE
            elif [x, y] in green_squares:
                color = GREEN
            elif [x, y] in red_squares:
                color = RED
                
            pygame.draw.rect(screen, color, pygame.Rect(x * CELL_SIZE, y * CELL_SIZE, CELL_SIZE, CELL_SIZE))
            pygame.draw.line(screen, BLACK, (x * CELL_SIZE, 0), (x * CELL_SIZE, HEIGHT))
            pygame.draw.line(screen, BLACK, (0, y * CELL_SIZE), (WIDTH, y * CELL_SIZE))

def draw_agents(image_dict, agent1, agent2):
    # pygame.draw.circle(screen, GOLD, (agent1.x * CELL_SIZE + CELL_SIZE // 2, agent1.y * CELL_SIZE + CELL_SIZE // 2), CELL_SIZE // 4)
    # pygame.draw.circle(screen, BLACK, (agent2.x * CELL_SIZE + CELL_SIZE // 2, agent2.y * CELL_SIZE + CELL_SIZE // 2), CELL_SIZE // 4)
    screen.blit(image_dict['agent1.png'], (agent1.x * CELL_SIZE, agent1.y * CELL_SIZE))
    screen.blit(image_dict['agent2.png'], (agent2.x * CELL_SIZE, agent2.y * CELL_SIZE))
    
def draw_objects(image_dict, obj_dict):
    if len(obj_dict) == 0:
        return
    for obj, pos in obj_dict.items():
        screen.blit(image_dict[obj+'.png'], (pos[0] * CELL_SIZE, pos[1] * CELL_SIZE))
            
stage = 1

while stage < 5:
    reward = 0
    memory1 = {}
    memory2 = {}
    obj_dict = {'red_key': [2, 2], 'blue_key': [3, 2], 'green_key': [4, 2]}
    
    GRID_SIZE = random.randint(5, 8)
    CELL_SIZE = min(WIDTH, HEIGHT) // (GRID_SIZE + 2)  # +2 for walls
    
    # Load images here
    image_dict = {}  # Dictionary to store images
    for filename in os.listdir(asset_folder):
        if filename.endswith(".png"):  # You can adjust the file extension as needed
            image_path = os.path.join(asset_folder, filename)
            image = pygame.image.load(image_path)
            image = pygame.transform.scale(image, (CELL_SIZE, CELL_SIZE))
            image_dict[filename] = image
        
    # Main game loop
    agent1 = Actor(1, 1)
    agent2 = Actor(GRID_SIZE, GRID_SIZE)
    
    # forms the random squares
    repeat = True
    while repeat:
        blue_squares = [[random.randint(2, GRID_SIZE), random.randint(2, GRID_SIZE)] for _ in range(NUM_BLUE)]
        green_squares = [[random.randint(2, GRID_SIZE), random.randint(2, GRID_SIZE)] for _ in range(NUM_GREEN)]
        red_squares = [[random.randint(2, GRID_SIZE), random.randint(2, GRID_SIZE)] for _ in range(NUM_RED)]
    
        # check sum of squares
        cur_squares = []
        repeat = False
        for square in blue_squares + green_squares + red_squares:
            if square in cur_squares:
                repeat = True   
            cur_squares.append(square)
            
    turns = 0
    running = True
    while running:
        ### AGENT LOGIC GOES HERE ###
        ## Replace these with relevant descriptions of the world
        # moves_with_description = [
        # {"Name": "Up", "Use Case": "used when row offset is negative, for example Offset of [0, -1] or [0, -2]",
        #  "Description": "move a step towards negative row direction", 
        #  # "Example": ["[1, 1]", "Up", "[1, 0]"]
        # },
        # {"Name": "Down",  "Use Case": "used when row offset is positive, for example Offset of [0, 1] or [0, 2]",
        #  "Description": "move a step towards positive row direction", 
        #  # "Example": ["[1, 1]", "Down", "[1, 2]"]
        # },
        # {"Name": "Left",  "Use Case": "used when col offset is negative, for example Offset of [-1, 0] or [-2, 0]",
        #  "Description": "move a step towards negative col direction", 
        #  # "Example": ["[1, 1]", "Left", "[0, 1]"]
        # },
        # {"Name": "Right",  "Use Case": "used when col offset is positive, for example Offset of [1, 0] or [2, 0]",
        #  "Description": "move a step towards positive col direction", 
        #  # "Example": ["[1, 1]", "Right", "[2, 1]"]
        # },
        # {"Name": "None",  "Use Case": "used when Offset is [0, 0]",
        #  "Description": "stay in current square", 
        #  # "Example": ["[1, 1]", "None", "[1, 1]"]
        # }]
        
        moves_with_description = [
        {"Name": "Goto <x> <y>", 
         "Use Case": "Used to navigate to destination cell [<x>, <y>]. You will move from current position to [<x>, <y>]",
         "Example": "Goto 3 4"},
        {"Name": "Pickup <object>", 
         "Use Case": "Used to pick up <object> from grid to your inventory.",
         "Constraint": "You must be in same cell as <object> to perform this move. You cannot pick up squares.",
         "Example": "Pickup flower"},
        {"Name": "Place <object>", "Use Case": "Used to place <object> from inventory to the grid.",
         "Constraint": "You must have <object> in inventory to perform this move. You cannot place squares.",
         "Example": "Place flower"},
        {"Name": "None", "Use Case": "Used when goal is achieved and nothing needs to be done"}]
        
        world_description = '''The grid coordinates are given as [col, row]. 
The two agents have to accomplish a given task cooperatively.
At each time step, each agent must only choose from one of the available moves from valid_moves. 
Do not invent your own moves.
Note that not all available moves are useful to solve the task.
There are three objects red_key, blue_key and green_key and three types of squares red_squares, blue_squares and green_squares.
Each agent has a separate inventory, and the agent must have an object in inventory to place it.
Note that you cannot pickup or place squares. Stepping on the square equates to being on the square'''
        
        metadata = {'world_description': world_description, 'task': task, 
                    'agent1': agent1.get_state(), 
                    'agent2': agent2.get_state(),
                    'objects not in inventory': obj_dict,
                    'blue_squares': blue_squares.copy(), 'green_squares': green_squares.copy(), 
                    'red_squares': red_squares.copy(), 'grid_size': GRID_SIZE + 2, 
                    'valid_moves': moves_with_description}
        
        moved = False
        win_condition = False
        # have conversation between two AI actors (only when one is an LLMAgent)
        # store the coordinated goals for each agent in that memory
        controllable_agents = ''
        if ai_agent1 == LLMAgent:
            controllable_agents += ' agent1'
        if ai_agent2 == LLMAgent:
            controllable_agents += ' agent2'
        # do top-level planning when there is no current list of next moves for any agents every 10 turns
        # if (ai_agent1 == LLMAgent and 'List of Next Moves' not in memory1) \
        # or (ai_agent2 == LLMAgent and 'List of Next Moves' not in memory2):
        if turns%10 == 0:
            memory1, memory2 = Conversation(memory1, memory2, metadata, controllable_agents, turns)
        
        action1, action2 = '', ''
        # do AI actions over here
        action1 = ai_agent1(metadata, 'agent1', memory1)
        action2 = ai_agent2(metadata, 'agent2', memory2)
        
        for event in pygame.event.get():
            if event.type == pygame.KEYDOWN:
                if event.key == pygame.K_a:  # Agent 1
                    action1 = 'Left'
                elif event.key == pygame.K_w:
                    action1 = 'Up'
                elif event.key == pygame.K_s:
                    action1 = 'Down'
                elif event.key == pygame.K_d:
                    action1 = 'Right'
                elif event.key == pygame.K_e:
                    action1 = 'Pickup'
                elif event.key == pygame.K_r:
                    action1 = 'Place'
                if event.key == pygame.K_UP:  # Agent 2
                    action2 = 'Up'
                elif event.key == pygame.K_DOWN:
                    action2 = 'Down'
                elif event.key == pygame.K_LEFT:
                    action2 = 'Left'
                elif event.key == pygame.K_RIGHT:
                    action2 = 'Right'
                elif event.key == pygame.K_n:
                    action2 = 'Pickup'
                elif event.key == pygame.K_m:
                    action2 = 'Place'
                # manually determine if the environment is won, if we don't already have it coded
                elif event.key == pygame.K_n:
                    win_condition = True
        
        # print('Selected moves for each agent:', action1, action2)
        
        for action, agent in [(action1, agent1), (action2, agent2)]:
            if action != '':
                moved = True
                if action == 'Left':  # Agent 1
                    agent.move(-1, 0)
                elif action == 'Up':
                    agent.move(0, -1)
                elif action == 'Down':
                    agent.move(0, 1)
                elif action == 'Right':
                    agent.move(1, 0)
                elif action == 'Pickup':
                    agent.pickup('', obj_dict)
                elif action == 'Place':
                    agent.place('', obj_dict)
                    
        # if composite action, process it here
        for action, agent in [(action1, agent1), (action2, agent2)]:
            if ' ' in action:
                move, *param = action.split(' ')
                
                # if there is only one parameter, remove it from list
                if len(param) == 1: param = param[0]
                
                if move == 'Goto':
                    newpos = [int(x) for x in param]
                    agent.goto(newpos)
                    
                if move == 'Pickup':
                    agent.pickup(param, obj_dict)
                    
                if move == 'Place':
                    agent.place(param, obj_dict)
                    
        # if both agents choose None, that is the end of the turn too
        if action1 == 'None' and action2 == 'None': moved = True

        # Change this here for the win condition. Make sure it matches with the instruction
        win_condition = \
        'blue_key' in obj_dict and obj_dict['blue_key'] in blue_squares \
        and 'green_key' in obj_dict and obj_dict['green_key'] in green_squares \
        and 'red_key' in obj_dict and obj_dict['red_key'] in red_squares
        # and agent1.pos in blue_squares and agent2.pos in blue_squares and agent1.pos != agent2.pos
        if win_condition:
            reward = 1
            running = False
            
        ### Display Screen ###
        screen.fill(WHITE)

        draw_grid()
        draw_agents(image_dict, agent1, agent2)
        draw_objects(image_dict, obj_dict)

        # Displaying texts
        font = pygame.font.Font(None, 30)
        font2 = pygame.font.Font(None, 15)
        stage_text = font.render(f"Stage: {stage}, Turns: {turns}, Last Actions: {action1}, {action2}", True, RED)
        task_text = font.render(task, True, WHITE)
        screen.blit(task_text, (WIDTH // 2 - task_text.get_width() // 2, 10))
        screen.blit(stage_text, (WIDTH // 2 - stage_text.get_width() // 2, 40))
             
        if 'Goal' in memory1:
            mem1 = re.split(r',(?![^\[]*\])', memory1['Goal'])[0]
            text = f"Agent 1's Goal: {mem1}"
            text = font.render(text, True, GOLD)
            screen.blit(text, (WIDTH // 2 - text.get_width() // 2, 70))  
        if 'List of Next Moves' in memory1:
            text = f"Agent 1's List of Next Moves: {memory1['List of Next Moves'][:3]}"
            text = font.render(text, True, GOLD)
            screen.blit(text, (WIDTH // 2 - text.get_width() // 2, 100))
       
        if 'Goal' in memory2:
            mem2 = re.split(r',(?![^\[]*\])', memory2['Goal'])[0]
            text = f"Agent 2's Goal: {mem2}"
            text = font.render(text, True, BLACK)
            screen.blit(text, (WIDTH // 2 - text.get_width() // 2, 130))
        if 'List of Next Moves' in memory2:
            text = f"Agent 2's List of Next Moves: {memory2['List of Next Moves'][:3]}"
            text = font.render(text, True, BLACK)
            screen.blit(text, (WIDTH // 2 - text.get_width() // 2, 160))
            
        # display the inventory
        text = f"Agent 1's Inventory: {agent1.inventory}"
        text = font.render(text, True, BLACK)
        screen.blit(text, (0, HEIGHT))
        text = f"Agent 2's Inventory: {agent2.inventory}"
        text = font.render(text, True, BLACK)
        screen.blit(text, (0, HEIGHT + 30))
       

        # have a slight delay
        time.sleep(0.5)
        pygame.display.flip()

        if moved:
            turns += 1
        if turns > TOTAL_TURNS:
            running = False

    if reward == 1:
        print(f'Completed Stage {stage}!')
    else:
        print(f'Did not Complete {stage}!')
        
    stage += 1
    
pygame.quit()
try:
    sys.exit()
except SystemExit:
    pass

System prompt: You are a central planner for agents.
You can only control  agent1 agent2. 
If you cannot control an agent, you can only infer their desired goal, and replan the other agents accordingly for the objective.
Metadata: {'world_description': 'The grid coordinates are given as [col, row]. \nThe two agents have to accomplish a given task cooperatively.\nAt each time step, each agent must only choose from one of the available moves from valid_moves. \nDo not invent your own moves.\nNote that not all available moves are useful to solve the task.\nThere are three objects red_key, blue_key and green_key and three types of squares red_squares, blue_squares and green_squares.\nEach agent has a separate inventory, and the agent must have an object in inventory to place it.\nNote that you cannot pickup or place squares. Stepping on the square equates to being on the square', 'task': 'Place the keys to their corresponding coloured square', 'agent1': {'pos': [1, 1], 'inventory': []}, 'a

KeyboardInterrupt: 