# Vieira on Tracking Shuffled Objects (Big-Bench)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/vieira-artifact/vieira-artifact-aaai24/blob/main/tracking_shuffled_objects_main.ipynb)

In this notebook we explore using Vieira to solve the [tracking shuffled objects task](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/tracking_shuffled_objects) from Google's BIG-bench benchmark.

In [None]:
# Checking python version
!python --version

Python 3.10.12


# Download Vieira

In [None]:
# Download and install Vieira
# The default python version is for 3.10, you may change the link according to your python versions.
!wget https://github.com/vieira-artifact/vieira-artifact-aaai24/releases/download/v0.2.2/vieira-0.2.2-cp310-cp310-manylinux_2_31_x86_64.whl
!wget https://github.com/vieira-artifact/vieira-artifact-aaai24/releases/download/v0.2.2/vieira_ext-0.2.2-py3-none-any.whl
!wget https://github.com/vieira-artifact/vieira-artifact-aaai24/releases/download/v0.2.2/vieira_gpu-0.0.1-py3-none-any.whl
!wget https://github.com/vieira-artifact/vieira-artifact-aaai24/releases/download/v0.2.2/vieira_gpt-0.0.1-py3-none-any.whl
!pip install vieira-0.2.2-cp310-cp310-manylinux_2_31_x86_64.whl
!pip install vieira_ext-0.2.2-py3-none-any.whl
!pip install vieira_gpu-0.0.1-py3-none-any.whl
!pip install vieira_gpt-0.0.1-py3-none-any.whl

--2023-11-06 00:48:06--  https://github.com/vieira-artifact/vieira-artifact-aaai24/releases/download/v0.2.2/vieira-0.2.2-cp310-cp310-manylinux_2_31_x86_64.whl
Resolving github.com (github.com)... 140.82.114.3
Connecting to github.com (github.com)|140.82.114.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://objects.githubusercontent.com/github-production-release-asset-2e65be/713596004/b5c94623-4ec7-46ab-b2c7-52c18f0951f6?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20231106%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20231106T004806Z&X-Amz-Expires=300&X-Amz-Signature=e4a7a90e8d45036299d1ba715d11ce81a334c790b6dc4714d54b370fcee1c111&X-Amz-SignedHeaders=host&actor_id=0&key_id=0&repo_id=713596004&response-content-disposition=attachment%3B%20filename%3Dvieira-0.2.2-cp310-cp310-manylinux_2_31_x86_64.whl&response-content-type=application%2Foctet-stream [following]
--2023-11-06 00:48:06--  https://objects.githubusercontent.com/gi

# Import Vieira

In [None]:
# Import Vieira and related plugins
import vieira
import vieira_ext

# Setup Vieira plugins

In this example, only GPU and GPT plugins are needed.

In [None]:
# Add your OpenAI API key if you want to run the example
import os
os.environ['OPENAI_API_KEY'] = "YOUR-OPENAI-API-KEY-HERE"

In [None]:
# Configure Vieira plugins
import argparse
plugins = vieira_ext.PluginRegistry()

parser = argparse.ArgumentParser()
plugins.setup_argument_parser(parser)
known_args, unknown_args = parser.parse_known_args()
plugins.configure(known_args, unknown_args)

# Vieira program

In [None]:
# Setup tracking shuffled objects context
ctx = vieira.Context()
plugins.load_into_ctx(ctx)

ctx.add_program("""
type question(question: String)

@gpt_extract_info(
  header="Please extract the relationships from the provided question.",
  prompts=[
    "What are the initial possessions in JSON format? (use 'person' and 'object' as fields)",
    "What are the swaps in JSON format?",
    "Who is the goal in JSON format?"
  ],
  examples=[
    (
      ["Alice, Bob, and Claire are playing a game. At the start of the game, they are each holding a ball: Alice has a orange ball, Bob has a white ball, and Claire has a blue ball. As the game progresses, pairs of players trade balls. First, Alice and Bob swap balls. Then, Bob and Claire swap balls. Finally, Alice and Bob swap balls. At the end of the game, Alice has the"],
      [
        [("Alice", "orange ball"), ("Bob", "white ball"), ("Claire", "blue ball")],
        [("1", "Alice", "Bob"), ("2", "Bob", "Claire"), ("3", "Alice", "Bob")],
        [("Alice")]
      ]
    )
  ],
  model="gpt-4",
  debug=true,
)
type extract_possessions (bound question: String, person: String, object: String),
     extract_swaps       (bound question: String, time: i32, person_a: String, person_b: String),
     extract_goal        (bound question: String, goal: String),

rel possessions(1, person, object) = question(question) and extract_possessions(question, person, object)
rel swaps(time, p1, p2) = question(question) and extract_swaps(question, time, p1, p2)
rel goal(person) = question(question) and extract_goal(question, person)

rel swaps(time, p1, p2) = swaps(time, p2, p1)
rel possessions(t + 1, p1, object) = swaps(t, p1, p2) and possessions(t, p2, object)
rel possessions(t + 1, p1, object) = swaps(t, _, _) and not swaps(t, p1, _) and possessions(t, p1, object)

rel final_time(t + 1) = t := max(t: swaps(t, _, _))
rel answer(object) = goal(person) and possessions(t, person, object) and final_time(t)
""")

Here we include a few samples (one for each difficulty level $n=3,5,7$) for demonstration purpose.
Please feel free to modify data-points here or upload the full version of the dataset which can be collected from [here](https://github.com/google/BIG-bench/tree/main/bigbench/benchmark_tasks/tracking_shuffled_objects) as `task.json`.

In [None]:
# Sample tracking shuffled objects data (upload your own data into task.json)
sample = [
    {
      "input": "Alice, Bob, and Claire are holding a white elephant gift exchange. At the start of the event, they are each holding a present of a different color: Alice has a green present, Bob has a blue present, and Claire has a brown present. \n\nAs the event progresses, pairs of people swap gifts. First, Claire and Alice swap their gifts. Then, Claire and Bob swap their gifts. Finally, Alice and Claire swap their gifts. At the end of the event, Claire has the ",
      "target_scores": {
        "green present.": 0,
        "blue present.": 0,
        "brown present.": 1
      }
    },
    {
      "input": "Alice, Bob, Claire, Dave, and Eve are friends and avid readers who occasionally trade books. At the start of the semester, they each buy one new book: Alice gets Hound of the Baskervilles, Bob gets The Fellowship of the Ring, Claire gets Ulysses, Dave gets Lolita, and Eve gets Moby Dick. \n\nAs the semester proceeds, they start trading around the new books. First, Bob and Alice swap books. Then, Claire and Eve swap books. Then, Claire and Dave swap books. Then, Bob and Dave swap books. Finally, Claire and Eve swap books. At the end of the semester, Eve has  ",
      "target_scores": {
        "Hound of the Baskervilles.": 0,
        "The Fellowship of the Ring.": 0,
        "Ulysses.": 0,
        "Lolita.": 1,
        "Moby Dick.": 0
      }
    },
    {
      "input": "Alice, Bob, Claire, Dave, Eve, Fred, and Gertrude are playing a game. At the start of the game, they are each holding a ball: Alice has a green ball, Bob has a white ball, Claire has a yellow ball, Dave has a pink ball, Eve has a orange ball, Fred has a black ball, and Gertrude has a brown ball. \n\nAs the game progresses, pairs of players trade balls. First, Bob and Gertrude swap balls. Then, Fred and Claire swap balls. Then, Dave and Gertrude swap balls. Then, Bob and Gertrude swap balls. Then, Alice and Claire swap balls. Then, Gertrude and Claire swap balls. Finally, Eve and Claire swap balls. At the end of the game, Alice has the ",
      "target_scores": {
        "green ball.": 0,
        "white ball.": 0,
        "yellow ball.": 0,
        "pink ball.": 0,
        "orange ball.": 0,
        "black ball.": 1,
        "brown ball.": 0
      }
    }
]

import json
with open("task.json", "w") as outfile:
    outfile.write(json.dumps(sample.copy(), indent=2))

# Running the Experiment!

Please checkout the log to get details of experimental results.

In [None]:
# Evaluate tracking shuffled objects task
import sys
from io import StringIO
from tqdm import tqdm

out = {"score": 0, "data": []}
pbar = tqdm(json.load(open("task.json")))

for example in pbar:
    # Capture GPT logs
    buffer = StringIO()
    sys.stdout = buffer

    try:
        # Pass example into context
        temp_ctx = ctx.clone()
        temp_ctx.add_facts("question", [(example["input"],)])
        temp_ctx.run()

        # Parse and score output
        res = list(temp_ctx.relation("answer"))[0][0]

        score = 0
        final_answer = ""
        for answer in example["target_scores"]:
            if res in answer:
                final_answer = answer
                score = example["target_scores"][answer]
                break

        # Log output
        out["score"] += score
        out["data"] += [
            {
                "question": example["input"],
                "final_answer": final_answer,
                "score": score,
                "possessions": list(temp_ctx.relation("possessions")),
                "swaps": list(temp_ctx.relation("swaps")),
                "goal": list(temp_ctx.relation("goal")),
                "answer": list(temp_ctx.relation("answer")),
                "gpt-logs": buffer.getvalue().encode("utf-8").decode("unicode_escape"),
            }
        ]

    except Exception as e:
        out["data"] += [
            {
                "question": example["input"],
                "exception": str(e),
                "score": 0,
                "gpt-logs": buffer.getvalue().encode("utf-8").decode("unicode_escape"),
            }
        ]

    pbar.set_postfix({"score": out["score"]})

100%|██████████| 3/3 [00:42<00:00, 14.08s/it, score=3]


In [None]:
out

{'score': 3,
 'data': [{'question': 'Alice, Bob, and Claire are holding a white elephant gift exchange. At the start of the event, they are each holding a present of a different color: Alice has a green present, Bob has a blue present, and Claire has a brown present. \n\nAs the event progresses, pairs of people swap gifts. First, Claire and Alice swap their gifts. Then, Claire and Bob swap their gifts. Finally, Alice and Claire swap their gifts. At the end of the event, Claire has the ',
   'final_answer': 'brown present.',
   'score': 1,
   'possessions': [(1, 'Alice', 'green present'),
    (1, 'Bob', 'blue present'),
    (1, 'Claire', 'brown present'),
    (2, 'Alice', 'brown present'),
    (2, 'Bob', 'blue present'),
    (2, 'Claire', 'green present'),
    (3, 'Alice', 'brown present'),
    (3, 'Bob', 'green present'),
    (3, 'Claire', 'blue present'),
    (4, 'Alice', 'blue present'),
    (4, 'Bob', 'green present'),
    (4, 'Claire', 'brown present')],
   'swaps': [(1, 'Alice', '