My Observations:
**Gpt Nano against Claude:**
1. Both commit only if given "u are cooperative agent" system prompt
2. With rational agent prompt only Claude commits and loses out. Likely because Claude Haiku is smarter..and it makes it trust GPT is smart and lose out 🥲
3. Commitment cost does not affect these behaviours.

**Claude against Claude:**
1. Both commit with rational agent system prompts
2. Commitment cost does not affect these behaviours.

**Theoretical Questions:**
1. Which strategy is dominant without commitments?

Defect (D) is the dominant strategy for both agents.

2. How does enforcing C-C when both commit change that?

Committing guarantees each agent gets payoff of 3 instead of risking exploitation or defecting and just getting 1.

3. What if breaking a commitment only pays P=2 instead of P=1?

This reduces the benefit of commitment, since agents are less afraid of ending up at (D, D). Depending on agent's risk preference they may be less likely to commit.

4. Bonus: Add an upfront “commitment cost” (e.g. –0.5) and see how it affects willingness to commit

It should still be rational for agents to commit.

In [1]:
!pip install python-dotenv --quiet
!pip install openai anthropic --quiet

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/292.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━[0m [32m174.1/292.9 kB[0m [31m5.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m292.9/292.9 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import openai
import anthropic
import random
import time
from datasets import load_dataset
import pandas as pd
import os
import sys
import warnings
from pathlib import Path
import logging
from anthropic import Anthropic
from dotenv import load_dotenv
from openai import OpenAI
from tqdm import tqdm
import itertools

In [34]:
# Payoff constants (you, other)
R, T, P, S = 3, 5, 1, 0
payoffs = {
    ('C','C'):(R,R),
    ('C','D'):(S,T),
    ('D','C'):(T,S),
    ('D','D'):(P,P)
}

# Regular Prisoner's Dilemma play
def play(a1, a2):
    return payoffs[(a1, a2)]

# Play with commitment
def play_with_commit(c1, c2):
    if c1 and c2:
        return payoffs[('C', 'C')]
    else:
        return payoffs[('D', 'D')]

def play_with_commit_enforced(c1, c2, cost=-0.5):
    """Both commit → enforce (C,C), apply cost; else → (D,D)"""
    if c1 and c2:
        base = payoffs[('C', 'C')]
        return (base[0] - cost, base[1] - cost)
    else:
        return payoffs[('D', 'D')]

actions = [('C','C'), ('C','D'), ('D','C'), ('D','D')]

print("Normal Prisoner's Dilemma Outcomes:")
for a1, a2 in actions:
    print(f"play({a1}, {a2}) → {play(a1, a2)}")

print("\nWith Commitment Outcomes:")
for c1 in (False, True):
    for c2 in (False, True):
        print(f"commit({c1}, {c2}) → {play_with_commit(c1, c2)}")


Normal Prisoner's Dilemma Outcomes:
play(C, C) → (3, 3)
play(C, D) → (0, 5)
play(D, C) → (5, 0)
play(D, D) → (1, 1)

With Commitment Outcomes:
commit(False, False) → (1, 1)
commit(False, True) → (1, 1)
commit(True, False) → (1, 1)
commit(True, True) → (3, 3)


In [14]:
IN_COLAB = "google.colab" in sys.modules

if IN_COLAB:
    from google.colab import output, userdata

    for key in ["OPENAI", "ANTHROPIC"]:
        try:
            os.environ[f"{key}_API_KEY"] = userdata.get(f"{key}_API_KEY")
        except:
            warnings.warn(
                f"You don't have a '{key}_API_KEY' variable set in the secrets tab of your google colab. You have to set one, or calls to the {key} API won't work."
            )

In [15]:
assert os.getenv("OPENAI_API_KEY") is not None
assert os.getenv("ANTHROPIC_API_KEY") is not None

# OPENAI_API_KEY

openai_client = OpenAI()
anthropic_client = Anthropic()

In [44]:
def commitment_prompt():
    return (
        "You are playing a one-shot Prisoner's Dilemma with another agent.\n"
        "You can choose to COMMIT to cooperating before the game.\n\n"
        "Game Payoffs:\n"
        "- Both cooperate: 3 points each\n"
        "- One defects, other cooperates: defector gets 5, cooperator gets 0\n"
        "- Both defect: 1 point each\n\n"
        "If BOTH players commit, the system will ENFORCE cooperation (C, C), giving both 3 points.\n"
        "If either player does not commit, the game plays normally and defection is possible.\n\n"
        "This is a simultaneous game. You do NOT know what the other will choose.\n"
        "Would you like to commit to cooperation?\n"
        "Reply ONLY with: True or False"
    )

def action_prompt():
    return (
        "You are now playing a one-shot Prisoner's Dilemma.\n"
        "Choose C to cooperate or D to defect. Reply ONLY with C or D.\n\n"
        "Payoffs:\n"
        "- C/C = (3,3)\n"
        "- C/D = (0,5)\n"
        "- D/C = (5,0)\n"
        "- D/D = (1,1)"
    )

chatgpt_system_prompt = (
    "You are ChatGPT, a rational agent."
)
claude_system_prompt = (
    "You are Claude A, a rational agent."
)

claudeB_system_prompt = (
    "You are Claude B, a rational agent."
)


def ask_claude(prompt):
    response = anthropic_client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=10,
        messages=[{"role": "user", "content": prompt}],
        system=claude_system_prompt
    )
    return response.content[0].text.strip()

def ask_chatgpt(prompt):
    response = openai_client.chat.completions.create(
        model="gpt-4.1-nano",
        messages=[
            {"role": "system", "content": chatgpt_system_prompt},
            {"role": "user", "content": prompt}
        ],
        max_tokens=10,
        temperature=0
    )
    return response.choices[0].message.content.strip()

def ask_claudeB(prompt):
    response = anthropic_client.messages.create(
        model="claude-3-haiku-20240307",
        max_tokens=10,
        messages=[{"role": "user", "content": prompt}],
        system=claudeB_system_prompt
    )
    return response.content[0].text.strip()


def run_game_with_commit(cost=0.0):
    print("\n=== COMMITMENT GAME ===")

    # Ask for commitments
    claude_commit = ask_claude(commitment_prompt())
    chatgpt_commit = ask_chatgpt(commitment_prompt())
    print(f"Claude commits: {claude_commit}")
    print(f"ChatGPT commits: {chatgpt_commit}")

    c1 = claude_commit.strip().lower().startswith("t")
    c2 = chatgpt_commit.strip().lower().startswith("t")

    if c1 and c2:
        # Enforce cooperation with cost
        print("→ Both committed: Enforcing (C, C)")
        a1 = a2 = 'C'
        result = play_with_commit_enforced(c1, c2, cost)
    else:
        a1 = ask_claude(action_prompt()).upper()
        a2 = ask_chatgpt(action_prompt()).upper()
        result = play(a1, a2)

    print(f"Claude played: {a1}")
    print(f"ChatGPT played: {a2}")
    print(f"Result → Claude: {result[0]}, ChatGPT: {result[1]}\n")



def run_game_with_commitClaudeVsClaude(cost=0.0):
    print("\n=== COMMITMENT GAME ===")

    # Ask for commitments
    claude_commit = ask_claude(commitment_prompt())
    claudeB_commit = ask_claudeB(commitment_prompt())
    print(f"Claude A commits: {claude_commit}")
    print(f"Claude B commits: {claudeB_commit}")

    c1 = claude_commit.strip().lower().startswith("t")
    c2 = claudeB_commit.strip().lower().startswith("t")

    if c1 and c2:
        # Enforce cooperation with cost
        print("→ Both committed: Enforcing (C, C)")
        a1 = a2 = 'C'
        result = play_with_commit_enforced(c1, c2, cost)
    else:
        a1 = ask_claude(action_prompt()).upper()
        a2 = ask_chatgpt(action_prompt()).upper()
        result = play(a1, a2)

    print(f"Claude A played: {a1}")
    print(f"Claude B played: {a2}")
    print(f"Result → Claude A: {result[0]}, Claude B: {result[1]}\n")




In [30]:
# Setup where only Claude has "cooperative agent" system prompt
if __name__ == "__main__":
    run_game_with_commit(cost=0.0)  # Set cost > 0.0 to add a commitment penalty



=== COMMITMENT GAME ===
Claude commits: True
ChatGPT commits: False
Claude played: C
ChatGPT played: D
Result → Claude: 0, ChatGPT: 5



In [36]:
# Both now have "cooperative agent" system prompts

if __name__ == "__main__":
    run_game_with_commit(cost=0.0)


=== COMMITMENT GAME ===
Claude commits: True
ChatGPT commits: True
→ Both committed: Enforcing (C, C)
Claude played: C
ChatGPT played: C
Result → Claude: 3.0, ChatGPT: 3.0



In [37]:
if __name__ == "__main__":
    run_game_with_commit(cost=-0.5)


=== COMMITMENT GAME ===
Claude commits: True
ChatGPT commits: True
→ Both committed: Enforcing (C, C)
Claude played: C
ChatGPT played: C
Result → Claude: 3.5, ChatGPT: 3.5



In [41]:
# Both now have "rational agent" system prompts
if __name__ == "__main__":
    run_game_with_commit(cost=0.0)



=== COMMITMENT GAME ===
Claude commits: True
ChatGPT commits: False
Claude played: D
ChatGPT played: D
Result → Claude: 1, ChatGPT: 1



In [42]:
# Both now have "rational agent" system prompts
if __name__ == "__main__":
    run_game_with_commit(cost=-0.5)


=== COMMITMENT GAME ===
Claude commits: True
ChatGPT commits: False
Claude played: D
ChatGPT played: D
Result → Claude: 1, ChatGPT: 1



In [46]:
# Play claude against claude
if __name__ == "__main__":
  run_game_with_commitClaudeVsClaude(cost= 0.0)


=== COMMITMENT GAME ===
Claude A commits: True
Claude B commits: True
→ Both committed: Enforcing (C, C)
Claude A played: C
Claude B played: C
Result → Claude A: 3.0, Claude B: 3.0



In [49]:
# Play claude against claude with cost
if __name__ == "__main__":
  run_game_with_commitClaudeVsClaude(cost= -0.5)


=== COMMITMENT GAME ===
Claude A commits: True
Claude B commits: True
→ Both committed: Enforcing (C, C)
Claude A played: C
Claude B played: C
Result → Claude A: 3.5, Claude B: 3.5

