<a href="https://colab.research.google.com/github/tanvircr7/meh/blob/master/Basic_Prisonners_Dilemma.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
! pip install langchain_community tiktoken langchain-openai langchainhub chromadb langchain langgraph tavily-python

Collecting langchain_community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting langchain-openai
  Downloading langchain_openai-0.3.18-py3-none-any.whl.metadata (2.3 kB)
Collecting langchainhub
  Downloading langchainhub-0.1.21-py3-none-any.whl.metadata (659 bytes)
Collecting chromadb
  Downloading chromadb-1.0.12-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.9 kB)
Collecting langgraph
  Downloading langgraph-0.4.7-py3-none-any.whl.metadata (6.8 kB)
Collecting tavily-python
  Downloading tavily_python-0.7.3-py3-none-any.whl.metadata (7.0 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain_community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain_community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain_community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.m

In [5]:

from enum import Enum
from abc import ABC, abstractmethod
from typing import List, Tuple, Dict
import random

class Action(Enum):
    COOPERATE = "C"  # Work together for mutual benefit (stay silent with other thief)
    DEFECT = "D"     # Betray the other player for personal gain (snitch on other thief)

class PayoffMatrix:
    """
    Represents the payoff matrix for the Prisoner's Dilemma.

    Design choice: Using Enum instead of raw strings makes this more robust.
    The type system can now catch errors at development time, and IDE autocomplete
    will help prevent typos. Using Enum as dictionary keys is also more efficient.
    """

    def __init__(self, matrix: Dict[Tuple[Action, Action], Tuple[int, int]]):
        """
        matrix format: {(player1_action, player2_action): (player1_payoff, player2_payoff)}
        Now using Action enum instead of strings for type safety.
        """
        self.matrix = matrix
        self.validate_matrix()

    def validate_matrix(self):
        """Ensure the matrix has all required combinations"""
        required_combinations = [
            (Action.COOPERATE, Action.COOPERATE),
            (Action.COOPERATE, Action.DEFECT),
            (Action.DEFECT, Action.COOPERATE),
            (Action.DEFECT, Action.DEFECT)
        ]
        for combo in required_combinations:
            if combo not in self.matrix:
                raise ValueError(f"Missing combination {combo} in payoff matrix")

    def get_payoffs(self, action1: Action, action2: Action) -> Tuple[int, int]:
        """
        Get payoffs for both players given their actions.

        Design choice: Method signature now enforces that only Action enum values
        can be passed, preventing runtime errors from invalid strings.
        """
        return self.matrix[(action1, action2)]

    def display(self):
        """Display the payoff matrix in a readable format"""
        print("\nPayoff Matrix (Player1, Player2):")
        print("In Prisoner's Dilemma context:")
        print("COOPERATE = Stay silent (cooperate with other thief)")
        print("DEFECT = Snitch on the other thief")
        print("\n           Player2")
        print("         COOP   DEFECT")
        coop_coop = self.matrix[(Action.COOPERATE, Action.COOPERATE)]
        coop_def = self.matrix[(Action.COOPERATE, Action.DEFECT)]
        def_coop = self.matrix[(Action.DEFECT, Action.COOPERATE)]
        def_def = self.matrix[(Action.DEFECT, Action.DEFECT)]

        print(f"Player1 COOP  {coop_coop}   {coop_def}")
        print(f"        DEF   {def_coop}   {def_def}")


In [6]:

class Agent(ABC):
    """
    Abstract base class for all agents.

    Design choice: Now all agent methods work with Action enum instead of strings.
    This makes the interface much cleaner and less error-prone. The type hints
    make it clear what each method expects and returns.
    """

    def __init__(self, name: str):
        self.name = name
        self.history: List[Action] = []  # Track opponent's actions
        self.my_actions: List[Action] = []  # Track my own actions
        self.total_payoff = 0

    @abstractmethod
    def decide(self) -> Action:
        """
        Return Action.COOPERATE or Action.DEFECT

        Design choice: Return type is now explicitly Action enum, making it
        impossible to accidentally return invalid values.
        """
        pass

    def update_history(self, opponent_action: Action, my_action: Action, payoff: int):
        """
        Update the agent's knowledge after each round.

        Design choice: Parameters are typed as Action enum, ensuring type safety
        throughout the entire game flow.
        """
        self.history.append(opponent_action)
        self.my_actions.append(my_action)
        self.total_payoff += payoff

    def reset(self):
        """Reset agent's history and payoff"""
        self.history = []
        self.my_actions = []
        self.total_payoff = 0

class AlwaysCooperateAgent(Agent):
    """
    Agent that always cooperates (stays silent).

    Design choice: Now returns Action.COOPERATE instead of string 'C'.
    This makes the code self-documenting and type-safe.
    """

    def decide(self) -> Action:
        return Action.COOPERATE

class AlwaysDefectAgent(Agent):
    """
    Agent that always defects (snitches).

    Design choice: Returns Action.DEFECT - much clearer intent than 'D' string.
    """

    def decide(self) -> Action:
        return Action.DEFECT

class TitForTatAgent(Agent):
    """
    Agent that cooperates first, then copies opponent's last move.

    Design choice: Logic is now much clearer since we're working with enum values
    rather than checking string equality. The comparison `== Action.COOPERATE` is
    more readable than `== 'C'`.
    """

    def decide(self) -> Action:
        if not self.history:  # First move
            return Action.COOPERATE
        return self.history[-1]  # Copy opponent's last move

class RandomAgent(Agent):
    """
    Agent that makes random decisions.

    Design choice: Using random.choice() with enum values is much cleaner
    than random number generation + string mapping.
    """

    def __init__(self, name: str, cooperation_probability: float = 0.5):
        super().__init__(name)
        self.cooperation_probability = cooperation_probability

    def decide(self) -> Action:
        if random.random() < self.cooperation_probability:
            return Action.COOPERATE
        else:
            return Action.DEFECT

class GrudgeAgent(Agent):
    """
    Agent that cooperates until opponent defects once, then always defects.

    Design choice: Checking `Action.DEFECT in self.history` is much more
    readable than checking for 'D' strings. The intent is crystal clear.
    """

    def __init__(self, name: str):
        super().__init__(name)
        self.been_betrayed = False

    def decide(self) -> Action:
        if Action.DEFECT in self.history:
            self.been_betrayed = True
        return Action.DEFECT if self.been_betrayed else Action.COOPERATE

    def reset(self):
        super().reset()
        self.been_betrayed = False

class GenerousAgent(Agent):
    """
    Agent that follows tit-for-tat but occasionally forgives defections.

    Design choice: Added this new agent to showcase how enum makes it easy
    to add complex strategies. The logic is very readable with enum comparisons.
    """

    def __init__(self, name: str, forgiveness_rate: float = 0.1):
        super().__init__(name)
        self.forgiveness_rate = forgiveness_rate

    def decide(self) -> Action:
        if not self.history:
            return Action.COOPERATE

        last_opponent_action = self.history[-1]

        # If opponent cooperated, we cooperate
        if last_opponent_action == Action.COOPERATE:
            return Action.COOPERATE

        # If opponent defected, sometimes forgive (cooperate anyway)
        if random.random() < self.forgiveness_rate:
            return Action.COOPERATE

        # Otherwise, retaliate
        return Action.DEFECT


In [7]:
class Game:
    """
    Main game engine that runs the Prisoner's Dilemma simulation.

    Design choice: All game logic now works with Action enum. This makes
    the code much more maintainable and prevents runtime errors from typos.
    """

    def __init__(self, payoff_matrix: PayoffMatrix):
        self.payoff_matrix = payoff_matrix
        self.results = []

    def play_round(self, agent1: Agent, agent2: Agent) -> Dict:
        """
        Play a single round between two agents.

        Design choice: Return dictionary now contains Action enum values instead
        of strings, making it easier to process results programmatically.
        """
        action1 = agent1.decide()
        action2 = agent2.decide()

        payoff1, payoff2 = self.payoff_matrix.get_payoffs(action1, action2)

        # Update agents with the results
        agent1.update_history(action2, action1, payoff1)
        agent2.update_history(action1, action2, payoff2)

        return {
            'round': len(agent1.my_actions),
            'agent1_action': action1,
            'agent2_action': action2,
            'agent1_payoff': payoff1,
            'agent2_payoff': payoff2,
            'agent1_name': agent1.name,
            'agent2_name': agent2.name
        }

    def play_game(self, agent1: Agent, agent2: Agent, rounds: int = 10) -> List[Dict]:
        """
        Play multiple rounds between two agents.

        Design choice: Using a mapping dictionary to convert enum to readable strings
        for display. This keeps the internal logic clean while making output user-friendly.
        """
        # Reset agents before starting
        agent1.reset()
        agent2.reset()

        game_results = []

        # Mapping for display purposes
        action_descriptions = {
            Action.COOPERATE: "Stay Silent (Cooperate with other thief)",
            Action.DEFECT: "Snitch (Betray other thief)"
        }

        for round_num in range(rounds):
            result = self.play_round(agent1, agent2)
            game_results.append(result)

            # Print round results
            print(f"\nRound {round_num + 1}:")
            print(f"{agent1.name}: {action_descriptions[result['agent1_action']]} -> Payoff: {result['agent1_payoff']}")
            print(f"{agent2.name}: {action_descriptions[result['agent2_action']]} -> Payoff: {result['agent2_payoff']}")

        return game_results

    def analyze_strategy_effectiveness(self, agents: List[Agent], rounds_per_match: int = 10):
        """
        Analyze how each strategy performs against each action type.

        Design choice: This analysis method leverages the enum to provide
        detailed statistics about cooperation vs defection rates.
        """
        strategy_stats = {}

        for agent in agents:
            strategy_stats[agent.name] = {
                'total_payoff': 0,
                'games_played': 0,
                'cooperation_rate': 0,
                'payoff_per_game': 0
            }

        # Run all combinations
        for i in range(len(agents)):
            for j in range(i + 1, len(agents)):
                agent1, agent2 = agents[i], agents[j]

                game_results = self.play_game(agent1, agent2, rounds_per_match)

                # Update statistics
                strategy_stats[agent1.name]['total_payoff'] += agent1.total_payoff
                strategy_stats[agent1.name]['games_played'] += 1

                strategy_stats[agent2.name]['total_payoff'] += agent2.total_payoff
                strategy_stats[agent2.name]['games_played'] += 1

                # Calculate cooperation rates
                agent1_cooperations = sum(1 for action in agent1.my_actions if action == Action.COOPERATE)
                agent2_cooperations = sum(1 for action in agent2.my_actions if action == Action.COOPERATE)

                strategy_stats[agent1.name]['cooperation_rate'] += agent1_cooperations / len(agent1.my_actions)
                strategy_stats[agent2.name]['cooperation_rate'] += agent2_cooperations / len(agent2.my_actions)

        # Calculate averages
        for agent_name, stats in strategy_stats.items():
            if stats['games_played'] > 0:
                stats['payoff_per_game'] = stats['total_payoff'] / stats['games_played']
                stats['cooperation_rate'] = stats['cooperation_rate'] / stats['games_played']

        return strategy_stats

    def tournament(self, agents: List[Agent], rounds_per_match: int = 10):
        """Run a comprehensive tournament with detailed analysis."""
        print("\n" + "="*60)
        print("PRISONER'S DILEMMA TOURNAMENT")
        print("="*60)

        tournament_results = {}

        for i in range(len(agents)):
            for j in range(i + 1, len(agents)):
                agent1, agent2 = agents[i], agents[j]
                print(f"\n{agent1.name} vs {agent2.name}")
                print("-" * 40)

                game_results = self.play_game(agent1, agent2, rounds_per_match)

                match_key = f"{agent1.name} vs {agent2.name}"
                tournament_results[match_key] = {
                    'agent1_total': agent1.total_payoff,
                    'agent2_total': agent2.total_payoff,
                    'rounds': game_results
                }

                print(f"\nFinal Scores:")
                print(f"{agent1.name}: {agent1.total_payoff}")
                print(f"{agent2.name}: {agent2.total_payoff}")

        # Detailed analysis
        print("\n" + "="*60)
        print("STRATEGY ANALYSIS")
        print("="*60)

        strategy_stats = self.analyze_strategy_effectiveness(agents, rounds_per_match)

        # Sort by average payoff per game
        sorted_strategies = sorted(strategy_stats.items(),
                                 key=lambda x: x[1]['payoff_per_game'],
                                 reverse=True)

        print(f"\n{'Strategy':<20} {'Avg Payoff':<12} {'Cooperation Rate':<18} {'Total Points'}")
        print("-" * 70)

        for strategy_name, stats in sorted_strategies:
            print(f"{strategy_name:<20} {stats['payoff_per_game']:<12.2f} "
                  f"{stats['cooperation_rate']:<18.2%} {stats['total_payoff']}")

        return tournament_results, strategy_stats


In [8]:
def demonstrate_custom_scenarios():
    """
    Demonstrate how different payoff matrices affect strategy effectiveness.

    Design choice: This shows the flexibility of the enum-based design.
    We can easily test how different reward structures change optimal strategies.
    """
    print(f"\n{'='*60}")
    print("CUSTOM SCENARIO: COOPERATION-FRIENDLY ENVIRONMENT")
    print("="*60)

    cooperation_friendly_matrix = create_custom_payoff_matrix()
    cooperation_friendly_matrix.display()

    agents = [
        TitForTatAgent("Tit-for-Tat"),
        AlwaysDefectAgent("Always Defect"),
        GenerousAgent("Generous", 0.3)
    ]

    game = Game(cooperation_friendly_matrix)
    results, stats = game.tournament(agents, rounds_per_match=5)


In [9]:
def create_classic_payoff_matrix() -> PayoffMatrix:
    """
    Create the classic Prisoner's Dilemma payoff matrix using Action enum.

    Design choice: Now using Action enum as keys instead of string tuples.
    This provides type safety and makes the matrix definition much clearer.
    The enum values make it obvious what each outcome represents.
    """
    return PayoffMatrix({
        (Action.COOPERATE, Action.COOPERATE): (3, 3),  # Both stay silent - medium sentence
        (Action.COOPERATE, Action.DEFECT): (0, 5),     # I stay silent, partner snitches - harsh for me
        (Action.DEFECT, Action.COOPERATE): (5, 0),     # I snitch, partner stays silent - good for me
        (Action.DEFECT, Action.DEFECT): (1, 1)         # Both snitch - light sentence for both
    })

def create_custom_payoff_matrix() -> PayoffMatrix:
    """
    Example of a custom payoff matrix that heavily rewards cooperation.

    Design choice: Demonstrating how easy it is to create different game
    scenarios while maintaining the same agent logic and enum structure.
    """
    return PayoffMatrix({
        (Action.COOPERATE, Action.COOPERATE): (5, 5),  # High reward for mutual cooperation
        (Action.COOPERATE, Action.DEFECT): (0, 3),     # Reduced advantage for defection
        (Action.DEFECT, Action.COOPERATE): (3, 0),     # Reduced advantage for defection
        (Action.DEFECT, Action.DEFECT): (1, 1)         # Same punishment for mutual defection
    })

In [12]:
def main():
    """
    Main demonstration function.

    Design choice: Comprehensive demo showing the power of enum-based design.
    The code is much more readable and maintainable than string-based approach.
    """

    print("PRISONER'S DILEMMA: THE HEIST AFTERMATH")
    print("="*50)
    print("You and your partner have been caught after a heist!")
    print("Police have you in separate rooms...")
    print()
    print("COOPERATE = Stay silent, don't rat out your partner")
    print("DEFECT = Snitch on your partner to get a better deal")

    # Create the classic payoff matrix
    payoff_matrix = create_classic_payoff_matrix()
    payoff_matrix.display()

    # Create different types of agents
    agents = [
        AlwaysCooperateAgent("Loyal Thief"),
        AlwaysDefectAgent("Backstabber"),
        TitForTatAgent("Eye-for-Eye"),
        RandomAgent("Unpredictable", 0.7),  # 70% chance to cooperate
        GrudgeAgent("Never Forgive"),
        GenerousAgent("Sometimes Forgive", 0.2)  # 20% forgiveness rate
    ]

    # Create and run the game
    game = Game(payoff_matrix)

    # Run tournament with analysis
    tournament_results, strategy_stats = game.tournament(agents, rounds_per_match=2)

    print(f"\n{'='*60}")
    print("KEY INSIGHTS")
    print("="*60)

    best_strategy = max(strategy_stats.items(), key=lambda x: x[1]['payoff_per_game'])
    most_cooperative = max(strategy_stats.items(), key=lambda x: x[1]['cooperation_rate'])

    print(f"Most Effective Strategy: {best_strategy[0]} (avg {best_strategy[1]['payoff_per_game']:.2f} points per game)")
    print(f"Most Cooperative Strategy: {most_cooperative[0]} ({most_cooperative[1]['cooperation_rate']:.1%} cooperation rate)")

    print(f"\nNote: In this prisoner's dilemma context:")
    print(f"- Higher scores = shorter prison sentences")
    print(f"- Cooperation = mutual loyalty between thieves")
    print(f"- Defection = betraying your partner for personal gain")

In [13]:
if __name__ == "__main__":
    main()
    demonstrate_custom_scenarios()

PRISONER'S DILEMMA: THE HEIST AFTERMATH
You and your partner have been caught after a heist!
Police have you in separate rooms...

COOPERATE = Stay silent, don't rat out your partner
DEFECT = Snitch on your partner to get a better deal

Payoff Matrix (Player1, Player2):
In Prisoner's Dilemma context:
COOPERATE = Stay silent (cooperate with other thief)
DEFECT = Snitch on the other thief

           Player2
         COOP   DEFECT
Player1 COOP  (3, 3)   (0, 5)
        DEF   (5, 0)   (1, 1)

PRISONER'S DILEMMA TOURNAMENT

Loyal Thief vs Backstabber
----------------------------------------

Round 1:
Loyal Thief: Stay Silent (Cooperate with other thief) -> Payoff: 0
Backstabber: Snitch (Betray other thief) -> Payoff: 5

Round 2:
Loyal Thief: Stay Silent (Cooperate with other thief) -> Payoff: 0
Backstabber: Snitch (Betray other thief) -> Payoff: 5

Final Scores:
Loyal Thief: 0
Backstabber: 10

Loyal Thief vs Eye-for-Eye
----------------------------------------

Round 1:
Loyal Thief: Stay Si