Let's first reproduce their _Coalitional Bargaining with Agent Type Uncertainty_

Simplifications: 
- only division rule: proportional to weights (perceived by the proposers)
- finite types.

Assuming that the fixed-type is common observable!

In [27]:
import numpy as np
from collections import defaultdict
from bisect import bisect_left, bisect_right

In [2]:
class State:
    def __init__(self, active_agents, t):
        self.active_agents = active_agents
        self.t = t
    
    def expand_states(self, game):
        pass

In [3]:
def one_hot_vector(agent_type, T):
    v = np.zeros(T)
    v[agent_type] = 1
    return v

In [4]:
class Agent:
    def __init__(self, name, agent_type):
        self.agent_type = agent_type
        self.policy = []
        self.name = str(name) + ':' + str(self.agent_type)
        self.belief = {} # key = name, value = prob. vector over finite type
        
    def init_belief(self, game):
        '''
        Initially, uniform prior over other agents
        '''
        self.belief[self.name] = one_hot_vector(self.agent_type, game.T)
        for player in game.agents:
            if player.name == self.name:
                continue
            self.belief[player.name] = np.array([1/game.T for _ in range(game.T)])
        
    def __repr__(self):
        return self.name
    
    def draw_types(self):
        '''
        draw the types of other agents based on current belief
        '''
        self.belief_types = {} # key = name, value = drawn type
        for player_name, belief_prob in self.belief.items():
            self.belief_types[player_name] = np.random.choice(range(len(belief_prob)),p=belief_prob)
    
    def proposer_eval(self, state):
        pass
    
    def responder_eval(self, state):
        pass

In [5]:
class Game:
    def __init__(self, T, N, tasks, horizon):
        self.T = T
        self.N = N
        self.tasks = tasks
        self.horizon = horizon
        self.nodes = defaultdict(set)
        self.agents = []
        self.init_game()
        
    def init_game(self):
        # chr(65)='A', chr(66)='B' and so on
        self.agents = [Agent(chr(i+65), np.random.randint(0, self.T)) \
                          for i in range(self.N)]
        self.init_beliefs()
    
    def init_beliefs(self):
        for player in self.agents:
            player.init_belief(self)


In [6]:
class Task:
    def __init__(self, threshold, reward):
        self.threshold = threshold
        self.reward = reward 
        self.name = 'Task(threshold={},reward={})'.format(self.threshold, self.reward)
    def __repr__(self):
        return self.name

In [7]:
T = 3 # no. of types
N = 5 # no. of players
horizon = 2 # how many proposal rounds before a game terminate
tasks = [Task(threshold=1, reward=1), 
         Task(threshold=4, reward=2)]

In [8]:
g = Game(T, N, tasks, horizon)

In [9]:
g.agents

[A:0, B:0, C:1, D:0, E:0]

In [10]:
g.tasks

[Task(threshold=1,reward=1), Task(threshold=4,reward=2)]

In [11]:
g.agents[0].belief

{'A:0': array([1., 0., 0.]),
 'B:0': array([0.33333333, 0.33333333, 0.33333333]),
 'C:1': array([0.33333333, 0.33333333, 0.33333333]),
 'D:0': array([0.33333333, 0.33333333, 0.33333333]),
 'E:0': array([0.33333333, 0.33333333, 0.33333333])}

In [12]:
g.agents[0].draw_types()
g.agents[0].belief_types

{'A:0': 0, 'B:0': 1, 'C:1': 0, 'D:0': 1, 'E:0': 1}

In [13]:
# randomly choose a proposer
#proposer = np.random.choice(g.active_agents)

In the responder node, we can have many different agent decision assumptions.
-  Other agent = decider assumption: eqn 2 of Chalkiadakis.
- You're = decider.
- Softmax/proportional prob. instead of argmax.

In [14]:
np.random.choice([1,2,3], p=[0.5,0.5,0])

1

__Continuation payoff calculation in Chalkiadakis is weird__

Let's us do our thing!

Given a type vector of common knowledge, we can evaluate the whole game tree!

$V(G) = \frac{1}{n} \sum_{i=1}^n V(G_i)$
where $V(G_i)$ is the value of the proposal node where $i$ proposes. 

$G_i$ would look at all potential coalition $C$ containing $i$. Each coalition either leads to acceptance or rejection based on the type vector $t$ (not probabilistic but deterministic so here!)

A responder $j$ under this common knowledge would accept if their payoff (they know the common knowledge) is larger than the continuation payoff.

__Our approach__

I mean we ultimately assume what we assume that i knows about j.


Ok. Let us assume that agent i is quite egotistical and assume that his belief is so "obvious" that it ought to be a common knowledge.

In [39]:
def eval_coalition(C, tasks):
    '''
    C is a list of agent weight!
    '''
    W = sum(C)
    thresholds = sorted([t.threshold for t in tasks]) 
    print(tasks)
    insertion_pt = bisect_right(thresholds, W)
    if insertion_pt == 0:
        return None
    return tasks[insertion_pt-1]

In [41]:
eval_coalition([0.5, 1], tasks)

[Task(threshold=1,reward=1), Task(threshold=4,reward=2)]


Task(threshold=1,reward=1)

In [42]:
2/(2 + 1 + 1) 

0.5

In [45]:
(1/(2+1+1)) * 5.9

1.475

In [47]:
s1 = set(['A', 'B'])
s2 = set(['B'])
s1.difference(s2)

{'A'}

In [48]:
s1

{'A', 'B'}

Three assumptions.

1. Common knowledge in mental simulation!
2. Greedily optimize in mental simulation
3. One wrong at the time in the belief update!

Our Bayesian update is obviously quite off due to the observer's lack of access to the knowledge of other agents.

The observer reasons about the likelihood of a responder using the observer's own belief. However, in reality, the responder uses their belief!

I think the Paper uses the latter case. No wonder there would be some sort of convergence in beliefs to the correct one. The observer basically gains indirect access to the responder's type (I think ...)

In [None]:
from itertools import product
product()