# Calculating impact factor

Tristan Miller, 4/28/2018

The impact factor of a card is a measure of how much the presence of a card affects the likelihood of gaining other cards.  Specifically, I calculate the sum of the change in the "gain frequency" of each other supply pile.  In mathematical terms:

$\mathrm{Impact}(X) = \sum_Y \Big | P(\mathrm{gain}(Y)|\mathrm{supply}(X)) - P(\mathrm{gain}(Y))\Big|$

Here, X and Y are supply piles.  gain(Y) refers to the subset of games where Y was gained, and supply(X) refers to the subset of games where X is in the supply.  Since cards can only be gained in games where they are in the supply (ignoring Black Market exceptions), we have the following expressions:

$P(\mathrm{gain}(Y)|\mathrm{supply}(X)) =$ <font color='red'>$P(\mathrm{gain}(Y)|\mathrm{supply}(X) \& \mathrm{supply}(Y))$</font> <font color='blue'>$P(\mathrm{supply}(Y)|\mathrm{supply}(X))$</font>

$P(\mathrm{gain}(Y)) =$ <font color='red'>$P(\mathrm{gain}(Y)|\mathrm{supply}(Y))$</font> <font color='blue'>$P(\mathrm{supply}(Y))$</font>

In each expression, the red term is calculated from the data from the game logs, but the blue term is calculated *a priori*, using standard rules for randomly choosing cards to be in the supply.  The reason for this is that the data coverage is biased towards older cards.

I also tried some other things, which could be found in the code.  Rather than calculating the probability that a card was gained, I also looked at the average number of cards gained.  I was less satisfied with this measure.  Another community member inspired the related idea of the synergy factor, which is explained further below.

In [1]:
#packages
import os
import re
import pdb
import numpy as np
import pickle
import time
import pandas as pd

In [2]:
#open pickles
with open('gain_matrices.pkl','rb') as f:
    num_games, game_gains, total_gains = pickle.load(f)
with open('card_list.pkl','rb') as f:
    card_list,card_dict = pickle.load(f)

In [8]:
#This is the a priori coding of the blue terms.

#Card_weights is the probability that [col] is in the kingdom given that [row] is.
card_weights = np.zeros((len(card_list),len(card_list)))
#For any given kingdom card, the probabilities are roughly independent
card_weights += 10/206
#Some cards are in every kingdom
card_weights[:,card_dict['Copper']] = 1
card_weights[:,card_dict['Silver']] = 1
card_weights[:,card_dict['Gold']] = 1
card_weights[:,card_dict['Estate']] = 1
card_weights[:,card_dict['Duchy']] = 1
card_weights[:,card_dict['Province']] = 1
card_weights[:,card_dict['Curse']] = 1
#Colony and Platinum are 2.5 times as common (and I won't worry about the correlation with Prosperity cards)
card_weights[:,card_dict['Colony']] *= 2.5
card_weights[:,card_dict['Platinum']] *= 2.5
card_weights[card_dict['Colony'],card_dict['Platinum']] = 1
card_weights[card_dict['Platinum'],card_dict['Colony']] = 1
#Ruins are three times as common, and guaranteed when a looter is in the kingdom
card_weights[:,card_dict['Ruins']] *= 3
card_weights[card_dict['Death Cart'],card_dict['Ruins']] = 1
card_weights[card_dict['Marauder'],card_dict['Ruins']] = 1
card_weights[card_dict['Cultist'],card_dict['Ruins']] = 1
card_weights[card_dict['Ruins'],card_dict['Death Cart']] = 1/3
card_weights[card_dict['Ruins'],card_dict['Marauder']] = 1/3
card_weights[card_dict['Ruins'],card_dict['Cultist']] = 1/3
#Potions are 9 times as common, and guaranteed if one of the potion cards is in the kingdom
card_weights[:,card_dict['Potion']] *= 9
potion_cards = ['Transmute','Scrying Pool','University','Apothecary','Familiar','Alchemist',"Philosopher's Stone",'Golem','Possession']
for card in potion_cards:
    card_weights[card_dict[card],card_dict['Potion']] = 1
    card_weights[card_dict['Potion'],card_dict[card]] = 1/9
#Finally, every card is guaranteed to appear in a game with itself
np.fill_diagonal(card_weights,1)

In [9]:
#element-wise division
prc_gains = game_gains / num_games
avg_gains = total_gains / num_games

#Apply weighting. Multiply by card_weights[col], except along the diagonal of the matrix
impact_prc = prc_gains * card_weights
impact_avg = avg_gains * card_weights
#try without weights

#Now from each row, subtract the vector from the average game
copper_prc = impact_prc[card_dict['Copper'],:].copy()
copper_avg = impact_avg[card_dict['Copper'],:].copy()
impact_prc -= copper_prc
impact_avg -= copper_avg

#Finally, calculate the impact factor for each card
#card_impact_prc = np.sum(impact_prc * impact_prc,axis = 1) ** 0.5 #problem: this squares the weight too
#card_impact_avg = np.sum(impact_avg * impact_avg,axis = 1) ** 0.5
card_impact_prc = np.sum(abs(impact_prc),axis = 1)
card_impact_avg = np.sum(abs(impact_avg),axis = 1)

In [10]:
#want to estimate error margins
prc_margin = np.sum( card_weights ** 2 * prc_gains ** 2 / num_games , axis = 1) ** 0.5
avg_margin = np.sum( card_weights ** 2 * avg_gains ** 2 / num_games , axis = 1) ** 0.5
print(np.mean(prc_margin))
print(np.mean(avg_margin))

0.020445697077629766
0.06721717651659934


In [11]:
#sort cards by impact factor
cards_sorted_prc = sorted(card_list , key = lambda card: -card_impact_prc[card_dict[card]])
cards_sorted_avg = sorted(card_list , key = lambda card: -card_impact_avg[card_dict[card]])
sorted_impact = sorted(card_impact_prc, reverse=True)

cards_sorted_prc is based on gain frequency, while cards_sorted_avg is based on the average number of copies of each card gained.  Upon looking at these lists, I decided that cards_sorted_prc is more meaningful.  Cards_sorted_avg places Ill-Gotten Gains at top, which makes sense since you're gaining a lot of copper and curses, but most people wouldn't consider it stronger than Mountebank.

In [12]:
#use pandas to print out the list
table = pd.DataFrame({'Rank':range(1,218),'Card':cards_sorted_prc,'Impact':sorted_impact})
table = table.set_index('Rank')
print(table)

                     Card    Impact
Rank                               
1                 Rebuild  2.973071
2              Mountebank  2.550012
3                   Goons  2.459193
4                 Cultist  2.425984
5        Ill-Gotten Gains  2.378828
6              Ambassador  2.359535
7                Governor  2.228160
8              University  2.186465
9            Scrying Pool  2.168881
10             Tournament  2.163019
11               Familiar  2.014834
12                 Minion  2.002180
13               Swindler  1.999885
14        JackOfAllTrades  1.988608
15                 Chapel  1.956012
16             Masquerade  1.944556
17                 Colony  1.936160
18               Platinum  1.936160
19             Soothsayer  1.868006
20        Fishing Village  1.835029
21            Fool's Gold  1.811470
22                  Wharf  1.803391
23               Marauder  1.802965
24                  Witch  1.800978
25                Sea Hag  1.753880
26               Torturer  1

In [7]:
#make a table that can be posted to f.ds
#table_string = '[table][tr][td]Rank[/td][td]Card[/td][td]Impact[/td][/tr]\n'
#for i in range(217):
#    table_string += '[tr][td]%i[/td][td]%s[/td][td]%.2f[/td][/tr]\n' % (i+1,cards_sorted_prc[i],sorted_impact[i])
#table_string += '[/table]'

#2 column version
table_string = '[table][tr][td]Rank[/td][td]Card[/td][td]Impact[/td][td]         [/td][td]Rank[/td][td]Card[/td][td]Impact[/td][/tr]\n'
for i in range(109):
    table_string += '[tr][td]%i[/td][td]%s[/td][td]%.2f[/td]' % (i+1,cards_sorted_prc[i],sorted_impact[i])
    table_string += '[td][/td]'
    if i < 108:
        table_string += '[td]%i[/td][td]%s[/td][td]%.2f[/td][/tr]\n' % (i+110,cards_sorted_prc[i+109],sorted_impact[i+109])

table_string += '[/tr][/table]'
print(table_string)

[table][tr][td]Rank[/td][td]Card[/td][td]Impact[/td][td]         [/td][td]Rank[/td][td]Card[/td][td]Impact[/td][/tr]
[tr][td]1[/td][td]Rebuild[/td][td]2.97[/td][td][/td][td]110[/td][td]Throne Room[/td][td]1.03[/td][/tr]
[tr][td]2[/td][td]Mountebank[/td][td]2.55[/td][td][/td][td]111[/td][td]Cellar[/td][td]1.02[/td][/tr]
[tr][td]3[/td][td]Goons[/td][td]2.46[/td][td][/td][td]112[/td][td]Market[/td][td]1.01[/td][/tr]
[tr][td]4[/td][td]Cultist[/td][td]2.43[/td][td][/td][td]113[/td][td]Stonemason[/td][td]1.01[/td][/tr]
[tr][td]5[/td][td]Ill-Gotten Gains[/td][td]2.38[/td][td][/td][td]114[/td][td]Ghost Ship[/td][td]1.00[/td][/tr]
[tr][td]6[/td][td]Ambassador[/td][td]2.36[/td][td][/td][td]115[/td][td]Remodel[/td][td]1.00[/td][/tr]
[tr][td]7[/td][td]Governor[/td][td]2.23[/td][td][/td][td]116[/td][td]Oasis[/td][td]0.99[/td][/tr]
[tr][td]8[/td][td]University[/td][td]2.19[/td][td][/td][td]117[/td][td]Vagrant[/td][td]0.99[/td][/tr]
[tr][td]9[/td][td]Scrying Pool[/td][td]2.17[/td][td][/td][td]118[/td

# Synergy factor
The synergy factor is an alternative metric that tells you how much a card affects gain percentages rather than gain frequencies.  The difference between these two concepts is that "gain frequency" looks at all games, whereas "gain percentage" restricts itself only to games where the card is available.  In mathematical terms:

$\mathrm{Synergy}(X) = \sum_Y \Big| P(\mathrm{gain}(Y)|\mathrm{supply}(X)\&\mathrm{supply}(Y)) - P(\mathrm{gain}(Y)|\mathrm{supply}(Y))\Big|$

Essentially, we are ignoring the blue terms in the mathematical equations at the top of the notebook.  The main effect is that the weight of non-kingdom cards is greatly reduced.  Additionally, you may also notice that the synergy factor does not include any contribution from its *own* gain percentage.  This is a qualitatively distinct concept that, in my opinion, does not quite match the concept of "card strength".  But it is still useful, and it's what I use to classify card types in my later PCA analysis.

In [9]:
#open pickles if not already done
with open('gain_matrices.pkl','rb') as f:
    num_games, game_gains, total_gains = pickle.load(f)
with open('card_list.pkl','rb') as f:
    card_list,card_dict = pickle.load(f)

In [25]:
#I'm only calculating the synergy factor of kingdom cards.  So let's eliminate these rows:
card_excluder = np.ones((len(card_list)))
non_kingdom_cards = ['Copper','Silver','Gold','Estate','Duchy','Province','Curse','Colony','Platinum','Ruins','Potion','Prince','Walled Village']

for card in non_kingdom_cards:
    card_excluder[card_dict[card]] = 0

In [26]:
#element-wise division
prc_gains = game_gains / num_games

#Exclude non-kingdom cards by using card_excluder
synergies = prc_gains * card_excluder

#Now from each row, subtract the vector from the average game
base_gain = synergies[card_dict['Copper'],:].copy()
synergies -= base_gain

#Finally, calculate the synergy factor for each card
syn_factor = np.sum(abs(synergies),axis = 1)

In [27]:
#want to estimate error margins in the synergy factor
syn_margin = np.sum( (prc_gains * card_excluder) ** 2 / num_games , axis = 1) ** 0.5
print(np.mean(syn_margin))

#And the error margins in any particular synergy strength
print(np.mean(prc_gains) / 600**0.5)

0.31180894049892066
0.0202347400204194


In [28]:
#sort cards by synergy factor
#And remove the base cards and potion.  I don't care about those
cards_sorted_syn = sorted(card_list , key = lambda card: -syn_factor[card_dict[card]])[:210]
sorted_syn = sorted(syn_factor, reverse=True)[:210]

In [29]:
#find top two synergies and top anti-synergy
syn_card_1 = ['']*len(sorted_syn)
syn_card_2 = ['']*len(sorted_syn)
syn_card_3 = ['']*len(sorted_syn)
syn_factor_1 = [0.0]*len(sorted_syn)
syn_factor_2 = [0.0]*len(sorted_syn)
syn_factor_3 = [0.0]*len(sorted_syn)

for i, card in enumerate(cards_sorted_syn):
    syn_contributions = synergies[card_dict[card],:]
    #sorted_contributors = sorted(card_list, key = lambda card: -abs(syn_contributions[card_dict[card]]))
    #sorted_contributions = sorted(syn_contributions, key = lambda contribution: -abs(contribution))
    sorted_contributors = sorted(card_list, key = lambda card: -syn_contributions[card_dict[card]])
    sorted_contributions = sorted(syn_contributions, key = lambda contribution: -contribution)
    syn_card_1[i] = sorted_contributors[0]
    syn_card_2[i] = sorted_contributors[1]
    syn_card_3[i] = sorted_contributors[-1]
    syn_factor_1[i] = sorted_contributions[0]
    syn_factor_2[i] = sorted_contributions[1]
    syn_factor_3[i] = sorted_contributions[-1]

In [30]:
#use pandas to print out the list
table = pd.DataFrame({'Rank':range(1,211),'Card':cards_sorted_syn,'Impact':sorted_syn,'1st synergy':syn_card_1,'syn1 strength':syn_factor_1})
table = table.set_index('Rank')
print(table)

           1st synergy                Card     Impact  syn1 strength
Rank                                                                
1               Feodum             Rebuild  25.800888       0.245498
2         Treasure Map              Chapel  16.934039       0.293513
3               Feodum     JackOfAllTrades  16.902519       0.207735
4                 Rats            Governor  16.836481       0.185476
5           Death Cart             Cultist  16.624026       0.180185
6           Possession          Masquerade  16.598802       0.146210
7                 Bank               Wharf  16.584568       0.184255
8              Library          University  16.130394       0.302616
9               Trader    Ill-Gotten Gains  15.792083       0.197256
10          Lighthouse              Minion  15.217949       0.183729
11              Quarry               Goons  14.252552       0.224409
12               Envoy     Fishing Village  13.790295       0.283946
13           Storeroom         Foo

In [16]:
#make a table that can be posted to f.ds
table_string = '[table][tr][td]Rank[/td][td]Card[/td][td]Synergy Factor[/td][td]Top synergy[/td][td]          [/td][td]2nd synergy[/td][td]          [/td][td]Top anti-synergy[/td][td]          [/td][/tr]\n'
for i in range(len(cards_sorted_syn)):
    table_string += '[tr][td]%i[/td][td]%s[/td][td]%.1f[/td]' % (i+1,cards_sorted_syn[i],sorted_syn[i])
    table_string += '[td]%s[/td][td]%.2f[/td][td]%s[/td][td]%.2f[/td][td]%s[/td][td]%.2f[/td][/tr]' % (syn_card_1[i],syn_factor_1[i],syn_card_2[i],syn_factor_2[i],syn_card_3[i],syn_factor_3[i])
    #table_string += '[td]%.2f[/td][td]%s[/td][td]%.2f[/td][td]%s[/td][td]%.2f[/td][td]%s[/td][/tr]' % (syn_factor_1[i],syn_card_1[i],syn_factor_2[i],syn_card_2[i],syn_factor_3[i],syn_card_3[i])
table_string += '[/tr][/table]'
print(table_string)

[table][tr][td]Rank[/td][td]Card[/td][td]Synergy Factor[/td][td]Top synergy[/td][td]          [/td][td]2nd synergy[/td][td]          [/td][td]Top anti-synergy[/td][td]          [/td][/tr]
[tr][td]1[/td][td]Rebuild[/td][td]25.8[/td][td]Feodum[/td][td]0.25[/td][td]Feast[/td][td]0.24[/td][td]Laboratory[/td][td]-0.34[/td][/tr][tr][td]2[/td][td]Chapel[/td][td]16.9[/td][td]Treasure Map[/td][td]0.29[/td][td]Market Square[/td][td]0.14[/td][td]Lookout[/td][td]-0.42[/td][/tr][tr][td]3[/td][td]JackOfAllTrades[/td][td]16.9[/td][td]Feodum[/td][td]0.21[/td][td]Duke[/td][td]0.13[/td][td]Sea Hag[/td][td]-0.30[/td][/tr][tr][td]4[/td][td]Governor[/td][td]16.8[/td][td]Rats[/td][td]0.19[/td][td]Militia[/td][td]0.18[/td][td]Bandit Camp[/td][td]-0.34[/td][/tr][tr][td]5[/td][td]Cultist[/td][td]16.6[/td][td]Death Cart[/td][td]0.18[/td][td]Trader[/td][td]0.16[/td][td]Rabble[/td][td]-0.27[/td][/tr][tr][td]6[/td][td]Masquerade[/td][td]16.6[/td][td]Possession[/td][td]0.15[/td][td]Bandit Camp[/td][td]0.12[/td][td]

# Streamlining analysis
I created a python module that streamlines all the above calculations.  I will run it again, and then save the output to a csv file.

In [17]:
#reload module
import sys
if 'analysis_functions' in sys.modules:
    del sys.modules['analysis_functions']
from analysis_functions import *

In [10]:
impact = get_impact()
impact.head()

Unnamed: 0_level_0,Card,metric
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Rebuild,2.973071
2,Mountebank,2.550012
3,Goons,2.459193
4,Cultist,2.425984
5,Ill-Gotten Gains,2.378828


In [13]:
synergy = get_synergy()
synergy.head()

Unnamed: 0_level_0,Card,metric
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Rebuild,25.800888
2,Chapel,16.934039
3,JackOfAllTrades,16.902519
4,Governor,16.836481
5,Cultist,16.624026


In [14]:
#also, calculate the "total gains" impact factor
num_games, game_gains, total_gains, card_list, card_dict = init_data()
card_weights = get_card_weights(card_list,card_dict)
card_impact_alt = calculate_impact(total_gains,num_games,card_weights,card_dict)
impact_alt = sort_metric( card_impact_alt, card_list, card_dict )
impact_alt.head()

Unnamed: 0_level_0,Card,metric
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Ill-Gotten Gains,11.836082
2,Rebuild,11.754391
3,Cultist,11.517386
4,Governor,11.509131
5,Mountebank,11.335032


In [16]:
impact.to_csv('impact_rankings.csv')
synergy.to_csv('Synergy_rankings.csv')
impact_alt.to_csv('total_gain_impact_rankings.csv')