# Average the predictions

Researchers provided predictions about the social variables that would correlate with novelty and predict precocity. Here we average them and estimate variation.

In [1]:
import pandas as pd
import numpy as np
from scipy.stats import pearsonr, spearmanr, kendalltau

### The social variables we're talking about

I've provided the complete list, but the ones that really interest us below are in bold.

1. Books that got the most reviews.
2. **Books about which most was written (adding up the length of each review).**
3. **Books more positively reviewed.**
4. Books reviewed by specific publications (we can itemize, say, 10 leading publications in the BRD).
5. Books widely reviewed by little magazines; this is one way of defining an "avant garde."
6. Books published by particularly prestigious publishers (e.g., Knopf).
7. **Books that won Pulitzer/Nobel prizes.**
8. **Bestsellers (the top 10 per year from Unsworth’s list).**
9. **We can use principal component analysis on the whole reception matrix, and then “rotate” the components to find one that tends to distinguish wide-circulation venues (like newspapers) from little magazines. This is another way of defining “avant garde,” and arguably better than the absolute count in (5) at identifying books that get relatively more attention in intellectual venues than in mass-market ones.**
10. **A retrospective definition of the early-20c avant-garde extracted from recent 21c secondary sources by David Bishop and Liza Senatorova.**


### Predictions about novelty

We each created a ranked list that predicted how well these variables would correlate with novelty.

In [2]:
novelty = dict()
novelty['d'] = [9, 3, 2, 7, 8, 10]
novelty ['l'] = [10, 9, 7, 2, 8, 3]
novelty['t'] = [9, 10, 8, 2, 7, 3]
novelty['w'] = [10, 9, 2, 7, 3, 8]
novelty['y'] = [10, 9, 8, 7, 2, 3]

### Convert into a list of ranks

We're given a ranked list of items. Since the item numbers have no meaning, we need to convert into a list of ranks before averaging.

For each (real) item, we want to know "what rank did it have in this list"?

Note that this is not the same thing as asking "for each position in the list, what's the mapping to the arbitrary order 2, 3, 7, 8, 9, 10"? Since the original ordering of those elements was arbitary, a number representing "order in the original list" doesn't have any substantive meaning.

We will use that arbitrary order 2, 3, 7 etc to *organize* our lists (because they have to have some fixed order in common). But the numbers that we're interested in (and that we want to average) are the *rankings* provided by the five of us. Those are the numbers that have a substantive meaning ("how much will this item correlate with novelty") and that we therefore want to average.

In [3]:
def rankedlisttolistofranks(adict):
    
    ''' Converts a dictionary where each value is a
    researcher-specific ordered list of six items
    into a dictionary where each value is a list aligned with the
    masterlist [2, 3, 7, 8, 9, 10]
    and each element i of these new lists reports the rank of
    masterlist[i] in the researcher's ordered list.'''
    
    #sanity check
    assert len(adict) == 5
    
    masterlist = [2, 3, 7, 8, 9, 10]
    
    newdict = dict()
    for k, v in adict.items():
        assert len(v) == 6   # sanity check  
        ranklist = []
        for item in masterlist:
            ranklist.append(v.index(item))
        newdict[k] = ranklist
    return newdict

Using this function we can convert our ranked lists into lists of ranks.

In [4]:
noveltyranks = rankedlisttolistofranks(novelty)
noveltyranks

{'d': [2, 1, 3, 4, 0, 5],
 'l': [3, 5, 2, 4, 1, 0],
 't': [3, 5, 4, 2, 0, 1],
 'w': [2, 4, 3, 5, 1, 0],
 'y': [4, 5, 3, 2, 1, 0]}

### A function to average the ranks

In [5]:
def averagealltheranks(adict):
    averageranks = np.zeros(6)
    for k, v in adict.items():
        averageranks = averageranks + v
    averageranks = averageranks / len(adict)
    return averageranks
    
averagenovranks = averagealltheranks(noveltyranks)
averagenovranks

array([2.8, 4. , 3. , 3.4, 0.6, 1.2])

That's our list of average ranks. Now we can use this to infer a ranking:

In [7]:
tuplelist = zip(averagenovranks, [2, 3, 7, 8, 9, 10])
avgnovorder = [x[1] for x in sorted(tuplelist)]
avgnovorder

[9, 10, 2, 7, 8, 3]

### Calculating Kendall's W

[Kendall's W](https://en.wikipedia.org/wiki/Kendall%27s_W) is a measure of agreement between different rankings.

In [8]:
import numpy as np


def kendall_w(expt_ratings):
    if expt_ratings.ndim!=2:
        raise 'ratings matrix must be 2-dimensional'
    m = expt_ratings.shape[0] #raters
    n = expt_ratings.shape[1] # items rated
    denom = m**2*(n**3-n)
    rating_sums = np.sum(expt_ratings, axis=0)
    S = n*np.var(rating_sums)

    return 12*S/denom

the_ratings = np.array([[1,2,3,4],[2,3,1,4],[1,3,2,4],[1,3,4,2]])
m = the_ratings.shape[0]
n = the_ratings.shape[1]

W = kendall_w(the_ratings)

count = 0
for trial in range(1000):
    perm_trial = []
    for _ in range(m):
        perm_trial.append(list(np.random.permutation(range(1, 1+n))))
    count += 1 if kendall_w(np.array(perm_trial)) > W else 0

print ('Calculated value of W:', W, ' Permutation values exceed it in', count, 'out of 1000 cases')
    

Calculated value of W: 0.525  Permutation values exceed it in 67 out of 1000 cases


That gives us both a measure of *W* and something analogous to a p-value, telling us how often this could happen by accident.

In [9]:
count / 1000

0.067

In [10]:
novelty_array = np.array([x for x in noveltyranks.values()])

m = novelty_array.shape[0]
n = novelty_array.shape[1]

W = kendall_w(novelty_array)

count = 0
for trial in range(1000):
    perm_trial = []
    for _ in range(m):
        perm_trial.append(list(np.random.permutation(range(1, 1+n))))
    count += 1 if kendall_w(np.array(perm_trial)) > W else 0

print ('Calculated value of W:', W, ' Permutation values exceed it in', count, 'out of 1000 cases')
print ("raters: ",m," items rated: ", n, "and W = ",'%.8f' % W)
print("---------------------------------------------")

Calculated value of W: 0.49714285714285716  Permutation values exceed it in 11 out of 1000 cases
raters:  5  items rated:  6 and W =  0.49714286
---------------------------------------------


### How much the typical prediction diverges from the average

A simpler and more familiar way to talk about variance may be to measure the average Spearman correlation of individual rankings with the center of gravity of them all.

In an earlier version I suggested measuring this pairwise. We could still do that, but this measure seemed more comparable to the one we're going to get after the experiment: Spearman correlation of a single set of measurements with our predicted average.

In [11]:
allzs = []
allrhos = []

for key, ranking in noveltyranks.items():
    r, p = spearmanr(ranking, averagenovranks)
    z = np.arctanh(r)
    allzs.append(z)
    allrhos.append(r)

print('The average Spearman rho is', np.tanh(np.mean(allzs)))
print('Note not the same as', np.mean(allrhos))
print('The range of rhos is ', allrhos)

The average Spearman rho is 0.7589548509215696
Note not the same as 0.68
The range of rhos is  [0.08571428571428573, 0.8857142857142858, 0.8285714285714287, 0.8857142857142858, 0.7142857142857143]


### Predictions about precocity

Now we'll do the same thing for precocity.

In [12]:
precocity = dict()
precocity['d'] = [8, 7, 2, 10, 9, 3]
precocity['l'] = [7, 10, 8, 2, 9, 3]
precocity['t'] = [2, 10, 8, 7, 9, 3]
precocity['w'] = [10, 7, 3, 2, 8, 9]
precocity['y'] = [9, 8, 10, 7, 2, 3]

precocityranks = rankedlisttolistofranks(precocity)
precocityranks

{'d': [2, 5, 1, 0, 4, 3],
 'l': [3, 5, 0, 2, 4, 1],
 't': [0, 5, 3, 2, 4, 1],
 'w': [3, 2, 1, 4, 5, 0],
 'y': [4, 5, 3, 1, 0, 2]}

In [13]:
averageprecranks = averagealltheranks(precocityranks)
averageprecranks

array([2.4, 4.4, 1.6, 1.8, 3.4, 1.4])

In [14]:
pearsonr(averagenovranks, averageprecranks)

(0.18789425188745115, 0.7214753549798081)

In [15]:
tuplelist = zip(averageprecranks, [2, 3, 7, 8, 9, 10])
avgprecorder = [x[1] for x in sorted(tuplelist)]
avgprecorder

[10, 7, 8, 2, 9, 3]

In [16]:
precocity_array = np.array([x for x in precocityranks.values()])

m = precocity_array.shape[0]
n = precocity_array.shape[1]

W = kendall_w(precocity_array)

count = 0
for trial in range(1000):
    perm_trial = []
    for _ in range(m):
        perm_trial.append(list(np.random.permutation(range(1, 1+n))))
    count += 1 if kendall_w(np.array(perm_trial)) > W else 0

print ('Calculated value of W:', W, ' Permutation values exceed it in', count, 'out of 1000 cases')
print(n, m)

Calculated value of W: 0.3965714285714286  Permutation values exceed it in 51 out of 1000 cases
6 5


In [17]:
allzs = []
allrhos = []

for key, ranking in precocityranks.items():
    r, p = spearmanr(ranking, averageprecranks)
    z = np.arctanh(r)
    allzs.append(z)
    allrhos.append(r)

print('The average Spearman rho is', np.tanh(np.mean(allzs)))
print('Note not the same as', np.mean(allrhos))
print('The range of rhos is ', allrhos)

The average Spearman rho is 0.6756864375860854
Note not the same as 0.6
The range of rhos is  [0.6, 0.942857142857143, 0.6, 0.6, 0.2571428571428572]


### Relation of novelty predictions to precocity predictions

Finally, how do the two rankings agree?

In [18]:
spearmanr(avgprecorder, avgnovorder)

SpearmanrResult(correlation=0.3142857142857143, pvalue=0.5440932944606414)