In [15]:
import scipy.stats as stats
import sklearn.cluster as clust
import sklearn.metrics as met
import numpy as np
import math
import statsmodels.api as sm
import statsmodels

# Cohen's kappa

Cohen's kappa is a statistic used to measure the inter-rater reliability between 2 raters
It is calculated this way : \
Suppose that we ask two annotators, A and B, to label a sample with YES or NO, and we get the following result
$$
\begin{array}{|c|c|c|}
\hline
 & YES (A) & NO (A) \\
 \hline
 YES (B) & a & b \\
 \hline
 NO (B) & c & d  \\
\hline
\end{array}
$$
Let $ p_0 $ be the observed proportionate agreement, defined as :
$$ p_0 = \frac{a+d}{a+b+c+d} $$
The estimated propability that A and B both say YES is :
$$ p_{YES} = \frac{a+b}{a+b+c+d}.\frac{a+c}{a+b+c+d} $$
The estimated propability that A and B both say NO is :
$$ p_{NO} = \frac{b+d}{a+b+c+d}.\frac{c+d}{a+b+c+d} $$
The estimated overall propability that A and B agree is then :
$$  p_e = p_{YES} + p_{NO} $$
Cohen's Kappa is the computed this way :
$$ \kappa = \frac{p_0 - p_e}{1 - p_e} $$ 
Given the value of kappa, it is possible to evaluate the agreement between the two annotators
$$
\begin{array}{|c|c|}
\hline
\kappa & \text{\textbf{agreement}} \\
 \hline
< 0 & \text{poor agreement} \\
 \hline
0.01 - 0.2 & \text{slight agreement} \\
 \hline
 0.21 - 40 & \text{fair agreement} \\
 \hline
 0.41 - 0.6 & \text{moderate agreement} \\
 \hline
 0.61 - 0.8 & \text{substential agreement} \\
 \hline
 0.81 - 1.0 & \text{almost perfect agreement} \\
 \hline
\end{array}
$$

In [2]:
constant_size = 50 # size of the part that is identical in the two samples
random_size = 1000 # size of the part that may differ in the two samples 
constant = np.random.randint(2, size=constant_size) 
annot_A = np.concatenate( [np.random.randint(2, size=random_size), constant] )
annot_B = np.concatenate( [np.random.randint(2, size=random_size), constant] )
print(f"Annotations de A : {annot_A}")
print(f"Annotations de B : {annot_B}")
score = met.cohen_kappa_score(annot_A, annot_B)
print(f"Score : {score}")

Annotations de A : [0 0 1 ... 0 0 1]
Annotations de B : [1 1 1 ... 0 0 1]
Score : 0.016928855261009068


# Fleiss' kappa

Fleiss's kappa is similar to Cohen's kappa, but used to measure the agreement in a group of more than 2 annotators \
We consider $n$ annotated tweets (indexed by $i = 1,\dots, n$), and k annotators (indexed by $j = 1,\dots,k$) \
Let $Y_i$ be the number of annotators that labelled YES for the $i^{th}$ tweet, and $N_i$ be the number of annotators that labelled NO for the $i^{th}$ tweet \
Let $p_{YES}$ be the proportion of YES ratings :
$$ p_{YES} = \frac{1}{n.k}\sum_{i=1}^n Y_i$$
Let $p_{NO}$ be the proportion of NO ratings :
$$ p_{NO} = \frac{1}{n.k}\sum_{i=1}^n N_i$$
And let $\overline{P}_e$ be :
$$ \overline{P}_e = (p_{YES})^2 + (p_{NO})^2$$
Let $P_i$ be an estimation of the agreement between all the annotators on the $i^{th}$ tweet :
$$ P_i = \frac{Y_i(Y_i -1) + N_i(N_i - 1)}{k(k-1)}  $$ 
And $\overline{P}$ be the mean of all the $\overline{P}_i$ :
$$ \overline{P} = \frac{1}{n}.\sum_{i=1}^n P_i $$
We compute Fleiss' kappa using the following formula :
$$ \kappa = \frac{\overline{P} - \overline{P}_e}{1 - \overline{P}_e} $$

Let's assume we have the following results :
$$ 
\begin{array}{|c|ccc|}
\hline
& \text{annotator A} & \text{annotator B} & \text{annotator C} \\
\hline
\text{tweet } 1 & 1 & 0 & 1 \\
\text{tweet } 2 & 1 & 1 & 0 \\
\text{tweet } 3 & 1 & 1 & 0 \\
\text{tweet } 4 & 1 & 1 & 1 \\
\text{tweet } 5 & 0 & 0 & 0 \\
\text{tweet } 6 & 0 & 1 & 0 \\
\text{tweet } 7 & 0 & 0 & 0 \\
\text{tweet } 8 & 1 & 1 & 1 \\
\text{tweet } 9 & 1 & 1 & 1 \\
\text{tweet } 10 & 1 & 1 & 1 \\
\text{tweet } 11 & 1 & 0 & 0 \\
\text{tweet } 12 & 0 & 0 & 0 \\
\hline
\end{array}
$$

In [16]:
data = [[1,0,1], [1,1,0], [1,1,0], [1,1,1], [0,0,0], [0,1,0], [0,0,0], [1,1,1], [1,1,1], [1,1,1], [1,0,0], [0,0,0]]
score = sm.stats.fleiss_kappa(statsmodels.stats.inter_rater.aggregate_raters(data), method='fleiss')
print(f"Score : {score}")

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

# K-means

In [5]:

data = np.array( [[1,1,0], [1,1,1], [0,1,1], [1,1,0], [0,1,0], [0,0,0], [0,1,1] ])
clusters = clust.KMeans(n_clusters=2, random_state=42, n_init="auto").fit(data)
print(clusters.labels_)

[1 0 0 1 1 1 0]


# Cosine similarity

For $\overrightarrow{u}$ and $\overrightarrow{v}$, two vectors :
$$ cos (\overrightarrow{u}, \overrightarrow{v}) = \frac{\langle \overrightarrow{u} \; , \; \overrightarrow{v} \rangle}{ \lVert \overrightarrow{u} \rVert .\lVert \overrightarrow{v} \rVert }  $$
This value is called the cosine similarity and evaluate the angle between two vectors (can be relevant in some contexts) \
Fun fact : in the probability world, $ \mathbb{E}(X.Y) $ could be considered as a dot-product-like operation (the associated norm then would be $\sqrt{\mathbb{E}(X^2)}$), and the associated cosine similarity would be :
$$ similarity(X,Y) = \frac{ \mathbb{E}(X.Y) }{\sqrt{\mathbb{E}(X^2)\mathbb{E}(Y^2)}} $$
And 
$$ similarity(X-\mathbb{E}(X),Y-\mathbb{E}(Y)) =  \frac{ \mathbb{E}((X-\mathbb{E}(X)).(Y-\mathbb{E}(Y))) }{\sqrt{\mathbb{E}((X-\mathbb{E}(X))^2)\mathbb{E}((Y-\mathbb{E}(Y))^2)}} = \frac{Cov(X,Y)}{\sigma_X.\sigma_Y} = Corr(X,Y) $$

# Chi2 Test

The independance $ \chi^2 $ test is a statistic used to test of two features of the same sample are independant \
For example let's take a sample of $n$ annotations $ A_1, \dots, A_n $. We focus on two features : the fact that the annation was correct (noted $ x_k $ for the $k^{th}$ annotation), which can be YES or NO ; and the dataset from which the rated tweet comes from (noted $ y_k $ for the $k^{th}$ annotation), which can be ISARC or SIGN (in this example, there are only two possible values for each feature, but actually, there can be more). \
It is a hypothesis test, with $ \mathcal{H_0} : \text{The two features are independant} $ and $ \mathcal{H_1} : \text{The two features are not independant} $ \
Let $I = \{ Y, N \}$ be the set of all possible values taken by the first feature, and $J = \{ ISARC, SIGN \}$ be the set of all possible values taken by the second feature \
Let $N_{i,j}$ be the number of observed annotations for which the first feature is equal to $ i $ and the second feature is equal to $ j $ \
Let $N_{i,.}$ be the number of observed annotations for which the first feature is equal to $ i $ \
Let $N_{.,j}$ be the number of observed annotations for which the second feature is equal to $ j $ \
If the two features are independant (= under $\mathcal{H}_0$), then :
$$ \mathbb{P}( x_k = i , y_k = j) =\mathbb{P}( x_k = i)\times \mathbb{P}(y_k = j)  $$ 
Which should be observed in our sample with :
$$ N_{i,j} \approx \frac{N_{i,.} \times N_{.,j}}{n}$$
We use the following test statistic :
$$ T = \sum_{i \in I, j \in J} \frac{ \left( N_{i,j} - \frac{N_{i,.} \times N_{.,j}}{n} \right)^2 }{ \frac{N_{i,.} \times N_{.,j}}{n} } $$ 
(It actually looks like a distance between all the $ N_{i,j} $ and the $ \frac{N_{i,.} \times N_{.,j}}{n}$, weighted using $ \frac{N_{i,.} \times N_{.,j}}{n} $) \
Under $ \mathcal{H}_0 $, we have $T \sim \chi^2( (|I| - 1)(|J| - 1) )$ and therefore, we reject $\mathcal{H}_0$ with a Type I error of $\alpha$ if :
$$ T > \chi_{(|I| - 1)(|J| - 1)}(1 - \alpha) \qquad (\leftarrow \text{\tiny quantile of the chi2 distribution})$$ 
and the p-value is 
$$ \hat{\alpha} = 1 - F_{  \chi^2( (|I| - 1)(|J| - 1) ) }( T ) \qquad (\leftarrow \text{\tiny cdf of the chi2 distribution}) $$

For the example, let's imagine we have the following result :
$$ 
\begin{array}{|c|cc|}
\hline
 & CORRECT & INCORRECT \\
 \hline
 SIGN & 5000 & 2500 \\
 ISARC & 3000 & 1580 \\
\hline
\end{array}
$$
(this table is called the contingency table of the observations)

In [6]:
obs = np.array( [[5000, 2500],[3000, 1580]] ) 
res = stats.chi2_contingency(obs)
print(f"The test statistic : {res.statistic}")
print(f"The p-value : {res.pvalue}")
print(f"That means we need to consider a risk of at least {round(res.pvalue * 100, 2)}% if we want to reject the null hypothesis")
print(f"Usually, a p-value greater than 5% (or even 1% sometimes) is considered as too much risk")

AttributeError: 'tuple' object has no attribute 'statistic'