In [1]:
ROOT_DIR=!git rev-parse --show-toplevel
%cd {ROOT_DIR[0]}

/home/witiko/documents/Práce/2017/09/segmentation-experiments/SemEvalTask3/segmentation-experiments


# Stating the hypotheses

In [21]:
hypotheses = []

class Hypothesis(object):
    def __init__(self, pvalue, desc):
        self.pvalue = pvalue
        self.desc = desc
    def __repr__(self):
        return "%s (p-value: %f)" % (self.desc, self.pvalue)

## Comment relevance probability distributions
Probability density functions $P(X_i), i=1,2,\ldots,10$, where $X_i\sim B(\theta_i)$ is a random variable that determines whether a comment at position $i$ in a thread is relevant, follow different distributions.

In [5]:
from filenames import SUBTASK_A_TRAIN_DATASET_FNAMES
from preprocessing import retrieve_comment_relevancies

import numpy as np

!LC_ALL=C make -C datasets &>/dev/null
trials = [[], [], [], [], [], [], [], [], [], []]
for relevancies in retrieve_comment_relevancies(SUBTASK_A_TRAIN_DATASET_FNAMES):
    for i, relevance in enumerate(relevancies):
        trials[i].append(relevance)
x = []
for i, _ in enumerate(trials):
    x.append((sum(trials[i]), len(trials[i])))

The relative frequencies $\hat P(X_i=1)$ sampled from the annotated SemEval-Task 3 subtask A data:

In [6]:
for i, (successes, trials) in enumerate(x):
    print("^P(X_%d=1) = %f\t(%d trials)" % (i+1, successes / trials, trials))

^P(X_1=1) = 0.629876	(2410 trials)
^P(X_2=1) = 0.519087	(2410 trials)
^P(X_3=1) = 0.533195	(2410 trials)
^P(X_4=1) = 0.481328	(2410 trials)
^P(X_5=1) = 0.474274	(2410 trials)
^P(X_6=1) = 0.451452	(2410 trials)
^P(X_7=1) = 0.438589	(2410 trials)
^P(X_8=1) = 0.425311	(2410 trials)
^P(X_9=1) = 0.418257	(2410 trials)
^P(X_10=1) = 0.419502	(2410 trials)


Assuming $Y_i = \sum_{k=1}^{2410} X_i, Y_i\sim Bi(2410, \theta_i)$, we will use Fisher's exact test ([1](https://en.wikipedia.org/wiki/Fisher%27s_exact_test), [2](http://udel.edu/~mcdonald/statfishers.html), [3](http://www.itl.nist.gov/div898/handbook/prc/section3/prc33.htm)) to compute the one-tailed $p$-values of $H_0: \theta_i=\theta_j$ for all $i<j$.

In [23]:
from scipy.stats import fisher_exact
for i, (successes_i, trials_i) in enumerate(x):
    for j, (successes_j, trials_j) in enumerate(x):
        if i >= j:
            continue
        a = successes_i
        b = successes_j
        c = trials_i - successes_i
        d = trials_j - successes_j
        _, pvalue = fisher_exact([[a, b], [c, d]], alternative="greater")
        hypotheses.append(Hypothesis(pvalue, "Comment relevance: θ%d = θ%d" % (i+1, j+1)))

# Testing the hypotheses
We will they to disprove all the hypotheses at the significance level of 5 % using the [Benjamini–Hochberg procedure](https://en.wikipedia.org/wiki/False_discovery_rate#Benjamini.E2.80.93Hochberg_procedure).

In [51]:
alpha = 0.05
m = len(Pi)
Pi = sorted(hypotheses, key=lambda h0: h0.pvalue)
for k in range(m, 0, -1):
    if Pi[k-1].pvalue < (k/m) * alpha:
        break
print("------------ Rejected hypotheses -------------")
for j, h0 in ((i+1, h0) for (i, h0) in enumerate(Pi)):
    if j == k+1:
        print("\n------- Hypotheses we could not reject -------")
    print(h0)

------------ Rejected hypotheses -------------
Comment relevance: θ1 = θ9 (p-value: 0.000000)
Comment relevance: θ1 = θ10 (p-value: 0.000000)
Comment relevance: θ1 = θ8 (p-value: 0.000000)
Comment relevance: θ1 = θ7 (p-value: 0.000000)
Comment relevance: θ1 = θ6 (p-value: 0.000000)
Comment relevance: θ1 = θ5 (p-value: 0.000000)
Comment relevance: θ1 = θ4 (p-value: 0.000000)
Comment relevance: θ3 = θ9 (p-value: 0.000000)
Comment relevance: θ3 = θ10 (p-value: 0.000000)
Comment relevance: θ1 = θ2 (p-value: 0.000000)
Comment relevance: θ3 = θ8 (p-value: 0.000000)
Comment relevance: θ2 = θ9 (p-value: 0.000000)
Comment relevance: θ2 = θ10 (p-value: 0.000000)
Comment relevance: θ1 = θ3 (p-value: 0.000000)
Comment relevance: θ3 = θ7 (p-value: 0.000000)
Comment relevance: θ2 = θ8 (p-value: 0.000000)
Comment relevance: θ3 = θ6 (p-value: 0.000000)
Comment relevance: θ2 = θ7 (p-value: 0.000000)
Comment relevance: θ2 = θ6 (p-value: 0.000002)
Comment relevance: θ4 = θ9 (p-value: 0.000006)
Comment re