# Generalized binomial coefficients

Here we give a simple model of subsets of $K$ from a base set $\mathcal{Y} = \{1, \ldots, N\}$.  We assume that each element $i \in \mathcal{Y}$ has a weight $w_i$.

The probability of a set $Y \subseteq \mathcal{Y}$ is

$$
p(Y) \propto \begin{cases}
\prod_{i \in Y} w_i & \textbf{if } |Y|=K \\
0 & \textbf{otherwise}
\end{cases}
$$

We can compute the normalization constant of this distribution using a generalized version of the dynamic programming algorithm for computing [binomial coefficients](https://en.wikipedia.org/wiki/Binomial_coefficient).

In [7]:
import numpy as np
from hypergraphs.semirings import LazySort, flatten

def subsets(w, K, zero, one):
    "Subsets of size K"
    N = len(w)
    E = np.full((K+1,N+1), zero)
    E[0,:] = one                     # initialization
    for k in range(1, K+1):
        for n in range(N):
            E[k,n+1] = E[k,n] + E[k-1,n] * w[n]
    return E[K,N]

Below, we check that we match the binomial coefficients when all the weights are 1.

In [11]:
from scipy.special import binom
N,K = 15,4
want = binom(N, K)
got = subsets(np.ones(N), K, 0.0, 1.0)
assert np.allclose(want, got)

Below, we use the `LazySort` semiring to see all of the structures that we summed over in order of the highest scoring subsets.

In [8]:
N = 6
K = 3 
ws = np.random.uniform(0, 1, size=N)
z = subsets([LazySort(w, i) for i, w in enumerate(ws)], K, LazySort.zero(), LazySort.one())
Z = subsets(ws, K, 0.0, 1.0)  # normalization constant computed in the sum-product semiring
for x in z:
    print(f'{x.score/Z:.3f} {set(flatten(x.data))}')  # print subsets sorted by decreasing probability

0.194 {0, 3, 4}
0.113 {3, 4, 5}
0.102 {0, 3, 5}
0.097 {0, 4, 5}
0.070 {1, 3, 4}
0.063 {0, 1, 3}
0.060 {0, 1, 4}
0.038 {2, 3, 4}
0.037 {1, 3, 5}
0.035 {1, 4, 5}
0.034 {0, 2, 3}
0.032 {0, 2, 4}
0.031 {0, 1, 5}
0.020 {2, 3, 5}
0.019 {2, 4, 5}
0.017 {0, 2, 5}
0.012 {1, 2, 3}
0.012 {1, 2, 4}
0.010 {0, 1, 2}
0.006 {1, 2, 5}


As usual, we can use the outside algorithm to compute marginal sums.

**TODO**: compute the marginals via the outside algorithm.