# Generalized binomial coefficients

Here we give a simple model of subsets of $K$ from a base set $\mathcal{Y} = \{1, \ldots, N\}$.  We assume that each element $i \in \mathcal{Y}$ has a weight $w_i$.

The probability of a set $Y \subseteq \mathcal{Y}$ is

$$
p(Y) \propto \begin{cases}
\prod_{i \in Y} w_i & \textbf{if } |Y|=K \\
0 & \textbf{otherwise}
\end{cases}
$$

We can compute the normalization constant of this distribution using a generalized version of the dynamic programming algorithm for computing [binomial coefficients](https://en.wikipedia.org/wiki/Binomial_coefficient).

In [6]:
import numpy as np
from hypergraphs.apps.subsets import subsets
from hypergraphs.semirings import flatten
from arsenal.nb import psource

In [7]:
psource(subsets)

Below, we check that we match the binomial coefficients when all the weights are 1.

In [8]:
from hypergraphs.semirings.sampling.lazy2 import Sample as Sampling
from hypergraphs.semirings import LazySort as Sort
from hypergraphs.semirings.float import Float
from collections import defaultdict

In [9]:
from scipy.special import binom
N,K = 15,4
want = binom(N, K)
got = subsets(np.ones(N), K, Float)
assert np.allclose(want, got)

Below, we use the `LazySort` semiring to see all of the structures that we summed over in order of the highest scoring subsets.

In [10]:
N = 6
K = 3 
ws = np.random.uniform(0, 1, size=N)
z = subsets(ws, K, Sort)
Z = subsets(ws, K, Float)  # normalization constant computed in the sum-product semiring
for x in z:
    print(f'{x.score/Z:.3f} {set(flatten(x.data))}')  # print subsets sorted by decreasing probability

0.248 {1, 3, 5}
0.178 {0, 1, 3}
0.141 {1, 3, 4}
0.089 {1, 2, 3}
0.049 {0, 3, 5}
0.045 {0, 1, 5}
0.038 {3, 4, 5}
0.035 {1, 4, 5}
0.028 {0, 3, 4}
0.025 {0, 1, 4}
0.024 {2, 3, 5}
0.022 {1, 2, 5}
0.017 {0, 2, 3}
0.016 {0, 1, 2}
0.014 {2, 3, 4}
0.013 {1, 2, 4}
0.007 {0, 4, 5}
0.004 {0, 2, 5}
0.003 {2, 4, 5}
0.002 {0, 2, 4}


As usual, we can use the outside algorithm to compute marginal sums.

**TODO**: compute the marginals via the outside algorithm.

In [11]:
from arsenal.iterextras import take
for _, d in take(10, subsets(ws, K,  Sampling)):  # normalization constant computed in the sum-product semiring
    print(f'{set(flatten(d))}')

{3, 4, 5}
{0, 1, 3}
{0, 1, 3}
{0, 1, 5}
{1, 2, 5}
{1, 2, 3}
{1, 3, 5}
{0, 3, 5}
{1, 3, 5}
{0, 1, 3}
