# Summing over subsets of size K

Here we give a simple model of subsets of $K$ from a base set $\mathcal{Y} = \{1, \ldots, N\}$.  We assume that each element $i \in \mathcal{Y}$ has a weight $w_i$.

The probability of a set $Y \subseteq \mathcal{Y}$ is

$$
p(Y) \propto \begin{cases}
\prod_{i \in Y} w_i & \textbf{if } |Y|=K \\
0 & \textbf{otherwise}
\end{cases}
$$

We can compute the normalization constant of this distribution using a generalized version of the dynamic programming algorithm for computing [binomial coefficients](https://en.wikipedia.org/wiki/Binomial_coefficient).

In [1]:
import numpy as np
from hypergraphs.semirings import LazySort, flatten

def subsets(w, K, zero, one):
    "Subsets of size K"
    N = len(w)
    E = np.full((K+1,N+1), zero)
    E[0,:] = one                     # initialization
    for k in range(1, K+1):
        for n in range(N):
            E[k,n+1] = E[k,n] + E[k-1,n] * w[n]
    return E[K,N]

Below, we use the `LazySort` semiring to see all of the structures that we summed over in order of the highest scoring subsets.

In [2]:
N = 6
K = 3 
ws = np.random.uniform(0, 1, size=N)
z = subsets([LazySort(w, i) for i, w in enumerate(ws)], K, LazySort.zero(), LazySort.one())
Z = subsets(ws, K, 0.0, 1.0)  # normalization constant computed in the sum-product semiring
for x in z:
    print(f'{x.score/Z:.3f} {set(flatten(x.data))}')  # print subsets sorted by decreasing probability

0.533 {1, 3, 5}
0.130 {2, 3, 5}
0.084 {1, 2, 3}
0.076 {1, 2, 5}
0.034 {0, 3, 5}
0.032 {3, 4, 5}
0.022 {0, 1, 3}
0.021 {1, 3, 4}
0.020 {0, 1, 5}
0.019 {1, 4, 5}
0.005 {0, 2, 3}
0.005 {2, 3, 4}
0.005 {0, 2, 5}
0.005 {2, 4, 5}
0.003 {0, 1, 2}
0.003 {1, 2, 4}
0.001 {0, 3, 4}
0.001 {0, 4, 5}
0.001 {0, 1, 4}
0.000 {0, 2, 4}


As usual, we can use the outside algorithm to compute marginal sums.