# PID-style synergy demo: XOR example

a minimal notion of synergy using a tiny XOR example, then connects it conceptually to compositional programs in the DSL.


In [9]:
import itertools
import math
import numpy as np
import pandas as pd


def entropy(probs):
    values = [float(p) for p in probs if p > 0.0]
    if not values:
        return 0.0
    arr = np.array(values, dtype=float)
    return float(-np.sum(arr * np.log2(arr)))


def joint_distribution():
    """Return joint P(X, Y, T) for XOR with X, Y ~ Bernoulli(0.5)."""
    rows = []
    for x, y in itertools.product([0, 1], repeat=2):
        t = x ^ y  # XOR
        p = 0.25
        rows.append({"X": x, "Y": y, "T": t, "p": p})
    return pd.DataFrame(rows)


def marginal(df, vars_):
    grouped = df.groupby(vars_)["p"].sum().reset_index()
    return grouped


def mutual_information(df, vars_a, vars_b):
    # Compute I(A; B) from joint distribution df over all variables
    joint_cols = list(vars_a) + list(vars_b)
    joint = marginal(df, joint_cols)
    a = marginal(df, vars_a)
    b = marginal(df, vars_b)

    # Build lookup tables
    pa = {tuple(row[vars_a]): row["p"] for _, row in a.iterrows()}
    pb = {tuple(row[vars_b]): row["p"] for _, row in b.iterrows()}

    mi = 0.0
    for _, row in joint.iterrows():
        key_a = tuple(row[vars_a])
        key_b = tuple(row[vars_b])
        p_ab = row["p"]
        p_a = pa[key_a]
        p_b = pb[key_b]
        mi += p_ab * math.log2(p_ab / (p_a * p_b))
    return mi


df = joint_distribution()
print("Joint distribution for XOR:")
print(df)


Joint distribution for XOR:
   X  Y  T     p
0  0  0  0  0.25
1  0  1  1  0.25
2  1  0  1  0.25
3  1  1  0  0.25


In [10]:
# Compute mutual informations and a simple synergy measure for XOR

I_X_T = mutual_information(df, ["X"], ["T"])
I_Y_T = mutual_information(df, ["Y"], ["T"])
I_XY_T = mutual_information(df, ["X", "Y"], ["T"])

synergy = I_XY_T - I_X_T - I_Y_T

print("I(T; X) = {:.3f} bits".format(I_X_T))
print("I(T; Y) = {:.3f} bits".format(I_Y_T))
print("I(T; X, Y) = {:.3f} bits".format(I_XY_T))
print("Synergy (toy definition) = {:.3f} bits".format(synergy))


I(T; X) = 0.000 bits
I(T; Y) = 0.000 bits
I(T; X, Y) = 1.000 bits
Synergy (toy definition) = 1.000 bits


## Interpretation and link to compositional programs

- In the XOR example, we expect I(T; X) ≈ 0 and I(T; Y) ≈ 0 because each single input is independent of the target.
- Yet I(T; X, Y) ≈ 1 bit, so our toy synergy measure Synergy = I(T; X, Y) - I(T; X) - I(T; Y) is ≈ 1 bit.
- Intuitively, **only the combination (X, Y) carries information** about T; neither component is informative on its own.
- This is an extreme case of "1 + 1 > 2" and is a clean formalization of a purely synergistic representation.

Analogy to the DSL setting:

- Think of X and Y as two primitive programs, and T as a task or listener classification.
- In some tasks, neither primitive alone is discriminative for T, but **a specific composition of them** is.
- In that case, the composed program can carry extra task-relevant information beyond what is available from the parts in isolation.
- This aligns with the idea that learning reusable higher-level abstractions in the DSL can create new, synergistic concepts that are more than the sum of their primitives.
