### Example Setup: Fully Confounded IV

**DAG**
- Z → X → Y
- U → {Z, X, Y} (U unobserved; allows arbitrary confounding)

**Target**
$$
P(Y=1 \mid do(X=1))
$$

**Experiments (cost 1 each), Budget B = 1**
- Z1: observe $P(X=1 \mid do(Z=1))$
- Z0: observe $P(X=1 \mid do(Z=0))$

**LP decision variables (response-type with instrument):**
$$
q_{z,x_0,x_1,y_0,y_1}
= \Pr\!\big(Z=z,\; X(0)=x_0,\; X(1)=x_1,\; Y(0)=y_0,\; Y(1)=y_1\big),
\quad z,x_0,x_1,y_0,y_1\in\{0,1\}.
$$
- Total variables: $2^5 = 32$.
- Normalization & nonnegativity: $\sum q_{z,x_0,x_1,y_0,y_1}=1,\; q\ge 0$.

**Target as a linear form in $q$:**
$$
P(Y=1 \mid do(X=1))
= \sum_{z,x_0,x_1,y_0,y_1:\; y_1=1} q_{z,x_0,x_1,y_0,y_1}.
$$

**Observational consistency (if used):**
$$
P(Z=z, X=x, Y=y)
= \sum_{x_0,x_1,y_0,y_1:\; x = x_z,\; y = y_x} q_{z,x_0,x_1,y_0,y_1}.
$$

**Interventional constraints for experiments:**
Let $\tilde q_{x_0,x_1,y_0,y_1} := \sum_{z'} q_{z',x_0,x_1,y_0,y_1}$. Then
$$
P(X=1 \mid do(Z=z))
= \sum_{x_0,x_1,y_0,y_1:\; x_z=1} \tilde q_{x_0,x_1,y_0,y_1}.
$$

In [3]:
# Generate Synthetic data (sample joint directly)
import itertools
import numpy as np
import pandas as pd
import cvxpy as cp
import itertools
from scipy.special import expit

def create_consistent_observational_data(seed=42):
    """
    Create observational data that's guaranteed to be consistent
    by construction from a simple response-type distribution
    """
    print("\n" + "="*60)
    print("CREATING CONSISTENT OBSERVATIONAL DATA BY CONSTRUCTION")
    print("="*60)

    np.random.seed(seed)  # Set seed for reproducibility

    # Define a simple response-type distribution manually
    variables = list(itertools.product([0,1], [0,1], [0,1], [0,1], [0,1]))
    n_vars = len(variables)
    
    # Create a simple uniform-ish distribution over response types
    q_true = np.ones(n_vars) / n_vars  # Uniform distribution
    
    # Alternatively, create a more interesting distribution
    q_true = np.random.dirichlet(np.ones(n_vars))  # Random but valid distribution
    
    print(f"True response-type distribution (first 5 components):")
    for i in range(5):
        z, x0, x1, y0, y1 = variables[i]
        print(f"  q[{i}] = {q_true[i]:.4f}: (z={z}, x0={x0}, x1={x1}, y0={y0}, y1={y1})")
    
    # Now compute the implied observational distribution
    obs_probs = {}
    for z_obs in [0, 1]:
        for x_obs in [0, 1]:
            for y_obs in [0, 1]:
                prob = 0.0
                
                for i, (z, x0, x1, y0, y1) in enumerate(variables):
                    # Check if this response type contributes
                    if z == z_obs:
                        x_potential = x1 if z_obs == 1 else x0
                        if x_potential == x_obs:
                            y_potential = y1 if x_obs == 1 else y0
                            if y_potential == y_obs:
                                prob += q_true[i]
                
                obs_probs[(z_obs, x_obs, y_obs)] = prob
    
    print(f"\nImplied observational distribution:")
    total_prob = 0
    for key, prob in obs_probs.items():
        print(f"  P{key} = {prob:.4f}")
        total_prob += prob
    print(f"  Total probability: {total_prob:.4f}")
    
    # Compute experimental result P(X=1|do(Z=1))
    exp_prob = 0.0
    for i, (z, x0, x1, y0, y1) in enumerate(variables):
        if z == 1 and x1 == 1:  # Z=1 and X(1)=1
            exp_prob += q_true[i]
    
    print(f"\nImplied experimental result:")
    print(f"  P(X=1|do(Z=1)) = {exp_prob:.4f}")
    
    return obs_probs, exp_prob, q_true

# Example: LB under Z1

In [4]:
import numpy as np
import pandas as pd
import cvxpy as cp
import itertools
from scipy.special import expit

def fix_observational_constraint_mapping():
    """
    The issue is in how we map P(Z,X,Y) to response-type variables q[z,x0,x1,y0,y1]
    
    Key insight: P(Z=z, X=x, Y=y) should equal the sum of all q[z',x0,x1,y0,y1] where:
    - The observed Z=z (this selects z'=z)  
    - The observed X=x matches the response X(z) = x_z
    - The observed Y=y matches the response Y(x) = y_x
    """
    print("="*60)
    print("FIXING OBSERVATIONAL CONSTRAINT MAPPING")  
    print("="*60)
    
    # Let's trace through the logic step by step
    variables = list(itertools.product([0,1], [0,1], [0,1], [0,1], [0,1]))
    print(f"Total variables: {len(variables)}")
    print("Variable format: (z, x0, x1, y0, y1)")
    print("  z: observed instrument")
    print("  x0, x1: potential responses of X under Z=0, Z=1") 
    print("  y0, y1: potential responses of Y under X=0, X=1")
    
    # Example: P(Z=1, X=0, Y=1) 
    # This means we observed Z=1, X=0, Y=1
    z_obs, x_obs, y_obs = 1, 0, 1
    print(f"\nAnalyzing: P(Z={z_obs}, X={x_obs}, Y={y_obs})")
    
    contributing_vars = []
    for i, (z, x0, x1, y0, y1) in enumerate(variables):
        # For this observation to be possible from this response type:
        
        # 1. The instrument must match: z == z_obs
        if z != z_obs:
            continue
            
        # 2. The observed X must match the potential response under the observed Z
        # If Z=1 was observed, then X must equal x1 (response under Z=1)
        # If Z=0 was observed, then X must equal x0 (response under Z=0)
        x_potential = x1 if z_obs == 1 else x0
        if x_potential != x_obs:
            continue
            
        # 3. The observed Y must match the potential response under the observed X
        # If X=1 was observed, then Y must equal y1 (response under X=1)
        # If X=0 was observed, then Y must equal y0 (response under X=0)  
        y_potential = y1 if x_obs == 1 else y0
        if y_potential != y_obs:
            continue
            
        # If we get here, this response type contributes to the observed probability
        contributing_vars.append((i, z, x0, x1, y0, y1))
    
    print(f"Contributing response types:")
    for i, z, x0, x1, y0, y1 in contributing_vars:
        print(f"  q[{i}]: (z={z}, x0={x0}, x1={x1}, y0={y0}, y1={y1})")
        
        # Verify the logic
        x_under_z = x1 if z == 1 else x0
        y_under_x = y1 if x_obs == 1 else y0  
        print(f"    -> Z={z}, X(Z={z})={x_under_z}, Y(X={x_obs})={y_under_x}")
    
    return contributing_vars

def test_fixed_lp(obs_probs, p_val, q_true):
    """
    Test the LP with consistent data
    
    Parameters:
    -----------
    obs_probs : dict
        Observational probabilities P(Z,X,Y)
    p_val : float or None
        Value for experimental constraint P(X=1|do(Z=1))
        If None, experimental constraint is NOT included
    q_true : array
        True response-type distribution (for validation)
    """
    print("\n" + "="*60)
    if p_val is None:
        print("TESTING LP WITHOUT EXPERIMENTAL CONSTRAINT")
    else:
        print(f"TESTING LP WITH p = {p_val}")
    print("="*60)
    
    # Set up variables
    variables = list(itertools.product([0,1], [0,1], [0,1], [0,1], [0,1]))
    n_vars = len(variables)
    
    q = cp.Variable(n_vars, nonneg=True)
    
    # Target: P(Y=1|do(X=1))
    theta = np.zeros(n_vars)
    for i, (z, x0, x1, y0, y1) in enumerate(variables):
        if y1 == 1:
            theta[i] = 1.0
    
    # Observational constraints (using corrected mapping)
    obs_constraints = []
    for (z_obs, x_obs, y_obs), observed_prob in obs_probs.items():
        obs_coeff = np.zeros(n_vars)
        
        for i, (z, x0, x1, y0, y1) in enumerate(variables):
            if z == z_obs:  # Instrument matches
                x_potential = x1 if z_obs == 1 else x0  # X response under observed Z
                if x_potential == x_obs:
                    y_potential = y1 if x_obs == 1 else y0  # Y response under observed X
                    if y_potential == y_obs:
                        obs_coeff[i] = 1.0
        
        obs_constraints.append(obs_coeff @ q == observed_prob)
    
    # Start with basic constraints
    constraints = [
        cp.sum(q) == 1,  # normalization
        *obs_constraints  # observations
    ]
    
    # Conditionally add experimental constraint
    if p_val is not None:
        # Create parameter and experimental constraint
        p = cp.Parameter()
        p.value = p_val
        
        # Experimental constraint coefficients
        experiment_coeff = np.zeros(n_vars)
        for i, (z, x0, x1, y0, y1) in enumerate(variables):
            if z == 1 and x1 == 1:
                experiment_coeff[i] = 1.0
        
        # Add experimental constraint
        constraints.append(experiment_coeff @ q == p)
        
        print(f"LP setup:")
        print(f"  Variables: {n_vars}")
        print(f"  Constraints: {len(constraints)} total")
        print(f"    - 1 normalization")
        print(f"    - 1 experimental (P(X=1|do(Z=1)) = {p_val})") 
        print(f"    - {len(obs_constraints)} observational")
        
    else:
        print(f"LP setup:")
        print(f"  Variables: {n_vars}")
        print(f"  Constraints: {len(constraints)} total")
        print(f"    - 1 normalization")
        print(f"    - 0 experimental (NO experimental constraint)")
        print(f"    - {len(obs_constraints)} observational")
    
    # Test feasibility
    test_problem = cp.Problem(cp.Minimize(0), constraints)
    test_problem.solve(verbose=False)
    
    print(f"\nFeasibility test: {test_problem.status}")
    
    if test_problem.status == cp.OPTIMAL:
        print("✅ LP is feasible!")
        
        # Test bounds on the target
        lb_problem = cp.Problem(cp.Minimize(theta @ q), constraints)
        ub_problem = cp.Problem(cp.Maximize(theta @ q), constraints)
        
        lb_problem.solve(verbose=False)
        ub_problem.solve(verbose=False)
        
        if lb_problem.status == cp.OPTIMAL and ub_problem.status == cp.OPTIMAL:
            lb = lb_problem.value
            ub = ub_problem.value
            
            # Compute true target value for comparison
            true_target = theta @ q_true
            
            print(f"\nBounds on P(Y=1|do(X=1)):")
            print(f"  Lower bound: {lb:.4f}")
            print(f"  Upper bound: {ub:.4f}")
            print(f"  Width: {ub - lb:.4f}")
            print(f"  True value: {true_target:.4f}")
            print(f"  True in bounds? {lb <= true_target <= ub}")
            
            if p_val is None:
                print(f"\n💡 Without experimental constraint:")
                print(f"   The bounds are [{lb:.4f}, {ub:.4f}] - this shows what")
                print(f"   P(Y=1|do(X=1)) could be given only observational data!")
            
            return True, lb, ub
        else:
            print("❌ Bounds computation failed")
            return False, None, None
    else:
        print("❌ LP is infeasible")
        return False, None, None

# Example usage functions
def create_example_data():
    """
    Create example data to demonstrate the modified function
    """
    # Create simple consistent data
    variables = list(itertools.product([0,1], [0,1], [0,1], [0,1], [0,1]))
    n_vars = len(variables)
    
    # Random but valid response-type distribution
    np.random.seed(42)
    q_true = np.random.dirichlet(np.ones(n_vars))
    
    # Compute observational probabilities from q_true
    obs_probs = {}
    for z_obs in [0, 1]:
        for x_obs in [0, 1]:
            for y_obs in [0, 1]:
                prob = 0.0
                for i, (z, x0, x1, y0, y1) in enumerate(variables):
                    if z == z_obs:
                        x_potential = x1 if z_obs == 1 else x0
                        if x_potential == x_obs:
                            y_potential = y1 if x_obs == 1 else y0
                            if y_potential == y_obs:
                                prob += q_true[i]
                obs_probs[(z_obs, x_obs, y_obs)] = prob
    
    # Compute true experimental result
    true_exp_result = sum(q_true[i] for i, (z, x0, x1, y0, y1) in enumerate(variables) 
                         if z == 1 and x1 == 1)
    
    return obs_probs, q_true, true_exp_result

def demonstrate_usage():
    """
    Demonstrate the modified function with and without experimental constraints
    """
    print("="*70)
    print("DEMONSTRATING MODIFIED test_fixed_lp FUNCTION")
    print("="*70)
    
    # Create example data
    obs_probs, q_true, true_exp_result = create_example_data()
    
    print("Example observational data:")
    for key, prob in obs_probs.items():
        print(f"  P{key} = {prob:.4f}")
    print(f"True experimental result: P(X=1|do(Z=1)) = {true_exp_result:.4f}")
    
    # Test 1: Without experimental constraint (p_val = None)
    print(f"\n" + "="*50)
    print("TEST 1: NO EXPERIMENTAL CONSTRAINT")
    print("="*50)
    
    success1, lb1, ub1 = test_fixed_lp(obs_probs, p_val=None, q_true=q_true)
    
    # Test 2: With experimental constraint at true value
    print(f"\n" + "="*50)
    print("TEST 2: WITH EXPERIMENTAL CONSTRAINT (TRUE VALUE)")
    print("="*50)
    
    success2, lb2, ub2 = test_fixed_lp(obs_probs, p_val=true_exp_result, q_true=q_true)
    
    # Test 3: With experimental constraint at different value
    print(f"\n" + "="*50)
    print("TEST 3: WITH EXPERIMENTAL CONSTRAINT (DIFFERENT VALUE)")
    print("="*50)
    
    different_p = 0.5 if abs(true_exp_result - 0.5) > 0.1 else 0.7
    success3, lb3, ub3 = test_fixed_lp(obs_probs, p_val=different_p, q_true=q_true)
    
    # Summary
    print(f"\n" + "="*70)
    print("SUMMARY")
    print("="*70)
    
    if success1:
        print(f"✅ Without experimental constraint: bounds = [{lb1:.4f}, {ub1:.4f}], width = {ub1-lb1:.4f}")
    else:
        print(f"❌ Without experimental constraint: infeasible")
    
    if success2:
        print(f"✅ With true experimental value: bounds = [{lb2:.4f}, {ub2:.4f}], width = {ub2-lb2:.4f}")
    else:
        print(f"❌ With true experimental value: infeasible")
    
    if success3:
        print(f"✅ With different experimental value: bounds = [{lb3:.4f}, {ub3:.4f}], width = {ub3-lb3:.4f}")
    else:
        print(f"❌ With different experimental value: infeasible")
    
    if success1 and success2:
        print(f"\n💡 Key insight:")
        print(f"   Adding experimental constraint tightened bounds from {ub1-lb1:.4f} to {ub2-lb2:.4f}")
        print(f"   This shows the value of experimental data!")


In [7]:
obs_probs, true_exp_result, q_true = create_consistent_observational_data(seed=1233)

p_vals = [true_exp_result, None]
for p_val in p_vals:
    test_fixed_lp(obs_probs, p_val, q_true)


CREATING CONSISTENT OBSERVATIONAL DATA BY CONSTRUCTION
True response-type distribution (first 5 components):
  q[0] = 0.0381: (z=0, x0=0, x1=0, y0=0, y1=0)
  q[1] = 0.0263: (z=0, x0=0, x1=0, y0=0, y1=1)
  q[2] = 0.0651: (z=0, x0=0, x1=0, y0=1, y1=0)
  q[3] = 0.0489: (z=0, x0=0, x1=0, y0=1, y1=1)
  q[4] = 0.0295: (z=0, x0=0, x1=1, y0=0, y1=0)

Implied observational distribution:
  P(0, 0, 0) = 0.2673
  P(0, 0, 1) = 0.1840
  P(0, 1, 0) = 0.1289
  P(0, 1, 1) = 0.1261
  P(1, 0, 0) = 0.0673
  P(1, 0, 1) = 0.1440
  P(1, 1, 0) = 0.0272
  P(1, 1, 1) = 0.0552
  Total probability: 1.0000

Implied experimental result:
  P(X=1|do(Z=1)) = 0.0824

TESTING LP WITH p = 0.0824214759111186
LP setup:
  Variables: 32
  Constraints: 10 total
    - 1 normalization
    - 1 experimental (P(X=1|do(Z=1)) = 0.0824214759111186)
    - 8 observational

Feasibility test: optimal
✅ LP is feasible!

Bounds on P(Y=1|do(X=1)):
  Lower bound: 0.1813
  Upper bound: 0.8439
  Width: 0.6627
  True value: 0.5301
  True in bo