# Conditionnal Bernoulli

## question 1

`définition des objet`  
Let $N$ be a natural number, and consider the probability vector $p = (p_1, p_2, \dots, p_N)$ where each $p_i$ belongs to the interval $(0,1)$ for  $i = 1, \dots, N$.  
The sample space is given by $\Omega = \{0,1\}^N.$  
We assume that $I$ is an integer satisfying $I \leq \frac{N}{2}$.  
Let $g$ be the probability density function of the random vector  
$$
(X_1, \dots, X_N)
$$
where the $X_i$ are independent Bernoulli random variables $X_i \sim \mathsf{B}(p_i).$  
Finally, let $f$ be the probability density function of  
$$
(X_1, \dots, X_N) \sim \mathsf{CB}(p, I).
$$  

`justification du choix de ce que l'on fait`
We will use the following proposition from the lecture :
> Let $f$, $g$ be probability density functions (PDFs) such that the support of $g$ contains the support of $f$ and
> 
> $$
> f \leq M g \quad \text{with } M \geq 1.
> $$
> 
> ### Accept-Reject Algorithm
> Repeat:  
> 1. Draw $X \sim g$ and $U \sim U[0,1]$.  
> 2. Until $U \leq \frac{f(X)}{M g(X)}$.
> 
> ### Properties:
> - $X \sim f$ 
> - The number of draws until acceptance follows a **Geometric** distribution:  
>   
>   $$
>   \text{Geometric}(1/M).
>   $$  

This propriety can be used because $g$ does not cancel out `j'imagine que c'est comme ça que se dit ne s'annule pas` on $\Omega$.  
We choose the smallest possible $M$ in order to have as little draws as possible 
$$
M = \underset{x \in \Omega}{\sup} \; \frac{f(x)}{g(x)}
$$


`Je ne suis pas sûr de ce que je dis après mais il est trop tard pour réfléchir`

With such a $M$, in the algorithm, $X$ is rejected if and only if $f(X)$ is null. In other words, $X$ is rejected if and only if there is not exactly $I$ 1 in it. As such the algorithm can be simplified as:

> ### Accept-Reject Algorithm
> Repeat:  
> 1. Draw $X \sim g$
> 2. Until $|| X|| _1 = I$.


### algorithm du cours aucun changement

In [1]:
import numpy as np
import itertools

In [6]:
class RejectionSampling:
    def __init__(self, p, I, m=10000):  # Default value for m
        self.p = p
        self.I = I
        self.N = len(p)
        self.g = self.pdf_bernoulli(p)  # Bernoulli probability function
        self.f = self.pdf_cb(p, I)  # Conditional probability function
        self.M = self.compute_M(m)  # Compute M
        self._sample= None

    def compute_M(self, m):
        """Computes M based on given m and probability ratios"""
        res = m
        for seq in self.generate_sequences(self.N, self.I):  # Corrected N and I
            pr = self.f(seq) / self.g(seq)  # Fix missing self.
            if pr <= res:
                res = pr
        return res

    def pdf_cb(self, p, I):
        """Computes a conditional probability function"""
        g = self.pdf_bernoulli(p)
        proba = 0
        for x in self.generate_sequences(len(p), I):
            proba += g(x)

        def f(x):
            return g(x) / proba if np.sum(x) == I else 0  # Use np.sum(x)

        return f

    @staticmethod
    def generate_sequences(N, I):
        """Generates all binary sequences of length N with I ones"""
        positions = itertools.combinations(range(N), I)
        sequences = []
        for pos in positions:
            seq = np.zeros(N, dtype=int)  # Start with all zeros
            seq[list(pos)] = 1  # Set the specified positions to 1
            sequences.append(seq)
        return np.array(sequences)

    @staticmethod
    def pdf_bernoulli(p):
        """Creates a Bernoulli probability function"""
        def g(x):
            return np.prod(np.where(x == 1, p, 1 - p))
        return g
    
    def sample(self,L=1):
        samples = []
        while len(samples) < L:
            X = np.array([np.random.binomial(1, p_i) for p_i in self.p])
            U = np.random.uniform(0,1)
            if U<=(self.f(X)/(self.g(X)*(self.M+0.5))):
                samples.append(X)
        self._sample=np.array(samples)
        return np.array(samples)
    


In [7]:
rejection = RejectionSampling(np.array([.5,.25,.45,.78,.1,.8]),2)

In [None]:
def generate_sequences(N, I):
    # Generate all combinations of I positions out of N
    # This will give the indices of the positions that should be 1
    positions = itertools.combinations(range(N), I)
    
    # Generate the sequences by setting the corresponding positions to 1
    sequences = []
    for pos in positions:
        seq = np.zeros(N, dtype=int)  # Start with a list of all zeros
        seq[list(pos)] = 1  # Set the specified positions to 1
        sequences.append(seq)
    
    return np.array(sequences)


def pdf_bernoulli(p):
    def g(x):
        return np.prod(np.where(x==1,p,1-p))
    return g

def pdf_cb(p,I):
    g=pdf_bernoulli(p)
    proba = 0
    for x in generate_sequences(len(p),I):
        proba=proba+g(x)
    def f(x):
        if sum(x) !=I:
            return 0
        else:
            return g(x)/proba
    return f

In [3]:
def rejection_sampling(p,I,M,L):
    samples =[]
    g = pdf_bernoulli(p)
    f = pdf_cb(p,I)
    
    while len(samples) < L:
        X = np.array([np.random.binomial(1, p_i) for p_i in p])
        U = np.random.uniform(0,1)
        if U<=(f(X)/(g(X)*(M+0.5))):
            samples.append(X)
    return np.array(samples)

In [4]:
N = 25
I=6
L=100
p = np.random.uniform(0,1,N)
#p = np.array([0.5]*N)
g = pdf_bernoulli(p)
f = pdf_cb(p,I)
M= 100000
for seq in generate_sequences(N,I):
    pr=f(seq)/g(seq)
    if pr<=M:
        M=pr

print(p)
print(M)
samples = rejection_sampling(p,I,M,L)
np.mean(samples, axis = 0)


[0.98579461 0.90450562 0.33091814 0.38850484 0.48420812 0.26216353
 0.20804913 0.35523267 0.4442211  0.74350317 0.19028161 0.28732861
 0.63485561 0.22126845 0.27062426 0.72482066 0.84317296 0.37861926
 0.09519849 0.93239711 0.27410838 0.33914277 0.97160981 0.15972198
 0.4834197 ]
381.6988840100144


array([0.93, 0.63, 0.11, 0.07, 0.09, 0.07, 0.02, 0.1 , 0.18, 0.31, 0.05,
       0.05, 0.23, 0.03, 0.07, 0.38, 0.53, 0.08, 0.02, 0.84, 0.05, 0.1 ,
       0.9 , 0.02, 0.14])

ci après la répartition qu'il devrait avoir; ça colle bien :

In [5]:
proba_binom=[0]*N
for seq in generate_sequences(N,I):
    for i in range(N):
        proba_binom[i]=proba_binom[i]+seq[i]*f(seq)

In [6]:
np.array(proba_binom)

array([0.94278735, 0.67384812, 0.07909963, 0.10003919, 0.14296832,
       0.05771242, 0.04311331, 0.08756489, 0.12360054, 0.359029  ,
       0.03868577, 0.06513778, 0.24289939, 0.04651959, 0.06016059,
       0.33515493, 0.52619813, 0.09621885, 0.0175743 , 0.75568855,
       0.06118276, 0.08190533, 0.88889804, 0.03144939, 0.14256381])

### algorithm simplifié

In [7]:
#méthode simple du on prends N binomial indépendante puis on vérifie si leurs somme est égal à I, si oui c'est bon, si non on recommence
def rejection_sampling(p,I,L):
    samples =[]
    while len(samples) < L:
        X = np.array([np.random.binomial(1, p_i) for p_i in p])
        if X.sum() == I:
            samples.append(X)
    return np.array(samples)

In [8]:
# évaluation de si cette méthode donne bien un truc binomial avec les bonne probas
N = 3
I=2
L=1000
p = np.random.uniform(0,1,N)
f = pdf_cb(p,I)
print(p)
samples = rejection_sampling(p,I,L)
np.mean(samples, axis = 0)

[0.75369619 0.74696197 0.13564033]


array([0.949, 0.949, 0.102])

In [9]:
proba_binom=[0]*N
for seq in generate_sequences(N,I):
    for i in range(N):
        proba_binom[i]=proba_binom[i]+seq[i]*f(seq)
np.array(proba_binom)

array([0.95356706, 0.95186747, 0.09456548])

### conclusion

Ces deux algo (dont les résultats devraient, si j'ai pas fait d'erreur dans mon raisonnement, être exactement les même), il va falloir étudier les résultats prochainement.