### Ex1. C = A x B with A and B are uniform. What’s the distribution of C?

Define two sets A and B. Define C = f(A, B). C is uniform if
- There's no two pairs A and B that gives the same outcome C. 
- (or in other words) Each pair A and B gives a unique outcome C.

Generalisation. **C is uniform if the number of pairs A and B that give the same outcome C is the same for all possible outcomes C.**

___
Example.

A is either 0 or 1\
B is either 2 or 3

C is either 0, 2, or 3.
- 0 is given twice, by pairs {0;2}, {0;3}
- 2 and 3 are given once, respectively by pairs {1;2}, {1;3}

P(C=0|A,B) = 1/2\
P(C=2|A,B) = 1/4\
P(C=3|A,B) = 1/4

The distribution of C is not uniform.

### Ex2. Histogram up and look for a closed form solution
Let 
* A be a continuous variables between 0.5 and 1.5 
* B between 3.5 and 4.5

Histogram up to 1000/10k samples of C = A x B.

Discretise C by bins of width 1 starting at .5

Closed form solution. If we had infinite samples, what proportion would go in each bin of C

Do this for any function f(A,B)

In [19]:
import random
import plotly.express as px
import plotly.graph_objects as go

In [3]:
# Let 
# * A be a continuous variables between 0.5 and 1.5 
# * B between 3.5 and 4.5

def get_a():
    # Create a variable that takes a random value between .5 and 1.5
    # Note: you could also use random.random() to get a value between 0 and 1 and then add 0.5 to it
    return random.uniform(.5, 1.5)

def get_b():
    return random.uniform(3.5, 4.5)

def get_c(a, b):
    return a*b
  
def generate_c(n):
    # Create a list of n values of c
    c_list = []
    for i in range(n):
        a = get_a()
        b = get_b()
        c = get_c(a, b)
        c_list.append(c)
    return c_list

generate_c(5)


[2.488713693778195,
 5.346587642068136,
 5.807225639933586,
 5.410712617449955,
 4.299894150937093]

In [40]:
# Histogram up to 1000/10k samples of C = A x B.
# Discretise C by bins of width 1 starting at .5, i.e. 0.5-1.5, 1.5-2.5, 2.5-3.5, etc.

# -> The shape of the histogram depends on the amount of outcome values of C that fall in the each bin.
# The bins below 1.5 and above 7.5 will be empty.
# The middle bins will have the same number of values
# C is in the range 0.5*3.5=1.75 to 1.5*4.5=6.75. Therefore, the bins 1.5-2.5 and 6.5-7.5 have a spread that is not covered by 
## the spread of C. Those bins will have fewer values than the other bins.
# At the upper and lower boundaries, the probability decreases linearly with the distance from the boundary. This is because
## the set of inputs that give the same outcome gets smaller on the boundaries.

def plot_c(n, bin_width):
    sample = generate_c(n)
    print("Min and max of sample:", min(sample), max(sample))
    # Use plotly express to generate a histogram of sample values of C with bin width of 1 and starting at 0.5
    fig = px.histogram(x=sample, range_x=[0.5, 10.5])
    fig.update_traces(xbins=dict( # bins used for histogram
            start=0.5,
            end=10,
            size=bin_width
        ))
    # Update figure size
    fig.update_layout(
        width=800,
        height=400,
        title=f"Sample of {n} values of C = A x B (uniform x uniform)",
    )
    fig.show()

plot_c(1000, 1)
plot_c(100000, 1)
plot_c(100000, 0.1)

Min and max of sample: 1.847821989407504 6.673050552635568


Min and max of sample: 1.751390017733969 6.7378513783233265


Min and max of sample: 1.754926380737839 6.7370439787715775


#### Closed form solution. If we had infinite samples, what proportion would go in each bin of C?

You should not think about the probability of A or B, but of the probability of AxB which is drastically different.

$P(x \in bin) = \frac{\#bin\ outcomes}{\#outcomes}$

with 
* bin outcomes = number of input that give an outcome falling in the bin
* outcomes = total number of inputs

The multiplication of A and B gives a bivariate random variable (A,B) = C which has a joint probability density function

In [None]:
# pseudo code to get the closed form solution for the probability density function of C = A x B
a_b_samples = given_samples()
c_samples = get_c(a_b_samples)

bins = get_bins()
binned_samples = c_samples.groupby(bins, aggregate="count")

p_bin = binned_samples.count/len(c_samples)