# week 9: poisson processes, discrete traits, and CTMCs

## eeb463

## tomo parins-fukuchi

## thomas bayes

- okay we are going to revisit some things

![bg right h:700](images/tbayes.png)

## bayesian inference

- the likelihood is the probability of the data given a model, $P(D|M)$


## bayesian inference

- imagine you are in your apartment and hear a huge crash above you
- what is the probability of hearing this crash if: 
    - $M_{0}$: minions are bowling in the apartment upstairs?
    - $M_{1}$: the neighbor who just moved in dropped a large box of stuff that scattered?
       



## bayesian inference

- $L(M_{0}) \approx 1$ <- you would almost certainly hear noises if minions were bowling upstairs
- $L(M_{1}) \approx 1$ <- this scenario would also almost certainly yield a banging sound

## bayesian inference

- **conclusion:** the likelihood is profoundly useful but carries limitations when used to draw inference
- _we want a way to assess $P(M|D)$_

## bayes' theorem

$$ P(M | D) = \frac{P(M) P(D | M)}{P(D)} $$

- $P(M | D)$ is called the **posterior** probability
- $P(M)$ is the **prior** probability
    - what was the probability of the model before we collected any observations?
- $P(D | M)$ is the likelihood
- $P(D)$ is the probability of the data across the entire universe of possible models


## bayesian inference

- under maximum likelihood, we aimed to find the minimum along a negative log likelihood surface
- we call this the maximum likelihood estimate





In [None]:
import distributions as dist
import numpy as np
import matplotlib.pyplot as plt

norm = dist.normal(1.0,1.0)

surf = [-norm.log_pdf(i) for i in np.linspace(0.,2.,30)]

plt.plot(np.linspace(0.,2.,30),surf)
plt.ylabel("negative log-likleihood")
plt.xlabel("mean estimate")
plt.show()

## bayesian inference

- for bayesian inference, we want the estimate with the highest **posterior probability** (i.e., $P(M|D)$)
- one problem with this is that $P(D)$ is very difficult to deal with


$$ P(M | D) = \frac{P(M) P(D | M)}{P(D)} $$



## bayesian inference

- bayesian inference can target several types of estimates
    - **maximum _a posteriori_ (MAP)**: what parameter value has the highest posterior prob?
        - which values of &mu; and &sigma; maximize $P(M|D)$?
        - this is a point estimate, similar to MLE
    - **posterior distribution**: also common to estimate the entire posterior probability _distribution_
        - $P(M|D)$, $P(M)$, $P(D|M)$ reflect distributions over a range of param. values
        - for anole svl, we want the _distribution_ of $P(M|D)$ over different values of &mu; and &sigma;
        - usually involves calculating or approximating a very complicated integral
        

## bayesian inference

- with maximum likelihood, we searched for the peak on the likelihood surface by plugging and chugging
- here, we will seek a way to approximate the **posterior probability distribution** using random sampling

        

## markov chain monte carlo (MCMC)

- construct a random walk (i.e., a "markov chain") that converges on the posterior distribution
- simulates draws from the posterior distribution to approximate the integral
  - run many iterations, each time simulate a single sample from the posterior
  - we can make these draws given a function that is proportional to the posterior probability

$$ P(M | D) = \frac{P(M) P(D | M)}{P(D)} \propto P(M) P(D | M) $$


## metropolis-hastings algorithm

- discovered in 1953 by a group of former manhattan project scientists
- can draw samples from a target distribution if given a function _proportional to_ the density of the target

$$ P(M | D) = \frac{P(M) P(D | M)}{P(D)} \propto P(M) P(D | M) $$

- since numerator is proportional to the posterior, we can use it for MH simulations

## metropolis-hastings algorithm (the recipe)

1. initialize parameter vector, $x$ (could simply be random) and start the chain (for loop)
2. update parameters to create new vector $x'$
    - these proposals are drawn from proposal distributions of the user's choice
3. compute the ratio $R = \frac{P(x')P(D|x')}{P(x)P(D|x)}$
4. accept $x'$ with probability $R$ 
    - always accept if new params better than old (i.e., $P(x')P(D|x') > P(x)P(D|x)$)
    - we accept _worse_ param values with some probability
5. return to 2 and repeat many times


## metropolis-hastings algorithm

- each repetition of steps 2-4 yields a single draw from the posterior distribution
- let's go back through our example to illustrate


## bayesian inference

- let's bring back our simulated anoles for an example:
    - model SVL measurements for a population of lizards using a normal distribution
    - **what is the posterior distribution of values the mean and sd can take?**

In [None]:
def norm_ll(mean, sd, x):
    dens = math.log((1.0/(sd*math.sqrt(2.*math.pi))))+((-.5 * (((x - mean)/sd)**2.)))
    return dens

def svl_norm_ll(mean,sd, data):
    return sum([norm_ll(mean,sd, i) for i in data])

In [None]:
import random, math

svl = dist.normal(130.0,10.0).sample(50)
mean = 115.
sd = 10.
mean_prop = dist.normal(0., 4.)
ll = svl_norm_ll(mean,sd,svl)


gen = 100000
thin = 100
gens = []
samples = []
mean_samp = []
accept = 0
for i in range(gen):
    mean_star = mean + mean_prop.sample(1)[0]
        
    ll_star = svl_norm_ll(mean_star,sd,svl)
    
    ratio = math.exp(ll_star - ll)
    if ratio >= 1. or random.random() < ratio:
        accept+=1
        ll = ll_star
        mean = mean_star

    if i % thin == 0:

        gens.append(i)
        samples.append(ll)
        mean_samp.append(mean)
        

print("proportion of proposals accepted:",accept/gen)

In [None]:
postburn = gen // 10 // 100  # get rid of all of the "burn-in" samples
postburn_mu = mean_samp[postburn:]
plt.plot(gens[postburn:],postburn_mu)
plt.xlabel("generation")
plt.ylabel("mean estimate")
plt.show()

## bayesian inference

- parameter values with higher posterior probability will be visited more often in the chain

- we can plot a histogram of all our simulated draws
  - this is an approximation of the posterior distribution of $\mu$

- resulting distribution can give us a single "best" estimate of $\mu$ but also our confidence in that estimate

In [None]:
plt.hist(postburn_mu)
plt.ylabel("count")
plt.xlabel("mean estimate")
lower = np.quantile(postburn_mu,0.025)
upper = np.quantile(postburn_mu,0.975)
plt.axvline(dist.mean(postburn_mu),color="black")
plt.axvline(lower,color="grey")
plt.axvline(upper,color="grey")

plt.show()

## bayesian inference

- let's go back over our OU example from last week

In [None]:
## simulate our data

theta = 90.0
sigma = 1.0

dt = 0.1
total_myr = 10.0
n_iter = int(total_myr / dt)


norm = dist.normal(mean=0.0, sd = 1.0)
curtime = 0.0
times = []
y_t = 100.0
pheno = []

for i in range(n_iter-1):
    pheno.append(y_t)
    times.append(curtime)
    delta_y = (  ( theta - y_t )  * dt) + (norm.sample(1)[0] * sigma * math.sqrt(dt))
    y_t += delta_y
    curtime = curtime + 0.1
        
plt.plot(times,pheno)
plt.show()


### our 2-parameter likelihood function

In [None]:
def stasis_ts_ll(theta, sigma, times, data):
    ll = 0.0
    for i in range(1,len(times)):
        cur_val = data[i]
        last_val = data[i-1]
        delta_y = cur_val - last_val
        delta_t = times[i] - times[i-1]
        mean = (  theta - last_val ) * delta_t 
        sd = sigma * math.sqrt(delta_t)
        stepll = math.log((1.0 / (sd*math.sqrt(2.*math.pi)))) + ((-.5 * (((delta_y - mean) / sd) ** 2.)))
        ll += stepll
    return ll


### write a function to generate >0 proposals for $\sigma$

In [None]:
## define function to propose new values for sigma

def generate_sigma_proposal(sigma, prop_dist):
    while True:
        sigma_star = sigma + prop_dist.sample(1)[0]
        if sigma_star > 0.:  # make sure we aren't proposing negative values for sigma
            break
    return sigma_star

### MCMC on a 2-parameter model



In [None]:
def stasis_MCMC(gen,thin,start_p,theta_prior,sigma_prior,theta_prop,sigma_prop):
    theta = start_p[0]
    sigma = start_p[1]
    ll = stasis_ts_ll(theta,sigma,times,pheno) + theta_prior.log_pdf(theta) + sigma_prior.log_pdf(sigma)
    
    theta_samp = [theta]
    sigma_samp = [sigma]
    gens = [0]
    accept = 0

    for i in range(gen):

        if random.random() < .5:                   ## half the time we update theta, the other half sigma
            theta_star = theta + theta_prop.sample(1)[0]
            sigma_star = sigma
        else:
            theta_star = theta
            sigma_star = generate_sigma_proposal(sigma,sigma_prop)

        ll_star = stasis_ts_ll(theta_star,sigma_star,times,pheno) + theta_prior.log_pdf(theta_star) + sigma_prior.log_pdf(sigma_star)
        if ll_star - ll > 0.:
            ratio = 1.1
        else:
            ratio = math.exp(ll_star - ll)
        if random.random() < ratio:
            accept+=1
            ll = ll_star
            theta = theta_star
            sigma = sigma_star

        if i % thin == 0:
            gens.append(i)
            theta_samp.append(theta)
            sigma_samp.append(sigma)
    
    print("MCMC complete. Acceptance ratio:",accept / gen)
    return gens,theta_samp,sigma_samp

### MCMC on a 2-parameter model



In [None]:
import random, math


theta, sigma = 80., 1.3
theta_prior = dist.normal(80., 5.)
sigma_prior = dist.normal(2.3, 0.25)
theta_prop = dist.normal(0.0, 0.8)
sigma_prop = dist.normal(0., 0.23)

gen = 100000
thin = 100

gens, theta_samp, sigma_samp = stasis_MCMC(gen,thin,[theta,sigma],theta_prior,sigma_prior,theta_prop,sigma_prop)

    


## markov chain for &theta; and &sigma;

- let's discard the first 10% of samples as burn-in

In [None]:
burnin = gen // 100 // 10 
theta_pb = theta_samp[burnin:] # remove first 10% of samples
sigma_pb = sigma_samp[burnin:]
gens_pb = gens[burnin:]

## markov chain for &theta; and &sigma;

In [None]:
fig, (ax1, ax2) = plt.subplots(2)
ax1.plot(gens_pb, theta_pb)
ax1.xaxis.set_tick_params(labelbottom=False)
ax1.set(xlabel="", ylabel="theta value")
ax2.plot(gens_pb, sigma_pb)
ax2.set(xlabel="generation", ylabel="sigma value")

plt.show()

## estimated posterior distribution of theta and sigma

In [None]:
fig, (ax1, ax2) = plt.subplots(1,2)
ax1.hist(theta_pb)
ax1.set(xlabel="theta", ylabel="")
ax2.hist(sigma_pb)
ax2.set(xlabel="sigma", ylabel="")

plt.show()

### compare the posterior to the prior for $\sigma$

In [None]:
prior_samp = [sigma_prior.sample(1)[0] for _ in range(len(sigma_samp))]

weights = np.ones_like(prior_samp) / len(prior_samp)
plt.hist(prior_samp,alpha=0.5,label="prior",weights=weights)
plt.hist(sigma_samp,alpha=0.5,label="posterior",weights=weights)
plt.xlabel("sigma")
plt.legend(loc='upper left')
plt.show()

break!

## poisson processes

- now we will explore a new class of stochastic models
- these will concern themselves with modelling the stochastic timing of discrete events

## **how often do things happen?**

- amino acid substitutions along a protein
- speciation events in a large clade
- fossil deposition events

## **how often do things happen?**

- might imagine a stochastic **rate**:
    - substitution rate
    - speciation rate
    - fossilization rate

## **how often do things happen?**

- might imagine a stochastic **rate**:
    - substitution rate: how many substitutions occur per million years?
    - speciation rate: how many new species form per million years?
    - fossilization rate: how many fossils are preserved per million years?

## protein evolution

- one common use of a poisson process is to model stochastic changes in a protein sequence
- imagine amino acid substitutions occur randomly at some rate
- simulating the timing of changes should be straightforward

In [None]:
# how many substitutions occur in a protein after 1000 time steps?

import random
import matplotlib.pyplot as plt

n_subs = 0
stt = [n_subs]
rate = 0.05
for i in range(999):
    r = random.random()
    if r < rate:
        n_subs += 1
    stt.append(n_subs)

plt.plot([i for i in range(1000)], stt,"o")
plt.xlabel("time step")
plt.ylabel("# substitutions")
plt.show()

## **how many events can we expect in time window $\Delta t$?**

- Simulation under this model is straightforward but there are mathematical expressions that are also useful
- In particular, a situation like this is often modelled using the **Poisson** distribution

In [None]:
subs = []
for _ in range(500):
    n_subs = 0
    rate = 0.05
    for i in range(999):
        r = random.random()
        if r < rate:
            n_subs += 1
    subs.append(n_subs)

plt.hist(subs)
plt.xlabel("# substitutions")
plt.ylabel("count")
plt.show()

## **Poisson distribution**

- Discrete probability distribution
- 1 parameter:
  - &lambda; -- how often do events happen?


## **Poisson PMF**

- Probability of _k_ events in an interval, $t$:

$$\frac{(\lambda t)^ke^{-\lambda t}}{k!}$$

In [None]:
def factorial(n):
    nums = [i for i in range(n,1,-1)]
    prod = 1
    for i in nums:
        prod *= i
    return prod

In [None]:
import math

def poisson_pmf(lam, t, k):
    num = ((lam*t)**k) * math.exp(-(lam*t))
    denom = factorial(k)
    return num / denom

probs = []
x = []
for i in range(30,70):
    probs.append(poisson_pmf(0.05,1000,i))
    x.append(i)
    
plt.plot(x,probs,"o")
plt.xlabel("# substitutions")
plt.ylabel("probability")
plt.show()


In [None]:
plt.plot(x,probs,"o")

plt.hist(subs,alpha=0.5,color="grey",density=True)
plt.xlabel("# substitutions")
plt.ylabel("probability")
plt.show()


## **poisson distribution**

- Poisson distribution can be derived from the binomial distribution
    - Divided into sub-intervals, what is the probability of 'success' (there is an event) vs not (no event) ?
    - Limiting case of the binomial where the number of trials approaches infinity
    


## **poisson process**

- \# events (e.g., mutations) over time _t_ follows Poisson distribution
- Evolution of mutations over time can be modelled by a **Poisson _process_**
    - index a poisson distribution over time
- Poisson distribution/process implies additional properties we will explore next week

## **poisson process**

- For now we will make a departure to explore a specific type of Poisson process


## discrete trait evolution

- suppose we are studying a species of trilobite and are interested in the evolution of their proboscis 

![h:500 center](images/discrete0.svg)



## discrete trait evolution

- imagine that, over time, we tend to see the gain and loss of spines from the end of the proboscis

![h:500 center](images/discrete1.svg)



## discrete trait evolution

- many organisms display traits like this that are discontinuous
- another example is DNA or protein sequences

## discrete trait evolution

- point mutations become substitutions and substitutions occur at some rate

![h:500 center](images/dna_substitution.svg)


## discrete trait evolution

- we can extend the simple poisson process introduced above to incorporate rates of different kinds of events


## discrete trait evolution

- we will represent each character state using integers:

![h:500 center](images/discrete3.svg)



In [None]:
# simulate character history over 10 million years
# will arbitrarily set dt to 0.1, so that we will observe (sample) the process every 100kyr

import random
import matplotlib.pyplot as plt

char_state = 0
states = [char_state]
rate = 0.03
dt = 0.1
curtime = 0.
times = [curtime]
while curtime < 10.:
    curtime += dt
    r = random.random()
    if r < rate:
        char_state = abs(1 - char_state)  # swap current character state to other
    states.append(char_state)
    times.append(curtime)
    

plt.plot(times,states)
plt.show()

## discrete trait evolution

- here we have simulated changes between 0 and 1 according to a Poisson process
- what if there is asymmetry in different types of changes?

## discrete trait evolution

- gains

![h:500 center](images/discrete2.svg)



## discrete trait evolution

- losses

![h:500 center](images/discrete1.svg)



## discrete trait evolution

- this is easy to simulate as a poisson process
- we can set up a poisson process with two kinds of events


## discrete trait evolution

- might have a rate of loss ($r_l$) and a rate of gain ($r_g$)

![h:500 center](images/discrete4.svg)



## discrete trait evolution

- we can build a matrix of rates emcompassing each kind of of event

![h:500 center](images/rate_mat.svg)



## discrete trait evolution

- we typically call this the _Q_ matrix
- formulate it so that the rows sum to 0

![h:500 center](images/rate_mat1.svg)



## discrete trait evolution

- this matrix gives the instantaneous rate of all types of changes
- this type of model is called a **continuous-time Markov chain (CTMC)**
- we can also use it to compute the probabilities of different types of changes given _dt_
    - we call this the _P_ matrix
    
$$ P = e^{Qdt}$$


In [None]:
from scipy.linalg import expm

gain = 0.2
loss = 0.1
dt = 0.1

Q = np.array(
    [ 
    [-gain,gain], 
    [loss,-loss] 
    ])

P = expm(Q*dt)

print(P)


## discrete trait evolution

- we could now modify our simulation code to accomodate this matrix of rates rather than just a single poisson rate 

In [None]:
# simulate character history over 10 million years
# will arbitrarily set dt to 0.1, so that we will observe (sample) the process every 100kyr

import random
import matplotlib.pyplot as plt

def sim_discrete(Q, time_window, dt = 0.1):
    char_state = 0
    states = [char_state]
    curtime = 0.
    times = [curtime]
    while curtime < time_window:
        curtime += dt
        P = expm(Q*dt)
        cur_row = P[char_state]  # we'll pull out the row corresponding to our current state
        prob_change = cur_row[abs(1 - char_state)]  # find probability of change by pulling index opposite of char_state
        r = random.random()
        if r < prob_change:
            char_state = abs(1 - char_state)  # swap current character state to other
        states.append(char_state)
        times.append(curtime)
    return times, states

times,states = sim_discrete(Q, 10.)
plt.plot(times,states)
plt.show()

## a note on time  and _dt_

- like Brownian motion, this is a continuous time model
- we just _sample_ the process at different points
- so _dt_ can be arbitrary
    - _P_ is a function of both _Q_ and _dt_
    - how does it change as _dt_ increases?

In [None]:
def set_Q(gain,loss):
    Q = np.array(
        [ 
        [-gain, gain], 
        [loss, -loss] 
        ])
    return Q

In [None]:
from scipy.linalg import expm

dt = 0.9
gain = 0.2
loss = 0.05

Q = set_Q(gain,loss)

P = expm(Q*dt)

print(P)


## discrete trait evolution

- one more property of CTMCs to discuss
- the **stationary distribution**:
  - what are the probabilities of each state when time in the chain $\rightarrow \infty$ ?
  - you will derive this in the homework, but first, we'll simulate

In [None]:
gain = 0.2
loss = 0.05

Q = set_Q(gain,loss)

all_states = []

for _ in range(10):
    times, states = sim_discrete(Q, 100.)
    times = times[100:]
    states = states[100:]
    all_states += states
    plt.plot(times,states,color='grey',alpha=0.2)
    
plt.show()


In [None]:
p0 = np.array(all_states.count(0)) / float(len(all_states))
p1 = np.array(all_states.count(1)) / float(len(all_states))

fig, ax  = plt.subplots()
ax.bar([1, 2], [p0, p1], width=1,
       tick_label=['state 0', 'state 1'], align='center')
plt.ylabel("probability")
plt.show()

## discrete trait evolution

- given the stationary distribution, the probability of each state is proportional to the rates
- we will explore more in the homework

In [None]:
print(p1 / p0)
print(gain / loss) 

## discrete trait evolution

- we can also perform statistical inference under this model
- each cell in _P_ gives probability of going from one state to another
    - e.g., $P(0\rightarrow1 | Q)$ 
    - each of these probabilities can be used as a likelihood


## discrete trait evolution

![h:500 center](images/ctmc.svg)


In [None]:
# write a function to calculate the log-likelihood of a particular time series under the CTMC

def calc_disc_loglike(times, data, gain, loss):
    ll = 0.0
    for i in range(1,len(times)):
        cur_state = data[i]
        last_state = data[i-1]
        dt = times[i] - times[i-1]
        Q = set_Q(gain,loss)
        P = expm(Q*dt)
        like = P[last_state][cur_state]
        stepll = math.log(like)
        ll += stepll
    return ll

In [None]:
# visualize log-likelihood surface over values of gain rate parameter

test_gains = np.linspace(0.1,0.4,20)

likes = [calc_disc_loglike(times,states,i,loss) for i in test_gains]
plt.plot(test_gains,likes)
plt.show()

In [None]:
# visualize 2d likelihood surface for both rates

times,states = sim_discrete(Q, 10.)

X = np.linspace(.01, .4, 40)
Y = np.linspace(.01, .4, 40)
Xgrid, Ygrid = np.meshgrid(X, Y)
z = []

for yi in Y: 
    y_ll = []
    for xi in X:
        ll = calc_disc_loglike(times,states,xi,yi)
        y_ll.append(ll)
        
    z.append(y_ll)


In [None]:
# visualize 2d likelihood surface for both rates


levels = np.linspace(min(min(z)),max(max(z)),15) 
fig, ax = plt.subplots()
CS = ax.contour(Xgrid,Ygrid,z,levels=levels)
ax.clabel(CS, inline=True, fontsize=10)
plt.xlabel("gain rate")
plt.ylabel("loss rate")
plt.show()

## discrete trait evolution


- statistical estimation of rate parameters is tricky due to low information
- high stochastic variability across simulated datasets b/c small sample size
- we could increase statistical power by:
    - sampling for a longer timespan
    - adding more lineages

## discrete trait evolution

- this approach is still very useful. this model is frequently used in biology to:
    - infer phylogenetic trees using DNA, protein, and morphological data
    - reconstruct rates of change in a trait (using datasets larger than simulated here)
- model spread of infectious disease through population 

## discrete trait evolution

- these models are used for all kinds of things, even outside of biology

![h:500 center](images/financial_markov.png)


# END

everything below is discard stuff that i wanted to keep around

In [None]:
def gen_positive_prop(val, prop_dist): 
    positive = False
    while True:
        val_star = val + prop_dist.sample(1)[0]
        if val_star > 0.:
            return val_star

In [None]:
def discrete_MCMC(times, states, gen,thin,start_p,gain_prior,loss_prior,gain_prop,loss_prop):
    gain = start_p[0]
    loss = start_p[1]
    ll = calc_disc_loglike(times,states,gain,loss) + gain_prior.log_pdf(gain) + loss_prior.log_pdf(loss)
    
    gain_samp = [gain]
    loss_samp = [loss]
    gens = [0]
    accept = 0

    for i in range(gen):
        if random.random() < .5:                   ## half the time we update gain, the other half loss
            gain_star = gen_positive_prop(gain,gain_prop)
            loss_star = loss
        else:
            gain_star = gain
            loss_star = gen_positive_prop(loss,loss_prop)
            
        ll_star = calc_disc_loglike(times,states,gain_star,loss_star) + gain_prior.log_pdf(gain_star) + loss_prior.log_pdf(loss_star)
        if ll_star - ll > 0.:
            ratio = 1.1
        else:
            ratio = math.exp(ll_star - ll)
        if random.random() < ratio:
            accept+=1
            ll = ll_star
            gain = gain_star
            loss = loss_star

        if i % thin == 0:
            print(i)
            gens.append(i)
            gain_samp.append(gain)
            loss_samp.append(loss)
    
    print("MCMC complete. Acceptance ratio:",accept / gen)
    return gens,gain_samp,loss_samp
    
            

In [None]:
gain, loss = 0.1 , 0.1
gain_prior = dist.normal(0.3, 0.1)
loss_prior = dist.normal(0.3, 0.1)
gain_prop = dist.normal(0.0, 0.05)
loss_prop = dist.normal(0.0, 0.05)

gen = 1000
thin = 10

gens, theta_samp, sigma_samp = discrete_MCMC(times, states, gen,thin,[gain,loss],gain_prior,loss_prior,gain_prop,loss_prop)

    


In [None]:
burnin = gen // 100 // 10 
theta_pb = theta_samp[burnin:] # remove first 10% of samples
sigma_pb = sigma_samp[burnin:]
gens_pb = gens[burnin:]

## markov chain for gain and loss rates

In [None]:
fig, (ax1, ax2) = plt.subplots(2)
ax1.plot(gens_pb, theta_pb)
ax1.xaxis.set_tick_params(labelbottom=False)
ax1.set(xlabel="", ylabel="theta value")
ax2.plot(gens_pb, sigma_pb)
ax2.set(xlabel="generation", ylabel="sigma value")

plt.show()

## estimated posterior distribution of rates

In [None]:
fig, (ax1, ax2) = plt.subplots(1,2)
ax1.hist(theta_pb)
ax1.set(xlabel="theta", ylabel="")
ax2.hist(sigma_pb)
ax2.set(xlabel="sigma", ylabel="")

plt.show()

In [None]:
prior_samp = [sigma_prior.sample(1)[0] for _ in range(len(sigma_samp))]

weights = np.ones_like(prior_samp) / len(prior_samp)
plt.hist(prior_samp,alpha=0.5,label="prior",weights=weights)
plt.hist(sigma_samp,alpha=0.5,label="posterior",weights=weights)
plt.xlabel("sigma")
plt.legend(loc='upper left')
plt.show()