# 1. Why am I getting this error:  

## Not enough samples to build a trace.  

I would like to know how to reproduce the **Adapting the model to death data** in newer version of pymc3.  

Versions:
pymc3:  3.7
Python: 3.7.4


In [1]:
# installing packages


%pylab inline 
# magic fucntion in ipython. It loads major numerical and plotting libraries

import pandas as pd
import pymc3 as pm
# probabilistic programming language

# import class for constructing random walks
from pymc3.distributions.timeseries import GaussianRandomWalk
import theano.tensor as tt

#import some special methods
from scipy.special import logit,expit
import scipy.stats as stats

# set some plotting parameters to create ggplot-style plots
import seaborn as sns
import matplotlib as mpl
sns.set_context(context='talk',font_scale=1.5)
plt.style.use('ggplot')
mpl.rcParams['axes.labelsize'] = 24

#interactive plotting tools
from ipywidgets import interactive
matplotlib.rcParams['figure.figsize'] = [15,8]


Populating the interactive namespace from numpy and matplotlib


In [9]:
print(pm.__version__)
from platform import python_version
print(python_version())

3.7
3.7.4


Then I define two functions, one for creating the model, the other for generating trace (samples?) and ppc (posterior predictive checks).  

I changed some parts to adjust the changes of pymc3 updates.

In [16]:

def create_timeseries_death_model(N_mean = 10000, N_sd = 1000, 
                            p0 = 0.01,
                            mu_w=0.0, sigma_w = 0.1,
                            mu_pd = 0.1, sigma_pd = 0.05,
                            overdoses = None,deaths = None):
    """
    Create pymc3 time-series overdose model
    
    Parameters
    ----------
    N_mean : float
        mean of population size
    N_sd : float
        standard deviation of population size
    p0 : float
        initial prevalence of overdoses
    mu_w : float
        drift for overdose random walk
    sd_w : float
        variance for overdose random walk
    mu_pd : float
        mean probability of death following an overdose
    sigma_pd : float
        sd probability of death following an overdose
    overdoses : numpy array
        overdose data
    deaths : numpy array
        death data
    
    Returns
    -------
    
    Pymc3 model
    
    """
    
    # instantiate model
    model = pm.Model()

    # create elements of the model
    with model:
        # define population size random variable. PyMC3 needs a label for the RV as the first value.
        N = pm.Normal('N',mu=N_mean, sd=N_sd)

        # define random walk process
        w = GaussianRandomWalk('w',mu=mu_w, sd=sigma_w, shape=n_months, init=pm.Normal.dist(mu=logit(p0), sd=0.1))
        
        # convert random walk into probability
        p = pm.Deterministic('p',pm.math.invlogit(w))
        
        # probability of death following an overdose
        pd = pm.Beta('p_d',mu=mu_pd,sd=sigma_pd)
        

        # generate data (we assume we haven't observed any data so far to draw from the prior)
        if overdoses is not None:
            x = pm.Poisson('overdoses',mu=N*p,shape=(n_months,), observed=overdoses)
            
        if deaths is not None:
            x = pm.Poisson('deaths',mu=N*p*pd,shape=(n_months,), observed=deaths)
    
    return model

def generate_ppc_from_model(model):
    '''
    Generate trace and ppc from a pymc3 model using variational Bayes.
    
    Parameters
    ----------
    
    model : pymc3 model
    
    Return
    ------
    
    dict
        dictionary containing the trace and ppc
    '''
    # generate trace of posterior
    
### this is original codes
#     print('Running variational fiiting...')
#     with model:
#         v_params = pm.variational.advi(n=100000)

#     print('Generating trace...')
#     with model:
#         trace = pm.variational.sample_vp(v_params, draws=1000)
###

    # new codes
    print('Running variational fitting and generating trace...')
    with model:
        trace = pm.sample(1000, init='advi', n_init=100000)

    print('Generate ppc...')
    # calculate ppc
    with model:
### this is original codes
#         ppc = pm.sample_ppc(trace)
###
        #new codes
        ppc = pm.sample_posterior_predictive(trace)
    return {'trace':trace,'ppc':ppc}

There was not many resources online but I found this:  
https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/issues/385  
I followed the reply from the link to modify the variational bayes part of codes.  

Now I am importing data set, fitting the model then generate trace and ppc.


In [11]:
data = pd.read_csv('./data/data_sample.csv')

In [13]:
# defining variables of need
n_months = 12
n_samples = 1000

In [17]:
model = create_timeseries_death_model(overdoses=data['overdoses'].values,
                                     deaths=data['deaths'].values)

In [18]:
fit = generate_ppc_from_model(model)

Auto-assigning NUTS sampler...
Initializing NUTS using advi...


Running variational fitting and generating trace...


Average Loss = 202.52:  16%|█▌        | 16212/100000 [00:10<00:55, 1516.34it/s] 
Convergence achieved at 16300
Interrupted at 16,299 [16%]: Average Loss = 7,188.9
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [p_d, w, N]
Sampling 2 chains: 100%|██████████| 3000/3000 [00:37<00:00, 80.13draws/s] 
The estimated number of effective samples is smaller than 200 for some parameters.
  0%|          | 0/2000 [00:00<?, ?it/s]

Generate ppc...


100%|██████████| 2000/2000 [00:06<00:00, 304.37it/s]


# 2. I am getting a different result.

(put the ppc part after the time variation is added while I used MCMC and Mike used VB.)