I have some trouble translating a model from PyMC3 #109

laygr · 2018-06-19T19:24:35Z

I'm very excited for this library, so I really want to learn how to use it. I'm translating the example Cheating among students from PyMC3 to ZhuSuan, from the book Probabilistic Programming & Bayesian Methods for Hackers.

It is a clever solution for inferring how many students cheated on an exam. It involves a "privacy algorithm" where students answer whether they cheated with the truth or randomly according to the flipping of two coins.

First coin flip:

Head => they answer with the truth. That is, if they cheated or not
Tails => they answer according to the second coin flip

Second coin flip:

Heads => they answer that they cheated
Tails => the answer that they didn't cheat

This way, we can't point to an individual student and claim that they cheated, since only the student knows the results of the coin flips. And so, their privacy remains protected.

After the survey, we observe that 35 students out of a 100 answered that they cheated. The inference problem is to infer the real probability of cheating. Note that even if no student cheated, we would expect that 25 students would answer that they cheated.

However, it seems I'm getting something wrong since my results don't look similar to the results obtained in the original code.

This is my graphical model (a running Jupiter Notebook for this code can be found here):

def create_model(observed):
  with zs.BayesianNet(observed=observed) as model:
    
    #this is what we want to infer, the probability that a student cheats
    cheating_freq = zs.Uniform('cheating_freq', minval=0.0, maxval=1.0, group_ndims=1)
    
    not_cheated = zs.Bernoulli('not_cheated', logits=n_ones*logit(cheating_freq), group_ndims=1)
    first_coin_flips = zs.Bernoulli('first_coin_flips', logits=n_ones*logit(0.5), group_ndims=1)
    second_coin_flips = zs.Bernoulli('second_coin_flips', logits=n_ones*logit(0.5), group_ndims=1)
    
    answers = first_coin_flips * not_cheated + (1 - first_coin_flips) * second_coin_flips
    
    observed_proportion = tf.cast(tf.reduce_sum(answers)/N, tf.float32)
    observations = zs.Binomial('observations', logits=logit(observed_proportion),
                               n_experiments=tf.constant(N), group_ndims=1)
  return model

This is the code for the inference model:

n_chains = 1
n_iters = 40000
burnin = 20000
n_leapfrogs = 5

adapt_step_size = tf.placeholder(tf.bool, shape=[], name="adapt_step_size")
adapt_mass = tf.placeholder(tf.bool, shape=[], name="adapt_mass")
hmc = zs.HMC(step_size=1e-3, n_leapfrogs=n_leapfrogs, 
             adapt_step_size=adapt_step_size, adapt_mass=adapt_mass,
             target_acceptance_rate=0.9)

def log_joint(observed):
  model = create_model(observed)
  cf, nc = model.local_log_prob(['cheating_freq', 'not_cheated'])
  return cf + nc

qcheating_freq = tf.Variable(0.1 * tf.ones([n_chains,1]), trainable=True)
obs = tf.constant([35])
sample_op, hmc_info = hmc.sample(log_joint, observed={'observations':obs}, latent={'cheating_freq':qcheating_freq})

And this is the code for running the inference:

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  samples = []
  print('Sampling...')
  for i in range(n_iters):
      _, fc_sample, acc, ss = sess.run(
          [sample_op, hmc_info.samples['cheating_freq'],
           hmc_info.acceptance_rate,hmc_info.updated_step_size],
          feed_dict={adapt_step_size: i < burnin // 2, adapt_mass: i < burnin // 2}
      )
      if i % 250 == 0:
          print('Sample {}: Acceptance rate = {}, updated step size = {}'
                  .format(i, np.mean(acc), ss))
      if i >= burnin:
          samples.append(fc_sample)
  print('Finished.')
  samples = np.vstack(samples)

If you have the time, I'll appreciate your help a lot.

The text was updated successfully, but these errors were encountered:

laygr · 2018-06-20T06:26:35Z

A better name for the variable not_cheated is cheated, since a 1 means that the student cheated and a 0 means that he didn’t cheat.

meta-inf · 2018-06-20T13:33:21Z

Basically you specified a joint distribution (cheating_freq', 'not_cheated and observation which you forgot to include), but only sampled a subset of the variables (cheating_freq). You could

include not_cheated into the sampler, but it's discrete, and HMC can't work on it without further augmentations;
derive the joint distribution in which not_cheated is marginalized, but it's hard.

So I guess instead of a single HMC that samples everything at once, you should use Gibbs like the PyMC3 tutorials did.

laygr · 2018-06-20T15:04:47Z

I did try several configurations, but I kept getting python errors; it’s good to know that there was no way of succeeding with the HMC algorithm, thanks! I considered it an almighty algorithm. Perhaps I should take the Probabilistic Graphical Models specialization on Coursera.
I didn’t see the Gibbs sampler in the docs.

meta-inf · 2018-06-20T16:44:12Z

The pymc3 link you attached mentioned they used metropolis within Gibbs. If you’re referring to zhusuan I think you need to specify each step (which variable to sample, which sample algorithm to use) to implement a Gibbs sampling procedure.

…

On Wed, 20 Jun 2018 at 23:04, Jesús Irais González Romero < ***@***.***> wrote: I did try several configurations; it’s good to know that there was no way of succeeding with the HMC algorithm, thanks! I didn’t see the Gibbs sampler in the docs. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#109 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEMFIkhu8d8NwZnv9Zjfu2KtZsE0hOHfks5t-mSRgaJpZM4UuGD1> .

thjashin added the question label Jun 20, 2018

csy530216 closed this as completed Jun 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I have some trouble translating a model from PyMC3 #109

I have some trouble translating a model from PyMC3 #109

laygr commented Jun 19, 2018 •

edited

Loading

laygr commented Jun 20, 2018 •

edited

Loading

meta-inf commented Jun 20, 2018

laygr commented Jun 20, 2018 •

edited

Loading

meta-inf commented Jun 20, 2018 via email

I have some trouble translating a model from PyMC3 #109

I have some trouble translating a model from PyMC3 #109

Comments

laygr commented Jun 19, 2018 • edited Loading

First coin flip:

Second coin flip:

laygr commented Jun 20, 2018 • edited Loading

meta-inf commented Jun 20, 2018

laygr commented Jun 20, 2018 • edited Loading

meta-inf commented Jun 20, 2018 via email

laygr commented Jun 19, 2018 •

edited

Loading

laygr commented Jun 20, 2018 •

edited

Loading

laygr commented Jun 20, 2018 •

edited

Loading