Dirichlet + Categorical or Dirichlet + Multinomial toy example ? #106

keskinm · 2018-05-26T14:44:53Z

Hello
Is it possible to add a little example for doing this:
Latent <- zs.distributions.Dirichlet(alpha_parameters)
Observations <- zs.distributions.Multinomial(Latent, n_experiments) ?

And the goal is to retrieve alpha_parameters when we have only Observations, with HMC or variationnal inference.

(My ultimate goal is to do:
Latent1 <- zs.distributions.Dirichlet(alpha_parameters)
Latent2 <- zs.distributions.Dirichlet(alpha_parameters)
Latent3 <- conv2d(Latent1,Latent2)
Observations <- zs.distributions.Multinomial(Latent3, n_experiments)
And retrieve again alpha_parameters. I could manage it with a toy Dirichlet+Cat/Mult example and especially I think that could be helpful for everyone to have such an example.)
Thanks a lot

csy530216 · 2018-05-26T15:05:38Z

My personal opinion is that the reason why we use the Dirichlet distribution as the prior is that it is the conjugate of Multinomial distribution. However currently ZhuSuan does not support detecting the conjugacy and figuring out analytic updating formula of variational inference (but it is under development, see here, so you can wait for some while:) ) Therefore instead of using Dirichlet distribution which is defined on a probability simplex, we first sample from normal distribution (each dimension is independent) and transform it onto the probability simplex by softmax (Latent0 <- Normal, Latent1 = softmax(Latent0) ), i.e. the logistic normal distribution. Now the latent variable lies in an unconstrained space, which is convenient. You can also see how to approximate Dirichlet distribution with some parameter by logistic normal distribution in this paper.
By the way, unlike the standard LDA, I think the model in your ultimate goal does not have conjugacy, making conjugacy between Dirichlet and Multinomial meaningless. You can still use the black-box inference method in ZhuSuan, such as VI and HMC. Since your model does not have conjugacy, I think there is basically no loss if you change Dirichlet to logistic normal.
By the way, to learn more about the background and implementation about logistic normal topic model in ZhuSuan, you can refer to this tutorial:)

keskinm · 2018-05-27T07:58:45Z

Hello

Thanks you a lot for you reply.
Indeed, put a Dirichlet prior is not important, actually what I care about is to retrieve Latents it-selves and the priors are just a way to do it, so replace with logistic normal should not be a problem for me.

I am trying then to do this:

observations <- zs.Multinomial(toy_logits, n_experiments)

latent0 <- zs.Normal(latent0_mean, latent0_stdev, ngroup_dims=0)
latent1 <- tf.nn.softmax(latent0)
model_observations <- zs.Multinomial(latent1, n_experiments)

I have a piece of code that should be a good start:

import numpy as np
import tensorflow as tf
import zhusuan as zs

observations = zs.Multinomial(name='observations',logits=tf.log([1.,5.,10.,5.,1.]), n_experiments=100)

latent0_mean = tf.placeholder(tf.float32, shape=[5,], name='latent0_mean')
latent0_stdev = tf.placeholder(tf.float32, shape=[5,],name='latent0_stdev')

Latent0_mean = np.zeros(5, dtype=np.float32)
Latent0_stdev = np.zeros(5, dtype=np.float32)

def lognormalmult(observed, latent0_mean, latent0_stdev):
    with zs.BayesianNet(observed=observed) as model:
        latent0 = zs.Normal('latent0', mean=latent0_mean, logstd=latent0_stdev, group_ndims=0)
        latent1 = tf.nn.softmax(latent0)
        x = zs.Multinomial('x', tf.log(latent1),normalize_logits=False,dtype=tf.float32,n_experiments=100)
    return model

tf.set_random_seed(1)
kernel_width = 0.1
n_chains = 1
n_iters = 200
n_leapfrogs = 5

# Build the computation graph
def log_joint(observed):
    model = lognormalmult(observed, latent0_mean, latent0_stdev)
    log_platent0, log_px = model.local_log_prob(['latent0', 'x'])
    return log_platent0 + log_px

hmc = zs.HMC(step_size=1e-3, n_leapfrogs=n_leapfrogs,target_acceptance_rate=0.9)
x = tf.Variable(tf.zeros([n_chains, 5]), name='x')
latent0 = tf.Variable(tf.zeros([n_chains, 5]), name='latent0')
sample_op, hmc_info = hmc.sample(log_joint, observed={'x': x}, latent={'latent0': latent0})

# Run the inference
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    samples = []
    print('Sampling...')
    for i in range(n_iters):
        _, x_sample = sess.run([sample_op, hmc_info.samples['x']],
        feed_dict={latent0_mean: Latent0_mean, latent0_stdev:Latent0_stdev})
        samples.append(x_sample)
    print('Finished.')
    samples = np.vstack(samples)
    print('latent0 mean:')
    print(np.mean(samples,0))
    print('latent0_stdev:')
    print(np.stdev(samples,0))

Especially I don't know where to use "observations", and if I can do it in this way

csy530216 · 2018-05-28T09:01:06Z

In

observations = zs.Multinomial(name='observations',logits=tf.log([1.,5.,10.,5.,1.]), n_experiments=100)

zs.Multinomial is only supposed to be used in a BayesianNet context. You can use

observations = zs.distribution.Multinomial(name='observations',logits=tf.log([1.,5.,10.,5.,1.]), n_experiments=100).sample(size_of_dataset)

My understanding is that your code intends to infer the global latent variable latent0, which corresponds to the logits parameter of the supposed multinomial distribution of observations. Then I modified your code as follows, which can run:

import numpy as np
import tensorflow as tf
import zhusuan as zs

N = 50
observations = zs.distributions.Multinomial(name='observations',logits=tf.log([1.,5.,10.,5.,1.]), n_experiments=100).sample(N)

latent0_mean = tf.placeholder(tf.float32, shape=[5,], name='latent0_mean')
latent0_logstdev = tf.placeholder(tf.float32, shape=[5,],name='latent0_logstdev')

Latent0_mean = np.zeros(5, dtype=np.float32)
Latent0_logstdev = np.zeros(5, dtype=np.float32)

def lognormalmult(observed, latent0_mean, latent0_logstdev):
    with zs.BayesianNet(observed=observed) as model:
        latent0 = zs.Normal('latent0', mean=latent0_mean, logstd=latent0_logstdev, group_ndims=1, n_samples=n_chains)
        x = zs.Multinomial('x', tf.expand_dims(latent0, 1),n_experiments=100)
    return model

tf.set_random_seed(1)
kernel_width = 0.1
n_chains = 2
n_iters = 200
n_leapfrogs = 5

# Build the computation graph
def log_joint(observed):
    model = lognormalmult(observed, latent0_mean, latent0_logstdev)
    log_platent0, log_px = model.local_log_prob(['latent0', 'x'])
    return log_platent0 + tf.reduce_sum(log_px, -1)

hmc = zs.HMC(step_size=1e-3, n_leapfrogs=n_leapfrogs,target_acceptance_rate=0.9,adapt_step_size=True)
latent0 = tf.Variable(tf.zeros([n_chains, 5]), name='latent0')
sample_op, hmc_info = hmc.sample(log_joint, observed={'x': observations}, latent={'latent0': latent0})

# Run the inference
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    samples = []
    print('Sampling...')
    for i in range(n_iters):
        _, latent0_sample = sess.run([sample_op, hmc_info.samples['latent0']],
        feed_dict={latent0_mean: Latent0_mean, latent0_logstdev:Latent0_logstdev})
        samples.append(latent0_sample)
    print('Finished.')
    samples = np.vstack(samples)
    print('the generative logits')
    print(np.log([1.,5.,10.,5.,1.]))
    print('latent0 mean: (notice logits=[a,b] and logits=[a+k,b+k] are the same:')
    print(np.mean(samples,0))
    print('latent0_stdev:')
    print(np.std(samples,0))

keskinm · 2018-05-31T08:08:00Z

Hello
Thanks a lot, that works well like that, when I do a multinomial sampling with obtained logits I obtain data similar to the original one "observations".

However, I obtain the same kind of problem as in Edward (but in Pyro that works):
Before doing my "ultimate goal" (with logistic normal instead of Dirichlet with Zhusuan as you suggest), I want to try more simple thing first, like:

Constant1 <- tf.constant([..])
Constant2 <- tf.constant([..])
Observations <- zs.distributions.Multinomial(Constant1 + Constant2, n_experiments))
Latent0<- zs.distributions.Normal(params)
Observations_model <- zs.distributions.Multinomial(Latent0 + Constant2, n_experiments))

but that fails to retrieve correct params so that we could obtain the Latent0 similar to Constant1 and then similar data to the original one (see also https://discourse.edwardlib.org/t/a-little-example-on-dirichlet-multinomial-inference/802 if interested, this is the same thing with Dirichlet instead of Normal).

Code:

import numpy as np
import tensorflow as tf
import zhusuan as zs

N = 500
constant1=tf.constant([40.,5.,10.,20.,80])
constant2=tf.constant([10.,2.,5.,3.,50.])
observations = zs.distributions.Multinomial(name='observations',logits=tf.log(constant1+constant2), n_experiments=8000).sample(N)

latent0_mean = tf.placeholder(tf.float32, shape=[5,], name='latent0_mean')
latent0_logstdev = tf.placeholder(tf.float32, shape=[5,],name='latent0_logstdev')

Latent0_mean = np.zeros(5, dtype=np.float32)
Latent0_logstdev = np.zeros(5, dtype=np.float32)

def lognormalmult(observed, latent0_mean, latent0_logstdev):
    with zs.BayesianNet(observed=observed) as model:
        latent0 = zs.Normal('latent0', mean=latent0_mean, logstd=latent0_logstdev, group_ndims=1)

        constant2 = tf.constant([10.,2.,5.,3.,50.])
        
        x = zs.Multinomial('x', logits=(latent0+constant2), n_experiments=8000)
    return model

tf.set_random_seed(1)
kernel_width = 0.1
n_chains = 2
n_iters = 2000
n_leapfrogs = 5

# Build the computation graph
def log_joint(observed):
    model = lognormalmult(observed, latent0_mean, latent0_logstdev)
    log_platent0, log_px = model.local_log_prob(['latent0', 'x'])
    return log_platent0 + tf.reduce_sum(log_px, -1)

hmc = zs.HMC(step_size=1e-3, n_leapfrogs=n_leapfrogs,target_acceptance_rate=0.9,adapt_step_size=True)
latent0 = tf.Variable(tf.zeros([5,]), name='latent0')
sample_op, hmc_info = hmc.sample(log_joint, observed={'x': observations}, latent={'latent0': latent0})

# Run the inference(same thing as before)..

(I removed n_chains to make it simpler)

csy530216 · 2018-06-28T18:21:00Z

Sorry for the (very) late reply. The problem is at:

observations = zs.distributions.Multinomial(name='observations',logits=tf.log(constant1+constant2), n_experiments=8000).sample(N)

And

x = zs.Multinomial('x', logits=(latent0+constant2), n_experiments=8000)

You can see there should be 'latent0+constant2=tf.log(constant1+constant2)' (up to a difference of an constant), and this does not imply 'latent0=tf.log(constant1)'.
Sorry that the code in my reply before is not clear enough: I should write constant1=np.log([1.,5.,10.,5.,1.]) and replace the other log([1.,5.,10.,5.,1.]) with constant1. The usage of log is to show the conversion between (unnormalized) probability and logits, but here we only need to work in the logits, so no need to use log.

thjashin added the question label Jun 20, 2018

csy530216 closed this as completed Jul 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dirichlet + Categorical or Dirichlet + Multinomial toy example ? #106

Dirichlet + Categorical or Dirichlet + Multinomial toy example ? #106

keskinm commented May 26, 2018 •

edited

csy530216 commented May 26, 2018 •

edited

keskinm commented May 27, 2018 •

edited

csy530216 commented May 28, 2018

keskinm commented May 31, 2018 •

edited

csy530216 commented Jun 28, 2018

Dirichlet + Categorical or Dirichlet + Multinomial toy example ? #106

Dirichlet + Categorical or Dirichlet + Multinomial toy example ? #106

Comments

keskinm commented May 26, 2018 • edited

csy530216 commented May 26, 2018 • edited

keskinm commented May 27, 2018 • edited

csy530216 commented May 28, 2018

keskinm commented May 31, 2018 • edited

csy530216 commented Jun 28, 2018

keskinm commented May 26, 2018 •

edited

csy530216 commented May 26, 2018 •

edited

keskinm commented May 27, 2018 •

edited

keskinm commented May 31, 2018 •

edited