Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dirichlet + Categorical or Dirichlet + Multinomial toy example ? #106

Closed
keskinm opened this issue May 26, 2018 · 5 comments
Closed

Dirichlet + Categorical or Dirichlet + Multinomial toy example ? #106

keskinm opened this issue May 26, 2018 · 5 comments
Labels

Comments

@keskinm
Copy link

keskinm commented May 26, 2018

Hello
Is it possible to add a little example for doing this:
Latent <- zs.distributions.Dirichlet(alpha_parameters)
Observations <- zs.distributions.Multinomial(Latent, n_experiments) ?

And the goal is to retrieve alpha_parameters when we have only Observations, with HMC or variationnal inference.

(My ultimate goal is to do:
Latent1 <- zs.distributions.Dirichlet(alpha_parameters)
Latent2 <- zs.distributions.Dirichlet(alpha_parameters)
Latent3 <- conv2d(Latent1,Latent2)
Observations <- zs.distributions.Multinomial(Latent3, n_experiments)
And retrieve again alpha_parameters. I could manage it with a toy Dirichlet+Cat/Mult example and especially I think that could be helpful for everyone to have such an example.)
Thanks a lot

@csy530216
Copy link
Collaborator

csy530216 commented May 26, 2018

My personal opinion is that the reason why we use the Dirichlet distribution as the prior is that it is the conjugate of Multinomial distribution. However currently ZhuSuan does not support detecting the conjugacy and figuring out analytic updating formula of variational inference (but it is under development, see here, so you can wait for some while:) ) Therefore instead of using Dirichlet distribution which is defined on a probability simplex, we first sample from normal distribution (each dimension is independent) and transform it onto the probability simplex by softmax (Latent0 <- Normal, Latent1 = softmax(Latent0) ), i.e. the logistic normal distribution. Now the latent variable lies in an unconstrained space, which is convenient. You can also see how to approximate Dirichlet distribution with some parameter by logistic normal distribution in this paper.
By the way, unlike the standard LDA, I think the model in your ultimate goal does not have conjugacy, making conjugacy between Dirichlet and Multinomial meaningless. You can still use the black-box inference method in ZhuSuan, such as VI and HMC. Since your model does not have conjugacy, I think there is basically no loss if you change Dirichlet to logistic normal.
By the way, to learn more about the background and implementation about logistic normal topic model in ZhuSuan, you can refer to this tutorial:)

@keskinm
Copy link
Author

keskinm commented May 27, 2018

Hello

Thanks you a lot for you reply.
Indeed, put a Dirichlet prior is not important, actually what I care about is to retrieve Latents it-selves and the priors are just a way to do it, so replace with logistic normal should not be a problem for me.

I am trying then to do this:

observations <- zs.Multinomial(toy_logits, n_experiments)

latent0 <- zs.Normal(latent0_mean, latent0_stdev, ngroup_dims=0)
latent1 <- tf.nn.softmax(latent0)
model_observations <- zs.Multinomial(latent1, n_experiments)

I have a piece of code that should be a good start:

import numpy as np
import tensorflow as tf
import zhusuan as zs

observations = zs.Multinomial(name='observations',logits=tf.log([1.,5.,10.,5.,1.]), n_experiments=100)

latent0_mean = tf.placeholder(tf.float32, shape=[5,], name='latent0_mean')
latent0_stdev = tf.placeholder(tf.float32, shape=[5,],name='latent0_stdev')

Latent0_mean = np.zeros(5, dtype=np.float32)
Latent0_stdev = np.zeros(5, dtype=np.float32)

def lognormalmult(observed, latent0_mean, latent0_stdev):
    with zs.BayesianNet(observed=observed) as model:
        latent0 = zs.Normal('latent0', mean=latent0_mean, logstd=latent0_stdev, group_ndims=0)
        latent1 = tf.nn.softmax(latent0)
        x = zs.Multinomial('x', tf.log(latent1),normalize_logits=False,dtype=tf.float32,n_experiments=100)
    return model

tf.set_random_seed(1)
kernel_width = 0.1
n_chains = 1
n_iters = 200
n_leapfrogs = 5

# Build the computation graph
def log_joint(observed):
    model = lognormalmult(observed, latent0_mean, latent0_stdev)
    log_platent0, log_px = model.local_log_prob(['latent0', 'x'])
    return log_platent0 + log_px

hmc = zs.HMC(step_size=1e-3, n_leapfrogs=n_leapfrogs,target_acceptance_rate=0.9)
x = tf.Variable(tf.zeros([n_chains, 5]), name='x')
latent0 = tf.Variable(tf.zeros([n_chains, 5]), name='latent0')
sample_op, hmc_info = hmc.sample(log_joint, observed={'x': x}, latent={'latent0': latent0})

# Run the inference
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    samples = []
    print('Sampling...')
    for i in range(n_iters):
        _, x_sample = sess.run([sample_op, hmc_info.samples['x']],
        feed_dict={latent0_mean: Latent0_mean, latent0_stdev:Latent0_stdev})
        samples.append(x_sample)
    print('Finished.')
    samples = np.vstack(samples)
    print('latent0 mean:')
    print(np.mean(samples,0))
    print('latent0_stdev:')
    print(np.stdev(samples,0))

Especially I don't know where to use "observations", and if I can do it in this way

@csy530216
Copy link
Collaborator

In

observations = zs.Multinomial(name='observations',logits=tf.log([1.,5.,10.,5.,1.]), n_experiments=100)

zs.Multinomial is only supposed to be used in a BayesianNet context. You can use

observations = zs.distribution.Multinomial(name='observations',logits=tf.log([1.,5.,10.,5.,1.]), n_experiments=100).sample(size_of_dataset)

My understanding is that your code intends to infer the global latent variable latent0, which corresponds to the logits parameter of the supposed multinomial distribution of observations. Then I modified your code as follows, which can run:

import numpy as np
import tensorflow as tf
import zhusuan as zs

N = 50
observations = zs.distributions.Multinomial(name='observations',logits=tf.log([1.,5.,10.,5.,1.]), n_experiments=100).sample(N)

latent0_mean = tf.placeholder(tf.float32, shape=[5,], name='latent0_mean')
latent0_logstdev = tf.placeholder(tf.float32, shape=[5,],name='latent0_logstdev')

Latent0_mean = np.zeros(5, dtype=np.float32)
Latent0_logstdev = np.zeros(5, dtype=np.float32)

def lognormalmult(observed, latent0_mean, latent0_logstdev):
    with zs.BayesianNet(observed=observed) as model:
        latent0 = zs.Normal('latent0', mean=latent0_mean, logstd=latent0_logstdev, group_ndims=1, n_samples=n_chains)
        x = zs.Multinomial('x', tf.expand_dims(latent0, 1),n_experiments=100)
    return model

tf.set_random_seed(1)
kernel_width = 0.1
n_chains = 2
n_iters = 200
n_leapfrogs = 5

# Build the computation graph
def log_joint(observed):
    model = lognormalmult(observed, latent0_mean, latent0_logstdev)
    log_platent0, log_px = model.local_log_prob(['latent0', 'x'])
    return log_platent0 + tf.reduce_sum(log_px, -1)

hmc = zs.HMC(step_size=1e-3, n_leapfrogs=n_leapfrogs,target_acceptance_rate=0.9,adapt_step_size=True)
latent0 = tf.Variable(tf.zeros([n_chains, 5]), name='latent0')
sample_op, hmc_info = hmc.sample(log_joint, observed={'x': observations}, latent={'latent0': latent0})

# Run the inference
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    samples = []
    print('Sampling...')
    for i in range(n_iters):
        _, latent0_sample = sess.run([sample_op, hmc_info.samples['latent0']],
        feed_dict={latent0_mean: Latent0_mean, latent0_logstdev:Latent0_logstdev})
        samples.append(latent0_sample)
    print('Finished.')
    samples = np.vstack(samples)
    print('the generative logits')
    print(np.log([1.,5.,10.,5.,1.]))
    print('latent0 mean: (notice logits=[a,b] and logits=[a+k,b+k] are the same:')
    print(np.mean(samples,0))
    print('latent0_stdev:')
    print(np.std(samples,0))

@keskinm
Copy link
Author

keskinm commented May 31, 2018

Hello
Thanks a lot, that works well like that, when I do a multinomial sampling with obtained logits I obtain data similar to the original one "observations".

However, I obtain the same kind of problem as in Edward (but in Pyro that works):
Before doing my "ultimate goal" (with logistic normal instead of Dirichlet with Zhusuan as you suggest), I want to try more simple thing first, like:

Constant1 <- tf.constant([..])
Constant2 <- tf.constant([..])
Observations <- zs.distributions.Multinomial(Constant1 + Constant2, n_experiments))
Latent0<- zs.distributions.Normal(params)
Observations_model <- zs.distributions.Multinomial(Latent0 + Constant2, n_experiments))

but that fails to retrieve correct params so that we could obtain the Latent0 similar to Constant1 and then similar data to the original one (see also https://discourse.edwardlib.org/t/a-little-example-on-dirichlet-multinomial-inference/802 if interested, this is the same thing with Dirichlet instead of Normal).

Code:

import numpy as np
import tensorflow as tf
import zhusuan as zs

N = 500
constant1=tf.constant([40.,5.,10.,20.,80])
constant2=tf.constant([10.,2.,5.,3.,50.])
observations = zs.distributions.Multinomial(name='observations',logits=tf.log(constant1+constant2), n_experiments=8000).sample(N)

latent0_mean = tf.placeholder(tf.float32, shape=[5,], name='latent0_mean')
latent0_logstdev = tf.placeholder(tf.float32, shape=[5,],name='latent0_logstdev')

Latent0_mean = np.zeros(5, dtype=np.float32)
Latent0_logstdev = np.zeros(5, dtype=np.float32)

def lognormalmult(observed, latent0_mean, latent0_logstdev):
    with zs.BayesianNet(observed=observed) as model:
        latent0 = zs.Normal('latent0', mean=latent0_mean, logstd=latent0_logstdev, group_ndims=1)

        constant2 = tf.constant([10.,2.,5.,3.,50.])
        
        x = zs.Multinomial('x', logits=(latent0+constant2), n_experiments=8000)
    return model

tf.set_random_seed(1)
kernel_width = 0.1
n_chains = 2
n_iters = 2000
n_leapfrogs = 5

# Build the computation graph
def log_joint(observed):
    model = lognormalmult(observed, latent0_mean, latent0_logstdev)
    log_platent0, log_px = model.local_log_prob(['latent0', 'x'])
    return log_platent0 + tf.reduce_sum(log_px, -1)

hmc = zs.HMC(step_size=1e-3, n_leapfrogs=n_leapfrogs,target_acceptance_rate=0.9,adapt_step_size=True)
latent0 = tf.Variable(tf.zeros([5,]), name='latent0')
sample_op, hmc_info = hmc.sample(log_joint, observed={'x': observations}, latent={'latent0': latent0})

# Run the inference(same thing as before)..

(I removed n_chains to make it simpler)

@csy530216
Copy link
Collaborator

Sorry for the (very) late reply. The problem is at:

observations = zs.distributions.Multinomial(name='observations',logits=tf.log(constant1+constant2), n_experiments=8000).sample(N)

And

x = zs.Multinomial('x', logits=(latent0+constant2), n_experiments=8000)

You can see there should be 'latent0+constant2=tf.log(constant1+constant2)' (up to a difference of an constant), and this does not imply 'latent0=tf.log(constant1)'.
Sorry that the code in my reply before is not clear enough: I should write constant1=np.log([1.,5.,10.,5.,1.]) and replace the other log([1.,5.,10.,5.,1.]) with constant1. The usage of log is to show the conversion between (unnormalized) probability and logits, but here we only need to work in the logits, so no need to use log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants