Incomplete documentation for add_factor #115

nathanielvirgo · 2022-02-14T03:08:18Z

The documentation for add_factor currently says:

log_potentials – Optional array of shape (num_val_configs,) or (num_factors, num_val_configs). If specified, it contains the log of the potential value for every possible configuration. If none, it is assumed the log potential is uniform 0 and such an array is automatically initialized.

However, in order to use it one would need to know the order in which the factors should be specified. Could this be added to the documentation?

The text was updated successfully, but these errors were encountered:

StannisZhou · 2022-02-14T04:45:30Z

Thanks for bringing this up! This is actually a bug in the documentation. log_potentials in this case should always be an array of shape (num_val_configs,), and add_factor should always be used to add one factor at a time.

This bug in the documentation was a result of copying the docstring for log_potentials from EnumerationFactorGroup. I will open a PR to fix this.

See

PGMax/examples/rbm.py

Line 82 in 2d86b75

fg.add_factor(

and Sections 2 and 3 of our blog post for an example on how to use add_factor. factor_configs and log_potentials should be a literal translation of factor definitions (see for example Tables 1 and 2 in the blog post).

nathanielvirgo · 2022-02-14T04:54:03Z

Thank you for working on it! Note that we'll still need to know which element of the array of shape (num_val_configs,) should correspond to which outcome. The examples in the blog post don't help so much with that, because arrays containing the correct data are just assumed to exist already.

StannisZhou · 2022-02-14T05:02:00Z

factor_configs specifies the list of valid configurations. log_potentials is assumed to contain one element for each valid configuration in factor_configs and the order is assumed to match.

If the arrays containing the correct data don't already exist, I assume you should be able to create the necessary arrays in the right format easily (e.g. maybe you allow all possible combinations in which case you can use itertools.product to create the necessary factor_configs and specify log_potentials accordingly).

The package only helps you specify the model once the factor definitions are clear. If it's unclear what value to use for each configuration that sounds like the factor definition is still unclear, and more work is needed to clarify how to define the factor but that is on the modeling side and is orthogonal to how add_factor is used.

nathanielvirgo · 2022-02-15T04:17:48Z

The values to use for each factor are perfectly clear in my mind, that isn't the issue at all. I'm actually just trying to implement something simple like the sprinkler example, because that's a more useful starting point for me than the complex machine learning examples in the blog post.

The actual issue is that I hadn't understood that the valid configurations are stored in a list with an arbitrary order, instead of exhaustively. This is partly because the documentation doesn't say what num_val_configs is, and I assumed it would be defined elsewhere as something like (no. of outcomes of child 1)*(number of outcomes of child 2)*.... I gather now that it's actually just the length of the arrays you pass to the function, which makes sense. This is partly just me failing to correctly guess something that was fairly obvious, but it also could be spelt out more clearly.

StannisZhou · 2022-02-15T04:35:53Z

Yes. The main reason for explicitly enumerating the valid configurations is to support non-standard factors (e.g. the RCN example or structured higher-order factors, where only a small number of all possible configurations are valid). We tried to mention this and define num_valid_configs in

PGMax/pgmax/fg/graph.py

Line 112 in f7f3e3f

factor_configs: Array of shape (num_val_configs, num_variables)

but looks like it's still unclear.

For the usual pairwise factors with an exhaustive list of configurations one can use

PGMax/pgmax/fg/graph.py

Line 127 in f7f3e3f

def add_factor_group(self, factory: Callable, *args, **kwargs) -> None:

with

PGMax/pgmax/fg/groups.py

Line 814 in f7f3e3f

class PairwiseFactorGroup(FactorGroup):

, which uses the intuitive log_potential_matrix. See

PGMax/examples/rbm.py

Line 110 in f7f3e3f

# fg.add_factor_group(

for an example.

We will add additional clarifications in the log_potentials docstring, and also add some examples on directed models/Bayesian networks!

nathanielvirgo · 2022-02-15T08:56:21Z

Thank you!

I am afraid I am still completely confused. Here is some code that's attempting to create a factor graph corresponding to a simple Bayesian network and add the first of two factors. However, on running it I get an error, pasted below, which I don't understand. I must still be misunderstanding something about how these functions are meant to be used.

import jax
import matplotlib.pyplot as plt
import numpy as np
          
from pgmax.fg import graph
from pgmax.fg import groups

A = groups.VariableDict(2, ("A",))
B = groups.VariableDict(3, ("B",))

network = graph.FactorGraph([A,B])

network.add_factor(
    ["A"],
    np.array([0,1]),
    np.log(np.array([0.5,0.5]))
)

Here is the error:

% python3.8 test01.py
Traceback (most recent call last):
  File "test01.py", line 13, in <module>
    network.add_factor(
  File "/Users/nathaniel/Library/Python/3.8/lib/python/site-packages/pgmax/fg/graph.py", line 119, in add_factor
    factor_group = groups.EnumerationFactorGroup(
  File "<string>", line 7, in __init__
  File "/Users/nathaniel/Library/Python/3.8/lib/python/site-packages/pgmax/fg/groups.py", line 576, in __post_init__
    MappingProxyType(self._get_variables_to_factors()),
  File "/Users/nathaniel/Library/Python/3.8/lib/python/site-packages/pgmax/fg/groups.py", line 721, in _get_variables_to_factors
    [
  File "/Users/nathaniel/Library/Python/3.8/lib/python/site-packages/pgmax/fg/groups.py", line 725, in <listcomp>
    tuple(self.variable_group[self.variable_names_for_factors[ii]]),
  File "/Users/nathaniel/Library/Python/3.8/lib/python/site-packages/pgmax/fg/groups.py", line 216, in __getitem__
    raise ValueError(
ValueError: The name needs to have at least 2 elements to index from a composite variable group.

This doesn't make sense to me because I would have thought a name is just an arbitrary string, but if I try giving them longer names instead the error changes to this:

% python3.8 test01.py
Traceback (most recent call last):
  File "test01.py", line 13, in <module>
    network.add_factor(
  File "/Users/nathaniel/Library/Python/3.8/lib/python/site-packages/pgmax/fg/graph.py", line 119, in add_factor
    factor_group = groups.EnumerationFactorGroup(
  File "<string>", line 7, in __init__
  File "/Users/nathaniel/Library/Python/3.8/lib/python/site-packages/pgmax/fg/groups.py", line 576, in __post_init__
    MappingProxyType(self._get_variables_to_factors()),
  File "/Users/nathaniel/Library/Python/3.8/lib/python/site-packages/pgmax/fg/groups.py", line 721, in _get_variables_to_factors
    [
  File "/Users/nathaniel/Library/Python/3.8/lib/python/site-packages/pgmax/fg/groups.py", line 725, in <listcomp>
    tuple(self.variable_group[self.variable_names_for_factors[ii]]),
  File "/Users/nathaniel/Library/Python/3.8/lib/python/site-packages/pgmax/fg/groups.py", line 220, in __getitem__
    variable_group = self.variable_group_container[curr_name[0]]
TypeError: list indices must be integers or slices, not str

nathanielvirgo · 2022-02-16T01:45:03Z

Doing it the following way solves my problem.

I would say this is also an issue with the documentation for add_factor. The documentation says

variable_names – A list containing the connected variable names.

But in fact variable_names is not a list of names but a list of tuples (group_name, variable_index_within_group). The only exception is if only a single group is added to the graph.

(In this case I had to split the variables into two groups because they don't have the same number of outcomes. This strikes me as a bit awkward, so I've posted a separate issue about it.)

import numpy as np
          
from pgmax.fg import graph
from pgmax.fg import groups

A = groups.NDVariableArray(2,shape=(1,))
B = groups.NDVariableArray(3,shape=(1,))

network = graph.FactorGraph(dict(A=A, B=B))

network.add_factor(
    [("A",0)],
    np.array([[0],[1]]),
    np.log(np.array([0.5,0.5]))
)

StannisZhou · 2022-02-16T04:44:15Z

Thanks for raising these questions! It's very helpful to know what is confusing the users, and we will enhance the documentation accordingly. For your questions:

Using VariableDict, the implementation should be

import numpy as np
from pgmax.fg import graph, groups

A = groups.VariableDict(2, ("A",))
B = groups.VariableDict(3, ("B",))

network = graph.FactorGraph([A, B])

network.add_factor(
    [(0, "A")], 
    np.array([[0], [1]]), 
    np.log(np.array([0.5, 0.5]))
)

When you construct the factor graph with [A, B], you end up with a CompositeVariableGroup. The way to refer to an individual variable is to use a tuple (e.g. (0, "A"), where the first element refers to the first VariableDict in the CompositeVariableGroup, and the second element refers to variable "A" within the VariableDict). The reasoning behind this choice is to allow potentially duplicated names in different variable groups (e.g. two variables both named "A" in two different variable groups). Also note the factor_configs should always be a 2D array (of shape (num_val_configs (=2 in this case), num_variables (=1 in this case)).
2. We need to make additional clarifications in the documentation, but by variable names we mean tuples of the type (group_name, variable_index_within_group). Each factor graph contains a set of variables, and each variable should have a unique name. The tuples are our way of assigning unique names to individual variables.
3. As you mentioned in #118, what you really want here is a way to specify a group of variables with a varying number of states. This sounds like a good idea, and should be easy. We will add support for this.

StannisZhou added bug Something isn't working documentation Improvements or additions to documentation labels Feb 14, 2022

StannisZhou mentioned this issue Feb 14, 2022

Get rid of redundant array shape for log_potentials #116

Merged

StannisZhou closed this as completed in #116 Feb 14, 2022

StannisZhou reopened this Feb 15, 2022

This was referenced Feb 19, 2022

Support python 3.9/3.10; Improve documentation for add_factor; Bump up version for new release #120

Merged

Improve documentation for variables/variable groups #121

Open

StannisZhou closed this as completed in #120 Feb 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incomplete documentation for add_factor #115

Incomplete documentation for add_factor #115

nathanielvirgo commented Feb 14, 2022

StannisZhou commented Feb 14, 2022

nathanielvirgo commented Feb 14, 2022

StannisZhou commented Feb 14, 2022 •

edited

nathanielvirgo commented Feb 15, 2022 •

edited

StannisZhou commented Feb 15, 2022 •

edited

nathanielvirgo commented Feb 15, 2022

nathanielvirgo commented Feb 16, 2022 •

edited

StannisZhou commented Feb 16, 2022 •

edited

Incomplete documentation for add_factor #115

Incomplete documentation for add_factor #115

Comments

nathanielvirgo commented Feb 14, 2022

StannisZhou commented Feb 14, 2022

nathanielvirgo commented Feb 14, 2022

StannisZhou commented Feb 14, 2022 • edited

nathanielvirgo commented Feb 15, 2022 • edited

StannisZhou commented Feb 15, 2022 • edited

nathanielvirgo commented Feb 15, 2022

nathanielvirgo commented Feb 16, 2022 • edited

StannisZhou commented Feb 16, 2022 • edited

StannisZhou commented Feb 14, 2022 •

edited

nathanielvirgo commented Feb 15, 2022 •

edited

StannisZhou commented Feb 15, 2022 •

edited

nathanielvirgo commented Feb 16, 2022 •

edited

StannisZhou commented Feb 16, 2022 •

edited