Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect math equation renderings broken by backtick #18386

Merged
merged 4 commits into from
Apr 20, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
22 changes: 8 additions & 14 deletions tensorflow/contrib/bayesflow/python/ops/monte_carlo_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,15 +44,13 @@ def expectation_importance_sampler(f,
n=None,
seed=None,
name='expectation_importance_sampler'):
r"""Monte Carlo estimate of `\\(E_p[f(Z)] = E_q[f(Z) p(Z) / q(Z)]\\)`.
r"""Monte Carlo estimate of \\(E_p[f(Z)] = E_q[f(Z) p(Z) / q(Z)]\\).

With `\\(p(z) := exp^{log_p(z)}\\)`, this `Op` returns
With \\(p(z) := exp^{log_p(z)}\\), this `Op` returns

```
\\(n^{-1} sum_{i=1}^n [ f(z_i) p(z_i) / q(z_i) ], z_i ~ q,\\)
\\(\approx E_q[ f(Z) p(Z) / q(Z) ]\\)
\\(= E_p[f(Z)]\\)
```

This integral is done in log-space with max-subtraction to better handle the
often extreme values that `f(z) p(z) / q(z)` can take on.
Expand Down Expand Up @@ -121,14 +119,12 @@ def expectation_importance_sampler_logspace(
name='expectation_importance_sampler_logspace'):
r"""Importance sampling with a positive function, in log-space.

With `\\(p(z) := exp^{log_p(z)}\\)`, and `\\(f(z) = exp{log_f(z)}\\)`,
With \\(p(z) := exp^{log_p(z)}\\), and \\(f(z) = exp{log_f(z)}\\),
this `Op` returns

```
\\(Log[ n^{-1} sum_{i=1}^n [ f(z_i) p(z_i) / q(z_i) ] ], z_i ~ q,\\)
\\(\approx Log[ E_q[ f(Z) p(Z) / q(Z) ] ]\\)
\\(= Log[E_p[f(Z)]]\\)
```

This integral is done in log-space with max-subtraction to better handle the
often extreme values that `f(z) p(z) / q(z)` can take on.
Expand Down Expand Up @@ -196,13 +192,11 @@ def _logspace_mean(log_values):

def expectation(f, samples, log_prob=None, use_reparametrization=True,
axis=0, keep_dims=False, name=None):
"""Computes the Monte-Carlo approximation of `\\(E_p[f(X)]\\)`.
"""Computes the Monte-Carlo approximation of \\(E_p[f(X)]\\).

This function computes the Monte-Carlo approximation of an expectation, i.e.,

```none
\\(E_p[f(X)] \approx= m^{-1} sum_i^m f(x_j), x_j\ ~iid\ p(X)\\)
```

where:

Expand All @@ -216,8 +210,8 @@ def expectation(f, samples, log_prob=None, use_reparametrization=True,
parameterless distribution (e.g.,
`Normal(Y; m, s) <=> Y = sX + m, X ~ Normal(0,1)`), we can swap gradient and
expectation, i.e.,
`grad[ Avg{ \\(s_i : i=1...n\\) } ] = Avg{ grad[\\(s_i\\)] : i=1...n }` where
`S_n = Avg{\\(s_i\\)}` and `\\(s_i = f(x_i), x_i ~ p\\)`.
grad[ Avg{ \\(s_i : i=1...n\\) } ] = Avg{ grad[\\(s_i\\)] : i=1...n } where
S_n = Avg{\\(s_i\\)}` and `\\(s_i = f(x_i), x_i ~ p\\).

However, if p is not reparameterized, TensorFlow's gradient will be incorrect
since the chain-rule stops at samples of non-reparameterized distributions.
Expand Down Expand Up @@ -296,7 +290,7 @@ def expectation(f, samples, log_prob=None, use_reparametrization=True,
Args:
f: Python callable which can return `f(samples)`.
samples: `Tensor` of samples used to form the Monte-Carlo approximation of
`\\(E_p[f(X)]\\)`. A batch of samples should be indexed by `axis`
\\(E_p[f(X)]\\). A batch of samples should be indexed by `axis`
dimensions.
log_prob: Python callable which can return `log_prob(samples)`. Must
correspond to the natural-logarithm of the pdf/pmf of each sample. Only
Expand All @@ -317,7 +311,7 @@ def expectation(f, samples, log_prob=None, use_reparametrization=True,

Returns:
approx_expectation: `Tensor` corresponding to the Monte-Carlo approximation
of `\\(E_p[f(X)]\\)`.
of \\(E_p[f(X)]\\).

Raises:
ValueError: if `f` is not a Python `callable`.
Expand Down
4 changes: 2 additions & 2 deletions tensorflow/contrib/factorization/python/ops/kmeans.py
Original file line number Diff line number Diff line change
Expand Up @@ -374,11 +374,11 @@ def __init__(self,
than `num_clusters`, a TensorFlow runtime error occurs.
distance_metric: The distance metric used for clustering. One of:
* `KMeansClustering.SQUARED_EUCLIDEAN_DISTANCE`: Euclidean distance
between vectors `u` and `v` is defined as `\\(||u - v||_2\\)`
between vectors `u` and `v` is defined as \\(||u - v||_2\\)
which is the square root of the sum of the absolute squares of
the elements' difference.
* `KMeansClustering.COSINE_DISTANCE`: Cosine distance between vectors
`u` and `v` is defined as `\\(1 - (u . v) / (||u||_2 ||v||_2)\\)`.
`u` and `v` is defined as \\(1 - (u . v) / (||u||_2 ||v||_2)\\).
random_seed: Python integer. Seed for PRNG used to initialize centers.
use_mini_batch: A boolean specifying whether to use the mini-batch k-means
algorithm. See explanation above.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,43 +6,39 @@ Monte Carlo integration and helpers.
## Background

Monte Carlo integration refers to the practice of estimating an expectation with
a sample mean. For example, given random variable `Z in \\(R^k\\)` with density `p`,
a sample mean. For example, given random variable Z in \\(R^k\\) with density `p`,
the expectation of function `f` can be approximated like:

```
$$E_p[f(Z)] = \int f(z) p(z) dz$$
$$ ~ S_n
:= n^{-1} \sum_{i=1}^n f(z_i), z_i\ iid\ samples\ from\ p.$$
```

If `\\(E_p[|f(Z)|] < infinity\\)`, then `\\(S_n\\) --> \\(E_p[f(Z)]\\)` by the strong law of large
numbers. If `\\(E_p[f(Z)^2] < infinity\\)`, then `\\(S_n\\)` is asymptotically normal with
variance `\\(Var[f(Z)] / n\\)`.
If \\(E_p[|f(Z)|] < infinity\\), then \\(S_n\\) --> \\(E_p[f(Z)]\\) by the strong law of large
numbers. If \\(E_p[f(Z)^2] < infinity\\), then \\(S_n\\) is asymptotically normal with
variance \\(Var[f(Z)] / n\\).

Practitioners of Bayesian statistics often find themselves wanting to estimate
`\\(E_p[f(Z)]\\)` when the distribution `p` is known only up to a constant. For
\\(E_p[f(Z)]\\) when the distribution `p` is known only up to a constant. For
example, the joint distribution `p(z, x)` may be known, but the evidence
`\\(p(x) = \int p(z, x) dz\\)` may be intractable. In that case, a parameterized
distribution family `\\(q_\lambda(z)\\)` may be chosen, and the optimal `\\(\lambda\\)` is the
one minimizing the KL divergence between `\\(q_\lambda(z)\\)` and
`\\(p(z | x)\\)`. We only know `p(z, x)`, but that is sufficient to find `\\(\lambda\\)`.
\\(p(x) = \int p(z, x) dz\\) may be intractable. In that case, a parameterized
distribution family \\(q_\lambda(z)\\) may be chosen, and the optimal \\(\lambda\\) is the
one minimizing the KL divergence between \\(q_\lambda(z)\\) and
\\(p(z | x)\\). We only know `p(z, x)`, but that is sufficient to find \\(\lambda\\).


## Log-space evaluation and subtracting the maximum

Care must be taken when the random variable lives in a high dimensional space.
For example, the naive importance sample estimate `\\(E_q[f(Z) p(Z) / q(Z)]\\)`
involves the ratio of two terms `\\(p(Z) / q(Z)\\)`, each of which must have tails
dropping off faster than `\\(O(|z|^{-(k + 1)})\\)` in order to have finite integral.
For example, the naive importance sample estimate \\(E_q[f(Z) p(Z) / q(Z)]\\)
involves the ratio of two terms \\(p(Z) / q(Z)\\), each of which must have tails
dropping off faster than \\(O(|z|^{-(k + 1)})\\) in order to have finite integral.
This ratio would often be zero or infinity up to numerical precision.

For that reason, we write

```
$$Log E_q[ f(Z) p(Z) / q(Z) ]$$
$$ = Log E_q[ \exp\{Log[f(Z)] + Log[p(Z)] - Log[q(Z)] - C\} ] + C,$$ where
$$C := Max[ Log[f(Z)] + Log[p(Z)] - Log[q(Z)] ].$$
```

The maximum value of the exponentiated term will be 0.0, and the expectation
can be evaluated in a stable manner.
Expand Down
2 changes: 1 addition & 1 deletion tensorflow/python/ops/nn_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -1155,7 +1155,7 @@ def atrous_conv2d(value, filters, rate, padding, name=None):

Returns:
A `Tensor` with the same type as `value`.
Output shape with `'VALID`` padding is:
Output shape with `'VALID'` padding is:

[batch, height - 2 * (filter_width - 1),
width - 2 * (filter_height - 1), out_channels].
Expand Down