#  Pyro 概率模型基本介绍

概率程序的基本单位是*随机函数*。这是一个任意的 Python 可调用对象，它结合了两种成分：

- 确定性 Python 代码；

- 调用随机数生成器的原始随机函数


具体来说，随机函数可以是具有 `__call__()` 方法的任何 Python 对象，例如：函数、方法或 PyTorch 的 `nn.Module`。

在整个教程和文档中，我们经常会称随机函数为`模型`，因为随机函数可用来表示生成数据的过程。将模型表示为随机函数意味着：**模型可以像常规 Python 可调用对象一样组合、重用、导入和序列化模型**。

In [2]:
import torch
import pyro

pyro.set_rng_seed(101)

## 原始随机函数

Primitive stochastic functions, or distributions, are an important class of stochastic functions for which we can explicitly compute the probability of the outputs given the inputs.  As of PyTorch 0.4 and Pyro 0.2, Pyro uses PyTorch's [distribution library](http://pytorch.org/docs/master/distributions.html). You can also create custom distributions using [transforms](http://pytorch.org/docs/master/distributions.html#module-torch.distributions.transforms).

原始随机函数（或分布）是一类重要的随机函数，我们可以使用它显式地计算出给定输入的概率输出。从 PyTorch 0.4 和 Pyro 0.2 开始，Pyro 使用 PyTorch 的 [概率分布库](http://pytorch.org/docs/master/distributions.html)。您还可以使用 [PyTorch 的 `torch.distributions.transforms`](http://pytorch.org/docs/master/distributions.html#module-torch.distributions.transforms) 创建自定义分布。

使用原始随机函数很容易。例如，要从单位正态分布 $\mathcal{N}(0,1)$ 中抽取样本 `x`，我们可以执行以下操作：

In [3]:
loc = 0.   # mean zero
scale = 1. # unit variance
normal = torch.distributions.Normal(loc, scale) # create a normal distribution object
x = normal.rsample() # draw a sample from N(0,1)
print("sample", x)
print("log prob", normal.log_prob(x)) # score the sample from N(0,1)

sample tensor(-1.3905)
log prob tensor(-1.8857)


这里，`torch.distributions.Normal` 是`Distribution` 类的一个实例，它接受参数并提供样本和评分方法。 Pyro 的概率分布库 `pyro.distributions` 是对 `torch.distributions` 的一个瘦包装，因为我们想在推断过程中利用 PyTorch 的快速张量数学和 autograd 功能。

## 简单模型

所有概率程序都是通过组合原始随机函数和确定性计算来构建的。我们之所以对概率编程感兴趣，是因为希望能够对现实世界中的事物进行建模，因此让我们从具体事物的模型开始。

假设我们有一堆包含每日平均温度和云量的数据。想推断温度与晴/阴天之间的相互作用。描述数据可能如何生成的简单随机函数可以由下式给出：

In [3]:
def weather():
    cloudy = torch.distributions.Bernoulli(0.3).sample()
    cloudy = 'cloudy' if cloudy.item() == 1.0 else 'sunny'
    mean_temp = {'cloudy': 55.0, 'sunny': 75.0}[cloudy]
    scale_temp = {'cloudy': 10.0, 'sunny': 15.0}[cloudy]
    temp = torch.distributions.Normal(mean_temp, scale_temp).rsample()
    return cloudy, temp.item()

让我们逐行浏览一遍。首先，在第 2 行中，我们定义了一个二值随机变量 “cloudy”，它由参数为 “0.3” 的伯努利分布得出。由于伯努利分布返回 `0` 或 `1`，在第 3 行我们将值 `cloudy` 转换为字符串，以便更容易解析 `weather` 的返回值。所以根据这个模型，30% 的时间是多云，70% 的时间是晴天。

在第 4-5 行中，我们定义了将用于对第 6 行中的温度进行采样的参数。这些参数取决于我们在第 2 行中采样的“阴天”的特定值。例如，阴天平均温度为 55 度（华氏度）和 晴天平均温度为 75 度。最后，我们在第 7 行返回了两个值 `cloudy` 和 `temp`。


然而，`weather` 完全独立于 Pyro —— 它只调用 PyTorch。如果我们想将此模型用于采样虚假数据以外的任何其他事情，我们需要将其转换为 Pyro 程序。

## `pyro.sample` 元语

To turn `weather` into a Pyro program, we'll replace the `torch.distribution`s with `pyro.distribution`s and the `.sample()` and `.rsample()` calls with calls to `pyro.sample`, one of the core language primitives in Pyro. Using `pyro.sample` is as simple as calling a primitive stochastic function with one important difference:

为了将 `weather` 变成 Pyro 程序，我们将 `torch.distribution`s 替换为 `pyro.distribution`s，并将 `.sample()` 和 `.rsample()` 调用替换为对 `pyro 的调用。 sample`，Pyro 中的核心语言原语之一。使用 `pyro.sample` 就像调用原始随机函数一样简单，但有一个重要区别：

In [4]:
x = pyro.sample("my_sample", pyro.distributions.Normal(loc, scale))
print(x)

tensor(-0.8152)


Just like a direct call to `torch.distributions.Normal().rsample()`, this returns a sample from the unit normal distribution. The crucial difference is that this sample is _named_. Pyro's backend uses these names to uniquely identify sample statements and _change their behavior at runtime_ depending on how the enclosing stochastic function is being used. As we will see, this is how Pyro can implement the various manipulations that underlie inference algorithms.

Now that we've introduced `pyro.sample` and `pyro.distributions` we can rewrite our simple model as a Pyro program:

In [5]:
def weather():
    cloudy = pyro.sample('cloudy', pyro.distributions.Bernoulli(0.3))
    cloudy = 'cloudy' if cloudy.item() == 1.0 else 'sunny'
    mean_temp = {'cloudy': 55.0, 'sunny': 75.0}[cloudy]
    scale_temp = {'cloudy': 10.0, 'sunny': 15.0}[cloudy]
    temp = pyro.sample('temp', pyro.distributions.Normal(mean_temp, scale_temp))
    return cloudy, temp.item()

for _ in range(3):
    print(weather())

('cloudy', 64.5440444946289)
('sunny', 94.37557983398438)
('sunny', 72.5186767578125)


Procedurally, `weather()` is still a non-deterministic Python callable that returns two random samples. Because the randomness is now invoked with `pyro.sample`, however, it is much more than that. In particular `weather()` specifies a joint probability distribution over two named random variables: `cloudy` and `temp`. As such, it defines a probabilistic model that we can reason about using the techniques of probability theory. For example we might ask: if I observe a temperature of 70 degrees, how likely is it to be cloudy? How to formulate and answer these kinds of questions will be the subject of the next tutorial.

## Universality: Stochastic Recursion, Higher-order Stochastic Functions, and Random Control Flow

We've now seen how to define a simple model. Building off of it is easy. For example:

In [6]:
def ice_cream_sales():
    cloudy, temp = weather()
    expected_sales = 200. if cloudy == 'sunny' and temp > 80.0 else 50.
    ice_cream = pyro.sample('ice_cream', pyro.distributions.Normal(expected_sales, 10.0))
    return ice_cream

This kind of modularity, familiar to any programmer, is obviously very powerful. But is it powerful enough to encompass all the different kinds of models we'd like to express?

It turns out that because Pyro is embedded in Python, stochastic functions can contain arbitrarily complex deterministic Python and randomness can freely affect control flow. For example, we can construct recursive functions that terminate their recursion nondeterministically, provided we take care to pass `pyro.sample` unique sample names whenever it's called. For example we can define a geometric distribution that counts the number of failures until the first success like so:

In [7]:
def geometric(p, t=None):
    if t is None:
        t = 0
    x = pyro.sample("x_{}".format(t), pyro.distributions.Bernoulli(p))
    if x.item() == 1:
        return 0
    else:
        return 1 + geometric(p, t + 1)
    
print(geometric(0.5))

0


Note that the names `x_0`, `x_1`, etc., in `geometric()` are generated dynamically and that different executions can have different numbers of named random variables. 

We are also free to define stochastic functions that accept as input or produce as output other stochastic functions:

In [8]:
def normal_product(loc, scale):
    z1 = pyro.sample("z1", pyro.distributions.Normal(loc, scale))
    z2 = pyro.sample("z2", pyro.distributions.Normal(loc, scale))
    y = z1 * z2
    return y

def make_normal_normal():
    mu_latent = pyro.sample("mu_latent", pyro.distributions.Normal(0, 1))
    fn = lambda scale: normal_product(mu_latent, scale)
    return fn

print(make_normal_normal()(1.))

tensor(2.1493)


Here `make_normal_normal()` is a stochastic function that takes one argument and which, upon execution, generates three named random variables.

The fact that Pyro supports arbitrary Python code like this&mdash;iteration, recursion, higher-order functions, etc.&mdash;in conjuction with random control flow means that Pyro stochastic functions are _universal_, i.e. they can be used to represent any computable probability distribution. As we will see in subsequent tutorials, this is incredibly powerful. 

It is worth emphasizing that this is one reason why Pyro is built on top of PyTorch: dynamic computational graphs are an important ingredient in allowing for universal models that can benefit from GPU-accelerated tensor math.

## Next Steps

We've shown how we can use stochastic functions and primitive distributions to represent models in Pyro. In order to learn models from data and reason about them we need to be able to do inference. This is the subject of the [next tutorial](intro_part_ii.ipynb).