# 第一课

褚则伟 zeweichu@gmail.com

[参考资料 reference](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html)


什么是PyTorch?
================

PyTorch是一个基于Python的科学计算库，它有以下特点:

- 类似于NumPy，但是它可以使用GPU
- 可以用它定义深度学习模型，可以灵活地进行深度学习模型的训练和使用

Tensors
---------------


Tensor类似与NumPy的ndarray，唯一的区别是Tensor可以在GPU上加速运算。


In [1]:
import torch

构造一个未初始化的5x3矩阵:

In [2]:
x = torch.empty(5, 3)
x

tensor([[4.0273e-01, 6.8103e-43, 4.0274e-01],
        [6.8103e-43, 4.0274e-01, 6.8103e-43],
        [4.0274e-01, 6.8103e-43, 4.0277e-01],
        [6.8103e-43, 4.0277e-01, 6.8103e-43],
        [4.0278e-01, 6.8103e-43, 4.0278e-01]])

构建一个随机初始化的矩阵:

In [4]:
x = torch.rand(5, 3)
x

tensor([[0.5000, 0.5785, 0.3924],
        [0.5834, 0.3587, 0.7411],
        [0.2405, 0.2649, 0.7328],
        [0.2475, 0.9244, 0.8744],
        [0.4679, 0.5440, 0.9812]])

构建一个全部为0，类型为long的矩阵:

In [6]:
x = torch.zeros(5, 3, dtype=torch.long)
x

tensor([[0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0],
        [0, 0, 0]])

In [7]:
x.dtype

torch.int64

In [10]:
x = torch.zeros(5, 3).long()
x.dtype

torch.int64

从数据直接直接构建tensor:

In [11]:
x = torch.tensor([5.5, 3])
x

tensor([5.5000, 3.0000])

也可以从一个已有的tensor构建一个tensor。这些方法会重用原来tensor的特征，例如，数据类型，除非提供新的数据。

In [13]:
x = x.new_ones(5, 3)
x

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

In [14]:
x = torch.randn_like(x, dtype=torch.float)
x

tensor([[ 0.0504, -0.8949, -0.8185],
        [-0.5997,  0.5454, -0.0252],
        [ 0.2941,  0.3912, -0.6967],
        [-0.3260, -1.2392,  1.9875],
        [ 0.9372, -2.2931,  1.1308]])

得到tensor的形状:

In [16]:
x.shape

torch.Size([5, 3])

In [17]:
x.size()

torch.Size([5, 3])

<div class="alert alert-info"><h4>注意</h4><p>``torch.Size`` 返回的是一个tuple</p></div>

Operations


有很多种tensor运算。我们先介绍加法运算。



In [19]:
y = torch.rand(5, 3)
y

tensor([[0.5474, 0.5270, 0.7894],
        [0.0124, 0.3016, 0.8688],
        [0.2528, 0.7705, 0.1715],
        [0.8004, 0.8628, 0.8776],
        [0.4489, 0.7268, 0.8369]])

In [21]:
x + y

tensor([[ 0.5979, -0.3678, -0.0291],
        [-0.5873,  0.8470,  0.8435],
        [ 0.5469,  1.1618, -0.5252],
        [ 0.4744, -0.3764,  2.8651],
        [ 1.3861, -1.5663,  1.9676]])

另一种着加法的写法


In [22]:
torch.add(x, y)

tensor([[ 0.5979, -0.3678, -0.0291],
        [-0.5873,  0.8470,  0.8435],
        [ 0.5469,  1.1618, -0.5252],
        [ 0.4744, -0.3764,  2.8651],
        [ 1.3861, -1.5663,  1.9676]])

加法：把输出作为一个变量

In [23]:
result = torch.empty(5, 3)
torch.add(x, y, out=result)
# 类似result = x+y  所以上面这样写没多大意义
result

tensor([[ 0.5979, -0.3678, -0.0291],
        [-0.5873,  0.8470,  0.8435],
        [ 0.5469,  1.1618, -0.5252],
        [ 0.4744, -0.3764,  2.8651],
        [ 1.3861, -1.5663,  1.9676]])

in-place加法

In [24]:
y

tensor([[0.5474, 0.5270, 0.7894],
        [0.0124, 0.3016, 0.8688],
        [0.2528, 0.7705, 0.1715],
        [0.8004, 0.8628, 0.8776],
        [0.4489, 0.7268, 0.8369]])

In [25]:
y.add_(x)
y

tensor([[ 0.5979, -0.3678, -0.0291],
        [-0.5873,  0.8470,  0.8435],
        [ 0.5469,  1.1618, -0.5252],
        [ 0.4744, -0.3764,  2.8651],
        [ 1.3861, -1.5663,  1.9676]])

<div class="alert alert-info"><h4>注意</h4><p>任何in-place的运算都会以``_``结尾。
    举例来说：``x.copy_(y)``, ``x.t_()``, 会改变 ``x``。</p></div>

各种类似NumPy的indexing都可以在PyTorch tensor上面使用。


In [26]:
x[:, 1:]

tensor([[-0.8949, -0.8185],
        [ 0.5454, -0.0252],
        [ 0.3912, -0.6967],
        [-1.2392,  1.9875],
        [-2.2931,  1.1308]])

Resizing: 如果你希望resize/reshape一个tensor，可以使用``torch.view``：

In [27]:
x = torch.rand(4,4)
x

tensor([[0.3336, 0.5088, 0.8541, 0.0020],
        [0.6179, 0.8664, 0.7470, 0.0266],
        [0.2415, 0.3528, 0.0577, 0.4929],
        [0.3647, 0.8183, 0.8700, 0.7377]])

In [28]:
y = x.view(16)
y

tensor([0.3336, 0.5088, 0.8541, 0.0020, 0.6179, 0.8664, 0.7470, 0.0266, 0.2415,
        0.3528, 0.0577, 0.4929, 0.3647, 0.8183, 0.8700, 0.7377])

In [30]:
z = x.view(2, 8)
z

tensor([[0.3336, 0.5088, 0.8541, 0.0020, 0.6179, 0.8664, 0.7470, 0.0266],
        [0.2415, 0.3528, 0.0577, 0.4929, 0.3647, 0.8183, 0.8700, 0.7377]])

In [31]:
z = x.view(-1, 8)
z

tensor([[0.3336, 0.5088, 0.8541, 0.0020, 0.6179, 0.8664, 0.7470, 0.0266],
        [0.2415, 0.3528, 0.0577, 0.4929, 0.3647, 0.8183, 0.8700, 0.7377]])

如果你有一个只有一个元素的tensor，使用``.item()``方法可以把里面的value变成Python数值。

In [32]:
x = torch.randn(1)
x

tensor([1.2592])

In [34]:
x.item()   # 把数字拿出来了

1.259178876876831

In [35]:
z.transpose(1, 0)

tensor([[0.3336, 0.2415],
        [0.5088, 0.3528],
        [0.8541, 0.0577],
        [0.0020, 0.4929],
        [0.6179, 0.3647],
        [0.8664, 0.8183],
        [0.7470, 0.8700],
        [0.0266, 0.7377]])

In [33]:
#dir(x)    # data grad  grad_fn有用

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_priority__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__contains__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dict__',
 '__dir__',
 '__div__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__idiv__',
 '__ilshift__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__long__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pow__',
 '__radd__',
 '__rdiv__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rfloordiv__',
 '__rmul__',
 '__rpow__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__setattr__',
 '__se

**更多阅读**


  各种Tensor operations, 包括transposing, indexing, slicing,
  mathematical operations, linear algebra, random numbers在
  `<https://pytorch.org/docs/torch>`.

Numpy和Tensor之间的转化
------------

在Torch Tensor和NumPy array之间相互转化非常容易。

Torch Tensor和NumPy array会共享内存，所以改变其中一项也会改变另一项。

把Torch Tensor转变成NumPy Array


In [37]:
a = torch.ones(5)
a

tensor([1., 1., 1., 1., 1.])

In [38]:
b = a.numpy()
b

array([1., 1., 1., 1., 1.], dtype=float32)

改变numpy array里面的值。

In [39]:
b[1] = 2
a

tensor([1., 2., 1., 1., 1.])

把NumPy ndarray转成Torch Tensor

In [40]:
a = np.ones(5)
b = torch.from_numpy(a)
b

tensor([1., 1., 1., 1., 1.], dtype=torch.float64)

In [41]:
np.add(a, 1, out=a)
a

array([2., 2., 2., 2., 2.])

In [42]:
b

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

In [43]:
a = a + 1 # 这个情况和np.add(a, 1, out=a)不一样，后者还是从原来内存上操作，而这种前者是新开辟了块内存空间
a

array([3., 3., 3., 3., 3.])

In [44]:
b

tensor([2., 2., 2., 2., 2.], dtype=torch.float64)

In [45]:
torch.add(b, 1, out=b)
b

tensor([3., 3., 3., 3., 3.], dtype=torch.float64)

In [46]:
a

array([3., 3., 3., 3., 3.])

所有CPU上的Tensor都支持转成numpy或者从numpy转成Tensor。

CUDA Tensors
------------

使用``.to``方法，Tensor可以被移动到别的device上。



In [47]:
torch.cuda.is_available()

True

In [50]:
if torch.cuda.is_available():
    device = torch.device("cuda")
    y = torch.ones_like(x, device=device)
    x = x.to(device)
    z = x + y
    print(z)
    print(z.to("cpu", torch.double))

tensor([2.2592], device='cuda:0')
tensor([2.2592], dtype=torch.float64)


In [53]:
"""一个Tensor在GPU上，是无法直接转成numpy的， 需要先转到CPU上"""
# y.data.numpy()
y.to("cpu").data.numpy()

array([1.], dtype=float32)

In [None]:
model = model.cuda()  # 把模型搬到GPU上     如果想用GPU，需要把东西搬到GPU上去


热身: 用numpy实现两层神经网络
--------------

一个全连接ReLU神经网络，一个隐藏层，没有bias。用来从x预测y，使用L2 Loss。<br>
* $hidden = W_1X$
* $a = max(0, h)$
* $y_{hat} = W_2a$

这一实现完全使用numpy来计算前向神经网络，loss，和反向传播。
* forward pass
* loss
* backward pass

numpy ndarray是一个普通的n维array。它不知道任何关于深度学习或者梯度(gradient)的知识，也不知道计算图(computation graph)，只是一种用来计算数学运算的数据结构。



In [55]:
N, D_in, H, D_out = 64, 1000, 100, 10   # N表示训练数据的个数， D_in表示输入的特征数 H是中间层，

# 随机创建一下训练数据
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6

for it in range(500):
    # forward pass
    h = x.dot(w1)   # N*H
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)   # N*D_out
    
    # compute loss
    loss = np.square(y_pred-y).sum()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h<0] = 0
    grad_w1 = x.T.dot(grad_h)
    
    # update weithts of w1 and w2
    w1 -= learning_rate  * grad_w1
    w2 -= learning_rate * grad_w2

0 29228837.983687624
1 24198719.633842565
2 23304362.816860273
3 23138236.945732567
4 21440415.137940705
5 17677044.729448378
6 12694815.893696152
7 8181481.081893332
8 4953331.466966225
9 2977358.010056023
10 1858002.7976885657
11 1235154.7663061763
12 880951.6961961221
13 668692.0478266304
14 532429.4887859911
15 438439.6370128431
16 369591.88163513603
17 316454.4263452521
18 273924.58833620267
19 239011.14064763562
20 209854.62084150483
21 185155.22523234415
22 164041.22216083
23 145867.86745498492
24 130127.7509176284
25 116431.02644030296
26 104461.6063021034
27 93966.22293483978
28 84729.62457731686
29 76579.02304912546
30 69363.8234690314
31 62956.305641464955
32 57250.98319896429
33 52160.08516658566
34 47605.818911365845
35 43520.440433598415
36 39849.35769058479
37 36544.9635105283
38 33563.42940815117
39 30870.142830796656
40 28431.835909946523
41 26218.916704759213
42 24211.412922572483
43 22384.30790953869
44 20719.529471764206
45 19199.11507354322
46 17808.62042018764
47 


PyTorch: Tensors
----------------

这次我们使用PyTorch tensors来创建前向神经网络，计算损失，以及反向传播。

一个PyTorch Tensor很像一个numpy的ndarray。但是它和numpy ndarray最大的区别是，PyTorch Tensor可以在CPU或者GPU上运算。如果想要在GPU上运算，就需要把Tensor换成cuda类型。


In [56]:
N, D_in, H, D_out = 64, 1000, 100, 10   # N表示训练数据的个数， D_in表示输入的特征数 H是中间层，

# 随机创建一下训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H)
w2 = torch.randn(H, D_out)

learning_rate = 1e-6

for it in range(500):
    # forward pass
    h = x.mm(w1)   # N*H
    h_relu = h.clamp(min=0)   # 这类似一个夹子， 只需要最小值是0就可以了
    y_pred = h_relu.mm(w2)   # N*D_out
    
    # compute loss
    loss = (y_pred-y).pow(2).sum().item()
    print(it, loss)
    
    # Backward pass
    # compute the gradient
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.T)
    grad_h = grad_h_relu.clone()
    grad_h[h<0] = 0
    grad_w1 = x.t().mm(grad_h)
    
    # update weithts of w1 and w2
    w1 -= learning_rate  * grad_w1
    w2 -= learning_rate * grad_w2

0 30791678.0
1 26445046.0
2 26094528.0
3 25603024.0
4 22585934.0
5 17110796.0
6 11192685.0
7 6643192.0
8 3854957.25
9 2337365.0
10 1537026.75
11 1102976.0
12 849898.6875
13 688406.5
14 575851.625
15 491679.0
16 425364.125
17 371280.78125
18 326206.375
19 288096.875
20 255565.078125
21 227582.28125
22 203352.046875
23 182242.671875
24 163824.5625
25 147687.671875
26 133457.5
27 120875.078125
28 109716.1953125
29 99784.171875
30 90923.578125
31 82982.46875
32 75860.0390625
33 69458.78125
34 63693.89453125
35 58488.7265625
36 53781.82421875
37 49522.078125
38 45655.3125
39 42136.515625
40 38925.75390625
41 35998.23046875
42 33322.546875
43 30874.09765625
44 28629.70703125
45 26570.5
46 24679.21484375
47 22940.9765625
48 21341.61328125
49 19867.580078125
50 18507.841796875
51 17256.236328125
52 16099.46875
53 15028.9267578125
54 14037.8076171875
55 13119.4931640625
56 12267.853515625
57 11477.5146484375
58 10743.4365234375
59 10061.3984375
60 9426.9599609375
61 8836.337890625
62 8286.03222

389 0.0014583675656467676
390 0.0014074859209358692
391 0.001359925139695406
392 0.001312759704887867
393 0.0012702061794698238
394 0.0012273528845980763
395 0.0011851646704599261
396 0.0011452364269644022
397 0.0011088816681876779
398 0.0010719896527007222
399 0.001038785558193922
400 0.0010048933327198029
401 0.0009719434310682118
402 0.0009411567589268088
403 0.0009119894239120185
404 0.0008824139949865639
405 0.0008561529102735221
406 0.0008304755319841206
407 0.0008055629441514611
408 0.0007815311546437442
409 0.000758456124458462
410 0.0007338350987993181
411 0.0007123203831724823
412 0.000692638277541846
413 0.0006712055183015764
414 0.000650823290925473
415 0.0006328407325781882
416 0.000615122786257416
417 0.0005976948887109756
418 0.0005814839387312531
419 0.0005641697207465768
420 0.0005480671534314752
421 0.0005319354822859168
422 0.000517569191288203
423 0.0005045298021286726
424 0.000489511585328728
425 0.00047623959835618734
426 0.00046370894415304065
427 0.0004513983731

简单的autograd

In [65]:
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

y = w * x + b   #  y = 2*1+3

y.backward()
# dy / dw = x
print(w.grad)
print(b.grad)
print(x.grad)

tensor(1.)
tensor(1.)
tensor(2.)



PyTorch: Tensor和autograd
-------------------------------

PyTorch的一个重要功能就是autograd，也就是说只要定义了forward pass(前向神经网络)，计算了loss之后，PyTorch可以自动求导计算模型所有参数的梯度。

一个PyTorch的Tensor表示计算图中的一个节点。如果``x``是一个Tensor并且``x.requires_grad=True``那么``x.grad``是另一个储存着``x``当前梯度(相对于一个scalar，常常是loss)的向量。


In [90]:
N, D_in, H, D_out = 64, 1000, 100, 10   # N表示训练数据的个数， D_in表示输入的特征数 H是中间层，

# 随机创建一下训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

learning_rate = 1e-6

for it in range(500):
    # forward pass
    y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
    # compute lossafd
    loss = (y_pred-y).pow(2).sum()  #  computation graph
    print(it, loss.item())
    
    # Backward pass
    # compute the gradient
    loss.backward()
    
    with torch.no_grad():
        # update weithts of w1 and w2
        w1 -= learning_rate  * w1.grad
        w2 -= learning_rate * w2.grad

        w1.grad.zero_()
        w2.grad.zero_()

0 31616036.0
1 26910740.0
2 23115910.0
3 18284996.0
4 13019692.0
5 8476639.0
6 5308443.0
7 3353285.5
8 2222494.75
9 1566421.125
10 1171352.875
11 917815.75
12 743820.8125
13 616951.625
14 519623.0625
15 442455.40625
16 379774.625
17 328044.9375
18 284879.25
19 248502.0
20 217591.625
21 191160.09375
22 168469.421875
23 148881.375
24 131900.125
25 117140.6015625
26 104250.53125
27 92962.9375
28 83061.46875
29 74354.421875
30 66671.0078125
31 59879.7578125
32 53865.0546875
33 48525.4296875
34 43776.35546875
35 39549.33203125
36 35775.55078125
37 32400.7109375
38 29378.3125
39 26669.7421875
40 24238.06640625
41 22050.6796875
42 20081.69921875
43 18305.859375
44 16704.373046875
45 15258.2119140625
46 13951.43359375
47 12767.556640625
48 11694.2470703125
49 10720.9619140625
50 9835.8359375
51 9031.12890625
52 8298.5986328125
53 7631.13330078125
54 7022.35205078125
55 6466.5947265625
56 5959.0322265625
57 5495.00341796875
58 5070.2724609375
59 4681.19775390625
60 4324.720703125
61 3997.661132

404 0.0003731283650267869
405 0.0003634344320744276
406 0.00035453628515824676
407 0.0003459243453107774
408 0.0003372969222255051
409 0.00032950006425380707
410 0.0003206252004019916
411 0.00031325581949204206
412 0.0003054137050639838
413 0.00029831999563612044
414 0.00029064068803563714
415 0.00028391333762556314
416 0.0002772982115857303
417 0.0002702448982745409
418 0.0002639677841216326
419 0.0002585936163086444
420 0.00025243283016607165
421 0.00024709024000912905
422 0.00024126422067638487
423 0.00023520219838246703
424 0.00023066128778737038
425 0.00022541631187777966
426 0.00022062752395868301
427 0.00021593015117105097
428 0.00021119722805451602
429 0.0002068128960672766
430 0.0002025668800342828
431 0.00019842162146233022
432 0.00019428404630161822
433 0.00018990600074175745
434 0.00018565051141195
435 0.00018189370166510344
436 0.0001781819883035496
437 0.00017459750233683735
438 0.00017080614634323865
439 0.00016727806359995157
440 0.00016375993436668068
441 0.00016084051

In [80]:
N, D_in, H, D_out = 64, 1000, 100, 10   # N表示训练数据的个数， D_in表示输入的特征数 H是中间层，

# 随机创建一下训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)


w1 = torch.randn(D_in, H, requires_grad=True)
w2 = torch.randn(H, D_out, requires_grad=True)

y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
# compute loss
loss = (y_pred-y).pow(2).sum()  #  computation graph
loss.backward()

w1.grad

tensor([[ -4759.0889,  -3616.0903,  -3843.6018,  ..., -11072.5781,
          -2451.3350,  -1766.1671],
        [ -7967.2842,    526.7576,   -284.1399,  ...,   4579.7217,
          -3875.2419,  -6572.7490],
        [  3489.5291,  -5378.6558,  31709.0645,  ...,   3620.1050,
         -12476.5996,  10473.9258],
        ...,
        [  6210.0361,   3189.0911,  -2162.0293,  ..., -10237.3037,
          -3920.1753,   1129.4575],
        [  6645.1338,   -698.3760,   3044.2788,  ...,   1304.6511,
           2051.7354,  -5745.3848],
        [   941.8529,   2244.2927,  12504.7578,  ..., -11249.3877,
           9321.3955,   9463.3652]])

In [89]:
y_pred = x.mm(w1).clamp(min=0).mm(w2)
    
# compute loss
loss = (y_pred-y).pow(2).sum()  #  computation graph
loss.backward()

w1.grad

tensor([[ -47590.8945,  -36160.9023,  -38436.0195,  ..., -110725.7812,
          -24513.3516,  -17661.6719],
        [ -79672.8438,    5267.5767,   -2841.3989,  ...,   45797.2227,
          -38752.4180,  -65727.4922],
        [  34895.2891,  -53786.5586,  317090.6562,  ...,   36201.0508,
         -124766.0078,  104739.2500],
        ...,
        [  62100.3555,   31890.9141,  -21620.2930,  ..., -102373.0391,
          -39201.7539,   11294.5742],
        [  66451.3359,   -6983.7598,   30442.7910,  ...,   13046.5127,
           20517.3516,  -57453.8438],
        [   9418.5293,   22442.9277,  125047.5781,  ..., -112493.8906,
           93213.9609,   94633.6641]])


PyTorch: nn
-----------


这次我们使用PyTorch中nn这个库来构建网络。
用PyTorch autograd来构建计算图和计算gradients，
然后PyTorch会帮我们自动计算gradient。




In [101]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10   # N表示训练数据的个数， D_in表示输入的特征数 H是中间层，

# 随机创建一下训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
# this is key poing 
torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)
#model = model.cuda()

loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-6

for it in range(500):
    # forward pass
    y_pred = model(x)   # model.forward()
    
    # compute lossa
    loss = (y_pred-y).pow(2).sum()  #  computation graph
    print(it, loss.item())
    
    
    model.zero_grad()
    # Backward pass
    # compute the gradient
    loss.backward()
    
    with torch.no_grad():
        for param in model.parameters():   # param (tensor, grad)
            param -= learning_rate * param.grad

0 35238972.0
1 33932324.0
2 37197840.0
3 38079456.0
4 31585246.0
5 20214708.0
6 10358612.0
7 4953301.5
8 2588858.25
9 1606277.25
10 1155294.5
11 908820.375
12 748443.25
13 631018.5
14 538933.125
15 464240.78125
16 402280.40625
17 350378.40625
18 306484.4375
19 269148.40625
20 237199.09375
21 209730.34375
22 186008.375
23 165437.0625
24 147528.90625
25 131894.3125
26 118202.921875
27 106162.625
28 95544.0
29 86154.015625
30 77827.5
31 70433.21875
32 63848.4609375
33 57966.515625
34 52700.1484375
35 47975.8984375
36 43729.375
37 39906.140625
38 36459.12109375
39 33346.390625
40 30535.552734375
41 27990.8125
42 25682.162109375
43 23584.416015625
44 21676.525390625
45 19939.251953125
46 18356.3984375
47 16912.81640625
48 15593.1484375
49 14386.158203125
50 13282.5390625
51 12270.5078125
52 11342.6025390625
53 10491.02734375
54 9710.822265625
55 8995.533203125
56 8337.0966796875
57 7730.38671875
58 7171.224609375
59 6655.3779296875
60 6179.17138671875
61 5739.3876953125
62 5332.9736328125
6

408 0.00012189468543510884
409 0.00011925929720746353
410 0.00011706296936608851
411 0.00011444018309703097
412 0.0001122946705436334
413 0.00010994556942023337
414 0.0001074321917258203
415 0.00010558062786003575
416 0.00010360986198065802
417 0.00010137570643564686
418 9.942957694875076e-05
419 9.783959831111133e-05
420 9.589402179699391e-05
421 9.460079309064895e-05
422 9.269714064430445e-05
423 9.082678298000246e-05
424 8.929644536692649e-05
425 8.765991515247151e-05
426 8.612287638243288e-05
427 8.462368714390323e-05
428 8.30735734780319e-05
429 8.1426820543129e-05
430 8.028095180634409e-05
431 7.868491957196966e-05
432 7.71208869991824e-05
433 7.561939128208905e-05
434 7.443901267834008e-05
435 7.359635492321104e-05
436 7.223497959785163e-05
437 7.088216807460412e-05
438 6.959830352570862e-05
439 6.855576793896034e-05
440 6.739485979778692e-05
441 6.618841143790632e-05
442 6.495422712760046e-05
443 6.412174843717366e-05
444 6.30939903203398e-05
445 6.219585338840261e-05
446 6.119


PyTorch: optim
--------------

这一次我们不再手动更新模型的weights,而是使用optim这个包来帮助我们更新参数。
optim这个package提供了各种不同的模型优化方法，包括SGD+momentum, RMSProp, Adam等等。


In [109]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10   # N表示训练数据的个数， D_in表示输入的特征数 H是中间层，

# 随机创建一下训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

torch.nn.init.normal_(model[0].weight)
torch.nn.init.normal_(model[2].weight)

loss_fn = nn.MSELoss(reduction='sum')
# learning_rate = 1e-4
# optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

learning_rate = 1e-6
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

for it in range(500):
    # forward pass
    y_pred = model(x)   # model.forward()
    
    # compute lossa
    loss = (y_pred-y).pow(2).sum()  #  computation graph
    print(it, loss.item())
    
    
    optimizer.zero_grad()
    # Backward pass
    # compute the gradient
    loss.backward()
    
    # update model parameters
    optimizer.step()

0 30046644.0
1 26872956.0
2 28097106.0
3 28902388.0
4 26177116.0
5 19417960.0
6 11955322.0
7 6495286.0
8 3477305.25
9 1995551.25
10 1280588.5
11 912005.8125
12 700790.125
13 564917.3125
14 468560.75
15 395407.46875
16 337437.5
17 290232.53125
18 251112.59375
19 218351.125
20 190684.0625
21 167157.421875
22 147045.3125
23 129786.7109375
24 114939.4140625
25 102085.625
26 90920.421875
27 81176.609375
28 72648.046875
29 65172.3125
30 58591.8828125
31 52785.984375
32 47647.140625
33 43085.46875
34 39026.28125
35 35407.8359375
36 32174.26171875
37 29280.078125
38 26685.390625
39 24352.009765625
40 22250.689453125
41 20356.501953125
42 18646.419921875
43 17099.05859375
44 15697.3388671875
45 14425.6923828125
46 13269.6142578125
47 12217.634765625
48 11258.74609375
49 10383.6025390625
50 9584.3984375
51 8853.5498046875
52 8184.451171875
53 7571.14306640625
54 7008.18603515625
55 6491.107421875
56 6015.79443359375
57 5578.5224609375
58 5176.0947265625
59 4805.8056640625
60 4464.119140625
61 41

429 5.89289229537826e-05
430 5.805636101285927e-05
431 5.718878674088046e-05
432 5.6409957323921844e-05
433 5.519092519534752e-05
434 5.436176434159279e-05
435 5.337605398381129e-05
436 5.264154606265947e-05
437 5.1787013944704086e-05
438 5.111621067044325e-05
439 5.0486116379033774e-05
440 4.982515747542493e-05
441 4.89888661832083e-05
442 4.852009078604169e-05
443 4.779280425282195e-05
444 4.688429908128455e-05
445 4.6387413021875545e-05
446 4.554789120447822e-05
447 4.469692794373259e-05
448 4.39111863670405e-05
449 4.324377732700668e-05
450 4.274714592611417e-05
451 4.195357541902922e-05
452 4.1634189983597025e-05
453 4.100361547898501e-05
454 4.048449409310706e-05
455 4.005493246950209e-05
456 3.954900967073627e-05
457 3.889204890583642e-05
458 3.8488626159960404e-05
459 3.787735477089882e-05
460 3.7536727177212015e-05
461 3.7010198866482824e-05
462 3.6553730751620606e-05
463 3.6164652556180954e-05
464 3.554927388904616e-05
465 3.528537490637973e-05
466 3.48001231031958e-05
467 3.


PyTorch: 自定义 nn Modules
--------------------------

我们可以定义一个模型，这个模型继承自nn.Module类。如果需要定义一个比Sequential模型更加复杂的模型，就需要定义nn.Module模型。



In [110]:
import torch.nn as nn

N, D_in, H, D_out = 64, 1000, 100, 10   # N表示训练数据的个数， D_in表示输入的特征数 H是中间层，

# 随机创建一下训练数据
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        super(TwoLayerNet, self).__init__()
        
        # define the model architecture
        self.linear1 = torch.nn.Linear(D_in, H, bias=False)
        self.linear2 = torch.nn.Linear(H, D_out, bias=False)
    
    def forward(self, x):
        y_pred = self.linear2(self.linear1(x).clamp(min=0))
        return y_pred

model = TwoLayerNet(D_in, H, D_out)
loss_fn = nn.MSELoss(reduction='sum')
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)


for it in range(500):
    # forward pass
    y_pred = model(x)   # model.forward()
    
    # compute lossa
    loss = (y_pred-y).pow(2).sum()  #  computation graph
    print(it, loss.item())
    
    
    optimizer.zero_grad()
    # Backward pass
    # compute the gradient
    loss.backward()
    
    # update model parameters
    optimizer.step()

0 677.4461059570312
1 660.1953125
2 643.40283203125
3 627.0911254882812
4 611.3529052734375
5 596.0787353515625
6 581.1334838867188
7 566.6232299804688
8 552.5125122070312
9 538.8779296875
10 525.659912109375
11 512.8281860351562
12 500.3941650390625
13 488.2879638671875
14 476.56317138671875
15 465.1693420410156
16 454.06756591796875
17 443.27435302734375
18 432.8061218261719
19 422.67767333984375
20 412.8088684082031
21 403.2757568359375
22 394.0106201171875
23 384.9906921386719
24 376.22393798828125
25 367.6571044921875
26 359.3021240234375
27 351.13427734375
28 343.1561279296875
29 335.37164306640625
30 327.7716369628906
31 320.35675048828125
32 313.0692138671875
33 305.9927978515625
34 299.08563232421875
35 292.3280029296875
36 285.73370361328125
37 279.2716064453125
38 272.9289855957031
39 266.7222595214844
40 260.62432861328125
41 254.62646484375
42 248.77110290527344
43 243.0587615966797
44 237.45980834960938
45 231.94979858398438
46 226.5487518310547
47 221.24937438964844
48 2

404 0.0005624009063467383
405 0.0005429052980616689
406 0.0005240599857643247
407 0.0005058430833742023
408 0.00048823212273418903
409 0.0004712125810328871
410 0.00045476065133698285
411 0.00043887001811526716
412 0.000423512130510062
413 0.00040866376366466284
414 0.00039432247285731137
415 0.00038046445115469396
416 0.0003670684527605772
417 0.00035413316800259054
418 0.00034162792144343257
419 0.0003295606002211571
420 0.00031788949854671955
421 0.00030662002973258495
422 0.00029573836945928633
423 0.00028522347565740347
424 0.0002750659768935293
425 0.0002652580151334405
426 0.0002557891421020031
427 0.0002466423320583999
428 0.00023780978517606854
429 0.00022928013640921563
430 0.0002210510428994894
431 0.0002130962529918179
432 0.0002054244832834229
433 0.00019800462177954614
434 0.00019085522217210382
435 0.00018394859216641635
436 0.00017728273815009743
437 0.00017084863793570548
438 0.00016464114014524966
439 0.0001586447178851813
440 0.00015286104462575167
441 0.000147278682

# FizzBuzz

FizzBuzz是一个简单的小游戏。游戏规则如下：从1开始往上数数，当遇到3的倍数的时候，说fizz，当遇到5的倍数，说buzz，当遇到15的倍数，就说fizzbuzz，其他情况下则正常数数。

我们可以写一个简单的小程序来决定要返回正常数值还是fizz, buzz 或者 fizzbuzz。

In [1]:
# one-hot encode the desired outputs: [number, "fizz", "buzz", "fizzbuzz"]
def fizz_buzz_encode(i):
    if i % 15 == 0: return 3
    elif i % 5 == 0: return 2
    elif i % 3 == 0: return 1
    else:            return 0

def fizz_buzz_decode(i, prediction):
    return [str(i), "fizz", "buzz", "fizzbuzz"][prediction]

print(fizz_buzz_decode(1, fizz_buzz_encode(1)))
print(fizz_buzz_decode(2, fizz_buzz_encode(2)))
print(fizz_buzz_decode(5, fizz_buzz_encode(5)))
print(fizz_buzz_decode(12, fizz_buzz_encode(12)))
print(fizz_buzz_decode(15, fizz_buzz_encode(15)))

1
2
buzz
fizz
fizzbuzz


In [2]:
[str(1), "fizz", "buzz", "fizzbuzz"]

['1', 'fizz', 'buzz', 'fizzbuzz']

我们首先定义模型的输入与输出(训练数据)

In [3]:
import numpy as np
import torch

NUM_DIGITS = 15

def binary_encode(i, num_digits):
    return np.array([i >> d & 1 for d in range(num_digits)])

trX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(101, 2 ** NUM_DIGITS)])
trY = torch.LongTensor([fizz_buzz_encode(i) for i in range(101, 2 ** NUM_DIGITS)])

trX.shape

torch.Size([32667, 15])

然后我们用PyTorch定义模型

In [6]:
# Define the model
NUM_HIDDEN1 = 100
NUM_HIDDEN2 = 60
model = torch.nn.Sequential(
    torch.nn.Linear(NUM_DIGITS, NUM_HIDDEN1),
    torch.nn.ReLU(),
    torch.nn.Linear(NUM_HIDDEN1, NUM_HIDDEN2),
    torch.nn.ReLU(),
    torch.nn.Linear(NUM_HIDDEN2, 4)
)

- 为了让我们的模型学会FizzBuzz这个游戏，我们需要定义一个损失函数，和一个优化算法。
- 这个优化算法会不断优化（降低）损失函数，使得模型的在该任务上取得尽可能低的损失值。
- 损失值低往往表示我们的模型表现好，损失值高表示我们的模型表现差。
- 由于FizzBuzz游戏本质上是一个分类问题，我们选用Cross Entropyy Loss函数。
- 优化函数我们选用Stochastic Gradient Descent。

In [7]:
loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)

以下是模型的训练代码

In [8]:
# start training it
BATCH_SIZE = 128

for epoch in range(5000):
    for start in range (0, len(trX), BATCH_SIZE):
        end = start + BATCH_SIZE
        batchX = trX[start:end]
        batchY = trY[start:end]
        
        y_pred = model(batchX)
        loss = loss_fn(y_pred, batchY)
        
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    # find loss on training data
    loss = loss_fn(model(trX), trY).item()
    print('Epoch: ', epoch, 'Loss:', loss)

Epoch:  0 Loss: 1.138267993927002
Epoch:  1 Loss: 1.1378307342529297
Epoch:  2 Loss: 1.1375750303268433
Epoch:  3 Loss: 1.1374017000198364
Epoch:  4 Loss: 1.137266993522644
Epoch:  5 Loss: 1.13716721534729
Epoch:  6 Loss: 1.1370795965194702
Epoch:  7 Loss: 1.1370128393173218
Epoch:  8 Loss: 1.136949896812439
Epoch:  9 Loss: 1.1368873119354248
Epoch:  10 Loss: 1.1368368864059448
Epoch:  11 Loss: 1.1367844343185425
Epoch:  12 Loss: 1.136731505393982
Epoch:  13 Loss: 1.136678695678711
Epoch:  14 Loss: 1.1366205215454102
Epoch:  15 Loss: 1.136557936668396
Epoch:  16 Loss: 1.1364970207214355
Epoch:  17 Loss: 1.136421799659729
Epoch:  18 Loss: 1.136338233947754
Epoch:  19 Loss: 1.1362360715866089
Epoch:  20 Loss: 1.1361300945281982
Epoch:  21 Loss: 1.1359829902648926
Epoch:  22 Loss: 1.135834813117981
Epoch:  23 Loss: 1.135672688484192
Epoch:  24 Loss: 1.1354960203170776
Epoch:  25 Loss: 1.1353040933609009
Epoch:  26 Loss: 1.1351165771484375
Epoch:  27 Loss: 1.1348793506622314
Epoch:  28 Los

Epoch:  226 Loss: 0.14506632089614868
Epoch:  227 Loss: 0.14081089198589325
Epoch:  228 Loss: 0.5455402135848999
Epoch:  229 Loss: 0.10637131333351135
Epoch:  230 Loss: 0.28111568093299866
Epoch:  231 Loss: 0.245904803276062
Epoch:  232 Loss: 0.22363275289535522
Epoch:  233 Loss: 0.2082826793193817
Epoch:  234 Loss: 0.20129109919071198
Epoch:  235 Loss: 0.15661315619945526
Epoch:  236 Loss: 0.13792641460895538
Epoch:  237 Loss: 0.09344135224819183
Epoch:  238 Loss: 0.07729433476924896
Epoch:  239 Loss: 0.05960829555988312
Epoch:  240 Loss: 0.09379560500383377
Epoch:  241 Loss: 0.09196894615888596
Epoch:  242 Loss: 0.059797193855047226
Epoch:  243 Loss: 0.09074312448501587
Epoch:  244 Loss: 0.08205784857273102
Epoch:  245 Loss: 0.09499765932559967
Epoch:  246 Loss: 0.05609648674726486
Epoch:  247 Loss: 0.09355001896619797
Epoch:  248 Loss: 0.08232072740793228
Epoch:  249 Loss: 0.06323523074388504
Epoch:  250 Loss: 0.05026042088866234
Epoch:  251 Loss: 0.14104458689689636
Epoch:  252 Los

Epoch:  439 Loss: 0.0056419288739562035
Epoch:  440 Loss: 0.005249716341495514
Epoch:  441 Loss: 0.005272258538752794
Epoch:  442 Loss: 0.005351168569177389
Epoch:  443 Loss: 0.005225026980042458
Epoch:  444 Loss: 0.005404407158493996
Epoch:  445 Loss: 0.005112088285386562
Epoch:  446 Loss: 0.005360560491681099
Epoch:  447 Loss: 0.005011062603443861
Epoch:  448 Loss: 0.0051924255676567554
Epoch:  449 Loss: 0.004995359107851982
Epoch:  450 Loss: 0.005179122556000948
Epoch:  451 Loss: 0.004964151885360479
Epoch:  452 Loss: 0.0049764420837163925
Epoch:  453 Loss: 0.004921347834169865
Epoch:  454 Loss: 0.005083159543573856
Epoch:  455 Loss: 0.004995716270059347
Epoch:  456 Loss: 0.004817563574761152
Epoch:  457 Loss: 0.0049244193360209465
Epoch:  458 Loss: 0.004777616821229458
Epoch:  459 Loss: 0.004909788258373737
Epoch:  460 Loss: 0.004752723034471273
Epoch:  461 Loss: 0.00477380957454443
Epoch:  462 Loss: 0.004658168647438288
Epoch:  463 Loss: 0.0049100336618721485
Epoch:  464 Loss: 0.0

Epoch:  647 Loss: 0.0017645381158217788
Epoch:  648 Loss: 0.0017914645140990615
Epoch:  649 Loss: 0.0017902274848893285
Epoch:  650 Loss: 0.0017523820279166102
Epoch:  651 Loss: 0.0017184412572532892
Epoch:  652 Loss: 0.0017474258784204721
Epoch:  653 Loss: 0.00173443544190377
Epoch:  654 Loss: 0.0017155759269371629
Epoch:  655 Loss: 0.0017276947619393468
Epoch:  656 Loss: 0.001683209789916873
Epoch:  657 Loss: 0.0016832036199048162
Epoch:  658 Loss: 0.0016827438957989216
Epoch:  659 Loss: 0.0016603353433310986
Epoch:  660 Loss: 0.0016437646700069308
Epoch:  661 Loss: 0.0016427726950496435
Epoch:  662 Loss: 0.0016352564562112093
Epoch:  663 Loss: 0.0016116272890940309
Epoch:  664 Loss: 0.0016137947095558047
Epoch:  665 Loss: 0.0015883657615631819
Epoch:  666 Loss: 0.001580648124217987
Epoch:  667 Loss: 0.0015681576915085316
Epoch:  668 Loss: 0.0015622233040630817
Epoch:  669 Loss: 0.0015531626995652914
Epoch:  670 Loss: 0.0015053011011332273
Epoch:  671 Loss: 0.0015365466242656112
Epoc

Epoch:  853 Loss: 0.0007040233467705548
Epoch:  854 Loss: 0.0007047598483040929
Epoch:  855 Loss: 0.0007018456817604601
Epoch:  856 Loss: 0.0006984574138186872
Epoch:  857 Loss: 0.0006973659037612379
Epoch:  858 Loss: 0.0006946188514120877
Epoch:  859 Loss: 0.000691103283315897
Epoch:  860 Loss: 0.0006922857137396932
Epoch:  861 Loss: 0.0006902097375132143
Epoch:  862 Loss: 0.0006850683130323887
Epoch:  863 Loss: 0.0006865118630230427
Epoch:  864 Loss: 0.0006811614730395377
Epoch:  865 Loss: 0.0006824195152148604
Epoch:  866 Loss: 0.0006787504535168409
Epoch:  867 Loss: 0.0006756570073775947
Epoch:  868 Loss: 0.0006763159180991352
Epoch:  869 Loss: 0.0006739312084391713
Epoch:  870 Loss: 0.0006705901469103992
Epoch:  871 Loss: 0.0006703922990709543
Epoch:  872 Loss: 0.0006672871531918645
Epoch:  873 Loss: 0.0006639096536673605
Epoch:  874 Loss: 0.0006655961042270064
Epoch:  875 Loss: 0.0006629239651374519
Epoch:  876 Loss: 0.000659655430354178
Epoch:  877 Loss: 0.0006608147523365915
Ep

KeyboardInterrupt: 

最后我们用训练好的模型尝试在1到100这些数字上玩FizzBuzz游戏

In [9]:
# Output now
testX = torch.Tensor([binary_encode(i, NUM_DIGITS) for i in range(1, 101)])
with torch.no_grad():
    testY = model(testX)

predictions = zip(range(1, 101), list(testY.max(1)[1].data.tolist()))

print([fizz_buzz_decode(i, x) for (i, x) in predictions])

['1', '2', 'fizz', '4', 'buzz', 'fizz', '7', '8', 'fizz', 'buzz', '11', 'fizz', '13', '14', 'fizzbuzz', '16', '17', 'fizz', '19', 'buzz', 'fizz', '22', '23', 'fizz', 'buzz', '26', 'fizz', '28', '29', 'fizzbuzz', '31', '32', 'fizz', '34', 'buzz', 'fizz', '37', '38', 'fizz', 'buzz', '41', 'fizz', '43', '44', 'fizzbuzz', '46', '47', 'fizz', '49', 'buzz', 'fizz', '52', '53', 'fizz', 'buzz', '56', 'fizz', '58', '59', 'fizzbuzz', '61', '62', 'fizz', '64', 'buzz', 'fizz', '67', '68', 'fizz', 'buzz', '71', 'fizz', '73', '74', 'fizzbuzz', '76', '77', 'fizz', '79', 'buzz', 'fizz', '82', '83', 'fizz', 'buzz', '86', 'fizz', '88', '89', 'fizzbuzz', '91', '92', 'fizz', '94', 'buzz', 'fizz', '97', '98', 'fizz', 'buzz']


In [10]:
testY.max(1)

torch.return_types.max(
values=tensor([17.7681, 17.1494, 13.3867, 17.8413, 12.3776, 19.1203, 25.4462, 14.3620,
        16.1881, 20.8343, 12.9536, 15.7778, 15.9730, 14.4690, 18.2388, 12.8996,
        16.2242, 12.7267, 12.2100, 16.9453, 14.9159, 21.9203, 19.6804, 16.6676,
        14.7927, 12.9554, 18.4135, 16.3170, 15.4389, 20.3333, 12.5983, 18.0805,
        16.2848, 20.6795, 14.3171, 10.0244, 19.8468, 17.7737, 15.7562, 10.8759,
        12.6595, 17.4195, 16.9138,  6.4123, 18.7537, 10.8812, 15.3287, 17.7650,
        16.1690, 16.9489, 14.3415, 17.7596, 19.7313, 14.6282, 17.9475, 13.4497,
        13.6473, 17.7920, 16.5408, 22.9646, 11.9299, 18.0382, 20.1401, 18.4846,
        12.7421, 20.2961, 25.7020, 15.0136, 15.8155, 17.5675, 17.4281, 14.8811,
        15.9034, 14.4747, 16.9561,  8.4638, 14.9793, 19.2727, 15.5045, 16.3638,
        15.3432, 22.4011, 20.7857, 16.8445, 16.9367, 15.6769, 19.8419, 16.8357,
        15.6094, 19.8485, 14.4447, 16.5610, 14.1885, 12.5160, 12.3527, 10.3508,
        1

In [11]:
print(np.sum(testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])))
testY.max(1)[1].numpy() == np.array([fizz_buzz_encode(i) for i in range(1,101)])

100


array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True])