# 性能建议

+ 编写类型稳定的代码
  - Julia程序执行分为编译期和执行期，编译期就能确定变量类型的代码为类型稳定的代码， 如果必须要程序真正跑起来才能确定变量类型，那说明类型不稳定，可以用`@code_warntype`检查
  - Julia的编译器会自动进行类型推断，一般来说不需要进行类型标注，除了含糊不清的情况
+ 避免内存分配
  - 必要时使用数组的视图，而不是copy
+ 并行计算(适用于计算顺序不影响最终结果的for-loop中)
  - `@simd` 单指令多数据
  - `@turbo` 进阶版的`@simd`， 可以替代`@inbounds @simd`
  - `@threads` 单进程多线程，共享一块内存区域
  - 多进程与分布式数据（或者copy数据到所有其它julia进程），每个julia程序单独享用一块内存区域
  - 以上几种， 优先使用`@turbo`， 其次使用多线程

## 避免定义全局变量
+ 避免使用`global x = 0`来定义全局变量
+ 使用`const x = 0`定义全局常量对性能是有益的

## @btime进行性能比较

`@btime`用来判断程序执行时间和内存分配，julia自带的`@time`第一次执行加入了编译时间，建议使用`@btime`多次执行`@benchmark`后取的最小时间，不包括编译时间，可能会比较慢

In [1]:
using BenchmarkTools

In [2]:
@btime mapreduce(x->sin(x), +, 1:10)

  46.768 ns (0 allocations: 0 bytes)


1.4112

In [3]:
@benchmark mapreduce(x->sin(x), +, 1:10)

BenchmarkTools.Trial: 10000 samples with 990 evaluations.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m45.657 ns[22m[39m … [35m110.505 ns[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 0.00%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m46.566 ns               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m49.887 ns[22m[39m ± [32m  6.228 ns[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m0.00% ± 0.00%

  [39m▂[39m█[34m▃[39m[39m▃[39m▄[39m▂[39m▂[39m▃[32m▂[39m[39m▂[39m▃[39m▃[39m▂[39m [39m [39m [39m▂[39m▂[39m▂[39m▁[39m▂[39m [39m▁[39m [39m [39m [39m▁[39m [39m [39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁
  [39m█[39m█[34m█[39m[39

## @code_warntype判断类型是否稳定

In [4]:
function mysum(A)
    s = 0
    for a in A
        s += a
    end
    return s
end

mysum (generic function with 1 method)

In [5]:
a = rand(10^7)
@code_warntype mysum(a)

MethodInstance for mysum(::Vector{Float64})
  from mysum(A) in Main at In[4]:1
Arguments
  #self#[36m::Core.Const(mysum)[39m
  A[36m::Vector{Float64}[39m
Locals
  @_3[33m[1m::Union{Nothing, Tuple{Float64, Int64}}[22m[39m
  s[91m[1m::Union{Float64, Int64}[22m[39m
  a[36m::Float64[39m
Body[91m[1m::Union{Float64, Int64}[22m[39m
[90m1 ─[39m       (s = 0)
[90m│  [39m %2  = A[36m::Vector{Float64}[39m
[90m│  [39m       (@_3 = Base.iterate(%2))
[90m│  [39m %4  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %5  = Base.not_int(%4)[36m::Bool[39m
[90m└──[39m       goto #4 if not %5
[90m2 ┄[39m %7  = @_3[36m::Tuple{Float64, Int64}[39m
[90m│  [39m       (a = Core.getfield(%7, 1))
[90m│  [39m %9  = Core.getfield(%7, 2)[36m::Int64[39m
[90m│  [39m       (s = s + a)
[90m│  [39m       (@_3 = Base.iterate(%2, %9))
[90m│  [39m %12 = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %13 = Base.not_int(%12)[36m::Bool[39m
[90m└──[39m       goto #4 if not %1

关注输出中的红色字体，红色表示类型不稳定的代码声明, `s`既可能是Float64也可能是Int64类型，类型不稳定

In [164]:
function mysum3(A)
    s = zero(A)
    for i in eachindex(A)
        s += A[i]
    end
    return s
end

mysum3 (generic function with 1 method)

In [171]:
@code_warntype mysum2(rand(10))

MethodInstance for mysum2(::Vector{Float64})
  from mysum2(A::AbstractArray) in Main at In[141]:1
Arguments
  #self#[36m::Core.Const(mysum2)[39m
  A[36m::Vector{Float64}[39m
Locals
  @_3[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  s[36m::Vector{Float64}[39m
  i[36m::Int64[39m
Body[36m::Vector{Float64}[39m
[90m1 ─[39m       (s = Main.zero(A))
[90m│  [39m %2  = Main.eachindex(A)[36m::Base.OneTo{Int64}[39m
[90m│  [39m       (@_3 = Base.iterate(%2))
[90m│  [39m %4  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %5  = Base.not_int(%4)[36m::Bool[39m
[90m└──[39m       goto #3 if not %5
[90m2 ─[39m %7  = @_3[36m::Tuple{Int64, Int64}[39m
[90m│  [39m       (i = Core.getfield(%7, 1))
[90m│  [39m       Core.getfield(%7, 2)
[90m│  [39m %10 = s[36m::Vector{Float64}[39m
[90m│  [39m %11 = Base.getindex(A, i)[36m::Float64[39m
[90m│  [39m       (s = %10 + %11)
[90m│  [39m       Core.Const(:(@_3 = Base.iterate(%2, %9)))
[90m│  [39m       Core

LoadError: UndefVarError: next not defined

In [103]:
iterate(rand(10))

(0.5840, 2)

In [104]:
a = rand(10)

10-element Vector{Float64}:
 0.4807
 0.4590
 0.9570
 0.8708
 0.0067
 0.3248
 0.4342
 0.0880
 0.0953
 0.3260

In [121]:
empty()

Any[]

## 广播VS循环

R和Matlab的向量化代码比较快的原因是
+ 向量化代码用C语言展开为for-loop，不受到R或Matlab本身的限制
+ 底层使用C时，数值类型明确，可以触发更加高效的优化手段（如simd）

In [8]:
N = 1000
a = randn(N)
b = randn(N)
c = rand(N)
d = randn(N) * 2;

In [9]:
function testdot(a, b, c, d)
    return sum(a .* b .+ c./ d .- 1) 
end

testdot (generic function with 1 method)

In [10]:
@btime testdot($a, $b, $c, $d)

  1.300 μs (1 allocation: 7.94 KiB)


-858.9943

In [11]:
function testloop(a, b, c, d)
    s = zero(eltype(a))
    @inbounds @simd for i in eachindex(a)
        s += a[i] * b[i] + c[i] / d[i] - 1
    end
    return s;
end

@btime testloop($a, $b, $c, $d)

  309.442 ns (0 allocations: 0 bytes)


-858.9943

注意这里的for-loop版本的计算很明显要快于向量化计算，原因在于这么几个方面
+ 向量化代码需要更高的内存占用，储存的中间变量为矩阵
+ for-loop使用`@inbounds`，避免了数组边界检查（配合`eachindex`可以确保不会越界）
+ for-loop使用`@simd`，CPU级别的并行，意思是`单指令多数据`，在有AVX指令的Intel CPU的效果最好，原理在于，寄存器较大，一次运算可以同时处理载入的4批数据（4 * 64= 256 bit），注意事项：用在单层循环，循环的顺序对结果没有影响时才可以使用
+ **不使用`@simd` `@inbounds`的循环性能不如向量化版本**

在实际使用中， 向量化操作简洁易懂， 如果把向量化操作写成for-loop的形式，可以单独把它写成一个函数，这样能让程序更简洁

In [12]:
@code_llvm testloop(a, b, c, d)

[90m;  @ In[11]:1 within `testloop`[39m
[90m; Function Attrs: uwtable[39m
[95mdefine[39m [36mdouble[39m [93m@julia_testloop_3654[39m[33m([39m[33m{[39m[33m}[39m[0m* [95mnonnull[39m [95malign[39m [33m16[39m [95mdereferenceable[39m[33m([39m[33m40[39m[33m)[39m [0m%0[0m, [33m{[39m[33m}[39m[0m* [95mnonnull[39m [95malign[39m [33m16[39m [95mdereferenceable[39m[33m([39m[33m40[39m[33m)[39m [0m%1[0m, [33m{[39m[33m}[39m[0m* [95mnonnull[39m [95malign[39m [33m16[39m [95mdereferenceable[39m[33m([39m[33m40[39m[33m)[39m [0m%2[0m, [33m{[39m[33m}[39m[0m* [95mnonnull[39m [95malign[39m [33m16[39m [95mdereferenceable[39m[33m([39m[33m40[39m[33m)[39m [0m%3[33m)[39m [0m#0 [33m{[39m
[91mtop:[39m
[90m;  @ In[11]:3 within `testloop`[39m
[90m; ┌ @ simdloop.jl:69 within `macro expansion`[39m
[90m; │┌ @ abstractarray.jl:285 within `eachindex`[39m
[90m; ││┌ @ abstractarray.jl:116 within `axes1`[39m
[90m; │││

``` julia
@llvm.experimental.vector.reduce.v2.fadd.f64.v4f64(double 0.000000e+00, <4 x double> %bin.rdx)
```
`@code_llvm`的结果里要是有上边那一行基本上就可以确定`@simd`发挥了作用

`@simd`也有一些问题， 据我发现只能用于**加减乘除**操作， 当计算中有`sin, exp`这种形式时无法起作用

LoopVectorization.jl包提供了`@turbo`宏， 这个宏命令提供的加速更加强大， 原理好像和`@simd`差不多

## `@turbo`

In [13]:
using LoopVectorization

> “This library provides the `@turbo` macro, which may be used to **prefix a for loop or broadcast statement**. It then tries to vectorize the loop to improve runtime performance.”

### 加速循环

In [14]:
function noturbo(x1, x2, x3, x4)
    s = zero(eltype(x1))
    for i in eachindex(x1)
        s+= exp(x1[i]) * sin(x2[i]) + x3[i] / x4[i] + 1
    end
    return s
end
function withturbo(x1, x2,x3, x4)
    s = zero(eltype(x1))
    @turbo for i in eachindex(x1)
        s+= exp(x1[i]) * sin(x2[i]) + x3[i] / x4[i] + 1
    end
    return s
end

withturbo (generic function with 1 method)

In [15]:
N = 10000
x1 = randn(N)
x2 = randn(N)
x3 = randn(N)
x4 = randn(N);

In [16]:
@btime noturbo($x1, $x2, $x3, $x4)

  161.600 μs (0 allocations: 0 bytes)


144007.2952

In [17]:
@btime withturbo($x1, $x2, $x3, $x4)

  25.000 μs (0 allocations: 0 bytes)


144007.2952

他可以加速`sin, exp`， 确实比`@simd`强大了很多

### 加速广播

In [18]:
f_noturbo(x1, x2, x3, x4) = sum(x1 .* x2 .* x3 ./ exp.(x4))
f_turbo(x1, x2, x3, x4) = sum(@turbo x1 .* x2 .* x3 ./ exp.(x4))
f2_turbo(x1, x2, x3, x4) = @turbo sum(x1 .* x2 .* x3 ./ exp.(x4))

f2_turbo (generic function with 1 method)

In [19]:
@btime f_noturbo($x1, $x2, $x3, $x4)

  77.000 μs (2 allocations: 78.17 KiB)


188.0698

In [20]:
@btime f_turbo($x1, $x2, $x3, $x4)

  14.100 μs (2 allocations: 78.17 KiB)


188.0698

In [21]:
@btime f2_turbo($x1, $x2, $x3, $x4)

  14.200 μs (2 allocations: 78.17 KiB)


188.0698

确实是很牛， 广播也能加速的很快

### 额外测试

在实际应用中， 我经常需要计算概率密度值， 看一下这种情况能不能算得快，有加速

<font style="color:red;font-size:18pt"> TL,DR: <br> 连续分布可以有加速， 效果还不错， <br> 离散分布会报错</font>

In [22]:
using Distributions, Random
seed = Random.seed!(2022 - 8 - 9)

TaskLocalRNG()

In [23]:
fd_noturbo(x) = sum(logpdf.(Normal(), x))
fd_turbo(x) = sum(@turbo logpdf.(Normal(), x))

fd_turbo (generic function with 1 method)

In [24]:
X = randn(seed, N);

In [25]:
@btime fd_noturbo($X)

  58.700 μs (2 allocations: 78.17 KiB)


-14260.7614

In [26]:
@btime fd_turbo($X)

  27.900 μs (2 allocations: 78.17 KiB)


-14260.7614

牛皮， 竟然快了接近三倍， 再试试负二项分布

In [27]:
fd2_noturbo(x) = sum(logpdf.(Poisson(10), x))
fd2_turbo(x) = sum(@turbo logpdf.(Poisson(10), x))

fd2_turbo (generic function with 1 method)

In [28]:
X2 = rand(seed, Poisson(10), N);

In [29]:
@btime fd2_noturbo($X2)

  345.000 μs (2 allocations: 78.17 KiB)


-25611.7098

In [30]:
X2 = Vector{Float64}(X2);

In [31]:
using Test

In [32]:
@test_broken fd2_turbo(X2)

[33m[1mTest Broken[22m[39m
  Expression: fd2_turbo(X2)

## Val（）

Val可以给Julia编译器提供额外的类型信息

In [61]:
using StaticArrays

In [62]:
SVector{4, Float64}(1:4)

4-element SVector{4, Float64} with indices SOneTo(4):
 1.0000
 2.0000
 3.0000
 4.0000

In [63]:
function static(v::Vector)
    return SVector{length(v), eltype(v)}(v)
end

static (generic function with 1 method)

In [64]:
@code_warntype static([1, 2, 3.0])

MethodInstance for static(::Vector{Float64})
  from static(v::Vector) in Main at In[63]:1
Arguments
  #self#[36m::Core.Const(static)[39m
  v[36m::Vector{Float64}[39m
Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1 = Main.length(v)[36m::Int64[39m
[90m│  [39m %2 = Main.eltype(v)[36m::Core.Const(Float64)[39m
[90m│  [39m %3 = Core.apply_type(Main.SVector, %1, %2)[91m[1m::Type{SVector{_A, Float64}} where _A[22m[39m
[90m│  [39m %4 = (%3)(v)[91m[1m::Any[22m[39m
[90m└──[39m      return %4



为啥显示类型不稳定？程序在编译时仍然不知道length(v)是多少，运行时才知道，但是`eltype(v)`在编译时是知道的，julia能推断出`eltype(v)`

Julia对于`Val`结构体的定义, 可以看出`Val(X)`实例化了一个结构体， 得到`Val{X}()`
```julia
struct Val{X} where X end
Val(X) = Val{X}()
```

In [65]:
function static2(v::Vector, ::Val{l}) where {l}
    return SVector{l, eltype(v)}(v)
end

static2 (generic function with 1 method)

In [66]:
function static3(v::Vector, l::Int64)
    return SVector{l, eltype(v)}(v)
end

static3 (generic function with 1 method)

In [67]:
@code_warntype static2([1, 2, 3], Val(3))

MethodInstance for static2(::Vector{Int64}, ::Val{3})
  from static2(v::Vector, ::Val{l}) where l in Main at In[65]:1
Static Parameters
  l = [36m3[39m
Arguments
  #self#[36m::Core.Const(static2)[39m
  v[36m::Vector{Int64}[39m
  _[36m::Core.Const(Val{3}())[39m
Body[36m::SVector{3, Int64}[39m
[90m1 ─[39m %1 = $(Expr(:static_parameter, 1))[36m::Core.Const(3)[39m
[90m│  [39m %2 = Main.eltype(v)[36m::Core.Const(Int64)[39m
[90m│  [39m %3 = Core.apply_type(Main.SVector, %1, %2)[36m::Core.Const(SVector{3, Int64})[39m
[90m│  [39m %4 = (%3)(v)[36m::SVector{3, Int64}[39m
[90m└──[39m      return %4



上边的代码为啥类型稳定了？ “它将值l作为编译器就可以知道的类型信息告诉julia了” -JonnyChen

In [68]:
@code_warntype static3([1, 2, 3.0], 3)

MethodInstance for static3(::Vector{Float64}, ::Int64)
  from static3(v::Vector, l::Int64) in Main at In[66]:1
Arguments
  #self#[36m::Core.Const(static3)[39m
  v[36m::Vector{Float64}[39m
  l[36m::Int64[39m
Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1 = Main.eltype(v)[36m::Core.Const(Float64)[39m
[90m│  [39m %2 = Core.apply_type(Main.SVector, l, %1)[91m[1m::Type{SVector{_A, Float64}} where _A[22m[39m
[90m│  [39m %3 = (%2)(v)[91m[1m::Any[22m[39m
[90m└──[39m      return %3



这个为啥类型不稳定呢？ 貌似3是作为一个变量传进去的， 在llvm源码中看到 `_A = 3`这个代码， 应该是要先执行这个代码， 再去生成`SVector（）`

## 避免声明容器内元素为抽象类型

容器包含：数组，矩阵，向量，元组，命名元组，结构体，自定义结构体等

In [69]:
struct badstructs
    apple::AbstractString 
end

In [70]:
struct goodstructs{T<:AbstractString}
    apple::T 
end

In [71]:
typeof(badstructs("apple"))

badstructs

In [72]:
typeof(goodstructs("apple"))

goodstructs{String}

## 避免声明容器内的元素为抽象类型

In [73]:
struct badstruct2
    a::AbstractMatrix
end

In [74]:
struct goodstruct2{T <: AbstractMatrix}
    a::T
end

## 必要时使用数组视图而不是切片

In [75]:
a = rand(1000, 1000);

In [76]:
@btime a[1:10, 3:10];

  114.590 ns (1 allocation: 736 bytes)


In [77]:
@btime @view a[1:10, 3:10];

  30.120 ns (1 allocation: 64 bytes)


In [78]:
@btime a[1:100:1000, 3:30:900];

  387.437 ns (1 allocation: 2.50 KiB)


In [79]:
@btime @view a[1:100:1000, 3:30:900];

  28.342 ns (1 allocation: 80 bytes)


尤其是不连续切片，这个的提升非常明显， 几乎是十几倍的提升

## 不要在函数内部使用全局变量

### 函数内部使用全局变量(Bad)

In [80]:
# Bad way
I = 100
x = randn(I)
function sumI1()
    s = zero(eltype(x))
    for i in 1:I
        s+=x[i]
    end
    return s
end

sumI1 (generic function with 1 method)

In [81]:
@code_warntype sumI1()

MethodInstance for sumI1()
  from sumI1() in Main at In[80]:4
Arguments
  #self#[36m::Core.Const(sumI1)[39m
Locals
  @_2[91m[1m::Any[22m[39m
  s[91m[1m::Any[22m[39m
  i[91m[1m::Any[22m[39m
Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1  = Main.eltype(Main.x)[91m[1m::Any[22m[39m
[90m│  [39m       (s = Main.zero(%1))
[90m│  [39m %3  = (1:Main.I)[91m[1m::Any[22m[39m
[90m│  [39m       (@_2 = Base.iterate(%3))
[90m│  [39m %5  = (@_2 === nothing)[36m::Bool[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Bool[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_2[91m[1m::Any[22m[39m
[90m│  [39m       (i = Core.getfield(%8, 1))
[90m│  [39m %10 = Core.getfield(%8, 2)[91m[1m::Any[22m[39m
[90m│  [39m %11 = s[91m[1m::Any[22m[39m
[90m│  [39m %12 = Base.getindex(Main.x, i)[91m[1m::Any[22m[39m
[90m│  [39m       (s = %11 + %12)
[90m│  [39m       (@_2 = Base.iterate(%3, %10))
[90m│  [39m %15 = (@_2 === nothing)[36m::Bool[39m
[90

`I`和`x`是全局变量， 函数内部使用了全局变量， 运行时程序不会报错， 但是出现了类型不稳定的情况，原因是`s`和`I`在编译时不知道类型， 只有在执行时才知道

### 函数内部使用全局常量(Good)

In [82]:
# Good way 1
const CI = 100
const Cx = randn(CI)
function sumI2()
    s = zero(eltype(Cx))
    for i in 1:CI
        s+=Cx[i]
    end
    return s
end



sumI2 (generic function with 1 method)

In [83]:
@code_warntype sumI2()

MethodInstance for sumI2()
  from sumI2() in Main at In[82]:4
Arguments
  #self#[36m::Core.Const(sumI2)[39m
Locals
  @_2[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  s[36m::Float64[39m
  i[36m::Int64[39m
Body[36m::Float64[39m
[90m1 ─[39m %1  = Main.eltype(Main.Cx)[36m::Core.Const(Float64)[39m
[90m│  [39m       (s = Main.zero(%1))
[90m│  [39m %3  = (1:Main.CI)[36m::Core.Const(1:100)[39m
[90m│  [39m       (@_2 = Base.iterate(%3))
[90m│  [39m %5  = (@_2::Core.Const((1, 1)) === nothing)[36m::Core.Const(false)[39m
[90m│  [39m %6  = Base.not_int(%5)[36m::Core.Const(true)[39m
[90m└──[39m       goto #4 if not %6
[90m2 ┄[39m %8  = @_2[36m::Tuple{Int64, Int64}[39m
[90m│  [39m       (i = Core.getfield(%8, 1))
[90m│  [39m %10 = Core.getfield(%8, 2)[36m::Int64[39m
[90m│  [39m %11 = s[36m::Float64[39m
[90m│  [39m %12 = Base.getindex(Main.Cx, i)[36m::Float64[39m
[90m│  [39m       (s = %11 + %12)
[90m│  [39m       (@_2 = Base.iterate(%3,

将全局变量变为全局常量， 能够得到编译器充分优化的代码， julia能够在编译期推断出变量的类型

### 函数通过结构体传入数据（Good如果结构体内的字段有类型标注）

In [84]:
# Good way 2
Base.@kwdef mutable struct data
    I::Int = 100                   # 必须进行类型标注
    x::Vector{Float64} = randn(I)  # 必须进行类型标注
end
function sumI3(d::data)
    x, I = d.x, d.I
    s = zero(eltype(x))
    for i in 1:I
        s+=x[i]
    end
    return s
end

sumI3 (generic function with 1 method)

In [85]:
@code_warntype sumI3(data())

MethodInstance for sumI3(::data)
  from sumI3(d::data) in Main at In[84]:6
Arguments
  #self#[36m::Core.Const(sumI3)[39m
  d[36m::data[39m
Locals
  @_3[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  s[36m::Float64[39m
  I[36m::Int64[39m
  x[36m::Vector{Float64}[39m
  i[36m::Int64[39m
Body[36m::Float64[39m
[90m1 ─[39m %1  = Base.getproperty(d, :x)[36m::Vector{Float64}[39m
[90m│  [39m %2  = Base.getproperty(d, :I)[36m::Int64[39m
[90m│  [39m       (x = %1)
[90m│  [39m       (I = %2)
[90m│  [39m %5  = Main.eltype(x)[36m::Core.Const(Float64)[39m
[90m│  [39m       (s = Main.zero(%5))
[90m│  [39m %7  = (1:I)[36m::Core.PartialStruct(UnitRange{Int64}, Any[Core.Const(1), Int64])[39m
[90m│  [39m       (@_3 = Base.iterate(%7))
[90m│  [39m %9  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %10 = Base.not_int(%9)[36m::Bool[39m
[90m└──[39m       goto #4 if not %10
[90m2 ┄[39m %12 = @_3[36m::Tuple{Int64, Int64}[39m
[90m│  [39m       (i = 

### 函数通过结构体传入数据（Bad如果结构体内的字段没有类型标注）

In [86]:
# Bad way 
Base.@kwdef mutable struct data2
    I = 100            
    x = randn(I) 
end
function sumI4(d::data2)
    x, I = d.x, d.I
    s = zero(eltype(x))
    for i in 1:I
        s+=x[i]
    end
    return s
end

sumI4 (generic function with 1 method)

In [87]:
@code_warntype sumI4(data2())

MethodInstance for sumI4(::data2)
  from sumI4(d::data2) in Main at In[86]:6
Arguments
  #self#[36m::Core.Const(sumI4)[39m
  d[36m::data2[39m
Locals
  @_3[91m[1m::Any[22m[39m
  s[91m[1m::Any[22m[39m
  I[91m[1m::Any[22m[39m
  x[91m[1m::Any[22m[39m
  i[91m[1m::Any[22m[39m
Body[91m[1m::Any[22m[39m
[90m1 ─[39m %1  = Base.getproperty(d, :x)[91m[1m::Any[22m[39m
[90m│  [39m %2  = Base.getproperty(d, :I)[91m[1m::Any[22m[39m
[90m│  [39m       (x = %1)
[90m│  [39m       (I = %2)
[90m│  [39m %5  = Main.eltype(x)[91m[1m::Any[22m[39m
[90m│  [39m       (s = Main.zero(%5))
[90m│  [39m %7  = (1:I)[91m[1m::Any[22m[39m
[90m│  [39m       (@_3 = Base.iterate(%7))
[90m│  [39m %9  = (@_3 === nothing)[36m::Bool[39m
[90m│  [39m %10 = Base.not_int(%9)[36m::Bool[39m
[90m└──[39m       goto #4 if not %10
[90m2 ┄[39m %12 = @_3[91m[1m::Any[22m[39m
[90m│  [39m       (i = Core.getfield(%12, 1))
[90m│  [39m %14 = Core.getfield(%12, 2)[91m

# 总结

+ 虽然把常量定义成全局常量，也可以类型稳定， 却会让代码可读性下降， 那么就可以**`将相关的全局常量构建成一个类型具体的不可变结构体`**, 将其作为参数传入到函数中参与计算， 通过`@unpack`在函数体内部便捷地获得结构体内的变量
+ 函数的参数类型声明是不必要的， 有时声明了变量却会让类型变得不稳定

以下是一个线性回归的MLE估计

In [26]:
using Optim
using LinearAlgebra
using Random
using Distributions
using UnPack
#--------------------Linear Regression---------------------#
const seed = Random.seed!(2022 - 3 - 19)
Base.@kwdef struct LR # 使用不可变结构体
    I::Int64          # 关键字必须进行类型标注
    X::Vector{Float64}
    𝜀::Vector{Float64}
    y::Vector{Float64}
    function LR(I) # 在结构体内使用同名函数进行结构体创建前的条件检查
        if I > 0 & I::Int64
            X = rand(seed, Normal(0, 1), I)
            𝜀 = rand(seed, Normal(0, 9), I)
            y = @__dot__ 0.8 + 3 * X + 𝜀
            new(I, X, 𝜀, y) # 使用new()创建结构体
        else
            error("I has to be a positive integer!")
        end
    end 
end

LR

In [27]:
function loglike(d::LR, pars)
   @unpack I, X, y = d # 使用@unpack获取结构体内的关键字为变量
    loglike = zero(eltype(pars))
    for i in 1:I
        loglike += logpdf(Normal(pars[1] + pars[2] * X[i], pars[3]^2), y[i])
    end
    return -loglike
end

loglike (generic function with 1 method)

In [30]:
fit = optimize(pars -> loglike(LR(2000), pars), [0.0, 0, 1], Newton(), Optim.Options(show_trace=false); autodiff=:forward)
params = Optim.minimizer(fit)

3-element Vector{Float64}:
 0.8080
 2.8415
 3.0043

In [32]:
# 类型稳定了
using BenchmarkTools
@benchmark loglike(LR(1000), [0.0, 1, 2])

BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range [90m([39m[36m[1mmin[22m[39m … [35mmax[39m[90m):  [39m[36m[1m24.800 μs[22m[39m … [35m  5.720 ms[39m  [90m┊[39m GC [90m([39mmin … max[90m): [39m0.00% … 98.41%
 Time  [90m([39m[34m[1mmedian[22m[39m[90m):     [39m[34m[1m31.000 μs               [22m[39m[90m┊[39m GC [90m([39mmedian[90m):    [39m0.00%
 Time  [90m([39m[32m[1mmean[22m[39m ± [32mσ[39m[90m):   [39m[32m[1m37.903 μs[22m[39m ± [32m108.178 μs[39m  [90m┊[39m GC [90m([39mmean ± σ[90m):  [39m5.61% ±  1.97%

  [39m▅[39m▆[39m▅[39m▇[39m█[34m▇[39m[39m▆[39m▅[39m▅[39m▄[39m▄[32m▃[39m[39m▂[39m▂[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▁[39m▁[39m▁[39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m [39m▂
  [39m█[39m█[39m█[39m█[39

In [33]:
@code_warntype loglike(LR(1000), [0.0, 1, 2])

MethodInstance for loglike(::LR, ::Vector{Float64})
  from loglike(d::LR, pars) in Main at In[27]:1
Arguments
  #self#[36m::Core.Const(loglike)[39m
  d[36m::LR[39m
  pars[36m::Vector{Float64}[39m
Locals
  @_4[33m[1m::Union{Nothing, Tuple{Int64, Int64}}[22m[39m
  317[36m::LR[39m
  loglike[36m::Float64[39m
  y[36m::Vector{Float64}[39m
  X[36m::Vector{Float64}[39m
  I[36m::Int64[39m
  i[36m::Int64[39m
Body[36m::Float64[39m
[90m1 ─[39m       (317 = d)
[90m│  [39m %2  = Base.getproperty(UnPack, :unpack)[36m::Core.Const(UnPack.unpack)[39m
[90m│  [39m %3  = 317[36m::LR[39m
[90m│  [39m %4  = Core.apply_type(Main.Val, :I)[36m::Core.Const(Val{:I})[39m
[90m│  [39m %5  = (%4)()[36m::Core.Const(Val{:I}())[39m
[90m│  [39m       (I = (%2)(%3, %5))
[90m│  [39m %7  = Base.getproperty(UnPack, :unpack)[36m::Core.Const(UnPack.unpack)[39m
[90m│  [39m %8  = 317[36m::LR[39m
[90m│  [39m %9  = Core.apply_type(Main.Val, :X)[36m::Core.Const(Val{:X})[39m
[9