# 并行计算

In [4]:
using Distributed

异步适用于IO密集型任务，不适用于CPU密集型任务，一般用不上
协程不能调用CPU的多核资源，因为这些协程实际上共享同一个系统线程
异步还是没有搞明白 :sad:

# 多进程

消息传递接口（Message Passing Interface， MPI）

In [55]:
nprocs()

4

In [6]:
addprocs(3)

3-element Vector{Int64}:
 2
 3
 4

In [7]:
nprocs()

4

+ 远程调用 remote reference
+ 远程引用 remote calls

In [8]:
# 远程调用：
# 通过本地进程在远程Worker中启动某一处理过程

using Distributed

In [9]:
# 当前进程PID = 1，在当前进程中调用PID = 2的进程，在这个2进程中生成随机数矩阵
r = remotecall(rand, 2, 3, 3)

Future(2, 1, 5, nothing)

In [10]:
# whence为本地进程，where 为被调用的进程, v为返回结果
# remotecall只会建立本地进程，被调用进程和处理过程之间的关系，不会返回结果
dump(r)

Future
  where: Int64 2
  whence: Int64 1
  id: Int64 5
  v: Nothing nothing


In [11]:
# fetch第一次被执行，远程的计算结果就会被传输到本地进程中的Future对象中缓存起来，远程的数据则会被删除

fetch(r)

3×3 Matrix{Float64}:
 0.383441  0.524586   0.6817
 0.939697  0.753465   0.550684
 0.551175  0.0886111  0.911647

In [12]:
dump(r)

Future
  where: Int64 2
  whence: Int64 1
  id: Int64 5
  v: Some{Any}
    value: Array{Float64}((3, 3)) [0.383441000998924 0.5245862108081003 0.6816995555618623; 0.9396970852443103 0.7534649534013604 0.5506840844155196; 0.5511747090834356 0.08861108455656086 0.9116473321514136]


In [13]:
# 使用进程3生成3X3矩阵，并返回计算结果
fetch(remotecall(rand, 3, 3, 3))

3×3 Matrix{Float64}:
 0.788435  0.701898  0.0194382
 0.240865  0.824585  0.902491
 0.204739  0.537878  0.570161

`spawn`在英文中有生产、造成的意思

In [14]:
# 使用进程4生成4X4矩阵，不返回计算结果
m1 = @spawnat 4 rand(4, 4)

Future(4, 1, 9, nothing)

In [15]:
# 系统自动选择一个进程生成4X4矩阵, 不返回计算结果
m2 = @spawn rand(4, 4)

Future(2, 1, 10, nothing)

In [16]:
fetch(m1)

4×4 Matrix{Float64}:
 0.428936  0.312901  0.742059  0.302812
 0.915653  0.419801  0.926646  0.225822
 0.487022  0.212901  0.650086  0.0653473
 0.676529  0.822042  0.799576  0.646381

In [17]:
fetch(m2)

4×4 Matrix{Float64}:
 0.45018   0.064     0.361389  0.513537
 0.225267  0.101173  0.304447  0.991288
 0.152127  0.403045  0.758481  0.183584
 0.903989  0.686444  0.13446   0.500997

In [18]:
# @fetch = fetch(@spawn function)
# @fetchfrom  PID = fetch(@spawnat PID function)

@fetch rand(3, 3)

3×3 Matrix{Float64}:
 0.323678  0.162866  0.841077
 0.360195  0.237249  0.722177
 0.960139  0.486465  0.567638

In [20]:
@fetchfrom 1 rand(3, 3)

3×3 Matrix{Float64}:
 0.69721   0.939192  0.956687
 0.330665  0.266357  0.700392
 0.664401  0.698287  0.153467

In [21]:
# 不同的进程有不同的作用域，@everywhere可以把你定义的一些东西放到所有的进程中
@everywhere function rand2(ndim)
    return 2 * rand(ndim, ndim)
end

In [22]:
@fetch rand2(10)

10×10 Matrix{Float64}:
 0.67724    0.839544   0.686618  0.0808221  …  1.33776   1.19296   1.43195
 1.36918    1.33536    1.41483   0.47121       0.304063  0.953734  0.997376
 0.434412   0.235603   1.19216   0.649332      0.962741  0.724932  0.347885
 0.485139   0.800376   0.218011  0.310004      0.218498  1.47966   1.68337
 0.827663   0.305805   1.18762   1.33322       1.50443   1.37378   1.71195
 1.07752    0.0749084  0.104501  0.672066   …  1.32255   0.109508  1.65545
 0.0619149  0.764861   0.559341  1.0316        0.117856  1.89191   1.24812
 1.00854    0.470405   1.80526   1.7117        1.29276   0.901951  0.197122
 0.595526   1.23157    1.0167    0.694855      0.255305  0.864201  0.554252
 0.639782   1.03889    1.29056   1.61451       1.54819   0.48125   0.391184

In [24]:
@fetchfrom 2 rand2(10)

10×10 Matrix{Float64}:
 0.704298  0.828989  0.499955   1.64886   …  0.353699  1.0976     1.06081
 0.644881  1.80985   0.835609   1.11237      0.891741  0.704004   1.00079
 0.609898  1.97494   1.88459    1.57673      0.803528  0.607008   0.425467
 0.264796  0.722974  0.964188   0.865909     1.7846    0.678075   1.61752
 1.79087   1.67052   0.505739   1.68245      1.45593   1.17963    0.0734138
 1.94099   1.65656   0.216484   1.73239   …  1.12944   1.02279    0.604975
 0.935169  0.189206  1.83624    0.617376     1.67436   1.2987     0.612559
 1.48079   1.85966   0.324703   0.614302     0.93944   0.0349604  0.521829
 1.525     1.23933   1.30451    0.138455     1.56679   1.32988    1.27248
 1.63734   1.94416   0.0261426  0.547749     1.48371   1.31425    1.04688

In [25]:
@everywhere using Random

@everywhere Random.seed!(123)

In [26]:
@fetchfrom 3 rand2(3)

3×3 Matrix{Float64}:
 1.5369   0.790906  1.17204
 1.88103  0.626488  0.104266
 1.34792  1.32511   0.537279

In [29]:
@fetchfrom 2 rand2(3)

3×3 Matrix{Float64}:
 1.5369   0.790906  1.17204
 1.88103  0.626488  0.104266
 1.34792  1.32511   0.537279

In [30]:
# 自动将for-loop分配到所有的进程中，但是结果的顺序是乱的
@distributed (+) for i = 1:200000000
                    Int64(rand(Bool))
end

100001967

##  分布式数组

In [31]:
using SharedArrays, Distributed

In [32]:
a = SharedArray{Float64}(1, 10)
@distributed for i in 1:10
    a[i] = i
end

Task (runnable) @0x000000001214a720

In [33]:
a

1×10 SharedMatrix{Float64}:
 1.0  2.0  3.0  4.0  5.0  6.0  7.0  8.0  9.0  10.0

# 多线程

In [34]:
Threads.nthreads()

6

In [35]:
versioninfo()

Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-10400F CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6
  JULIA_PKG_SERVER = https://mirrors.ustc.edu.cn/julia


In [71]:
using LinearAlgebra
using Random
BLAS.set_num_threads(1)
Random.seed!(2022 - 1 - 3)

MersenneTwister(2018)

In [40]:
using LinearAlgebra
function do_cpu_task(X, i)
    println("thread $(Threads.threadid()):task $(i)")
    return tr(inv(X[:, (3000 * (i - 1) + 1):(3000 * (i - 1) + 3000)]))
end

do_cpu_task (generic function with 1 method)

In [50]:
# 不使用任何加速技术
@time begin
    seed = Random.seed!(123)
    result = zeros(Float64, 6)
    X = rand(seed, 3000, 6 * 3000)
    for i in 1:6
        result[i] = do_cpu_task(X, i)
    end    
end

thread 1:task 1
thread 1:task 2
thread 1:task 3
thread 1:task 4
thread 1:task 5
thread 1:task 6
  7.115125 seconds (781 allocations: 1.216 GiB, 0.53% gc time)


In [51]:
result

6-element Vector{Float64}:
 -0.1809489819845882
 -0.3010668102843521
 10.24367858367015
  7.652336853869846
 -0.8934574835666981
 -9.38857467749887

In [54]:
# 使用多线程，计算速度明显提高了接近5倍
@time begin
    Random.seed!(123)
    result_thread = zeros(Float64, 6)
    X = rand(3000, 3000 * 6)
    @sync for i in 1:6
        Threads.@spawn result_thread[i] = do_cpu_task(X, i)
    end
end

thread 2:task 1
thread 6:task 2
thread 1:task 6
thread 3:task 4
thread 4:task 5
thread 5:task 3
  6.622171 seconds (27.71 k allocations: 1.218 GiB, 0.62% gc time, 0.02% compilation time)


In [53]:
result_thread

6-element Vector{Float64}:
 -0.1809489819845882
 -0.3010668102843521
 10.24367858367015
  7.652336853869846
 -0.8934574835666981
 -9.38857467749887