## 第三章 DSP同步并行模型
并行迭代计算常被用于求解含有数以百万计变量或没有解析解的问题.并行迭代计算的实现一般都会基于一定的 并行计算模型.现有的并行计算模型,如BSP,AiA,PiT和logP,各自具有不同的优缺点,因而适合不同的应用场景.对于大同步 并行计算模型(BSP)来说,通常会引入大量的全局同步.然而,我们发现相当比例的全局同步都是可以避免的,当处理的数据 集是相对稀疏时尤其如此.

据此,本章提出了一种新的并行计算模型—多步前进的同步并行模型(DSP)和一种新的并行迭代计算的形式化表示方 法.通过形式化表示和迭代过程的推导,我们发现DSP是一种比BSP更一般的并行计算模型.在BSP的基础上,所做的改进 仅仅是将BSP中执行一次的局部计算执行多次.我们将新增加的局部计算步命名为投机计算步(Scstep),理论分析和验证 实验表明投机计算步可以进一步挖掘和利用隐藏在数据分区中的局部性或加速值扩散.因为投机计算是通过''计算换通信’’的原理减少通信开销,所以投机计算步并非越多越好.文章最后的案例研究显示DSP能有效地效地减少迭代轮数并缩 短收敛时间,效果高达BSP的数倍到几十倍加速.

### 3.1 DSP模型

#### 3.1.1 相关工作

##### 3.1.1.1 BSP模型
```python
procedure DSP_algo(X):
    iter_count == 0
    while True do:
        Computing()
        if iter_count % delta == 0 then:
            DataExchange()
            if is_convergent() then:
                break
        iter_count++
```

##### 3.1.1.2 参数服务器

##### 3.1.1.3 KLA

##### 3.1.1.4 多步并行最短路算法

##### 3.1.1.5 其他工作

### 3.2 DSP模型的形式化表示

\begin{align*}
% \vspace*{-\baselineskip}\setlength\belowdisplayshortskip{0pt}
X_{t0} &= (x_{t0,0}, x_{t0,1}, \dots, x_{t0,n})  \\
X_{t1}^{(p, q)} &= X_{t0}\otimes F^{(p, q)} = (x_{t0,0}, x_{t0,1}, \dots, x_{t0,n})\otimes
        \begin{pmatrix}
          1 & 0 & \dots & 0 & F_{0,p} & \dots & F_{0,q} & 0 & \dots & 0 \\
          0 & 1 & \dots & 0 & F_{1,p} & \dots & F_{1,q} & 0 & \dots & 0 \\
          \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\
          0 & 0 & \dots & 1 & F_{p-1,p} & \dots & F_{p-1,q} & 0 & \dots & 0 \\
          0 & 0 & \dots & 0 & F_{p,p} & \dots & F_{p,q} & 0 & \dots & 0 \\
          \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\
          0 & 0 & \dots & 0 & F_{q,p} & \dots & F_{q,q} & 0 & \dots & 0 \\
          0 & 0 & \dots & 0 & F_{q+1,p} & \dots & F_{q+1,q} & 1 & \dots & 0 \\
          \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots\\
          0 & 0 & \dots & 0 & F_{n,p} & \dots & F_{n,q} & 0 & \dots & 1 \\
        \end{pmatrix} \\
% &\quad\quad \textboxwithoutcallout{Using "matrix multiplication"-like transformation $\otimes$, we get:} \footnotemark \\
&\quad\quad \highlight{Using~the~``matrix~multiplication''-like~transformation~\otimes,~we~get~the~follows,~only~x_p~to~x_q~are~updated.} \\
&= (x_{t0,0},\dots,~x_{t0,p-1},~\biguplus_{i=0}^{n}F_{i, p}(x_{t0,i}),~\biguplus_{i=0}^{n}F_{i, p+1}(x_{t0,i}),~\dots,~\biguplus_{i=0}^{n}F_{i, q}(x_{t0,i}),~x_{t0,q+1},~\dots,~x_{t0,n}) \\
 X_{t2}^{(p, q)}  &= X_{t1}^{(p, q)}\otimes F^{(p, q)} \\
 %%%%%%%%%%%%%%%%%%%% Add the arrow and tex explain
 &= (x_{t0,0},\dots,~x_{t0,p-1},~\biguplus_{i=0}^{n}F_{i, p}(x_{t1,i}),~\biguplus_{i=0}^{n}F_{i, p+1}(x_{t1,i}),~\dots,~\biguplus_{i=0}^{n}F_{i, q}(x_{t1,i}),~x_{t0,q+1},~\dots,~x_{t0,n}) \\
&=(x_{t0,0},\dots,~x_{t0,p-1},~ \biguplus_{}^{}(\underset{i\in(p,q)}{\biguplus}F_{i,p}(\biguplus_{i=0}^{n}F_{i,p}(x_{t0,i})),~\underset{i\notin(p,q)}{\biguplus}F_{i,p}(x_{t0,i})), \\
 &\quad\quad\quad\quad\quad~\dots,~\biguplus_{}^{}(\underset{i\in(p,q)}{\biguplus}F_{i,q}(\biguplus_{i=0}^{n}F_{i,q}(x_{t0,i})),~\underset{i\notin(p,q)}{\biguplus}F_{i,q}(x_{t0,i})),~x_{t0,q+1},~\dots,x_{t0,n}) \\
&~\highlight{Let~\alpha_p=\biguplus_{i=0}^{n}F_{i,p}(x_{t0,i}),~\beta_p=\underset{i\notin(p,q)}{\biguplus}F_{i,p}(x_{t0,i})} \\
&=(x_{t0,0},\dots,~x_{t0,p-1},~\biguplus(\biguplus_{i\in(p,q)}F_{i,p}(\alpha_p),\beta_p),~\dots,~\biguplus(\biguplus_{i\in(p,q)}F_{i,q}(\alpha_q),\beta_q),~x_{t0,q+1},~\dots,x_{t0,n}) \\
&~\highlight{Let~g(x, y)=\biguplus(\biguplus_{i\in(p,q)}F_{i,p}(x),y)} \\
&=(x_{t0,0},\dots,~x_{t0,p-1},~g(\alpha_p, \beta_p),~\dots,~g(\alpha_q, \beta_q),~x_{t0,q+1},~\dots,x_{t0,n}) \\
X_{t3}^{(p, q)} &= X_{t2}^{(p, q)}\otimes F^{(p, q)} \\
&= (x_{t0,0},\dots,~x_{t0,p-1},~\biguplus(\biguplus_{i\in(p,q)}^{}F_{i,p}\biguplus(\biguplus_{i\in(p,q)}F_{i,p}(\alpha_p),~\beta_p),~\beta_p), \\
&\quad\quad\quad\quad\quad~\dots,~\biguplus(\biguplus_{i\in(p,q)}^{}F_{i,q}\biguplus(\biguplus_{i\in(p,q)}F_{i,q}(\alpha_p),~\beta_q),~\beta_q),~x_{t0,q+1},~\dots,x_{t0,n}) \\
&= (x_{t0,0},\dots,~x_{t0,p-1},~\biguplus(\biguplus_{i\in(p,q)}^{}F_{i,p}(g(\alpha_p,\beta_p)),~\beta_p),~\dots,~\biguplus(\biguplus_{i\in(p,q)}^{}F_{i,q}(g(\alpha_q,\beta_q)),~\beta_q),~x_{t0,q+1},~x_{t0,q+1},~\dots,x_{t0,n}) \\
      & \vdots \\
X_{\Delta}^{(p, q)} &= X_{\Delta-1}^{(p, q)}\otimes F^{(p, q)} \\
&= (x_{t0,0},\dots,~x_{t0,p-1},~\biguplus(\biguplus_{i\in(p,q)}F_{i,p}(\dots\biguplus(\biguplus_{i\in(p,q)}F_{i,p}(\alpha_p),~\beta_p),\dots,\beta_p),\beta_p), \\
&\quad\quad\quad\quad~\dots,~\biguplus(\biguplus_{i\in(p,q)}F_{i,q}(\dots\biguplus(\biguplus_{i\in(p,q)}F_{i,q}(\alpha_q),~\beta_q),\dots,\beta_q),\beta_q),~x_{t0,q+1},~\dots,~x_{t0,n}) \\
&= (x_{t0,0},\dots,~x_{t0,p-1},~g(g(\dots g(\alpha_p,\beta_p),\dots,\beta_p),\beta_p),\dots,~g(g(\dots g(\alpha_q,\beta_q),\dots,\beta_q),\beta_q),~x_{t0,q+1},\dots,x_{t0,n}) \\
&= (x_{t0,0},\dots,~x_{t0,p-1},g^{\Delta-1}(\alpha_p,\beta_p),\dots,~g^{\Delta-1}(\alpha_q,\beta_q),~x_{t0,q+1},\dots,x_{t0,n})
\end{align*}


### 3.3 DSP模型的收敛性证明

### 3.4 DSP模型加速性能分析

### 3.5 总结