### The Parallel Perceptron

A parallel perceptron is a structure of the single layer perceptron which involves perceptrons in finite number.

Assume that we have $n$ function and $f_{1}, f_{2}, ..., f_{n}$ are functions which are computed by these perceptrons

The output of a perceptron which takes input $x$ as parameter is calculated as 
\begin{equation*}
    \sum_{i=1}^{n} f_{i}(z) \in \{-n, ..., n\}
\end{equation*} 

More clearly, the result is found through $s(\sum_{i=1}^{n} f_{i}(z))$ where $s$ is a `squashing function` which scales the output to desired range. 

Let $\rho$ be the parameter of the squashing function. It denotes the resolution of the squashing function $s_{\rho}$. We can described the function $s_{\rho}$ as

\begin{equation*}
    s_{\rho}(p) = 
    \begin{cases}
    -1       & \text{if} \quad p < \rho \\
    p/{\rho} & \text{if} \quad -\rho \leq p \leq \rho\\
    1        & \text{if} \quad p > \rho \\
    \end{cases}
\end{equation*}

### The p-delta rule

The parallel delta rule(p-delta rule) is a simple learning algorithm which is used for the parallel perceptrons.

The approximation error of the parallel perceptron can be small as half of the quantization step size. Therefore, the algorithm can set desired accuracy $\epsilon$ as a value. This value can be described as

\begin{equation*}
    \epsilon = \frac{\phi_{max} - \phi_{min}}{2N}
\end{equation*} where $\phi$ is a function which is equal to $s(\sum_{i=1}^{n} f_{i}(z))$

However, reaching to this level of the error is not guarantee. Because, the algorithm may box in local minimum error and it may not find the global minimum.

So, if the difference between desired output and actual output smaller than error, then we can say that the output of the parallel perceptron is within the desired accuracy.

In this situation, the system doesn't have to modify the weights.

Consequently, we will view the situation of that the difference between the output of the parallel perceptron and the actual output is bigger than $\epsilon$.  

\begin{equation*}
    \hat{y} > y + \epsilon
\end{equation*}

 We need that the number of the weight vector which $w_{i}x \geq 0$ to reduce the output of the parallel perceptron. The classic update rule was 

\begin{equation*}
    w_{i} \longleftarrow w_{i} + \xi \Delta_{i}
\end{equation*} where $\xi$ is learning rate and $\Delta_{i}$ is the difference.

There are some options for modifying the weights. The most advisable option is updating all weights. So, we can define $\Delta_{i}$ as

\begin{equation*}
    \Delta_{i} = 
    \begin{cases}
        -x & \text{if} \quad \hat{y} > y + \epsilon \quad \text{and} \quad w_{i}x \geq 0\\
        +x & \text{if} \quad \hat{y} < y - \epsilon \quad \text{and} \quad w_{i}x < 0\\
        0  &  \text{otherwise}
    \end{cases}
\end{equation*}

#### Approach of clear margin

The approach of clear margin is performed for stabilization. So, new delta rule is applied to the system with approach of clear margin. 

\begin{equation*}
    \Delta_{i} = 
    \begin{cases}
        -x & \text{if} \quad \hat{y} > y + \epsilon \quad \text{and} \quad w_{i}x \geq 0\\
        +x & \text{if} \quad \hat{y} < y - \epsilon \quad \text{and} \quad w_{i}x < 0\\
        +\frac{\mu}{x} & \text{if} \quad \hat{y} \leq y + \epsilon \quad \text{and} \quad 0 \leq w_{i}x < \gamma\\
        -\frac{\mu}{x} & \text{if} \quad \hat{y} \geq y - \epsilon \quad \text{and} \quad -\gamma < w_{i}x < 0\\
        0  &  \text{otherwise}
    \end{cases}
\end{equation*}

\begin{equation*}
    w_{i} \longleftarrow w_{i} + \xi \Delta_{i}
\end{equation*}