### Sigmoid neurons simulating perceptrons, part I

Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive constant, c>0. Show that the behaviour of the network doesn't change.

$$
\begin{equation}
    Perceptron(x; w) =
    \begin{cases}
    0 &\text{if } {\sum_j w_j x_j \le threshold}
    \\1 &\text{if } \sum_j w_j x_j > threshold
    \end{cases}
\end{equation}
$$

$$
\begin{equation}
    Perception(x; w) =
    \begin{cases}
    0 &\text{if } \sum_j w_j x_j \le -b
    \\1 &\text{if } \sum_j w_j x_j > -b
    \end{cases}
\end{equation}
$$

$$
\begin{equation}
    Perceptron(x; w) =
    \begin{cases}
    0 &\text{if } \sum_j c * w_j x_j \le c * -b
    \\1 &\text{if } \sum_j c * w_j x_j > c * -b
    \end{cases}
\end{equation}
$$

$$
\begin{equation}
    Perceptron(x; w) =
    \begin{cases}
    0 &\text{if } c * \sum_j w_j x_j \le c * -b
    \\1 &\text{if } c * \sum_j w_j x_j > c * -b
    \end{cases}
\end{equation}
$$

Take the $c$ out of the sum above and divide both sides by c

$$
\begin{equation}
    Perception(x; w) =
    \begin{cases}
    0 &\text{if } \sum_j w_j x_j \le -b
    \\1 &\text{if } \sum_j w_j x_j > -b
    \end{cases}
\end{equation}
$$

Every perceptron outputs the same value as before because multiplying by a constant $c$ is equivalent to the original perceptron. The network therefore behaves the same way.

A simpler way would have been to show that the response is invariant to scaling since the $c$ can be applied outside of the sum.

### Sigmoid neurons simulating perceptrons, part II  

Suppose we have the same setup as the last problem - a network of perceptrons. Suppose also that the overall input to the network of perceptrons has been chosen. We won't need the actual input value, we just need the input to have been fixed. Suppose the weights and biases are such that w⋅x+b≠0 for the input x to any particular perceptron in the network. Now replace all the perceptrons in the network by sigmoid neurons, and multiply the weights and biases by a positive constant c>0. Show that in the limit as c→∞ the behaviour of this network of sigmoid neurons is exactly the same as the network of perceptrons. How can this fail when w⋅x+b=0 for one of the perceptrons?

$\sigma(x; w,b) = \frac{1}{1 + exp(- \sum_j {w_j x_j} + b)}$

For $c$ > 0 :

$\sigma(x; c \cdot w, b \cdot w) = \frac{1}{1 + exp(- \sum_j {c * w_j x_j} + c * b)}$

$= \frac{1}{1 + exp(- c (\sum_j {w_j x_j} + b))}$

As $c \to \infty$:

$= \frac{1}{1 + exp(- \infty \cdot \sum_j {w_j x_j + b})}$

For $w \cdot x + b > 0$:

$\sigma(x) \to \frac{1}{1 + exp(-\infty)} = 1$

for $w \cdot x + b < 0$:

$\sigma(x) \to \frac{1}{1 + exp(\infty)} = 0$

for $w \cdot x + b = 0$:

$\sigma(x) \to \frac{1}{1 + exp(- \infty * 0)} = ??$ 

### Bitwise representation of digit classifier

There is a way of determining the bitwise representation of a digit by adding an extra layer to the three-layer network above. The extra layer converts the output from the previous layer into a binary representation, as illustrated in the figure below. Find a set of weights and biases for the new output layer. Assume that the first 3 layers of neurons are such that the correct output in the third layer (i.e., the old output layer) has activation at least 0.99, and incorrect outputs have activation less than 0.01

Connectivity from 3rd layer to extra output layer is parameterized by $W^{4x10}$ and $b^{4 x 1}$

$W_{i,j}$ represents the weight between the $i$th layer of the 3rd layer and the $j$th output neuron.

$b_k$ represents the bias for the $k$th output neuron.

$b$ should be just the $0$ vector since we already have a one-hot encoding and can easily map weights to the corresponding binary output.

Output for 1: $W_{1,:} = [\text{1 0 0 0}]^T$

Output for 2: $W_{2,:} = [\text{0 1 0 0}]^T$

And so on...the weights correspond to the bitwise representation themselves.