# PS3-1 A Simple Neural Network

### (a) Update Rule of $w_{1,2}^{[1]}$

First, formulate the forward pass involving $W_{1,2}^{[1]}$. For sample $x^{(i)}$,
\begin{align*}
&z_2^{[1],(i)}=w_{0,2}^{[1]}+w_{1,2}^{[1]}x_1^{(i)}+w_{2,2}^{[1]}x_2^{(i)}\\
&h_2^{(i)}=\sigma\left(z_2^{[1],(i)}\right)\\
&z^{[2],(i)}=w_0^{[2]}+w_1^{[2]}h_1^{(i)}+w_2^{[2]}h_2^{(i)}+w_3^{[2]}h_3^{(i)}\\
&o^{(i)}=\sigma\left(z_2^{[2],(i)}\right)\\
&l^{(i)}=\left(o^{(i)}-y^{(i)}\right)^2\\
&l=\frac{1}{m}\sum_{i=1}^ml^{(i)}
\end{align*}

Thus, according to the chain rule:

\begin{align*}
\frac{\partial l}{\partial{w_{1,2}^{[l]}}}&=\frac{1}{m}\sum_{i=1}^m\frac{\partial l^{(i)}}{\partial{w_{1,2}^{[l]}}}\\
&=\frac{1}{m}\sum_{i=1}^m\frac{\partial l^{(i)}}{\partial{o^{(i)}}}\frac{\partial o^{(i)}}{\partial{z^{[2],(i)}}}\frac{\partial z^{[2],(i)}}{\partial{h_2^{(i)}}}\frac{\partial h_2^{(i)}}{\partial{z_2^{[1],(i)}}}\frac{\partial z_2^{[1],(i)}}{\partial{w_{1,2}^{[1]}}}\\
&=\frac{2}{m}\sum_{i=1}^m\left(o^{(i)}-y^{(i)}\right)\sigma\left(z^{[2],(i)}\right)\left(1-\sigma\left(z^{[2],(i)}\right)\right)w_2^{[2]}\sigma\left(z_2^{[1],(i)}\right)\left(1-\sigma\left(z_2^{[1],(i)}\right)\right)x_1^{(i)}\\
&=\frac{2}{m}\sum_{i=1}^m\left(o^{(i)}-y^{(i)}\right)o^{(i)}\left(1-o^{(i)}\right)w_2^{[2]}h_2^{(i)}\left(1-h_2^{(i)}\right)x_1^{(i)}\\
\end{align*}

Therefore, the update rule of $w_{1,2}^{[1]}$ is
\begin{align*}
w_{1,2}^{[1]}&:=w_{1,2}^{[1]}-\alpha\frac{\partial l}{\partial{w_{1,2}}}\\
&=w_{1,2}^{[1]}-\frac{2\alpha}{m}\sum_{i=1}^m\left(o^{(i)}-y^{(i)}\right)o^{(i)}\left(1-o^{(i)}\right)w_2^{[2]}h_2^{(i)}\left(1-h_2^{(i)}\right)x_1^{(i)}
\end{align*}
where $h_2^{(i)}=\sigma\left(w_{0,2}^{[1]}+w_{1,2}^{[1]}x_1^{(i)}+w_{2,2}^{[1]}x_2^{(i)}\right)$.

### (b) Step Activation Function

It can be observed from visualization of the dataset that the decision boundary is triangular and consists of
\begin{align*}
&l_1:x_1=0.5\\
&l_2:x_2=0.5\\
&l_3:x_1+x_2-4=0
\end{align*}

Since there're three hidden units, we can view the neural network as a combination of three logitic regressions using step activation function, with each hidden unit performing one logistic regression.

Therefore, by setting

\begin{align*}
&w_{0,1}^{[1]}=0.5,w_{1,1}^{[1]}=-1,w_{2,1}^{[1]}=0\\
&w_{0,2}^{[1]}=0.5,w_{1,2}^{[1]}=0,w_{2,1}^{[1]}=-1\\
&w_{0,3}^{[1]}=-4,w_{1,3}^{[1]}=1,w_{2,3}^{[1]}=1
\end{align*}

the decision boundaries of $h_1,h_2,h_3$ are set to $l_1,l_2,l_3$, respectively. And note that $y=0$ if and only if $h_1=h_2=h_3=0$, thus by setting

\begin{align*}
w_0^{[2]}=-0.1, w_1^{[2]}=w_2^{[2]}=w_3^{[2]}=1
\end{align*}

we should be able to achieve 100% accuracy.

In [3]:
import json

def example_weights():
    """This is an example function that returns weights.
    Use this function as a template for optimal_step_weights and optimal_sigmoid_weights.
    You do not need to modify this class for this assignment.
    """
    w = {}

    w['hidden_layer_0_1'] = 0
    w['hidden_layer_1_1'] = 0
    w['hidden_layer_2_1'] = 0
    w['hidden_layer_0_2'] = 0
    w['hidden_layer_1_2'] = 0
    w['hidden_layer_2_2'] = 0
    w['hidden_layer_0_3'] = 0
    w['hidden_layer_1_3'] = 0
    w['hidden_layer_2_3'] = 0

    w['output_layer_0'] = 0
    w['output_layer_1'] = 0
    w['output_layer_2'] = 0
    w['output_layer_3'] = 0

    return w


def optimal_step_weights():
    """Return the optimal weights for the neural network with a step activation function.
    
    This function will not be graded if there are no optimal weights.
    See the PDF for instructions on what each weight represents.
    
    The hidden layer weights are notated by [1] on the problem set and 
    the output layer weights are notated by [2].

    This function should return a dict with elements for each weight, see example_weights above.

    """
    w = example_weights()

    # *** START CODE HERE ***
    w['hidden_layer_0_1'] = 0.5
    w['hidden_layer_1_1'] = -1
    w['hidden_layer_2_1'] = 0
    w['hidden_layer_0_2'] = 0.5
    w['hidden_layer_1_2'] = 0
    w['hidden_layer_2_2'] = -1
    w['hidden_layer_0_3'] = -4
    w['hidden_layer_1_3'] = 1
    w['hidden_layer_2_3'] = 1

    w['output_layer_0'] = -0.1
    w['output_layer_1'] = 1
    w['output_layer_2'] = 1
    w['output_layer_3'] = 1
    # *** END CODE HERE ***

    return w

def optimal_linear_weights():
    """Return the optimal weights for the neural network with a linear activation function for the hidden units.
    
    This function will not be graded if there are no optimal weights.
    See the PDF for instructions on what each weight represents.
    
    The hidden layer weights are notated by [1] on the problem set and 
    the output layer weights are notated by [2].

    This function should return a dict with elements for each weight, see example_weights above.

    """
    w = example_weights()

    # *** START CODE HERE ***
    # *** END CODE HERE ***

    return w

if __name__ == "__main__":
    step_weights = optimal_step_weights()

    with open('output/step_weights', 'w') as f:
        json.dump(step_weights, f)

    linear_weights = optimal_linear_weights()

    with open('output/linear_weights', 'w') as f:
        json.dump(linear_weights, f)

### (c) Linear Activation Function

When the activation function is linear, computation performed by the hidden units is linear transformation. And $o$ performs a single logistic regression using step activation function. Thus, the neural network will only fit a linear decision boundary and not be able to fit the triangular decision boundary of the dataset.