In this notebook we will be looking at another type of single-layer neural network called Adaptive Linear Neuron, or Adaline for short. Adaline is different from the perceptron we build before because instead of using a step function to update the weights, we use a linear activation function instead. From the book I can tell that this is going to be an important concept to grasp, as it is the gateway to a lot of other more advanced algorithms that supports more than just binary classification. 

For supervised learning, an <i>objective function</i> is extremely important as it is the function that we want the program to optimize during training, and it is usually a <i>cost function</i> that we want to minimize. Here is the cost function $J(w)$. $$J(w)=\frac{1}{2}\sum_{i}(y^{i}-\phi(z^{i}))^{2}$$The term $\frac{1}{2}$ will be explained later on, and the main advantage of this function over the step function is that it is differentiable, and using <i>gradient descent</i>, we can easily find ther lowest point of the graph, as it is quadratic and naturally convex.

To update our weights, we would first take the partial derivative of our $J(w)$ function with respect to each weight $w_j$:$$\frac{\delta J}{\delta w_j}=-\sum_i(y^i-\phi(z^i))*x^i_j$$To be quite honest, this is the place where I am a bit lost, as there are too many concepts for me to grasp at once. I know that we are minimizing our cost function, and somewhere along the line we are doing gradient descent to find the lowest point on graph. How do they relate to each other? Is it that the gradient descent determines the weights $w_j$ for each correspong $x_j$, and then those are used to minimize the function?

So without fully grasping the concept of an Adaline neuron, I still wrote out the code for one, and here it is below:

In [None]:
class AdalineGD(object):
    """ADAptive LInear NEuron classifier.
    
    Parameters
    -----------
    eta : float
        Learning rate (between 0.0 and 1.0)
    n_iter : int
        Passes over the training dataset.
        
    Attributes
    -----------
    w_ : 1d-array
        Weights after fitting
    errors_ : list
        Numbers of misclassifications in every epoch
        
    """
    def __init__(self, eta=0.01, n_iter=50):
        self.eta = eta
        self.n_iter = n_iter
        
    def fit(self, X, y):
        """ Fit training data.
        
        Parameters
        ----------
        X : {array-like}, shape = [n_samples, n_features]
            Training vectors,
            where n_samples is the number of samples and
            n_features is the number of features
        y : array-like, shape = [n_samples]
            Target values
            
        Returns
        -------
        self : object
        
        """
        