<a href="https://colab.research.google.com/github/vicente-gonzalez-ruiz/neural_network_interpolation/blob/master/1D.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Neuronal-Network-(NN)" data-toc-modified-id="Neuronal-Network-(NN)-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Neuronal Network (NN)</a></span></li><li><span><a href="#Structure-description" data-toc-modified-id="Structure-description-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Structure description</a></span></li><li><span><a href="#Algorithms" data-toc-modified-id="Algorithms-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Algorithms</a></span><ul class="toc-item"><li><span><a href="#Forward-propagation-of-the-input-activation:-the-Feed-Forward-Algorithm-(FFA)" data-toc-modified-id="Forward-propagation-of-the-input-activation:-the-Feed-Forward-Algorithm-(FFA)-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>Forward propagation of the input activation: the Feed-Forward Algorithm (FFA)</a></span></li><li><span><a href="#Error-retropropagation:-the-Back-Propagation-Algorithm-(BPA)" data-toc-modified-id="Error-retropropagation:-the-Back-Propagation-Algorithm-(BPA)-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>Error retropropagation: the Back-Propagation Algorithm (BPA)</a></span></li><li><span><a href="#Computation-of-the-gradient-using-Back-propagation-(BPA)-of-the-prediction-error" data-toc-modified-id="Computation-of-the-gradient-using-Back-propagation-(BPA)-of-the-prediction-error-3.3"><span class="toc-item-num">3.3&nbsp;&nbsp;</span>Computation of the gradient using Back-propagation (BPA) of the prediction error</a></span></li></ul></li><li><span><a href="#Implementation" data-toc-modified-id="Implementation-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Implementation</a></span></li><li><span><a href="#Hyperparameters" data-toc-modified-id="Hyperparameters-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Hyperparameters</a></span></li></ul></div>

# What's a Neuronal Network (NN)
Basically, a NN is an collection of *neurons* that are able to learn *dependencies* between the inputs and the outputs of a system. Such dependencies are expressed as collections of activations (excitation levels) of the output of the neurons. From a pure mathematical point of view, a NN try to approximate an unknown function $F(X)$, for all $X$.

In NNs the neurons are organized in *layers*. In the direction of the propagation of the information (from the input to the output of the network, and in this case, we are working with *feed-forward NNs*), the first one is called the *input layer* and the last one the output layer. The rest of layers are said hidden.

Between layers, the neurons are fully interconnected by means of a collection of weights. The "knowledge" learned by the NN is stored in such interconnections. Moreover, each neuron has also a special unconnected input (the bias) that, depending on its vaule, inhibits or excites the neuron. 

The number of neurons in the input and the output layers, that can be any, are defined by the problem we want to address. For example, in classification problems, the number our output neurons usually equals the number of classes, or at least, the output is encoded as binary combinatios. However, in prediction problems, it can enough to use only one neuron per dimension and quantify the output of each (output) neuron (notice that, typically, in the classification case, the output is simply thresholded). On the other hand, the number of hidden layers and the number of neurons/layer depends on the complexity of the dependiencias to learn.

In order to avoid overflow and/or underflow, inputs and targets must be numbers in $(0.0, 1.0)$ (open intervals). The output of the NN will be also in this interval of values.



## Topology
Let:
* $l=1,\cdots,l_\text{out}$, the layer number. $1$ is the number of the input layer and $l_\text{out}$ the number of the output layer.

* $n^l$, the number of neurons of the $l$-th layer.

Al layers are fully interconnected between them by links that are wighted. Let:

* $W^{l=2,\cdots,l_\text{out}}_{i,j}$ the *weight* that goes from the $j$-th neuron of the $(l-1)$-th layer to the $i$-th neuron of the layer $l$. Similarly, $B^l_j$ is the bias that inputs to the $j$-th neuron of the $l$ layer. Notice that the input layer has not weights associated with it.

## Forward propagation

Let:

* $A^l_{i=1, \cdots, n^l}$ the *activation* (the excitation level) of the $i$-th neuron of the $l$-th layer. In consequence, following this representation, $A^{l_\text{out}}$ would be the activation (usually a vector) of the output layer of the NN, and $A^1$ the input of the NN (the activation of the neurons of the input layer).

The forward propagation propagates the input activation towards the output layer:
1. Set $A^1=X$, where $X$ is an input of the NN.
2. For $l=2, \cdots, l_\text{out}$:
    1. $Z^l = W^l\cdot A^{l-1} + B^l$  /* Notice: $Z^l_i = \sum_j^{n^{l-1}}W^l_{i,j}A^{l-1}_j + B^l_j$ */
    2. $A^l = \sigma(Z^l)$ 

Where $\sigma$ is the activation function. $Z$ are the *wighted inputs* of the neurons. 

## Training
A NN learns by minimizing the cost function
\begin{equation}
 c(W,B) := \frac{1}{n}\sum_{\{X\}}c_X(W,B)
\end{equation}
as a function of the weights and biases (the adquired "knowledge") of the NN, where $n$ is the number training examples, and
\begin{equation}
 c_X(W,B) = \frac{1}{2}||F(X)-A^{l_\text{out}}||^2,
\end{equation}
where $||v||$ donotes the [magnitude](https://en.wikipedia.org/wiki/Magnitude_(mathematics)) (or length) of the vector $v$ in the Euclidean space, and $\{X\}$ is the set of training inputs.

As we can see, $c$ is a scaled version of the [MSE](https://en.wikipedia.org/wiki/Mean_squared_error), which allow us to deal with positive and negative prediction errors. Notice that $c$ must be continuous in order to be minimized.

Due to high number of weights and biases, Gradient Descend is used for minimizing $c$. This means that we must compute
\begin{equation}
 \nabla c = \frac{1}{\text{size}\{X\}}\sum_{\{X\}}\nabla c_X,
\end{equation}
where
\begin{equation}
  \nabla c_X =
  \begin{bmatrix}
  \begin{bmatrix}
    \frac{\partial c_X}{\partial W_{1,1}^2} & \frac{\partial c_X}{\partial W_{1,2}^2} & \cdots & \frac{\partial c_X}{\partial W_{1,n^1}^2} \\
    \frac{\partial c_X}{\partial W_{2,1}^2} & \frac{\partial c_X}{\partial W_{2,2}^2} & \cdots & \frac{\partial c_X}{\partial W_{2,n^1}^2} \\
    \vdots & \vdots & & \vdots \\
    \frac{\partial c_X}{\partial W_{n^2,1}^2} & \frac{\partial c_X}{\partial W_{n^2,2}^2} & \cdots & \frac{\partial c_X}{\partial W_{n^2,n^1}^2}
  \end{bmatrix},\cdots,
  \begin{bmatrix}
    \frac{\partial c_X}{\partial W_{1,1}^{l_\text{out}}} & \frac{\partial c_X}{\partial W_{1,2}^{l_\text{out}}} & \cdots & \frac{\partial c_X}{\partial W_{1,n^\text{out}-1}^{l_\text{out}}} \\
    \frac{\partial c_X}{\partial W_{2,1}^{l_\text{out}}} & \frac{\partial c_X}{\partial W_{2,2}^{l_\text{out}}} & \cdots & \frac{\partial c_X}{\partial W_{2,n^\text{out-1}}^{l_\text{out}}} \\
    \vdots & \vdots & & \vdots \\
    \frac{\partial c_X}{\partial W_{n^{\text{out}-1},1}^{l_\text{out}}} & \frac{\partial c_X}{\partial W_{n^{\text{out}-1},2}^{l_\text{out}}} & \cdots & \frac{\partial c_X}{\partial W_{n^{\text{out}},n^\text{out}-1}^{l_\text{out}}} \\
  \end{bmatrix},\\
    \begin{bmatrix}
     \frac{\partial c_X}{\partial B_1^2} \\
     \frac{\partial c_X}{\partial B_2^2} \\
     \vdots \\
    \frac{\partial c_X}{\partial B_{n^2}^2} 
  \end{bmatrix},\cdots,
  \begin{bmatrix}
    \frac{\partial c_X}{\partial B_1^{l_\text{out}}} \\
    \frac{\partial c_X}{\partial B_2^{l_\text{out}}} \\
    \vdots \\
    \frac{\partial c_X}{\partial B_{n^\text{out}}^{l_\text{out}}}
  \end{bmatrix}
  \end{bmatrix}.
\end{equation}

$\nabla c_X$ is a list of matrices (weights) and column vectors (biases) that expresses how quicly the cost changes when we modify the weights and the biases.

[Note that we have supposed that $n^2=n^3=\cdots=n^{l_\text{out}}$. If this is not true, there will be $0$-entries in the gradient matrix.]



## Back-Propagation
To compute $\nabla c_X$, i.e., we need to know how $c$ changes with respect to $W$ and $B$ for a given input $X$. Notice that, to known how $c$ changes for the complete trainig set $\{X\}$ (that, or course, can be a [minibatch](https://machinelearningmastery.com/gentle-introduction-mini-batch-gradient-descent-configure-batch-size/)), we will compute the average of the gradients.

Let's start with the weights (the computation of the gradient for the biases can be derived from the expression that we are going to obtain, considering that the activation connected to such biases are $[1 1 \cdots 1]^T$.

\begin{equation}
  \frac{\partial c_X}{\partial W^{l_\text{out}}_{k,j}} =
  \frac{\partial}{\partial W^{l_\text{out}}_{k,j}} \sum_i^{n^{l_\text{out}}}
    \frac{1}{2}(F(X)_i - A_i^{l_\text{out}})^2 =
  -(F(X)_k - A_k^{l_\text{out}}) \frac{\partial A_k^{l_\text{out}}}{\partial W^{l_\text{out}}_{k,j}}
\end{equation}

\begin{equation}
  \frac{\partial A_k^{l_\text{out}}}{\partial W^{l_\text{out}}_{k,j}} =
  \frac{\partial}{\partial W^{l_\text{out}}_{k,j}}\sigma(Z_k^{l_\text{out}}) =
  \sigma(Z_k^{l_\text{out}})(1-\sigma(Z_k^{l_\text{out}}))\frac{\partial Z^{l_\text{out}}}{\partial W^{l_\text{out}}_{k,j}},
\end{equation}
where, in general,
\begin{equation}
  Z_k^l = \sum_j^{n^l}A_j^{l-1}W_{k,j}^l.
\end{equation}

Finally,
\begin{equation}
  \frac{\partial Z_k^{l_\text{out}}}{\partial W^{l_\text{out}}_{k,j}} = A_i^{l_\text{out}-1}.
\end{equation}

Therefore,
\begin{equation}
  \frac{\partial c_X}{\partial W^{l_\text{out}}_{k,j}} =
  -(F(X)_k-A_k^{l_\text{out}})\sigma(Z_k^{l_\text{out}})(1-\sigma(Z_k^{l_\text{out}}))A_i^{l_\text{out}-1}.
\end{equation}

Now, lets call
\begin{equation}
  -(F(X)_k-A_k^{l_\text{out}})\sigma(Z_k^{l_\text{out}})(1-\sigma(Z_k^{l_\text{out}})) =
  E_k^{l_\text{out}}.
\end{equation}

As we can see, it's holds also that
\begin{equation}
  E_k^{l_\text{out}} = \frac{\partial c_X}{\partial A_k^{l_\text{out}}}\sigma'(Z_k^{l_\text{out}}),
\end{equation}
that for a hidden layer $l$ becomes
\begin{equation}
  E_k^l = \frac{\partial c_X}{\partial A_k^l}\sigma'(Z_k^l)),
\end{equation}
where
\begin{equation}
  \frac{\partial c_X^l}{\partial A_k^l} =
  \sum_i^{n^{l+1}} E_i^{l+1}W_{i,k}^{l+1}
\end{equation}
is an estimation of the error at the layer $l$ respect to a change in the activation of this layer. Substituting, we get that
\begin{equation}
  E_k^l = \sum_i^{n^{l+1}} E_i^{l+1}W_{i,k}^{l+1}\sigma'(Z_k^l)).
\end{equation}

So, in general, for hidden layers we have that
\begin{equation}
  \frac{\partial c_X}{\partial W_{k,j}} = E_k^l A_j.
\end{equation}

To get the gradient respect to the biases, we simply must compute that
\begin{equation}
  \frac{\partial c_X}{\partial B_k} = E_k^l.
\end{equation}

<--------->

Lets define the (prediction) error at the $k$-th neuron of the output layer as
\begin{equation}
 \delta^{l_\text{out}}_j := \frac{\partial c_X}{\partial Z^{l_\text{out}}_j}.
\end{equation}
Substituting, we get
\begin{equation}
\delta^{l_\text{out}}_j =
\frac{\partial(\frac{1}{2}||F(X)-A^{l_\text{out}}||^2)}{\partial Z^{l_\text{out}}_j} =
\frac{\partial(\frac{1}{2}||F(X)-\sigma(Z^{l_\text{out}})||^2)}{\partial Z^{l_\text{out}}_j} =
(F(X)-\sigma(Z^{l_\text{out}}))\sigma'(Z^{l_\text{out}}_j) =
\frac{\partial c_X}{\partial A^{l_\text{out}}_j}\sigma'(Z^{l_\text{out}}_j),
\end{equation}
that is the error at the output layer. Now, we retropropagate the error.

For a given layer $l$, the previous expresion can be rewritten as
\begin{equation}
\delta^l_j = \frac{\partial c^l_X}{\partial A^l_j}\sigma'(Z^l_j),
\end{equation}
where
\begin{equation}
\frac{\partial c^l_X}{\partial A^l_j} = \sum_k^{n_{l+1}} W^{l+1}_{k,j}\delta^{l+1}_k
\end{equation}
is an estimation of the change of the error at the layer $l$ respect to a change in the activation of this layer. Substituting this in the previous equation, we get
\begin{equation}
\delta^l_j = \sigma'(Z^l_j)\sum_k^{n_{l+1}} W^{l+1}_{k,j}\delta^{l+1}_k.
\end{equation}

Therefore, BP retro-propagates the error towards the input layer:
1. Set $\delta^{l_\text{out}} = \nabla_{A^{l_\text{out}}}c_X\odot\sigma'(Z^{l_\text{out}})$
2. For $l=l_\text{out},\cdots, 3$:
    1. $\delta^{l-1} = ((W^l)^T\delta^l)\odot\sigma'(Z^{l-1})$

## Gradients as a function of the errors
$\delta^l_j$ represents the error at the neuron $j$ of the layer $l$.

# Objective
The objetive of the NN is to minimize
$$
c(W,B)[i] := \frac{1}{2n}\sum_i^t||E^L(A^1[i])||^2,
$$
a *objective* function (also called *cost* and *loss* function), as a function of the weights and biases of the NN. As we can see, is $c$ is basically based on the [MSE](https://en.wikipedia.org/wiki/Mean_squared_error), where $t$ is the number of training examples, , and $E^L(A^L[i])$ the corresponding NN's output.

Therefore,
\begin{equation}
\nabla c = \frac{1}{t}\sum
\end{equation}

 to compute $\nabla c$, we need to compute $\nabla c[i]$




# Algorithms



## Prediction error retropropagation: the Back-Propagation Algorithm (BPA)

The BPA estimates the prediction error in the internal neurons by retropropagating $E^L$ from the output to the input of the NN (for the rest, we will ignore that this error changes if the feature vector changes because the algorithm does not depend on that):
1. For $l=L,\cdots, 3$:
    1. $E^{l-1}_i = \sum_j^{n^l}W^l_{ij}E^l_j$

## Weights and bias optimization using a Gradient Descend Algorithm (GDA)

Given an input feature vector, to minimize the prediction errors $E$, we use a GDA, which is based on the calculus of the gradient of the prediction error respect to each weight and bias of the NN. For that, we must find the partial derivative
$$
 \frac{\partial E^L}{\partial W^l_{ij}},
$$
and move $W^l_{ij}$ in the opposite direction, iteratively. Notice that this minimization should be performed considering the total cost for all the training inputs.

BPA computes the derivative of the prediction error at each neuron of the NN.
$E^L=\sum_k^{n^L}(T_k-A^L_k)^2$.

$
\frac{\partial E^L}{\partial W^L_{ij}} = 
\frac{\partial    }{\partial W^L_{ij}} \sum_k^{n^L}(T_k-A^L_k)^2 = [\text{by the topology of the NN}] = 
\frac{\partial    }{\partial W^L_{ij}}(T_j-A^L_j)^2 = 
2(A^L_j-T_j)\frac{\partial A^L_j}{\partial W^L_{ij}}
$


## Computation of the gradient using Back-propagation (BPA) of the prediction error

* $E^{l=2,\cdots,l_\text{out}}_i$ the prediction error of the $i$-th neuron of the $l$-th layer (logically, the input layer can not be erroneus). In order to deal with negative and positive errors, we work with a 
\begin{equation}
  
 he prediction error at the output layer is
\begin{equation}
  E^L[A^1] := \frac{1}{2}(T[A^1]-A^L[A^1])^2,
\end{equation}
where the pair $(A^1, T[A^1])$ constitutes a training example. $T(A^1)$ is the target (ideal output) of the NN associated to the input $A^1$ (also called *feature vector*). The errors in (the activation of) the internal neurons $E^{l=2,\cdots,L-1}_i$ will be estimated with the Back-Propagation Algorithm (BPA).


Determines the computation of the error at the internal nodes of the NN:

1. Set $E^L = 
$E^l_i = \sum_j^{n^{l+1}} E^{l+1}_j W^{l+1}_{ij}$

# Definitions





* 

* $e_i^l=t_i-a_i^{L-1}$, the error of the $i$-th neuron of the $l$-th layer, where $t_i$ is the target value for the $i$-th component of the desired output $y$. Notice that $e^{L-1}$ is the error of the network.
* By convenience, the $e^{L-1}$ is not directly minimized, but the cost function $c=\sum_i e_i^2$ .

Activation function!

# Implementation
Based on http://neuralnetworksanddeeplearning.com

In [0]:
import numpy as np
#import ipdb

def sigmoid(z):
    """The sigmoid function."""
    return 1.0/(1.0+np.exp(-z))

def sigmoid_derivative(z):
    """Derivative of the sigmoid function."""
    return sigmoid(z)*(1-sigmoid(z))

class Network:

    def __init__(self, sizes=[2, 3, 1], learning_rate=1, initial_biases=-1):
        self.num_layers = len(sizes)
        self.sizes = sizes
        
        if initial_biases == -1:
            if __debug__:
                print("Randomizing biases using uniform [0.01, 0.99]")
            self.B = [np.random.uniform(low=0.01, high=0.99, size=(y, 1)) for y in sizes[1:]]
        else:
            if __debug__:
                print(f"All biases initialized to {initial_biases}")
            assert initial_biases >= 0.0
            self.B = [np.full((n_l, 1), initial_biases) for n_l in sizes[1:]]
            
        self.W = [np.random.normal(loc=0.0, scale=pow(n_l, -0.5), size=(n_l, n_l_1))
                  for n_l_1, n_l in zip(sizes[:-1], sizes[1:])]
        for W_l in self.W:
            W_l = (W_l-W_l.min()) / (W_l.max() - W_l.min())

        self.LR = learning_rate

    def feed_forward(self, a):
        '''Propagates an activation ``a`` from the input to the output of the network.'''
        for b, w in zip(self.B, self.W):
            print(w.shape, a.shape)
            a = sigmoid(np.dot(w, a) + b)
        return a
    
    def learn(self, x, t):
        '''NN learn when the weights and biases are modified to minimize the cost function.
        To teach a NN, tuples at least a tuple ```(x, y)``` must be presented
        to the NN, where ```x``` is a input training example (also called a feature vector)
        and ``t``` is the associated (ideal) output (target) that the NN should learn.
        
        To modify the weights and the biases, Gradient Descend Optimization (GDO) is used.
        '''
        
        # Find the gradient for each weight and bias of the NN
        nabla_b, nabla_w = self.get_gradients(x, t)
        
        # w_ij^l -= \alpha/len(x)\nabla_w_ij^l
        self.W = [w - (self.LR/len(x))*nw for w, nw in zip(self.W, nabla_w)]
        
        # b_i^l -= \alpha/len(x)\nabla_b_i^l
        self.B = [b - (self.LR/len(x))*nb for b, nb in zip(self.B, nabla_b)]

    def cost_derivative(self, A_at_L, target):
        """Derivative of the cost function: 1/2*(ideal_out - out)^2.
        Notice that the scalar has been ignored (supposed to be 1) it will be
        considerated to be a part of the learning rate used by GDO."""
        return (A_at_L - target)

    def get_gradients(self, _in, ideal_out):
        """ Given an input ``_in`` and an ideal output ``ideal_out``,
        modify the weights and biases to minimize the cost of the error.
        
        Backpropagate the errors from the output to the first hidden layer,
        and computes the 
        """
        #ipdb.set_trace() # <-------------------------- breakpoint
        """Return a tuple ``(nabla_b, nabla_w)`` representing the
        gradient for the cost function C_x.  ``nabla_b`` and
        ``nabla_w`` are layer-by-layer lists of numpy arrays, similar
        to ``self.B`` and ``self.W``."""
        
        # Returns the gradient of the cost function C(x) respect to the
        # biases (nabla_b) and weights (nabla_w).
        nabla_b = [np.zeros(b.shape) for b in self.B]
        nabla_w = [np.zeros(w.shape) for w in self.W]
        #print(len(nabla_b), len(nabla_w))

        # Forward pass. We compute two lists: activations and zs,
        # with the activations and the z's of the neurons of the network.
        A = _in
        As = [_in] # list to store all the activations, layer by layer
        #print(_in)
        zs = [] # list to store all the z vectors, layer by layer
        for b, w in zip(self.B, self.W):
            #for i in activations:
            #    print(i.shape)
            z = np.dot(w, A) + b
            zs.append(z)
            A = sigmoid(z)
            As.append(A)

        # Backward pass. This is the backpropagation algorithm.
        
        # Derivative of the error of the cost function at the output (L-1) layer.
        delta = self.cost_derivative(As[-1], ideal_out) * sigmoid_derivative(zs[-1])
        
        # The gradient of the cost function respect to the biases at the output layer
        # is the calculus performed in the last sentence.
        nabla_b[-1] = delta
        
        # The gradient of the cost function respect to the weights at the output
        # is the previous derivative multiplied by the activations of the L-2 layer.
        #print(delta.shape, As[-2].transpose().shape)
        nabla_w[-1] = np.dot(delta, As[-2].transpose())

        # Now, retropropagate the error of the cost function to the rest of the 
        # layers (starting at L-2) up to the first one (layer 0), computing the gradient.
        for l in range(2, self.num_layers):
            z = zs[-l] # Negative indexes go backwards in the list
            sp = sigmoid_derivative(z)
            delta = np.dot(self.W[-l+1].transpose(), delta) * sp
            nabla_b[-l] = delta
            nabla_w[-l] = np.dot(delta, As[-l-1].transpose())

        return (nabla_b, nabla_w)

# Hyperparameters

* Input layer: $s_0$ and $s_2$, samples of $s_0, s_1, s_2, \cdots$. 
* Output layer: $\hat{s}_1$, a prediction.
* Initial prediction:
$$
\hat{s}_1 = \frac{s_0 + s_2}{2}.
$$
* $L$ layers and number of neurons by layer.

In [2]:
net = Network(sizes=[2, 16, 16, 1], initial_biases=-1)
for i in range(2000):
    net.learn(np.array([[10/255],[50/255]]), np.array([[130/255]]))
print(net.feed_forward(np.array([[10/255],[50/255]])) * 255)

Randomizing biases using uniform [0.01, 0.99]
(16, 2) (2, 1)
(16, 16) (16, 1)
(1, 16) (16, 1)
[[130.]]


In [3]:
net = Network(sizes=[2, 16, 16, 1], initial_biases=0)
for i in range(2000):
    net.learn(np.array([[10/255],[50/255]]), np.array([[130/255]]))
print(net.feed_forward(np.array([[10/255],[50/255]])) * 255)

All biases initialized to 0
(16, 2) (2, 1)
(16, 16) (16, 1)
(1, 16) (16, 1)
[[130.]]


In [4]:
x = np.random.randint(low=0, high=100, size=(2,3))
x

array([[37, 44, 42],
       [38, 22,  3]])

In [0]:
x[1,0]

29