# **3 Logistic Regression and Kernels (Ed Tam)**
**3.1 RKHS**

The representer theorem says, that for this problem a minimizer $f_{\theta}$ of regularized risk can
be written as

$$f_{\theta}=\sum_{i=1}^n\alpha_ik(\cdot,x_i).$$

In our case, since we are working with a simple inner product $\theta^T\theta$,
we can write our classifier as

$$f_{\theta}=\sum_{i=1}^n\alpha_ix_i,$$

where $\alpha_i$ are real weights. $x_i$ are the points of the training data.
Or it can be represented as follow:

$$f_{\theta}(x)=\sum_{i=1}^n\alpha_ix^Tx_i.$$

**3.2 L2 Regularization**

$$g(\zeta)=\ln(1+\exp(-\zeta))$$

Primal problem:

$$\min_{\mathbf{w}, \zeta}\left[ \frac{1}{2}\|w \|^2 + C\sum_{i=1}^n g(\zeta_i)\right]$$

subject to

$$y_i(w^T\mathbf{x}_i) \geq \zeta_i, \forall i$$.

We can rewrite our primal problem as follow:

$$\min_{\mathbf{w}, \zeta}\left(\max_{\mathbf{\alpha}; \alpha_i\ge0,\forall i}\left[ \frac{1}{2}\|w \|^2 + C\sum_{i=1}^n g(\zeta_i)+\sum_{i=1}^n\alpha_i(\zeta_i-y_i(w^Tx_i))\right]\right).$$

In that case the dual problem

$$\max_{\mathbf{\alpha}; \alpha_i\ge0,\forall i}\left(\min_{\mathbf{w}, \zeta}\left[ \frac{1}{2}\|w \|^2 + C\sum_{i=1}^n g(\zeta_i)+\sum_{i=1}^n\alpha_i(\zeta_i-y_i(w^Tx_i))\right]\right),$$

We can simplify this problem as follow:

\begin{eqnarray}
\max_{\mathbf{\alpha}; \alpha_i\ge0,\forall i}\left(\min_{\mathbf{w}, \zeta}\left[\frac{1}{2}\|w \|^2 + C\sum_{i=1}^n g(\zeta_i)+\sum_{i=1}^n\alpha_i(\zeta_i-y_i(w^Tx_i))\right]\right)=\\
=\max_{\mathbf{\alpha}; \alpha_i\ge0,\forall i}\left(\min_{\mathbf{w}}\left[\frac{1}{2}\|w \|^2-\sum_{i=1}^n\alpha_iy_i(w^Tx_i)\right]+\min_{\zeta}\left[\sum_{i=1}^n\left(Cg(\zeta_i)+\alpha_i\zeta_i\right)\right]\right).\\
\end{eqnarray}

Let's look at the next part of the equation:

$$\min_{\mathbf{w}}\left[\frac{1}{2}\|w \|^2-\sum_{i=1}^n\alpha_iy_i(w^Tx_i)\right],$$

to find the min let's take the derivative:

$$\left[\frac{1}{2}\|w \|^2-\sum_{i=1}^n\alpha_iy_i(w^Tx_i)\right]'=\|w \|\dfrac{w}{\|w \|}-\sum_{i=1}^n\alpha_iy_i(x_i)=0,$$

$$w=\sum_{i=1}^n\alpha_iy_i(x_i).$$

Substituting it back, we have

\begin{eqnarray}
\min_{\mathbf{w}}\left[\frac{1}{2}\|w \|^2-\sum_{i=1}^n\alpha_iy_i(w^Tx_i)\right]=\\
\frac{1}{2}\|\sum_{i=1}^n\alpha_iy_i(x_i)\|^2-\|\sum_{i=1}^n\alpha_iy_i(x_i)\|^2=-\frac{1}{2}\|\sum_{i=1}^n\alpha_iy_i(x_i)\|^2.
\end{eqnarray}

Now, we can do the same with the 2nd part of the problem:

$$\min_{\zeta}\left[\sum_{i=1}^n\left(Cg(\zeta_i)+\alpha_i\zeta_i\right)\right].$$

For that, we will find the derivatives with respect to $\zeta_i$:

\begin{eqnarray}
\left[Cg(\zeta_i)+\alpha_i\zeta_i\right]'=Cg'(\zeta_i)+\alpha_i=C\left[\ln(1+\exp(-\zeta_i))\right]'+\alpha_i=\\
=-\dfrac{C}{\exp(\zeta_i)+1}+\alpha_i=0,
\end{eqnarray}

$$\exp(\zeta_i)\alpha_i+\alpha_i=C,$$

$$\exp(\zeta_i)=\dfrac{C-\alpha_i}{\alpha_i},$$

$$\zeta_i=\ln\dfrac{C-\alpha_i}{\alpha_i}.$$

In that case:

\begin{eqnarray}
\min_{\zeta}\left[\sum_{i=1}^n\left(Cg(\zeta_i)+\alpha_i\zeta_i\right)\right]=\sum_{i=1}^n\left(Cg(\ln\dfrac{C-\alpha_i}{\alpha_i})+\alpha_i\ln\dfrac{C-\alpha_i}{\alpha_i}\right)=\\
\sum_{i=1}^n\left(C\ln\dfrac{C}{C-\alpha_i}+\alpha_i\ln\dfrac{C-\alpha_i}{\alpha_i}\right)=\\
=\sum_{i=1}^n\ln\dfrac{C^C(C-\alpha_i)^{\alpha_i}}{(C-\alpha_i)^{C}\alpha_i^{\alpha_i}}=\sum_{i=1}^n\ln\left(\dfrac{C^C}{\alpha_i^{\alpha_i}}(C-\alpha_i)^{\alpha_i-C}\right)
\end{eqnarray}

The final form for the dual problem looks:

$$\max_{\mathbf{\alpha}; \alpha_i\ge0,\forall i}\left(-\frac{1}{2}\|\sum_{i=1}^n\alpha_iy_ix_i\|^2+\sum_{i=1}^n\left(Cg(\ln\dfrac{C-\alpha_i}{\alpha_i})+\alpha_i\ln\dfrac{C-\alpha_i}{\alpha_i}\right)\right),$$

or

$$\max_{\mathbf{\alpha}; \alpha_i\ge0,\forall i}\left(-\frac{1}{2}\|\sum_{i=1}^n\alpha_iy_ix_i\|^2-\sum_{i=1}^n\ln\left(\dfrac{C^C}{\alpha_i^{\alpha_i}}(C-\alpha_i)^{C-\alpha_i}\right)\right),$$

or

$$-\min_{\mathbf{\alpha}; \alpha_i\ge0,\forall i}\left(\frac{1}{2}\|\sum_{i=1}^n\alpha_iy_ix_i\|^2+\sum_{i=1}^n\ln\left(\dfrac{C^C}{\alpha_i^{\alpha_i}}(C-\alpha_i)^{C-\alpha_i}\right)\right).$$
