# Deriving efficient LOOCV for the non-quadratic case
In this section, we extend our approach to scenarios where $l$ or $r$ are not quadratic. Although solving equation @eq-theta-def is not simplified to solving a linear equation in this case, we can resort to the following approximation:
$$
H^{(j)} (\theta^{(j)} - \theta) \approx -g^{(j)}
$$ {#eq-newton-approx}
where $H^{(j)}$ and $g^{(j)}$ represent the Hessian and gradient of $f^{(j)}$ at $\theta$, respectively.
The rationale here is that $\theta$ and $\theta^{(j)}$ should be relatively close (and closer as $n$ increases), making it likely that Newton's method on $f^{(j)}$ converges in a single iteration when initialized on $\theta$. 

Similar to the quadratic case, we can relate $H^{(j)}$ and $g^{(j)}$ to $H$ and $g$, the Hessian and gradient of $f$ at $\theta$:
\begin{align*}
H^{(j)} &= H - x_j l''(\hat{y}_i ; y_i) x_j^T
\\
g^{(j)} &= g - x_j l'(\hat{y}_i ; y_i) = - x_j l'(\hat{y}_i ; y_i)
\end{align*}
allowing us to rewrite @eq-newton-approx as:
$$
\left(
    H - x_j l''\left(\hat{y}_i ; y_i\right) x_j^T
\right) 
\left( \theta^{(j)} - \theta \right) 
\approx  x_j l'(\hat{y}_i ; y_i).
$$
Next, we introduce the second equation:
\begin{align*}
H \theta^{(j)} 
    - x_j l''(\hat{y}_i ; y_i) \tilde{y}_j
    - H \theta
    + x_j l''(\hat{y}_i ; y_i) \hat{y}_j
    &\approx
    x_j l'(\hat{y}_i ; y_i)
    \\
    \tilde{y}_j &= x_j ^T \theta^{(j)}.
\end{align*}
Now, we can eliminate $\theta^{(j)}$ and solve for $\tilde{y}_j$:
\begin{align*}
\theta^{(j)} &\approx \theta + t_j (l'(\hat{y}_i ; y_i) +  l''(\hat{y}_i ; y_i) (\tilde{y}_j - \hat{y}_j))
\\
\tilde{y}_j &\approx x_j ^T \left(
    \theta + t_j (l'(\hat{y}_i ; y_i) +  l''(\hat{y}_i ; y_i) (\tilde{y}_j - \hat{y}_j))
    \right)
\\
\tilde{y}_j &\approx 
     \hat{y}_j 
    + \frac{h_j}{1 - h_j l''(\hat{y}_i ; y_i)}  l'(\hat{y}_i ; y_i) 
\end{align*}
where $t_j := H^{-1} x_j$ and $h_j := x_j^T t_j$.

It's worth noting the resemblance between the expression for $\tilde{y}_j$ here and the expression obtained for the quadratic case.

## Python implementation
Once more, we'll turn to jax, leveraging its automatic differentiation capabilities.
Our estimator will take as inputs the loss and regularization functions, along with an optional "inverse link" function. This function can be employed to transform the predicted labels (e.g. a sigmoid to convert log-odds to probabilities in logistic regression).