# LMA: Levenberg-Marquardt Algorithm

The Levenberg-Marquardt algorithm is an iterative procedure widely used for solving non-linear least squares problems or for finding roots of non-linear systems of equations. This implementation is designed to be robust and offer maximum flexibility, but at the same time provides sensible defaults to facilitate its usage.

## Introduction

The LMA method interpolates between the more aggressive Gauss-Newton algorithm and the more conservative method of gradient descent to find the set of *solution parameters* $p$ that minimize a cost function $F(p)$, defined as:

$$F(p) = \sum_i \rho(y_i(p), c_i)$$

where $y_i(p)$ are the components of the residuals vector $y$, and $\rho$ is a loss function which depends on the residuals and a scaling factor $c_i$. In the standard case ($L_2$ loss function), $F(p)$ is the sum of squared residuals (and $c_i = 1$).

Every iteration of the algorithm, the residuals and the Jacobian are calculated for a given set of parameters $p$. Then, a new guess $p-\Delta p$ is calculated such that:

$$(J^T W J + \lambda D)\Delta p = J^T W y$$

where $J$ is the Jacobian of the residuals ($J_{ij}=\partial y_i / \partial p_j$), $W$ is a weight matrix depending on the choice of loss function (the identity matrix for $L_2$), $D$ is a diagonal matrix such that $D_{kk} = \max(\epsilon(\lambda), (J^T W J)_{kk})$ with $\epsilon(\lambda)$ the damping floor (dependent on the damping factor), $y$ is the residuals vector, and $\lambda$ is the damping factor. The damping factor determines how much the next guess approximates the prediction of the Gauss-Newton algorithm (lower damping factors) or the gradient descent methods (higher damping factors), in effect defining a trust-region.

For this new guess, the predicted error reduction is calculated as:

$$\Delta s_p = \frac{1}{2}(\Delta p^T J^T W y + \lambda \Delta p^T D \Delta p)$$

If the ratio of the actual error reduction calculated during the next iteration with respect to this prediction is above a defined limit, the current guess is accepted and the damping factor is decreased. Else, the new guess is rejected and the damping factor is increased.

The algorithm finishes once one of these conditions is met, returning the last accepted guess:

* Maximum number of iterations reached
* Cost (sum of losses) below specified tolerance
* Relative change in solution parameters or cost below specified tolerance
* Stagnation at maximum damping factor

### Choice of loss function

The standard $L_2$ loss function calculates the loss as the square of the residual. Its corresponding weight matrix is the identity. Moreover, if an individual scaling factor is defined for each residual, this loss function can be used for the solution of *weighted least-squares* problems.

Robust loss functions provide mechanisms to mitigate the effect of outliers in fitting data:

* **Huber** Equivalent to $L_2$ for small residuals and linear for larger residuals
* **Cauchy** Strongly down-weights large outliers
* **L1-Soft** Smooth approximation of $L_1$ loss that behaves like $L_2$ for small residuals
* **Tukey** Redescending M-estimator that completely rejects extreme outliers (but it may lead to convergence issues if scaling is not chosen well)
* **Welsh** Another redescending M-estimator, smoother than Tukey's in its rejection
* **Fair** Less sensitive to large errors than $L_2$, but not redescending
* **Arctan** Limits maximum loss of single residuals

### Normalized damping factor

A particularity of this LMA implementation is the use of a *normalized damping factor*. The damping factor $\lambda$ will oscillate between the specified limits $\lambda_{min}$ and $\lambda_{max}$, starting with an initial value $\lambda_0$. The normalized damping factor is calculated as:

$$\lambda_{norm} = \frac{(\lambda_{max}-\lambda_0)(\lambda-\lambda_{min})}{(\lambda_0-\lambda_{min})(\lambda_{max}-\lambda)}$$

If $\lambda_{norm}$ is 1, the initial damping factor is being used. If it is larger, it means that more damping than indicated was necessary, reaching the maximum $\lambda_{max}$ at infinite; if it is lower, it means that it could be decreased for faster convergence, with the point at 0 corresponding with $\lambda_{min}$.

The usage of normalized damping factors allows to monitor and adjust the effective size of the trust region during successive function calls independently of the actual damping parameters at use.

### Adaptive damping floor

To enhance numerical stability, particularly when dealing with Jacobian matrices $J$ where $J^T W J$ may have very small or zero diagonal elements, this LMA implementation employs an adaptive floor for the diagonal elements of the damping scaling matrix $D$. The floor $\epsilon(\lambda)$ is calculated such that it takes the value $\epsilon_0$ (an arbitrarily small number) when $\lambda$ is $\lambda_0$ or lower than $\lambda_0$ (so, $\lambda_{norm}≤1$), and it approaches 1 as $\lambda$ approaches $\lambda_{max}$.

This adaptive mechanism ensures that while a very low floor is used during optimistic (low damping) phases, a more substantial floor (approaching 1) is automatically applied to weak components when high overall damping is required.

## Usage

### `Jacobian` Operator

    R←{X}f Jacobian Y
    
`Jacobian` is a monadic operator that takes a monadic function `f` as left operand to return an ambivalent function. This derived function returns an estimation of the Jacobian matrix of `f`, using the method of finite differences. The right argument `Y` is the value at which the Jacobian is calculated, and the optional left argument `X` is the relative perturbation to apply to `Y` in the finite differences method. If `X` is not a given, `⎕CT*÷2` is used.

### `LMA` Operator

    R←{X}f LMA Y

`LMA` is a monadic operator, which takes a left operand to return a derived ambivalent function. This derived function allows to minimize a residual function with a known Jacobian using the Levenberg-Marquardt algorithm, given an initial set of parameters. Several configuration options are available, with sensible defaults previously defined.

The left operand `f` must be a configuration namespace or a function. Configuration namespaces may define the following configuration options:

* `toli`: Maximum number of iterations (default `1E3`)
* `tolc`: Tolerance for the cost (sum of squared residuals or loss values) (default `⎕CT`)
* `tolr`: Tolerance for relative change, either in the solution or the residual (default `⎕CT`)
* `tolg`: Tolerance for the gain ratio to accept or reject a step (default `1E¯2`)
* `dini`: Initial damping factor for `dnorm=1` (default `1E¯2`)
* `dinc`: Increment of damping factor after rejected solution (default `5`)
* `ddec`: Decrement of damping factor after accepted solution (default `÷dinc`)
* `dmax`: Maximum damping factor (default `÷⎕CT`)
* `dmin`: Minimum damping factor (default `÷dmax`)
* `pert`: Relative perturbation applied to parameters for numerical estimation of the Jacobian (default `⎕CT*÷2`)
* `loss`: Choice of loss function: `L2` `Huber` `Cauchy` `L1Soft` `Tukey` `Welsh` `Fair` `Arctan` or dyadic function (default `L2`)
* `scale`: Scale factor passed as left argument to loss function (default for 95% efficiency in robust loss functions)
* `verbose`: If `1`, print `iter cost rel dnorm p` each iteration (default `0`)

Configuration namespaces may also contain the functions:

* `Callback`: Callback function (default `⊢`)
* `Eval`: Evaluation function

The evaluation function `Eval` must return either the residuals and the Jacobian for the given set of solution parameters, or only the residuals. Whenever the residual and Jacobian need to be evaluated, the function `Eval` will be called with trial parameters as right argument and left argument `X`, if given (`Eval` will be called monadically if the derived function `f LMA` is called monadically). `Eval` must return either a two elements vector with the residuals in the first element and the Jacobian in the second one, or a vector of residuals, enclosed if they are not simple scalars. If a Jacobian is not returned, a numerical estimation is calculated evaluating the residual function after applying small perturbations to the parameters (as defined by `pert`).

The function selected by the option `loss` is used to calculate the loss from the residuals and scaling factor. If a function is provided by the user, it must be a dyadic function which returns the loss values and weights when given the residuals as right argument and scaling factor as left argument.

The `Callback` function will be called every iteration before checking convergence, with the current solution namespace as right argument and `X` as left argument, if given (`Callback` will be called monadically if the derived function `f LMA` is called monadically). Its return value is discarded.

If `f` is a function, the result is equivalent to using as `f` a namespace with an `Eval` function `f`
(with default values for the rest of parameters).

`Y` must be a vector.
The first element of `Y`, or `⊂Y` if `1=≡Y`, contains the initial guess for the solution parameters.
If the next element of `Y` is a scalar numeric value, it is interpreted as the initial normalized damping factor.
Additional elements of `Y` must be configuration namespaces. The final configuration parameters are obtained
overwriting the parameters in the namespace given as left operand with those given as right argument from right to left. Default values will be used for non-defined parameters and the `Callback` function, but the `Eval` function must be defined by the user either as left operand `f` or as member of a configuration namespace.

The returned value `R` is a solution namespace corresponding to the last accepted solution.
A solution namespace is a configuration namespace including all the configuration options used to run the algorithm and the additional elements:

* `iter`: Number of iterations
* `cost`: Sum of loss values (squared residuals for L2)
* `rel`: Relative change metric
* `dnorm`: Normalized damping factor
* `p0`: Initial guess
* `p`: Accepted guess

#### Notes

* With the exception of `toli` and `tolc`, configuration parameters should be modified only by expert users or in case of convergence problems

* The relative change metric `rel` is the minimum relative change between successive accepted solutions either in the the cost or in the solution parameters

* In addition to being used for the definition of default values, `⎕CT` is also the baseline for adaptive floor damping

* The perturbation to estimate the Jacobian `pert` and the scaling factor for loss functions `scale` can be either scalar values, or vectors of the same length of respectively the parameters and the residuals

* Loss functions and their respective weights, as well as their default values for the scaling parameter are defined in the namespace `Loss`


### `LM` Operator

`LM` is a simplified version of `LMA`. It is a dyadic operator from which a dyadic function is derived. Usage:

    R←X f LM g Y

where `f` is a monadic evaluation function, `g` is a monadic function which takes as argument an `iter cost rel dnorm p` vector and gets called before every convergence check, `Y` is a two elements vector with the initial guess of parameters and normalized damping factor, and `X` is a vector with the configuration parameters `toli tolc tolr tolg dini dinc ddec dmax dmin`. The function `f` must return either the residuals (as a simple or nested vector), the residuals and the Jacobian, or the residuals, Jacobian, loss values and weights, for a set of parameters. If no Jacobian is provided, it is estimated numerically using `Jacobian` (with no left argument). If no loss values and weights are provided, squared residuals are used.

The return value `R` is an `iter cost rel dnorm p` vector.

## Implementation

In [1]:
)clear

In [2]:
]dinput
Jacobian←{⍺←⎕CT*÷2 ⋄ a←⍺×1@(0∘=)|⍵                     ⍝ Jacobian matrix of ⍺⍺ at ⍵ applying perturbation ⍺
    ⍉↑⍺÷⍨(⍺⍺¨(⊂⍵)+↓↑(-⍳≢⍵)↑¨⍺)-⊆⍺⍺ ⍵                   ⍝     finite-differences method
}

In [3]:
]dinput
LM←{ti tc tr tg d0 di dd dx dn←⍺ ⋄ p d←⍵               ⍝ Levenberg–Marquardt algorithm

    D←((dx-d0)×dn-⍨⊢)÷(d0-dn)×dx-⊢ ⋄ L←{⍵,(×⍨⊃⍵)2÷2}   ⍝ damping factor normalization(λ) and standard loss(y)
    J←{(1<|≡g)∧1<≢g←⍺⍺ ⍵:g ⋄ (⊃⊆g)(⍺⍺ Jacobian ⍵)}     ⍝ residual and (estimated if not given) jacobian(p)
    E←⍺⍺{y j l w←L⍣(2=≢g)⊢g←⍺⍺ J ⍵ ⋄ (+/l)⍵ y j w}     ⍝ eval(p): cost, parameters, residual, jacobian
    A←{c p y j w←⍵ ⋄ c p(t+.×j)(y+.×⍨t←w×⍤1⍉j)}        ⍝ accept(EG output): sum(error²), parameters, JtJ, Jty
    
    T←{                                                ⍝ try guess(λ)
        r⊢←0 ⋄ 11::dx⌊⍵×di ⋄ b←1-÷1⌈d⊢←D ⍵             ⍝     bad guess if domain error
        ∆p←jy⌹jj+⍺×⍤1⊢⍵×dj←(⎕CT+b-⎕CT×b)⌈1 1⍉jj        ⍝     change of parameters with adaptive floor
        c0←2÷⍨∆p+.×jy+∆p×⍵×dj                          ⍝     predicted error decrement
        c1←c-⊃g←E⊢q←p-∆p                               ⍝     actual error decrement
        r⊢←(p(-÷⍥(+.×⍨)⊣)q)⌊|c1÷c                      ⍝     relative change in parameters or residuals
        (⎕CT>c0)∧⎕CT<c1:dx⌊⍵×di                        ⍝     if no changing, increase damping
        (⎕CT≤c0)∧tg≥c1÷c0:dx⌊⍵×di                      ⍝     if diverging, increase damping
        dn⌈⍵×dd⊣c p jj jy⊢←A g                         ⍝     accept change, decrease damping
    }
    C←⍵⍵{                                              ⍝ convergence check(λ_prev, λ)
        _←⍺⍺(i⊢←i+1)c r d p                            ⍝     call user function
        (ti<i)∨(dx∧.=⍺ ⍵)∨(tc>c)∨(r>0)∧tr>r            ⍝     iterations, max damping, residual, not changing
    }
    i r←0 ⋄ 0≥ti⊣⍵⍵ g←i c r d p⊣c p jj jy←A E p:g      ⍝ init
    i c r d p⊣(∘.=⍨⍳≢p)T⍣C{11::dx ⋄ D⍣¯1⊢⍵}d           ⍝ iterations, cost, change, norm damping, parameters
}

In [4]:
:Namespace Loss
    l2←1                                               ⍝ standard squared-residual
    L2←{(0.5×⍺××⍨⍵)⍺}
    
    huber←1.345                                        ⍝ ~95% efficiency for normal errors
    Huber←{y←⍺>|⍵ ⋄ Y←y⍨ ⋄ N←~Y                        ⍝ L2 for small residuals, lineal for large ones
        (((⍺÷2)-⍨⍺×|)@N×⍨@Y ⍵)((⍺÷⍨|)@N 1@Y ⍵)
    }
    cauchy←2.385                                       ⍝ ~95% efficiency for normal errors
    Cauchy←{(⍺×(⍺÷2)×⍟1+r)(÷1+r←×⍨⍵÷⍺)}                ⍝ strongly downweights large outliers
    
    softl1←1
    SoftL1←{(⍺×⍺×r-1)(÷r←(1+×⍨⍵÷⍺)*÷2)}                ⍝ L2 for small residuals, L1 for large ones
    
    tukey←4.685                                        ⍝ ~95% efficiency for normal errors
    Tukey←{k←(×⍨⍺)÷6 ⋄ y←⍺>|⍵ ⋄ Y←y⍨ ⋄ N←~Y            ⍝ re-descending M-estimator
        (k@Y(k×1-3*⍨1-⊢)@N⊢r)(0@Y(×⍨1-⊢)@N⊢r←×⍨⍵÷⍺)
    }
    welsh←2.985                                        ⍝ ~95% efficiency for normal errors
    Welsh←{(⍺×(⍺÷2)×1-e)(e←*-×⍨⍵÷⍺)}                   ⍝ smoother re-descending M-estimator
    
    fair←1
    Fair←{(⍺×⍺×r-⍟1+r)(÷1+r←(|⍵)÷⍺)}                   ⍝ no re-descending
    
    arctan←1
    Arctan←{(⍺×(⍺÷2)×¯3○r)(÷1+×⍨r←×⍨⍵÷⍺)}              ⍝ limits maximum loss
:EndNamespace

In [5]:
]dinput
LMA←{⍺←⊢ ⋄ p←⊃w←⊆⍵ ⋄ 0::⎕SIGNAL ⎕EN                    ⍝ pass signals

    n←'d',¨'ini' 'inc' 'dec' 'max' 'min'               ⍝ damping
    n←('icrg',⍨¨⊂'tol'),n                              ⍝ tolerances
    nc←'ddec' 'dmin' ⋄ ns←'scale'                      ⍝ computed defaults
    nr←'iter' 'cost' 'rel' 'dnorm' 'p'                 ⍝ results
    c←(                                                ⍝ default config
        toli:1000 ⋄ tolc:⎕CT ⋄ tolr:⎕CT ⋄ tolg:0.01    ⍝     tolerances: iterations cost relative gain
        dini:0.01 ⋄ dinc:5 ⋄ dmax:÷⎕CT ⋄ loss:'L2'     ⍝     damping and loss function
        p0:p ⋄ pert:⎕CT*÷2 ⋄ verbose:0                 ⍝     init guess, perturbation, logging
    )
    F←{1((⊂6 0⍕↑),12 ¯5∘⍕¨⍤↓)⍵} ⋄ P←{⎕←F ⍵ ⋄ ⍵}        ⍝ format and print
    D←{⍕'sp',¨':',¨(⊂2 5)⌷F ⍵}                         ⍝ display form
    J←{⍉↑⍺÷⍨(⍺⍺¨(⊂⍵)+↓↑(-⍳≢⍵)↑¨⍺)-⊆⍺⍺ ⍵}               ⍝ estimate Jacobian
    E←{⍺←⊢ ⋄ (1<≡e)∧2=≢e←⍺⍺ ⍵:e ⋄ (⊃⊆e)(⍺ ⍺⍺ J ⍵)}     ⍝ evaluate, and estimate J if needed
    M←(2÷⍨1⊥⊢⌷⍨∘⊂⍋⌷⍨∘⊂∘⌈2÷⍨0 1+≢){1@(0∘=)⍺⍺|⍵-⍺⍺ ⍵}    ⍝ median absolute deviation
    L←{0=c.⎕NC'sigma':∇⍵⊣c.sigma←(M ⍵)÷0.6745          ⍝ loss function (needs stddev estimation)
        3=⎕NC'⍺⍺':c.(sigma×⎕VGET⊂ns 1)⍺⍺ ⍵             ⍝     user defined
        ~(⊂⍺⍺)∊Loss.⎕NL¯3:⎕SIGNAL 6                    ⍝     if no user defined it must be in Loss
        c.(sigma×⍣(⍺⍺≢'L2')⊢scale)(Loss.⍎⍺⍺)⍵          ⍝     scaled loss function
    }
    3=⎕NC'⍺⍺':⍺((⍺⍺{⍵.Eval←⍺⍺ ⋄ ⍵}c)∇∇)w               ⍝ ⍺⍺ is Eval function
    (1<≢w)∧~2|⎕DR⊃⌽w:⍺((⎕NS ⍺⍺(⊃⌽w))∇∇)¯1↓w            ⍝ non numeric extra argument is config
    2<≢w:⎕SIGNAL 11                                    ⍝ wrong argument
    
    c.CallBack←⊢ ⋄ c←c ⎕NS ⍺⍺ ⋄ c.dnorm←1⊣⍣(1=≢w)⊃⌽w   ⍝ default callback and actual config
    c.scale←c ⎕VGET⊂ns(Loss ⎕VGET ⎕C c.loss)           ⍝ set scale factor for loss function
    _←c ⎕VSET(↑nc)(c ⎕VGET(↑nc)c.(÷dinc dmax))         ⍝ other computed defaults
    CB←⍺{⍺⍺ c.CallBack c ⎕VSET(↑nr)⍵}P⍣(c.verbose≢0)   ⍝ callback function
    EV←⍺∘c.Eval{y j←⍺ ⍺⍺ E ⍵ ⋄ y j,(c.loss L)y}        ⍝ eval function
    c⊣c.⎕DF D(c ⎕VGET↑n)(c.pert∘EV)LM CB p c.dnorm     ⍝ return namespace
}

### Tests

In [6]:
]dinput
Test←{0::⎕EM ⎕SIGNAL ⎕EN
    t←()
    ⍝ Tests ability to navigate sharp, narrow valleys and handle situations where the Jacobian
    ⍝ may lead to a locally rank-deficient J^T J matrix (eg. zero diagonal elements), requiring
    ⍝ robust damping
    t.Beale←{
        B←{
            x y←⍵ ⋄ (1.5 2.25 2.625-x×1-y*⍳3)[(¯1+y)x ⋄ (¯1+y*2)(2×x×y) ⋄ (¯1+y*3)(3×x×y*2)]
        }
        r←3 0.5 CMP(R B #.LMA 1 0.8).p                                ⍝ easy
        r,←3 0.5 CMP(R B #.LMA 1 1).p                                 ⍝ singular Jt J
        r,←3 0.5 CMP(R B #.LMA 0 0).p                                 ⍝ another tricky point
        r,←3 0.5 CMP(R B #.LMA 1 ¯2).p                                ⍝ different quadrant
        r,←(B #.LMA(2 2)(toli:1)).cost>(R B #.LMA 2 2).cost           ⍝ closer to another local minimum
        r,←(B #.LMA(¯1 1)(toli:1)).cost>(R B #.LMA ¯1 1).cost         ⍝ closer to another local minimum
        r
    }
    ⍝ numerical approximation of the Jacobian
    t.BealeNum←{
        B←{
            x y←⍵ ⋄ (1.5 2.25 2.625-x×1-y*⍳3)
        }
        r←3 0.5 CMP(R B #.LMA 1 0.8).p                                ⍝ easy
        r,←3 0.5 CMP(R B #.LMA 1 1).p                                 ⍝ singular Jt J
        r,←3 0.5 CMP(R B #.LMA 0 0).p                                 ⍝ another tricky point
        r,←3 0.5 CMP(R B #.LMA 1 ¯2).p                                ⍝ different quadrant
        r,←(B #.LMA(2 2)(toli:1)).cost>(R B #.LMA 2 2).cost           ⍝ closer to another local minimum
        r,←(B #.LMA(¯1 1)(toli:1)).cost>(R B #.LMA ¯1 1).cost         ⍝ closer to another local minimum
        r
    }
    ⍝ Jacobian and fitting problem with outliers to test loss functions and bare LM
    ExpDec←{A k C←⍺ ⋄ C+A×*-k×⍵}
    ExpDecEv←{x y←⍺ ⋄ A k C←⍵ ⋄ xe←A×x×e←*-k×x ⋄ (y-⍵ ExpDec x)(⍉(-e)⍪xe⍪⍉⍪-=⍨e)}
    y←100×@(?≢x)⊢0.1{⍵+⍺×0.5-⍨?0⍴⍨≢⍵}y0←(p←10 0.5 1)ExpDec⊢x←(⍳100)-1
    t.ExpDecFit←{p x y y0←#.(p x y y0) ⋄ CMP←{0.1>|⍺-⍵}
        r←p CMP(R x y0∘#.ExpDecEv #.LMA(10 0.5 1)(loss:'L2')).p                    ⍝ exact
        r,←p CMP(R x y0∘#.ExpDecEv #.LMA(10 0.5 1)(loss:'L2' ⋄ scale:?(≢x)⍴0)).p   ⍝ exact (wls)
        r,←p CMP(R x y0∘#.ExpDecEv #.LMA(10 0.5 1)(loss:'Huber')).p                ⍝ exact
        r,←p CMP(R x y∘#.ExpDecEv #.LMA(5 0.1 0.5)(loss:'Cauchy')).p
        r,←p CMP(R x y∘#.ExpDecEv #.LMA(5 0.1 0.5)(loss:'SoftL1')).p
        r,←p CMP(R x y0∘#.ExpDecEv #.LMA(5 0.1 0.5)(loss:'Tukey' ⋄ scale:0.1)).p   ⍝ exact with scaling
        r,←p CMP(R x y∘#.ExpDecEv #.LMA(5 0.1 0.5)(loss:'Welsh')).p
        r,←p CMP(R x y∘#.ExpDecEv #.LMA(5 0.1 0.5)(loss:'Fair')).p
        r,←p CMP(R x y∘#.ExpDecEv #.LMA(5 0.1 0.5)(loss:'Arctan')).p
        r
    }
    t.ExpDecJac←{p x y←#.(p x y0) ⋄ tol←1e¯10
        R←{⎕←(16↑''),12 ¯5⍕⍵ ⋄ tol>⍵}
        r←R+.×⍨,((y-#.ExpDec∘x)#.Jacobian p)-⊃⌽x y #.ExpDecEv p
        r,←R+.×⍨,(1e¯9(y-#.ExpDec∘x)#.Jacobian p)-⊃⌽x y #.ExpDecEv p
        r
    }
    t.ExpDecLM←{p x y←#.(p x y0)
        R←{⎕←(10↑''),((6 0⍕⊃),(12 ¯5⍕2∘⊃),4∘↓)⍵ ⋄ ⍵}
        cfg←1e6 ⎕CT ⎕CT 1e¯2 1e¯2 5(÷5)(÷⎕CT)⎕CT                         ⍝ ti tc tr tg d0 di dd dx dn
        r←p CMP⊃⌽R cfg(y-#.ExpDec∘x)#.LM⊢(5 0.1 0.5)1                    ⍝ only residuals
        r,←p CMP⊃⌽R cfg(x y∘#.ExpDecEv)#.LM⊢(5 0.1 0.5)1                 ⍝ residual and jacobian
        r,←p CMP⊃⌽R cfg((1,⍨⊢,(×⍨1∘↑))x y∘#.ExpDecEv)#.LM⊢(5 0.1 0.5)1   ⍝ everything
        r
    }
    ⍝ Assesses capability to follow a long, narrow, curving multi-dimensional valley, testing
    ⍝ the interplay between step length and direction adjustments controlled by the damping factor
    t.Helical←{
        T←{1|1+(12○⍺+0j1×⍵)÷○2} ⋄ M←{|⍺+0j1×⍵}
        H←{
            a b c←⍵ ⋄ m←a M b ⋄ y←(10×c-10×a T b)(10×m-1)c
            y[((50×b)÷○+.×⍨a b)(-(50×a)÷○+.×⍨a b)10 ⋄ (10×a÷m)(10×b÷m)0 ⋄ 0 0 1]
        }
        r←1 0 0 CMP(R H #.LMA¯1 0 0).p                                ⍝ standard starting point
        r,←1 0 0 CMP(R H #.LMA¯1.2 0.1 0.1).p                         ⍝ slightly perturbed
        r,←1 0 0 CMP(R H #.LMA¯0.9 ¯0.05 ¯0.05).p                     ⍝ in other direction
        r,←1 0 0 CMP(R H #.LMA 0.5 ¯0.5 0.5).p                        ⍝ qualitatively different
        r,←1 0 0 CMP(R H #.LMA¯0.5 0.5 ¯0.5).p                        ⍝ another one
        r,←1 0 0 CMP(R H #.LMA¯1 0 10).p                              ⍝ far off in 3rd dimension
        r,←1 0 0 CMP(R H #.LMA¯1 0 ¯10).p                             ⍝ far off in 3rd dimension
        r,←1 0 0 CMP(R H #.LMA 3 4 5).p                               ⍝ away in all components
        r
    }
    ⍝ Verifies efficiency and correctness for a linear system. Converge should be very fast
    ⍝ (1 or 2 iterations) with minimal damping, demonstrating Gauss-Newton like behavior
    t.Linear←{
        y←(A←?100 10⍴0)+.×x←⍳10 ⋄ s←R{(y-⍨A+.×⍵)A}#.LMA(?10⍴0)0 ⋄ (1=s.iter),(⍳10)CMP s.p
    }
    ⍝ numerical approximation of the Jacobian
    t.LinearNum←{
        y←(A←?100 5⍴0)+.×x←⍳5 ⋄ s←R{y-⍨A+.×⍵}#.LMA(?5⍴0)0 ⋄ (⍳5)CMP s.p
    }
    ⍝ Evaluates LMA's robustness and ability to converge to a solution when the Jacobian
    ⍝ becomes singular (rank-deficient) at or near the optimum
    t.Powell←{
        P←{
            a b c d←⍵ ⋄ r5 r10←5 10*÷2 ⋄ y←(a+10×b)(r5×c-d)(×⍨b-2×c)(r10××⍨a-d)
            y[1 10 0 0 ⋄ 0 0 r5(-r5) ⋄ 0(2×b-2×c)(-2×b-2×c)0 ⋄ (2×r10×a-d)0 0(-2×r10×a-d)]
        }
        r←(4⍴0)CMP(R P #.LMA(3 ¯1 0 1)(tolc:1e¯30)).p                 ⍝ standard starting point
        r,←(4⍴0)CMP(R P #.LMA(0 0 0 0)(tolc:1e¯30)).p                 ⍝ solution (convergence at zero step)
        r,←(4⍴0)CMP(R P #.LMA(1 1 1 1)(tolc:1e¯30)).p                 ⍝ far from solution
        r
    }
    ⍝ Tests LMA's performance in minimizing a classic non-linear function characterized by
    ⍝ a deep, narrow, banana-shaped valley, requiring effective adaptation of search
    ⍝ direction and step size
    Rosenbrock←{p q←⍵ ⋄ ((10×q-×⍨p),1-p)[(-20×p)10 ⋄ ¯1 0]}
    t.Rosenbrock←{
        r←1 1 CMP(R #.Rosenbrock #.LMA 1.5 1.5).p                     ⍝ close to the solution
        r,←1 1 CMP(R #.Rosenbrock #.LMA 2 1).p                        ⍝ not so close
        r,←1 1 CMP(R #.Rosenbrock #.LMA 0 0).p                        ⍝ outside of parabollic valley
        r,←1 1 CMP(R #.Rosenbrock #.LMA ¯1.2 1).p                     ⍝ further
        r,←1 1 CMP(R #.Rosenbrock #.LMA ¯2 ¯2).p                      ⍝ far and wrongly pointed gradient
        r,←1 1 CMP(R #.Rosenbrock #.LMA 2 2).p                        ⍝ far and wrongly pointed gradient
        r
    }
    ⍝ numerical approximation of the Jacobian
    t.RosenbrockNum←{
        r←1 1 CMP(R⊃⍤#.Rosenbrock #.LMA 1.5 1.5).p                    ⍝ close to the solution
        r,←1 1 CMP(R⊃⍤#.Rosenbrock #.LMA 2 1).p                       ⍝ not so close
        r,←1 1 CMP(R⊃⍤#.Rosenbrock #.LMA 0 0).p                       ⍝ outside of parabollic valley
        r,←1 1 CMP(R⊃⍤#.Rosenbrock #.LMA ¯1.2 1).p                    ⍝ further
        r,←1 1 CMP(R⊃⍤#.Rosenbrock #.LMA ¯2 ¯2).p                     ⍝ far and wrongly pointed gradient
        r,←1 1 CMP(R⊃⍤#.Rosenbrock #.LMA 2 2).p                       ⍝ far and wrongly pointed gradient
        r
    }
    ⍝ Check that algorithm finishes on termination conditions
    t.Terminate←{
        r←(R #.Rosenbrock #.LMA(0 0)(toli:10)).(iter>toli)                      ⍝ number of iterations
        r,←(R #.Rosenbrock #.LMA(0 0)(tolc:0)).(rel<tolr)                       ⍝ relative change
        r,←(R #.Rosenbrock #.LMA(0 0)(tolc:0 ⋄ dmax:1e6)).((dnorm>1e6)∧rel=0)   ⍝ maximum damping
        r
    }
    ⍺←1e¯6 ⋄ tol←⍺ ⋄ CMP←{tol>|⍺-⍵} ⋄ R←{⎕←(10↑''),⍵.((6 0⍕iter),(12 ¯5⍕cost),p) ⋄ ⍵}
    (t⎕NS'tol' 'CMP' 'R'){⎕←⍵ ⋄ 0∊⍺⍎⍵,'⍬':('TEST FAILED: ',⍵)⎕SIGNAL 8 ⋄ _←0}¨(↓t.⎕NL 3)⊣⍣(⍵≡'*')⊆⍵
}

In [7]:
Test'*'

### Export

In [8]:
] _←link.export -overwrite # APLSource

## Tutorial

This section provides practical examples to guide in the application of the Levenberg-Marquardt algorithm and the `LMA` and `LMA` operators to the solution of optiomization problems. The examples cover scenarios from simple equation solving to robust non-linear least squares fitting.

### 1. Solving a simple linear system

This example demonstrates the most basic use of `LMA` to solve a known linear system $A p = b$
by minimizing $||A p - b||$. The algorithm should behave like Gauss-Newton for linear problems, converging very quickly to the analytical solution $p = A^{-1} b$.

To begin, we define a random matrix `A` and random vector `b`. The solution parameters `p` can then be found using matrix division with `b⌹A`.

In [9]:
⎕RL←1 ⋄ A←?5 5⍴9 ⋄ b←?5⍴0 ⋄ p←b⌹A
'bAp'⍪⍉⍪⍪¨b A p

Next, we define an evaluation function. This function must return the residuals and Jacobian for a set of solution parameters. The residual is $A p - b$, because the algorithm will minimize the sum of squares of the residual. Therefore, the Jacobian is the matrix $A$.

In [10]:
LinEval←{A b←⍺ ⋄ (b-⍨A+.×⍵)A}     ⍝ residual and jacobian for solution parameters ⍵
'b-A+.×p' 'A'⍪⍉⍪⍪¨A b LinEval p   ⍝ eg: at the solution point the residual should be zero

To specify configuration options, we define a configuration namespace `cfg`. We must provide an evaluation function (the `LinEval` function that we just defined), and we will also set a very tight cost tolerance , since we know that an exact solution exists.

In [11]:
cfg←(tolc:1e¯20)       ⍝ cost tolerance
cfg.Eval←A b∘LinEval   ⍝ evaluation function

Everything else we need to run the algorithm is an initial guess. We use a vector of ones.

In [12]:
r←cfg LMA 5⍴1                    ⍝ run algorithm
'p' 'r.p' '∆'⍪⍉⍪⍪¨p r.p(p-r.p)   ⍝ compare result and known solution

The parameters found are very close to the true parameters! We can also inspect other output variables. The sum of squared errors (the cost value) should be low, and we should have achieved convergence very quickly.

In [13]:
RES←{⍺←'iter' 'cost' 'rel' 'dnorm' ⋄ ⍕⍉↑⍺(⍵⎕VGET↑⍺)} ⋄ P←{⍕(¯5↑'p')⍵.p}
RES r ⋄ P r

We needed a few iterations to solve the problem. Let's use the `verbose` option to see the progress of the algorithm during each iteration. To pass an additional option without modifying our `cfg` namespace, we will use a namespace in the right argument.

In [14]:
cfg LMA(5⍴1)(verbose:1)

The iteration number is in the first column; cost, relative change and normalized damping factor are in the second, third and fourth columns; and the accepted solution parameters follow.

We observe that the cost is reduced during each iteration without problems. This indicates that we can reduce the damping factor. Actually, we can reduce it to the minimum using `0` as normalized damping factor.

In [15]:
cfg LMA(5⍴1)0(verbose:1)

Convergence is achieved in a single iteration now!

Notice the flexibility of the different ways in which options can be passed to `LMA`. Alternatively, we can also pass the evaluation function directly, avoiding having to predeclare a namespace. Additional options can be specified in the right argument:

In [16]:
A b LinEval LMA(5⍴1)0(tolc:1e¯20 ⋄ verbose:1)

### 2. Fundamentals of non-linear least squares (NLLS) fitting

In the next example, we will fit the parameters of a non-linear model. We will consider the exponentical decay function:

$$y = C + A e^{-k x}$$

Let's generate some data:

In [17]:
⎕RL←1
ExpDec←{A k C←⍺ ⋄ C+A×*-k×⍵}                ⍝ exponential decay
Noise←{⍵×1+⍺×0.5-?≠⍨⍵}                      ⍝ add random noise of ±⍺ to ⍵
y0←(p←0.1 5 2)ExpDec x←(⍳100)÷10            ⍝ true data (100 data points)
y←5e¯3 Noise y0                             ⍝ add some random noise (±0.5%)
'x' 'y0' 'y' 'y-y0'⍪⍉⍪(10↑⍪)¨x y0 y(y-y0)   ⍝ first 10 data points

The `ExpDec` function evaluates the exponential decay function in the selected points for the parameters given. We use it to generate true values `y0` using known parameters `p`. Then, using the function `Noise`, we apply a random variation to all the points (of a maximum of 0.5%), to get the more realistic data in `y`. This imperfect data is a better representation of the kind of data usually found in fitting problems.

To fit the parameters using `LMA`, we need an evaluation function. As in previous example, we are going to define a function that returns the residual and the Jacobian. The Jacobian for the exponentical decay function defined above is given by the matrix:

$$J = \begin{bmatrix}
-e^{-kx_1} & A x_1 e^{-kx_1} & -1 \\
-e^{-kx_2} & A x_2 e^{-kx_2} & -1 \\
\vdots & \vdots & \vdots \\
-e^{-kx_m} & A x_m e^{-kx_m} & -1
\end{bmatrix}
$$

Our evaluation function therefore takes the form:

In [18]:
ExpDecEval←{x y←⍺ ⋄ A k C←⍵ ⋄ xe←A×x×e←*-k×x ⋄ (y-⍵ ExpDec x)(⍉(-e)⍪xe⍪⍉⍪-=⍨e)}   ⍝ residual and jacobian

With the `Eval` function defined, we can now run the LMA solver. We will use again a vector of ones as initial guess. Let's first fit the parameters using the raw output of the `ExpDec` function store in `y0`.

In [19]:
RP←{⎕←RES ⍵ ⋄ ⎕←P ⍵}
RP x y0 ExpDecEval LMA 3⍴1

The algorithm converges in very few iterations, and the solution parameters are very close to the true parameters (`0.1 5 2`) used to generate `y0`. The cost is extremely low, as expected for a fit to clean data.

Now we attempt to find the parameters using the noisy data in `y`:

In [20]:
RP x y ExpDecEval LMA 3⍴1

When fitting the noisy data `y`, the algorithm still converges. The solution parameters are reasonably close to the true values, but the final cost is higher, reflecting the noise. The algorithm stopped because the relative change (`rel`) fell below `tolr`, but the cost could not be reduced below tolerance. We can try to refine the solution reducing the `tolr` value.

In [21]:
RP x y ExpDecEval LMA(3⍴1)(tolr:1e¯20)

Even with a much stricter `tolr`, the solution parameters and cost do not change significantly, indicating that we have found a good local minimum for this noisy dataset.

In [22]:
(x y ExpDecEval LMA p(toli:0 ⋄ verbose:1)).cost

By running LMA with the true parameters `p` and `toli:0` (0 iterations, so just one evaluation), the resulting cost shows the inherent sum of losses for a perfect model with noisy data. This is the theoretical minimum cost `LMA` could achieve for this noisy data with these true parameters. The algorithm actually yielded a slightly lower cost, indicating that we are *overfitting* (adjusting the parameters to the noise instead of real data).

#### Numerical estimation of Jacobian matrix

What if we do not know what is the Jacobian and only have a residual function available? In this case, the algorithm can still find a solution, using a numerical estimation of the Jacobian.

Let's consider the exponential decay function again. But, this time, we will not use the `ExpDecEval` function. Instead, we define an `ExpDecRes` function that only returns the residuals:

In [23]:
ExpDecRes←{x y←⍺ ⋄ y-⍵ ExpDec x}
RP x y ExpDecRes LMA(3⍴1)

Although it took a few more iterations, the results are very similar to those obtained with the analytical Jacobian. `LMA` automatically detected that `ExpDecRes` only returned residuals and used the `Jacobian` operator (with the default `pert` value `⎕CT*÷2`) to approximate the Jacobian.

Modifying the configuration parameter `pert`, it is possible to adjust the applied perturbation:

In [24]:
RP x y ExpDecRes LMA(3⍴1)(pert:⎕CT*÷2)   ⍝ default perturbation
RP x y ExpDecRes LMA(3⍴1)(pert:1e0)      ⍝ large perturbation, worse prediction makes convergence slower
RP x y ExpDecRes LMA(3⍴1)(pert:1e¯15)    ⍝ perturbation too small, inacurate result

Very large perturbations can degrade the Jacobian approximation, resulting in slow convergence, while extremely small ones (like `1E-15`) can lead to numerical inaccuracies and poor solutions due to precision limits in finite differences. The default perturbation provides a good balance in this case.

For an accurate result, the Jacobian calculated numerically should be close to the analytical Jacobian. However, the quality of the approximation can vary depending on the actual solution parameters being used, as we can see comparing the Jacobian calculated numerically with the analytical Jacobian defined earlier. We calculate a relative error for the initial and final solution parameters:

In [25]:
(⊃⌽x y ExpDecEval 3⍴1)(-÷⍥(+.×⍨,)⊣)x y∘ExpDecRes Jacobian 3⍴1  ⍝ initial relative error
(⊃⌽x y ExpDecEval p)(-÷⍥(+.×⍨,)⊣)x y∘ExpDecRes Jacobian p      ⍝ final relative error

The relative error between the analytical Jacobian and our numerical approximation (using the default perturbation) is very small, especially for the final solution parameters, confirming the reliability of the numerical estimation.

#### Callback function

We could be interested in monitoring some variable during the optimization process. For example, in the last example, we might want to know how the numerical approximation of the Jacobian is progressing. This can be achieved using the callback function. The `CallBack` function, defined in the configuration namespace, is run every iteration with a namespace argument like the one returned by `LMA` as result, allowing the user to perform a detailed diagnosis.

Here, we define a callback function that displays the relative error with respect to the analytical Jacobian every iteration, together with the cost, relative error, and last accepted guess for the solution parameters.

In [26]:
cfg←(tolr:1e¯6) ⋄ cfg.Eval←ExpDecRes ⋄ cfg.ExpDec←ExpDec
cfg.CallBack←{⎕←12 ¯5⍕⍵.(cost rel,p),⍨(⊃⌽#.(x y ExpDecEval p))(-÷⍥(+.×⍨,)⊣)#.(x 0∘ExpDecRes Jacobian)⍵.p}
⎕←∊⍕10↑¨'errJ' 'cost' 'rel' 'A' 'k' 'C' ⋄ x y(cfg LMA)3⍴1

### 3. Fitting of a physical model with experimental data

At continuation, we are going to tackle a problem closer to a real life example. We will see how to fit the parameters of a Gaussian function:

$$f(x) = c + a \cdot \exp\left(-\frac{(x-m)^2}{2s^2}\right)$$

Gaussian peaks are arguably one of the most frequently encountered shapes in scientific and engineering data (spectroscopy, chromatography, signal processing, statistical distributions, etc.). Its symmetric, peak-like structure behaves differently from the monotonic exponential decay function in previous example.

For the fitting, we are going to produce virtual experimental data. First, we generate clean output data using a predefined set of parameters. Next, some small noise is added to the data, using the same `Noise` function from previous example. Finally, we create a few outliers applying a much larger perturbation, again using the `Noise` function, but with a larger value as left argument and applying it only to a few random points.

In [27]:
⎕RL←1 ⋄ X←{m s←⍺ ⋄ *-(×⍨⍵-m)÷(2××⍨s)}
Gauss←{a s m c←⍺ ⋄ c+a×m s X ⍵}                                ⍝ gauss function
y0←(p←5 2 1.5 0.1)Gauss x←0.25×(⍳20)-1                         ⍝ true data
y←(rl←1 Noise@(5?≢x)=⍨x)×y1←(rs←5e¯2 Noise=⍨x)×y0              ⍝ random noise (±5%) and 5 outliers (±100%)
'x' 'y0' '∆small' 'y1' '∆large' 'y'⍪⍉⍪⍪¨x y0(1-rs)y1(1-rl)y

We will fit the parameters using a numerical approximation of the Jacobian (we will also use the analytical one later):

In [28]:
GaussRes←{x y←⍺ ⋄ y-⍵ Gauss x}
RP x y1 GaussRes LMA(4⍴1)(verbose:0)
RP x y GaussRes LMA(4⍴1)(verbose:0)

The first run on the data with small noise (`y0`) converges well, with the solution parameters reasonably close to the true values `5 2 1.5 0.1`. However, the second run on the data with significant outliers (`y`) shows that the L2 loss function is heavily influenced by these outliers, and the resulting parameters are poor, with a high final cost.

#### Weighted least-squares fitting

If we know beforehand what is the quality of each of our datapoints--for example, because the experimental device that we use give us data about the quality of a measurement--we can use this information to guide the fitting process. Assigning a different weight to each of the data points, we can perform what is called a *weighted least-squares* (WLS) fitting.

In [29]:
RP x y1 GaussRes LMA(4⍴1)(scale:s1←÷1+|y1-y0)
RP x y GaussRes LMA(4⍴1)(scale:s←÷1+|y-y0)

By applying weights (here, inversely proportional to the deviation from the true data `y0`), the WLS fit gives more importance to data points believed to be more reliable. For the `y1` data, the parameters are similar to the unweighted L2 fit. However, for the `y` data with outliers, WLS can improve the fit if the weights correctly down-weight the outlier points.

#### Robust loss functions

Data about the quality of a measurement is not always available. It is often the case that experimental data present noise and outliers, but we do not know in advance how to weight our data points to get an accurate result. In this case, the use of *robust loss functions* can help to find a better solution.

We are going to repeat the fitting of the Gaussian function parameters using different robust functions:

In [30]:
⊢r←x y1 GaussRes LMA(4⍴1)
⊢r,←x y1 GaussRes LMA(4⍴1)(scale:s1)        ⍝ wls
⊢r,←x y1 GaussRes LMA(4⍴1)(loss:'Huber')
⊢r,←x y1 GaussRes LMA(4⍴1)(loss:'Cauchy')
⊢r,←x y1 GaussRes LMA(4⍴1)(loss:'SoftL1')
⊢r,←x y1 GaussRes LMA(4⍴1)(loss:'Tukey')
⊢r,←x y1 GaussRes LMA(4⍴1)(loss:'Welsh')
⊢r,←x y1 GaussRes LMA(4⍴1)(loss:'Fair')
⊢r,←x y1 GaussRes LMA(4⍴1)(loss:'Arctan')

With the exception of Tukey's loss function, all of them converge with a relatively low cost. Tukey's function usually requires a careful adjustment of the scaling factor.

In [31]:
⊢r[5]←x y1 GaussRes LMA(4⍴1)(loss:'Tukey' ⋄ scale:0.1)

To compare all loss functions, we can calculate the sum of squared residuals corresponding to the solution parameters obtained using each of the loss functions.

In [32]:
rl2←(x y1(GaussRes LMA)(toli:0),⍨⊂)¨r.p
'Robust' 'L2'⍪⍉⍪(12 ¯5⍕⍪)¨r.cost rl2.cost

The `Robust` column shows the sum of the respective loss values, which are not directly comparable across different loss types. However, in the `L2` column, we see that when fitting the data with only small noise, most robust loss functions yield a final sum of squared residuals very similar to that obtained by the direct L2 fit. This demonstrates that robust loss functions do not significantly penalize performance on clean data.

In [33]:
r←x y GaussRes LMA(4⍴1)
r,←x y GaussRes LMA(4⍴1)(scale:s)         ⍝ wls
r,←x y GaussRes LMA(4⍴1)(loss:'Huber')
r,←x y GaussRes LMA(4⍴1)(loss:'Cauchy')
r,←x y GaussRes LMA(4⍴1)(loss:'SoftL1')
r,←x y GaussRes LMA(4⍴1)(loss:'Tukey')
r,←x y GaussRes LMA(4⍴1)(loss:'Welsh')
r,←x y GaussRes LMA(4⍴1)(loss:'Fair')
r,←x y GaussRes LMA(4⍴1)(loss:'Arctan')
rl2←(x y(GaussRes LMA)(toli:0),⍨⊂)¨r.p
'Robust' 'L2'⍪⍉⍪(12 ¯5⍕⍪)¨r.cost rl2.cost

When fitting the data containing significant outliers, the benefit of robust loss functions becomes clear. While the L2 fit is heavily skewed (high L2 cost), functions like Huber, Cauchy, SoftL1, Fair, and Arctan achieve a much lower sum of squared residuals by down-weighting the outliers. Their own cost is also minimized effectively. Tukey and Welsh, being redescending, can be very effective but may require careful tuning of their scale parameter, or more iterations, to converge optimally, especially if the initial scale (auto-estimated or default) is not ideal for the outlier distribution.

#### `LM` operator

The `LMA` operator provides a convenient interface for the user. However, the core engine can also be directly accessed using the `LM` operator (which `LMA` uses internally).

Although the interface for `LM` is more "arcane", it offers the same flexibility as `LMA`. The evaluation function must be the left operand. The callback function (which in this case takes a vector instead of namespace) is specified as the right operand. All configuration parameters need to be passed as left argument, and the right argument must be a two elements vector with the initial parameters and normalized damping factor.

The evaluation function may return either only the residual, the residual and the Jacobian, or also loss and weight values if it is desired to use a non-standard loss function. Let's see these three cases for the Gaussian function. We will need the analytical Jacobian given by:

$$
J_i = \begin{bmatrix}
-e^{-\frac{(x_i-m)^2}{2s^2}} & 
-a \cdot e^{-\frac{(x_i-m)^2}{2s^2}} \cdot \frac{(x_i-m)^2}{s^3} & 
-a \cdot e^{-\frac{(x_i-m)^2}{2s^2}} \cdot \frac{(x_i-m)}{s^2} & 
-1
\end{bmatrix}
$$

In [34]:
GaussJac←{a s m c←⍺ ⋄ x2←(x1←(xe←-m s X ⍵)×a×(⍵-m)÷×⍨s)×(⍵-m)÷s ⋄ ⍉↑(xe)x2(x1)(-=⍨xe)}
(p GaussJac x)(-÷⍥(+.×⍨,)⊣)x y∘GaussRes Jacobian p   ⍝ check error with respect to numerical estimation

We can now define three different evaluation functions (notice that they must be monadic functions, so we bind the left argument with `∘`):

In [35]:
GaussEval1←x y∘GaussRes                         ⍝ residual only
GaussEval2←x y∘{(⍺ GaussRes ⍵)(⍵ GaussJac⊃⍺)}   ⍝ residual and jacobian
GaussEval3←(⊢,Loss.(fair∘Fair)⍤⊃)GaussEval2     ⍝ residual, jacobian, and fair loss function

Finally, we run the algorithm using the default configuration parameters of `LMA`, and a `CallBack` function with a functionality similar to `verbose:1`.

In [36]:
cfg←1e3 ⎕CT ⎕CT 1e¯2 1e¯2 5(÷5)(÷⎕CT)⎕CT   ⍝ ti tc tr tg d0 di dd dx dn
Verbose←{⎕←(3 0⍕⊃⍵),12 9∘⍕¨1↓⍵}
cfg GaussEval1 LM Verbose(4⍴1)1
cfg GaussEval2 LM Verbose(4⍴1)1
cfg GaussEval3 LM Verbose(4⍴1)1