# Linear Programming and Resource Allocation
In this module, we’ll introduce the core concepts of linear programming, and apply these concepts to a classic class of problems, resource allocation challenges such as production planning, consumer choice, and network flow.

By the end of this module, you will be able to define and demonstrate mastery of the following key concepts:

* __Primal and Dual Linear Programming__: In the _primal problem_, we directly choose how much of an activity (e.g. production levels) to do in to optimize a cost, profit, satisfaction while respecting resource limits. In the _dual problem_, you instead assign a _shadow price_ to each resource constraint—choosing those prices so that, if every activity were valued at its resource cost, the total value of your resources is maximized or minimized.
* __Karush–Kuhn–Tucker (KKT) Conditions__: The KKT conditions for a linear program say, intuitively, that at the optimum you must satisfy all your original constraints, assign _shadow prices_ that respect the cost structure, and have any slack in a constraint paired with a zero price (and vice versa). We care about these because they give a foolproof certificate of optimality and directly drive the pivot rules in simplex and the search directions in interior-point methods.


By mastering these methods, you’ll be equipped to tackle a wide variety of real-world optimization challenges efficiently and effectively. Let’s dive in!

___

## Primal Linear Programming Problems
Suppose you have a _linear_ objective function $O:\mathbb{R}^{n}\to\mathbb{R}$ of the continuous decision variable vector $\mathbf{x}\in\mathbb{R}^{n}$ whose values are constrained by a system of $m$ linear equations. To calculate an optimal value of the decision variable vector $\mathbf{x}$, we can formulate the problem as a _primal linear programming_ problem:
$$
\begin{align*}
\text{maximize} &\, \sum_{i=1}^{n} c_{i}\cdot{x}_{i}\\
\text{subject to}~\mathbf{A}\cdot\mathbf{x} &\leq \mathbf{b}\quad\mathbf{A}\in\mathbb{R}^{m\times{n}}\,\text{and}\,\mathbf{b}\in\mathbb{R}^{m}\\
~x_{i}&\geq {0}\qquad{i=1,2,\dots,n}
\end{align*}
$$
where $c_{i}\in\mathbb{R}$ are the coefficients of the objective function, $x_{i}\in\mathbb{R}$ are the decision variables, and $\mathbf{A}$ and $\mathbf{b}$ are the constraint matrix and right-hand side vector, respectively. The goal is to maximize the objective function while satisfying the constraints.

Let's look at a few simple example problems to illustrate the primal linear programming formulation.

### Consumer Choice Problems
Suppose you are a consumer with a set of products you can purchase, and you want to maximize your utility (satisfaction) from these products while staying within your budget. This is a classic example of a resource, i.e., money allocation task that can be formulated as a primal linear programming problem.
* __Formulation__: Let the set of $n$ products be represented as a collection of objects $X = \left\{x_{i}\right\}_{i=1}^{n}$, where each object $x_{i}$ has a utility score $u_{i}$ and a cost $c_{i}$ per unit object. The consumer has a budget $I$ that they can allocate among these products. Let $x_{i}$ be the number of units of product $i$ that the consumer purchases, the total cost of the products they purchase is given by the expression $\sum_{i=1}^{n} c_{i}\cdot{x}_{i}$.
Finally, the consumer's utility function is defined as a linear combination of the utility scores of the products they purchase: $U\left(x_{1},\dots,x_{n}\right) = \sum_{i=1}^{n} u_{i}\cdot{x}_{i}$.

Putting all this together, we can formulate the consumer choice problem as the _primal_ linear program:
$$
\begin{align*}
\text{maximize} &\, \sum_{i=1}^{n} u_{i}\cdot{x}_{i} \\
\text{subject to}~\sum_{i=1}^{n} c_{i}\cdot{x}_{i}& \leq I\quad\text{shortcut notation}:\,\left<\mathbf{c},\mathbf{x}\right>\leq{I}\\
~x_{i}&\geq{0}\qquad{i=1,2,\dots,n}
\end{align*}
$$
The optimal solution to this problem (if it exists) will give the consumer the optimal number of units of each product to purchase in order to maximize their utility while staying within their budget. In a simlar way, we can formulate other resource allocation problems such as production planning, transportation, and network flow as primal linear programming problems.

## Dual Linear Programming Problems
Having defined primal linear programs, we now turn to their duals—alternative formulations that offer a different viewpoint on the same optimization. You can think of it as viewing the primal through a different lens.

If the _primal problem_ has the form:
$$
\begin{align*}
\text{maximize} &\, \sum_{i=1}^{n} c_{i}\cdot{x}_{i}\\
\text{subject to}~\sum_{i=1}^{n} A_{i,j}\cdot{x}_{i} &\leq b_{j}\quad j=1,2,\dots,m\\
~x_{i}&\geq {0}\qquad{i=1,2,\dots,n}
\end{align*}
$$
then the _dual problem_ has the form:
$$
\begin{aligned}
\text{minimize}\quad & \sum_{j=1}^{m} b_{j}\,y_{j}\\
\text{subject to}\quad & \sum_{j=1}^{m} A_{j,i}\,y_{j}\;\ge\;c_{i}
\quad&&i=1,2,\dots,n,\\
&y_{j}\;\ge\;0
\quad&&j=1,2,\dots,m.
\end{aligned}
$$

### What has changed?
There are several key differences between the primal and dual linear programming problems:
1. The objective function flips (maximum ⇒ minimum or minimum ⇒ maximum).  
2. Primal objective coefficients $c_i$ become the dual right-hand side constants.  
3. Primal right-hand side constants $b_j$ become the dual objective coefficients.  
4. The $m\times n$ constraint matrix $A$ is transposed in the dual (so $A^\top$ appears).  
5. The number of variables and constraints swap:  the primal has $n$ variables, $m$ constraints, and the dual has $m$ variables and $n$ constraints.
6. Each primal constraint $a_j^\top x \le b_j$ gives a dual variable $y_j$.  Each primal variable $x_i$ gives a dual constraint $(A^\top y)_i \ge c_i$.  
7. Inequality directions and sign restrictions invert for the constraints: A $\le$ constraint in the primal gives rise to a $\ge$ constraint in the dual (and vice versa).   
8. Equality constraints in the primal become free variables in the dual, i.e., $a_j^T x \;\;=\; b_j$ gives rise to a dual variable $y_j$ that is free (no sign restriction), while a dual constraint $A^\top y \;\;\ge\; c$ gives rise to a primal variable $x_i$ that is free.

Finally, the solutions of the primal and dual problems are related by the concept of __duality__. For a primal problem:  $\max\{\,c^T x : A x \le b,\;x\ge0\}$  and its corresponding dual problem :$\min\{\,b^T y : A^T y \ge c,\;y\ge0\}$ the solutions are related:
* __Weak duality__:  For any primal feasible $x$ and dual feasible $y$, we have $c^T x \le b^T y$. Thus, the primal optimum is always bounded above by the dual optimum. The difference between the two is called the _duality gap_.
* __Strong duality__:  If both primal and dual are feasible and have finite optimal values, then $\max\{\,c^T x \} = \min\{\,b^T y\}$, i.e., the _duality gap is zero_. This means that the optimal values of the primal and dual problems are equal.

___

## Lagrange Multipliers?
Normally, when faced with a constrained optimization problem, our first thought would be to use the Lagrange multiplier method. We formulate the Lagrangian function by incorporating the constraints into the objective function using Lagrange multipliers. Then  we compute the gradient of the Lagrangian function and set it to zero to find the optimal solution. Let's apply this method to our linear programming problem. Suppose we have a linear program in the form:
$$
\begin{align*}
\text{maximize} &\, \sum_{i=1}^{n} c_{i}\cdot{x}_{i}\\
\text{subject to}~ \mathbf{A}\cdot\mathbf{x} + \mathbf{s} &= \mathbf{b}\quad\text{where}\,\mathbf{A}\in\mathbb{R}^{m\times n}\,\text{and}\,\mathbf{b}\in\mathbb{R}^{m}\\
x_{i} &\geq 0\quad{i=1,2,\dots,n}\\
s_{j} &\geq 0\quad{j=1,2,\dots,m}
\end{align*}$$
where $\mathbf{s} = (s_1, s_2, \ldots, s_m) \in \mathbb{R}^{m}$ are the _slack variables_ that convert the inequalities into equalities.
* _What?_ Working with inequality constraints can be tricky, especially when applying the Lagrange multiplier method, which is typically designed for equality constraints. However, we can convert the inequality constraints into equality constraints by introducing _slack variables_ $s_{j}$ for each of the $m$ constraints. Cool! What are slack variables?
* _Slack variables?_: The slack variable $0\leq{s}_{i} \equiv b_{i} - a^{\top}_{i}\cdot \mathbf{x}$ are non-negative variables that represent the difference between the left-hand side and right-hand side of the inequality constraints. They allow us to convert the inequalities into equalities, i.e., $\mathbf{a}^{\top}_{i}\cdot \mathbf{x} + s_{i} = b_{i}$ for each constraint $i$. This transformation is crucial because the Lagrange multiplier method requires equality constraints to define the Lagrangian function.

Now, we can write the Lagrangian function $\mathcal{L}(\mathbf{x}, \mathbf{s}, \boldsymbol{\lambda})$ as:
$$
\begin{align*}
\mathcal{L}(\mathbf{x}, \mathbf{s}, \boldsymbol{\lambda}) &= \mathbf{c}^{\top}\mathbf{x} - \boldsymbol{\lambda}^{\top}(\mathbf{A}\cdot\mathbf{x} + \mathbf{s} - \mathbf{b})\\
\end{align*}
$$
where $\boldsymbol{\lambda} \in \mathbb{R}^{m}$ are the Lagrange multipliers associated with the equality constraints. To compute the first-order optimality conditions, we take the gradient of the Lagrangian with respect to $\mathbf{x}$, $\mathbf{s}$ and $\boldsymbol{\lambda}$ and set it to zero:
$$
\begin{align*}
\nabla_{\mathbf{s}}\mathcal{L}(\mathbf{x}, \mathbf{s}, \boldsymbol{\lambda}) &= -\boldsymbol{\lambda}^{\top} = 0\quad\implies\boldsymbol{\lambda} = 0\quad\text{This is a problem!}\\
\nabla_{\mathbf{x}}\mathcal{L}(\mathbf{x}, \mathbf{s}, \boldsymbol{\lambda}) &= \mathbf{c} - \mathbf{A}^{\top}\boldsymbol{\lambda} = 0\implies\boldsymbol{c} = 0\quad\text{This is an even bigger problem!}\\
\nabla_{\boldsymbol{\lambda}}\mathcal{L}(\mathbf{x}, \mathbf{s}, \boldsymbol{\lambda}) &= \mathbf{A}\cdot\mathbf{x} + \mathbf{s} - \mathbf{b} = 0
\end{align*}
$$
From the second equation, the Lagrange multipliers are zero for all constraints which then requires $\mathbf{c}= 0$, which is not generally true! 

__Hmmmm__. The Lagrange multiplier method doesn't work with linear programming problems. We need the _Karush-Kuhn-Tucker (KKT) conditions_ to handle the inequality constraints properly!

___

## Karush–Kuhn–Tucker (KKT) Conditions
The Karush–Kuhn–Tucker (KKT) conditions extend the Lagrange multiplier framework to problems with inequality and non-negativity constraints. In convex settings—LPs in particular—these conditions are necessary and sufficient for optimality.

To construct the KKT conditions for a linear program, we need to understand four key concepts:

* __Stationarity__ At the optimum, the competing influences of improving the objective and enforcing the constraints balance out for each decision variable, so that no infinitesimal change in any variable can increase the Lagrangian.
* __Primal feasibility__ The candidate solution must satisfy every original model requirement—every equality condition must hold exactly, and every inequality or non-negativity restriction must be respected.
* __Dual feasibility__ All multipliers that penalize inequality constraints must be non-negative, ensuring that they only oppose constraint violations rather than “reward” them.
* __Complementary slackness__ Every inequality constraint is either exactly tight (active)—in which case its multiplier may be positive—or else it is slack (not binding), in which case its multiplier is forced to zero, so that only active constraints influence the solution.

Let's start by rewriting the linear program in standard form, where we introduce slack variables $s\ge0$ for $A\,x\le b$, the multipliers $\lambda,\mu,\nu$ which are all constrained to be nonnegative, and flip the maximization problem to a minimization problem:
$$
\begin{aligned}
&\text{Primal LP:}\quad\min_{x,s}\;c^\top x\quad\text{s.t.}\quad
A\,x + s = b,\;x \ge 0,\;s \ge 0.\\
&\text{Lagrange multipliers:}\quad
\lambda\in\mathbb R^n,\;\mu\in\mathbb R^m,\;\nu\in\mathbb R^n,
\quad\mu,\nu\ge0.\\
&\boxed{\displaystyle
\mathcal{L}(x,s,\lambda,\mu,\nu)
= c^\top x 
\;-\;\lambda^\top\bigl(A\,x + s - b\bigr)
\;-\;\mu^\top x
\;-\;\nu^\top s}
\end{aligned}
$$
where:
* $\lambda_j$ enforces the equality $A_j x + s_j=b_j$ (originally $A_j x\le b_j$),
* $\mu_i$ enforces $x_i\ge0$,
* $\nu_j$ enforces $s_j\ge0$.

Now, we compute the gradient of the Lagrangian function with respect to the _primal variables_ $(x,s, \lambda)$, enforce the complementary slackness conditions for the _dual variables_ $(\mu,\nu)$:
$$
\begin{aligned}
&\nabla_x\mathcal{L}(x,s,\lambda,\mu,\nu) = c - A^\top\lambda - \mu = 0,\\
&\nabla_s\mathcal{L}(x,s,\lambda,\mu,\nu) = -\lambda - \nu = 0,\\
&\nabla_\lambda\mathcal{L}(x,s,\lambda,\mu,\nu) = A\,x + s - b = 0,\\
&\mu_i x_i = 0,\quad\forall i=1,2,\dots,m,\\
&\nu_j s_j = 0,\quad\forall j=1,2,\dots,n\\
&\mu_i \ge 0,\quad\forall i=1,2,\dots,m,\\
&\nu_j \ge 0,\quad\forall j=1,2,\dots,n.
\end{aligned}
$$

Great! We have the KKT conditions for the primal linear program. These conditions are necessary and sufficient for optimality in convex optimization problems, including linear programming. However, what is the actionable algorithm that we develop from these conditions?

Let's consider two classes of algorithms based on the KKT conditions: the Simplex algorithm and the Interior Point methods.

___

## Revised Simplex Algorithm
The original Simplex Method, invented by George Dantzig in 1947, is an iterative algorithm that solves a linear program by moving around the collection of corner points of the feasible _polytope_, improving the objective at each step until no further gain is possible. 
* _Myth or fact?_ As a graduate student in 1939, Dantzig once arrived late to a statistics lecture, mistook two well-known unsolved problems on the blackboard for homework, and solved them over the next few days, before realizing they were open research questions! That story is true, but it happened years before he developed the simplex method and did not directly inspire the algorithm. But still, fun story - being late is now always bad!

__Simplex is a big deal__: Before simplex, LPs were mostly a theoretical curiosity; afterwards, they became essential tools, radically improving resource allocation and strategic planning in the second half of the twentieth century. Over seventy years later, the simplex method remains widely used in commercial optimization. This is despite some not-so-great worst-case performance bounds!

### Algorithm
We are going to focus on the _revised simplex algorithm_, developed by Dantzig and coworkers in the 1950s at the RAND Corporation, which is a more efficient version of the original method. 

The __key idea__ behind the revised simplex algorithm is to partition the decision variables into a _basic set_ and a _non-basic set_. Then, we iteratively add and subtract variables from these sets and estimate their values, iteratively improving the objective function until we reach an optimal solution.
* A _basic variable_ is one you’re allowing to _turn on_ i.e., take a non-zero value. In the simplex method, you swap which variables are on or off to move from one corner of the feasible region to the next.
* A _nonbasic variable_ is one you keep _off_ (held at zero) — think of non-basic variables as benchwarmers not in play. When it looks promising, you swap it in (make it basic) to move to a potentially better solution.
* _How are the basic (and non-basic) variables related to corners?_ At any corner of the feasible polytope, exactly $m$ linearly independent variables are _turned on_ (basic set) so they solve the $m$ active equality constraints, and all other variables (non-basic set) are zero. Choosing which $m$ variables are basic (and solving for them) picks out one specific corner of the polytope, while every non-basic variable being zero defines the edges that meet at that corner.

Let's sketch out the revised simplex algorithm:

__Initialization__: Given a linear program of the form: $\min\left\{c^\top x\mid Ax+s = {b},\;x\ge0,\; s\ge0,\;b\ge0\right\}$ where $A\in\mathbb{R}^{m\times{n}}$, $b\in\mathbb{R}^{m}$, and $c\in\mathbb{R}^{n}$, we want to find the optimal solution $x^{\star}$ that minimizes the objective function $c^\top x$ subject to the constraints defined by $Ax + s = b$ and the non-negativity conditions on $x$ and $s$.

Let $z = \left(x,s\right)\in\mathbb{R}^{n+m}$. Define the initial Basic set $B=\left\{s_{1},s_{2},\dots,s_{m}\right\}$, and the Non-Basic set $N=\left\{x_{1},x_{2},\dots,x_{n}\right\}$, where $x^{(0)}=0$ and $s^{(0)}=b$. Set $\lambda\gets\mathbf{0}$ and $\mu\gets\mathbf{c}$. Set $\texttt{converged}\gets\texttt{false}$, the iteration counter $t\gets{0}$, and the maximum number of iterations $T$.

While not $\texttt{converged}$ __do__:
1. __Optimality test__. Compute the reduced cost $\mu_{i} = c_{i} - \lambda^{\top}A_{i}$ for each non-basic variable $i \in N$. This condition comes directly from the $\nabla_x\mathcal{L}(x,s,\lambda,\mu,\nu) = 0$ expression in the KKT conditions.
    - If $\mu_{i} \geq 0$ for all $i \in N$ then set $\texttt{converged}\gets\texttt{true}$ and return the current solution $z^{(t)}$.
    - If $\mu_{i} < 0$ for any $i\in{N}$, the current solution is __not optimal__; there is a non-basic variable that, if _turned on_, will strictly decrease the objective.
2. __Direction and ratio test__. Select $e \gets \arg\min_{i \in N} \mu_{i}$. This variable $e$ will enter the basis (i.e., be turned on) to improve the objective. Compute the direction $\mathbf{d} = A^{-1}_{B}A_{e}$, where $A_{e}$ is the column of the constraint matrix corresponding to the variable $e$, and $A_{B}$ is the submatrix of $A$ formed by the basic variables.
    - If $\left\{j \mid d_{j} > 0\right\} = \emptyset$: the current solution is __unbounded__. Exit the algorithm with an __error__.
    - Compute the step size $\alpha = \min\left\{\frac{(z_B)_{j}}{d_{j}}\mid d_{j} > 0\right\}$, where $(z_B)_j$ is the jth element of the basic variable vector.
    - Compute the index of the variable that will _leave_ the basic set: $l = \arg\min\left\{\frac{(z_B)_{j}}{d_{j}}\mid d_{j} > 0\right\}$.
3. __Pivot and update__: Update the basic and non-basic sets: $B \gets \left(B \setminus \{B_{l}\}\right)\cup \{e\}$ and $N \gets \left(N \setminus \{e\}\right) \cup \{B_l\}$. This means we swap the entering variable $e$ into the basic set and the leaving variable $B_l$ into the non-basic set.
    - Set $z_{e} \gets \alpha$ and update the solution vector $z_{B}^{(t+1)} \gets z_{B}^{(t)} - \alpha \cdot\mathbf{d}$.
    - Update the multipliers $\lambda\gets\left(A^{\top}_{B}\right)^{-1}c_{B}$, and $\mu_{i}\gets{c}_{i} - A^{\top}_{i}\lambda$ for all $i\in{N}$, and the iteration counter $t \gets t + 1$.
4. __Check convergence__: If $t \geq T$, set $\texttt{converged}\gets\texttt{true}$ and return the current solution $z^{(t)}$. Exit the algorithm with an __error__ if the maximum number of iterations is reached without convergence. Otherwise, loop back to step 1.

Wow! That seems intense. How efficient is the simplex algorithm? 
* In the __worst case__, the simplex method can take _exponential time_ in the number of variables—Klee and Minty’s 1972 example shows it may visit all $2^n$ vertices of an $n$-dimensional cube, forcing on the order of $2^n$ pivots. Thus, it has $O(2^n)$ worst-case complexity.
* However, __in practice__, the simplex method is often very efficient. It performs well on most real-world problems, and its average-case performance is polynomial time for many practical instances. The worst-case exponential bound is rarely encountered in practice, as most LPs have a structure that allows the simplex method to converge quickly.

Next, let's examine the second class of algorithms based on the KKT conditions: Interior Point methods.

___

## Interior-Point Methods
Interior-point methods solve a linear program by traversing the _interior_ of the feasible region—avoiding the corners until the very end—using a _barrier function_ to enforce the inequalities. They trace a path defined by modified KKT conditions (with logarithmic penalties) and converge to the optimum in a small, predictable number of steps.
* _Barrier function?_ An invisible wall that shoots to infinity as you approach any constraint boundary, repelling your iterates and keeping them safely in the interior while still guiding you toward the optimum. Gradually lowering its weight lets the solution creep closer to the boundary without crossing it.
* _Logarithmic penalties?_ An example barrier function. Logarithmic penalties are terms of the form $-\mu\ln(s)$ added to your objective, because $\ln(s)\to-\infty$ as $s\to0^+$, they impose a _huge cost_ for getting too close to a constraint boundary.  As you decrease the weight $\mu$, you gradually lessen the penalty’s influence, letting the solution drift closer to the true feasible region edge.

__The big idea__: Interior-point methods navigate the feasible region's interior using a smooth barrier that dominates at edges, taking Newton-style steps along a central path to the optimum. Unlike the simplex method, which moves along the boundary and zig-zags across facets, these algorithms follow a direct, well-conditioned route through the interior, offering more predictable performance on large or ill-conditioned problems.


### Algorithm
Let's sketch out an interior point algorithm to solve an LP.

__Initialization__: Given a linear program of the form: $\min\left\{c^\top x\mid Ax+s = {b},\;x\ge0,\; s\ge0,\;b\ge0\right\}$ where $A\in\mathbb{R}^{n\times{m}}$. Specify an _initial strictly feasible guess_ for ($x^{(0)}, s^{(0)}, \lambda^{(0)}, \nu^{(0)})$ (more on this later). Specify a tolerance $\epsilon>0$, a maximum number of iterations $T$, an iteration counter $t\gets{0}$, set $\texttt{converged}\gets\texttt{false}$ and choose a _reduction factor_ $\sigma\in(0,1)$. Finally, compute $\mu^{(0)}$:
$$
\begin{align*}
\mu^{(0)} \gets \frac{x^{(0)^{\top}}\nu^{(0)} + s^{(0)^{\top}}\lambda^{(0)}}{n+m}
\end{align*}
$$

While not $\texttt{converged}$ __do__:
1. Compute the four residuals ($r_{P}, r_{D}, r^{x}_{C}, r^{s}_{C}$):
     - _Primal residual_: $r_{P}\gets{Ax^{(t)} +s^{(t)} - b}$. The primal residual $r_{P}$ shows how much your guess $(x,s)$ violates constraints. Each entry of $r_P$ is the gap between your decision, slack variables, and the right-hand side $b$; zero indicates your solution is feasible.
     - _Dual residual_: $r_D \gets A^{\top}\lambda^{(t)}+\nu^{(t)} - c$. The dual residual $r_{D}$ indicates how much your dual variables $(\lambda,\nu)$ deviate from the stationarity condition of the Lagrangian. A nonzero $r_D$ shows you need to adjust the multipliers or primal $x$ to move toward optimality.
     - _Complementarity residual_: $r^{x}_{C}\gets X^{(t)}\nu^{(t)} - \mu^{(t)}\mathbf{1}$, where $X = \text{diag}(x)$. The complementarity residual $r^{x}_{C}$ measures deviation from the ideal central product $x_j
u_j = \mu$ for each coordinate. When $r_C^x=0$, each $x_j$ and its dual partner $u_j$ satisfy the perturbed complementary-slackness condition, placing you on the central path between primal and dual feasibility.
     - _Slack‐complementarity residual_: $r^{s}_{C} \gets S^{(t)}\lambda^{(t)} - \mu^{(t)}\mathbf{1}$, where $S=\text{diag}(s)$. The slack-complementarity residual $r^{s}_{C}$ measures how far each slack variable $s_i$ and its dual multiplier $\lambda_i$ are from satisfying the central-path condition. When $r_C^s = 0$, each slack $s_{i}$ and multiplier $\lambda_{i}$ satisfy $s_{i}\lambda_{i} = \mu_{i}$.

2. Compute the Jacobian matrix $J$:
$$
     J =
     \begin{pmatrix}
       A & I & 0 & 0\\
       0 & 0 & A^\top & I\\
       \text{diag}(\nu^{(t)}) & 0 & 0 & \text{diag}(x^{(t)})\\
       0 & \text{diag}(\lambda^{(t)}) & \text{diag}(s^{(t)}) & 0
     \end{pmatrix}.
   $$
3. Take a _Newton step_ solve for the updates: $\Delta x, \Delta s, \Delta \lambda, \Delta \nu$:
$$
     J\,
     \begin{pmatrix}
       \Delta x\\[3pt]\Delta s\\[3pt]\Delta\lambda\\[3pt]\Delta\nu
     \end{pmatrix}
     = -\,
     \begin{pmatrix}
       r_P\\r_D\\r_C^x\\r_C^s
     \end{pmatrix}.
   $$

4. Choose the step size $\alpha$: Choose the largest $\alpha\in(0,1]$ such that:
   $$
     x^{(t)} + \alpha\,\Delta x > 0,\quad
     s^{(t)} + \alpha\,\Delta s > 0,\quad
     \lambda^{(t)} + \alpha\,\Delta\lambda > 0,\quad
     \nu^{(t)} + \alpha\,\Delta\nu > 0.
   $$
5. Update the system solution ($x,s,\lambda,\nu,\mu$):
 - Set $x^{(t+1)} \gets x^{(t)} + \alpha\,\Delta x$.
 - Set $s^{(t+1)} \gets s^{(t)} + \alpha\,\Delta s$.
 - Set $\lambda^{(t+1)} \gets \lambda^{(t)} + \alpha\,\Delta\lambda$
 - Set $\nu^{(t+1)} \gets \nu^{(t)} + \alpha\,\Delta\nu$
 - Set $\mu^{(t+1)} \gets \sigma\,\mu^{(t)}$

6. Check for convergence. Update the iteration counter $t\gets{t+1}$.
   - If $\mu^{(k)}$ (or $\|F\| \leq \epsilon$), then $(x^{(k)},s^{(k)})$ approximates the true optimum. Set $\texttt{converged}\gets\texttt{true}$, return $(x^{(t)},s^{(t)})$. Here, $F$ is the stacked residual vector.
   - If $t>T$, we've run out of iterations. Set $\texttt{converged}\gets\texttt{true}$, return $(x^{(t)},s^{(t)})$, which is the solution we have so far.

**Relation to KKT**: At each $\mu>0$, the algorithm solves the **perturbed** KKT system
$\;r_P=0,\;r_D=0,\;r_C^x=0,\;r_C^s=0$
which approaches the **exact** (unperturbed) KKT conditions as $\mu\to0$

___