# Optimization -  Duality

> LP Duality, Convex Duality, Weak and Strong Duality, Complementary Slackness, Farkas' Lemma, Separation Arguments and Theorems of the Alternative

- hide: false
- toc: true
- badges: true
- comments: true
- categories: ['Optimization','Applied Mathematics','Proofs']
- image: images/lp-duality.png

# Introduction

Every linear program, and every optimization problem in general, has a closely related problem called its dual which can be colloquially thought of as its evil twin. The primal and the dual represent two different perspectives on the same problem. 

In the most general case, if the primal is a minimization problem, its  dual is a maximization problem. In the case of constrained optimization, if the primal is minimization in $n$ variables and $m$ constraints, its dual is a maximization in $m$ variables and $n$ constraints. 

Furthermore, any feasible solution of the dual problem is a lower bound of all the feasible solutions of the primal and, in particular, the primal optimal. This property is called *Weak Duality* and it's true by construction since one of the ideas behind duality is to obtain, at least, a useful lower bound to the primal optimal. 

In the case of linear programs and all other problems which exhibit *Strong Duality*, the primal and the dual optimal values are strictly equal. That is, solving the dual guarantees that we've also solved the primal. Furthermore, since taking the dual of the dual gives back the primal, this relationship is true in the converse — if we've solved the primal then we've also solved its dual.

This is what makes *Duality Theory* useful in practice. To have a related, possibly easier optimization problem gives us a huge computational advantage. However, even if the dual does not turn out to be easier to solve and/or its optimal value disagrees with the primal optimal, by the various theoretical ties between the two (such as Weak Duality) we still stand to learn something about the structure of the primal problem.

In this post, we will examine the nature of the relationship between the primal and its dual, and we will list the possible primal-dual outcomes. In doing so, we will look at duality in linear programs (*LP Duality)*, in convex optimization problems, and optimization problems in general. 

# Deriving the Dual of a Constrained Problem

At first, we focus our attention on deriving the dual problem of a constrained optimization problem. For problems of this sort, duality comes from the [Lagrangian relaxation](https://en.wikipedia.org/wiki/Lagrangian_relaxation) which augments the constrained problem by turning it into an unconstrained problem that, nevertheless, respects the constraints of the original. So, in a sense, it is the constraints of a problem that give rise to its dual... 

However, as we shall see later, certain types of unconstrained problems also have duals which arise from the [Fenchel-Legendre Transform](https://en.wikipedia.org/wiki/Convex_conjugate). 

Take the most general form of constrained optimization problem with $m$ inequality and $n$ equality constraints and assume nothing, as of yet, about its convexity. To make the discussion interesting, assume the problem is non-trivial (i.e. its constraint set is non-empty and contains more than one feasible point), and also that it's bounded with a finite optimum $p^*$.

$
\begin{cases}
\min_x: f_0(x)
\\
s.t.: \begin{aligned} &f_i(x) \leq 0 \ \ i = 1, ...,m
\\ 
&h_i(x) = 0 \ \ i = 1, ... ,p
\end{aligned}
\end{cases}
$

The idea is to penalize infeasible choices of $x$ using functions of $x$ that express our *displeasure* for certain choices. 

At first we use an *infinitely hard* penalty by introducing indicator functions $\mathbb{1}_-$ and $\mathbb{1}_0$. 

These are defined as follows: 

$\mathbb{1}_-(u) = 
\begin{cases}
\begin{align} 
&0  &\textrm{if} \  &u \leq 0
\\
&\infty  &\textrm{if} \  &u > 0
\end{align}
\end{cases}$

$\mathbb{1}_0(u) = 
\begin{cases}
\begin{align} 
&0  &\textrm{if} \  &u = 0
\\
&\infty  &\textrm{if} \  &u \ne 0
\end{align}
\end{cases}$

Then the equivalent unconstrained problem is: 

$\min_x: \mathcal{J}(x)$ 

Where $\mathcal{J}(x) = f_0(x) + \sum_{i=1}^m \mathbb{1}_-(f_i(x)) + \sum_{i=1}^p \mathbb{1}_0(h_i(x))$

We can also express $\mathcal{J}$ as 

$\mathcal{J}(x) = \begin{cases}\begin{align} 
&f_0(x) \ \ \textrm{if $x$ is feasible}
\\
&\infty \ \ \textrm{otherwise}
\end{align}\end{cases}$

We can easily see that the two minimization problems are equivalent. If $x$ is chosen s.t $f_i(x) > 0$ and/or $h_i(x) \ne 0$ for any $i$, the minimization incurs an infinitely positive penalty. Therefore, such an $x$ will never be selected over an optimal solution $x^*$ which gives the finite optimum $p^*$. 

Since the two are equivalent, it suffices to optimize this unconstrained problem instead of the original.

Unconstrained problems can be locally optimized by simply finding their stationary points using a first-order necessary condition, and evaluating the objective at each stationary point. The first-order necessary condition concerns the directional derivative, namely it attests that the latter should be zero at a local optimizer in all feasible directions. Since all points of an unconstrained problem are, effectively, interior points, this amounts to simply taking the gradient and setting it to zero. However, both $\mathbb{1}_-$ and $\mathbb{1}_0$ are discontinuous, and therefore also non-differentiable. To sidestep this difficulty, we use a linear relaxation instead of the infinitely hard penalty functions $\mathbb{1}_-$ and $\mathbb{1}_0$. 

## The Lagrangian,  The Dual Variables , and The Dual Function

The Lagrangian linear relaxation, sometimes simply referred to as the *Lagrangian*, is: 

$\mathcal{L}(x,\lambda,\mu) = f_0(x) + \sum_{i=1}^m \lambda_i f_i(x) + \sum_{i=1}^p \mu_i h_i(x)$

We call the $\lambda_i$'s the *Lagrange multipliers* corresponding to the inequality constraints, and the $\mu_i$'s the *Lagrange multipliers* corresponding to the equality constraints. The vectors $\lambda$ and $\mu$ are called the *Lagrange multiplier vectors* or, for reasons that will soon become apparent, the *dual variables*. 

Furthermore, for the same reasons we call $\lambda$ and $\mu$ the dual variables, $g(\lambda, \mu) = \min_x \mathcal{L}(x, \lambda, \mu)$ is called the *dual function*

## Weak Duality - A Lagrangian Lower Bound on the Optimal

The first thing to note about the Lagrangian is that $\lambda \geq 0$ is necessary. This is because, in the event that $f_i(x) > 0$ for some $i$, the corresponding $\lambda_i$ must be non-negative in order to apply a positive penalty for violating the $i$-th constraint as needed. This extends to all the entries of $\lambda \geq 0$, where the inequality is meant to be read coordinate-wise. 

On the other hand, $\mu$ is free to assume any value since the equality constraints can be violated in either direction, and both events must be penalized.

The second thing to note about the Lagrangian is that, even though do we apply a positive penalty that's linear in the the severity of the violation, this penalty is, nevertheless, not as severe as the infinite penalty we were applying before. Also, in the Lagrangian, we actually *reward* feasible choices of $x$ that have margin. That is, in the event that $f_i(x) < 0$ strictly for some $i$, $\lambda_if_i(x)$ is a non-positive reward for the minimization problem. 

So, $\forall$ feasible $x$, $\mathcal{L}(x,\lambda,\mu) \leq f_0(x) + \sum_{i=1}^m \mathbb{1}_-(f_i(x)) + \sum_{i=1}^p \mathbb{1}_0(h_i(x))$.

This is also evident from the fact that, for any fixed $i$, $\lambda_i f_i(x)$ is a lower bound of $\mathbb{1}_-(f_i(x))$ and $\mu_i h_i(x)$ is, likewise, a lower bound of $\mathbb{1}_0(h_i(x))$. 

But, in particular this means 

$g(\lambda, \mu) = \min_x \mathcal{L}(x,\lambda,\mu) \leq \min_x f_0(x) + \sum_{i=1}^m \mathbb{1}_-(f_i(x)) + \sum_{i=1}^p \mathbb{1}_0(h_i(x))$ $\forall$ feasible $x$

But recall that the expression on the right-hand side is the optimal value of the original problem. So the dual function value for any $\mu$ and $\lambda \geq 0$ is, as desired, a lower bound on the primal optimal value. 

That is $g(\lambda, \mu) \leq p^*$, which establishes Weak Duality and explains why we chose to call $g$ the *dual function*.

## The Lagrange Dual Problem

In our quest to find the primal optimal $p^*$ it's natural, at this point, to ask what the tightest lower bound $d^*$ on $p^*$ is. This amounts to finding the values $\lambda^*$, and $\mu^*$ which maximize $g(\lambda, \mu)$, giving rise to the *dual problem*.

$
\begin{cases}
\max_{\lambda, \mu}: g(\lambda, \mu)
\\
s.t.: \lambda \geq  0
\end{cases}
$

Now we see why the Lagrange multipliers are also referred to as the *dual variables*.

Furthermore, we know $g(\lambda, \mu)$ to be $\min_{x} \mathcal{L}(x, \lambda, \mu)$. So, the dual optimal $d^*$ can be expressed as: 

$$d^* = \max_{\lambda \geq 0, \mu} \left\{ \min_x \mathcal{L} (x, \lambda, \mu) \right\}$$ 

# Weak Duality - The Max-Min Characterization

As we saw above, the dual optimal can be expressed as

$d^* = \max_{\lambda \geq 0, \mu} \left\{ \min_x \mathcal{L} (x, \lambda, \mu) \right\} \tag{1}$

We will now see that the primal optimal can be similarly expressed as 

$p^* = \min_x \left\{ \max_{\lambda \geq 0, \mu} \mathcal{L} (x, \lambda, \mu) \right\} \tag{2}$

To see this, note that maximizing the Lagrangian over $\lambda \geq 0$ and $\mu$ recovers $\mathcal{J}(x)$ which, as we recall, was the associated unconstrained problem with an infinitely hard penalty. 

That is $\max_{\lambda \geq 0, \mu} \mathcal{L}(x, \lambda, \mu) = \mathcal{J}(x)$.

Consider, for a fixed, feasible $x$, two possibilities. If all inequality constraints are respected, that is $f_i(x) \leq 0$ $\forall i$, then, in our attempt to maximize $\mathcal{L}$, the best we can do is set $\lambda_i = 0$ $\forall i$ which results in the optimal value $f_0(x)$. In the case when *any* inequality constraint is violated, that is $f_i(x) > 0$ for some $i$, the result of maximizing $\mathcal{L}$ is $\infty$ by choosing $\lambda_i \rightarrow \infty$ and $\lambda_j = 0$ $\forall j \ne i$. 

Using similar logic, if all equality constraints are respected, then $h_i(x) = 0$ $\forall i$. In this case, any choice of $\mu$ results in the optimal value of $f_0(x)$. If, on the other hand, some equality constraint is violated then $h_i(x) \ne 0$ for some $i$. In this case by choosing $\mu \rightarrow \pm \infty$, depending on the direction of violation, the result can be made $\infty$.

In summary

$\max_{\lambda \geq 0, \mu} \mathcal{L} (x, \lambda, \mu) = \mathcal{J}(x) = \begin{cases}\begin{align} 
&f_0(x) \ \ \textrm{if $x$ is feasible}
\\
&\infty \ \ \textrm{otherwise}
\end{align}\end{cases}$

But since $\min_x: \mathcal{J}(x)$ is equivalent to the original, constrained problem. We have that $p^* = \min_x \mathcal{J}(x) = \min_x \left\{ \max_{\lambda \geq 0, \mu} \mathcal{L}(x, \lambda, \mu) \right\}$

So, we can also express Weak Duality in the following, symmetric form
$$\max_{\lambda \geq 0, \mu} \left\{ \min_x \mathcal{L} (x, \lambda, \mu) \right\} \leq \min_x \left\{ \max_{\lambda \geq 0, \mu} \mathcal{L} (x, \lambda, \mu) \right\}$$

Weak Duality can also be derived from the fact that

### The Saddle-Point Interpretation


### Game-Theoretic Interpretation

# Weak Duality

To show that a feasible and bounded primal and its feasible and bounded dual agree on the optimal value, we start small by first showing that one's optimal value is upper bounded by that of the other. We do that by proving the theorem of *Weak Duality*.

Henceforth, without loss of generality, we will assume the primal is a minimization problem and, consequently, the dual is a maximization problem. 

That is, the primal is an LP of the form

$
\begin{cases}
\min_x: c^Tx
\\
s.t.: \begin{aligned} &Ax \geq b
\\ 
&x \geq 0
\end{aligned}
\end{cases}
$

which means the dual is of the form

$
\begin{cases}
\max_p: b^Tp
\\
s.t.: \begin{aligned} &A^Tp \leq c
\\ 
&p \geq 0
\end{aligned}
\end{cases}
$

> **Weak Duality:** &nbsp; For any primal feasible $x$ and for all dual feasible $p$, $c^Tx \geq b^Tp$.
<br>

That is, any dual feasible solution $b^Tp$ is a *lower bound* for all primal feasible solutions $c^Tx$. Conversely, any primal feasible solution $c^Tx$ is an *upper bound* for all dual feasible solutions $b^Tp$. 

##  Proof of Weak Duality

Let $(p, x)$ be respectively dual-primal feasible. Then $c^Tx = x^Tc \geq x^TA^Tp \geq b^Tp$.

##  Max-Min Theorem

In the case of primal-dual optimal points $(x^*, p^*)$, Weak Duality states $c^Tx^* \geq b^Tp^*$. This is simply a restatement of the [max-min inequality](https://en.wikipedia.org/wiki/Max%E2%80%93min_inequality) within the context of LP's. 

As we may recall, the min-max inequality makes no assumptions about the function. It's simply true for all functions of the form $f: X \times Y \rightarrow \mathbb{R}$, and it states that:
$$
\inf_{y\in Y} \left\{ \sup_{x\in X} f(x,y) \right\} \geq \sup_{x\in X} \left\{ \inf_{y\in Y} f(x,y) \right\}
$$
Since no assumption on $f$ is made, the max-min inequality certainly also applies to the special case of linear objective functions of LP's. And since we're in the special case where the optima are assumed to exist, the functions attain their optima. That is, we can replace $\sup$ and $\inf$ in the max-min inequality with $\max$ and $\min$.

## Proof of Max-Min Theorem

For any $f$, and $x \in X$, $y \in Y$ we have:
$$f(x,y) \geq \min_x f(x,y)$$
The right hand side is now only a function of $y$, so maximizing both sides w.r.t. $y$ yields: 
$$ \max_y f(x,y) \geq \max_y \left\{ \min_x f(x,y) \right\}$$
The right hand side is now a constant, so minimizing both sides w.r.t. $x$ results in the desired conclusion.
$$\min_x \left\{ \max_y f(x,y) \right\} \geq \max_y \left\{ \min_x f(x,y) \right\}$$

## Game-Theoretic Intuition of Max-Min Theorem

An intuitive way to see the validity of the max-min theorem comes from [game theory](https://en.wikipedia.org/wiki/Game_theory).

Suppose two players $A$, and $B$, are playing a game in which player $A$'s goal is to minimize the score $s$ whereas player $B$'s goal is to maximize it. Suppose, per the rules of the game, player $A$ has the first turn. Then player $B$'s choice is final and it restricts the actions player $A$ can take. So player $B$ has an advantage in this game. 

However, if a second game is played such that player $B$ must go first, then the advantage lies with player $A$. 


Formally, suppose the game is described by $f(x,y)$ where $x \in X$ and $y \in Y$ represent player $A$'s and player $B$'s choices respectively. 

In the first game player $A$ is restricted to choosing $x$ that will minimize $s_1(x) = \max_y f(x,y)$. Consequently, the payoff in the first game will be $ s_1 = \min_x \left\{ \max_y f(x,y) \right\}$. 

Similarly, in the second game the payoff will be $s_2 = \max_y \left\{ \max_x f(x,y) \right\}$.

Since player $B$, whose goal is to minimize the score, has an advantage in the first game, $s_1 \geq s_2$ which concludes the game-theoretic proof of max-min inequality.

# Strong Duality

While Weak Duality is a useful result, the real strength of duality theory lies in *Strong Duality*. Strong duality is a re-statement of Von Neumann's [Minimax Theorem](https://en.wikipedia.org/wiki/Minimax_theorem) which lays out the conditions for which the max-min inequality holds with strict equality. Roughly speaking, it holds for functions that are saddle-shaped — convex in one variable and concave in the another.

Instead of proving the Minimax Theorem in the general case, we will stay topical and prove Strong Duality for LP's. That is, the Minimax Theorem as it pertains to the special case of linear programs... 

> **Strong Duality:** &nbsp; If the primal is feasible and bounded with optimal $x^*$ then the dual is also feasible and bounded. Furthermore, if the dual has optimum $p^*$ then $c^Tx^* = b^Tp^*$.
<br>

To prove Strong Duality, we require *Farkas' Lemma*.

## Farkas' Lemma

*Farkas' Lemma* belongs to the class of theorems called *Theorems of the Alternative* — these are a theorems stating that exactly one of two statements holds true.

The lemma simply states that a given vector $c$ is either a [conic combination](https://v-poghosyan.github.io/blog/optimization/applied%20mathematics/proofs/2022/01/23/Optimization-Review-of-Linear-Algebra-and-Geometry.html#Conic-Combinations-of-$n$-Points) of $a_i$'s for some $i \in I$, or it's separated from their cone by some hyperplane. 

We state Farkas' Lemma without offering proof since it has such an obvious geometric interpretation.

> **Farkas' Lemma:** &nbsp; For any vector $c$ and $a_i \ \ (i \in I)$ either the first or the second statement holds:  
&nbsp;
> * $\exists p \geq 0$ s.t. $c = \sum_{i \in I} a_ip_i$
> * $\exists$ vector $d$ s.t. $d^Ta_i \geq 0 \ \ \forall i \in I$ but $d^Tc < 0$

## Proof of Strong Duality in LP's

The proof is by construction. 

Suppose $x^*$ is a primal optimal solution. Let the set $I_{x^*} = \{ i : a_i^Tx^* = b_i\}$ be the set of the indices of the active constraints at $x^*$. Our goal is to construct a dual optimal solution $p^*$ s.t. $c^Tx^* = b^Tp^*$. 

Let $d$ be any vector that satisfies $d^Ta_i \geq 0 \ \ \forall i \in I_{x^*}$. That is, $d$ is a feasible direction w.r.t. to all the active constraints.

A small, positive $\epsilon$-step in the direction of $d$ results in point $x^* + \epsilon d$ that's still feasible. The fact that the step is small is what guarantees no inactive constraints are violated.

Let's compare the value of the objective at $x^* + \epsilon d$ to the value of the objective at $x^*$.

By the assumption that $x^*$ is optimal, we have $c^Tx^* \leq c^T(x^* + \epsilon d) = c^Tx^* + \epsilon c^Td$. Thus, $c^Td = d^Tc \geq 0$

> Note: $d^Tc$ is nothing but the *directional derivative* at the minimizer $x^*$. It is a *first-order necessary-condition* that the *directional derivative* in any feasible direction $d$ be non-negative at any minimizer $x^*$. This is analogous to the first-derivative test for scalar-valued functions. So, this result should have been expected...
<br>

But since $d$ is a vector s.t. $d^Ta_i \geq 0 \ \ \forall i \in I_{x^*}$ and $d^Tc \geq 0$, then $d$ does *not* separate $c$ from the cone of the $a_i$'s. And since $d$ was arbitrary, this puts us in the setting of Farkas' Lemma. Namely, there exist *no* vectors $d$ that separate $c$ from the cone. This means the second statement in Farkas' Lemma is violated and the first must be true — $c$ must a conic combination of the $a_i$'s that are active at the minimizer. In other words, $\exists p \geq 0$ s.t. $c = \sum_{i \in I_{x^*}} p_ia_i$. 

> Note: $c = \sum_{i \in I_{x^*}} p_ia_i$ should remind us of the Lagrange optimality condition in the general case of convex optimization. Recall that the Lagrange condition states that a point $x^*$ is optimal for a convex problem with objective $f(x)$ and constraints $g_i(x)=0$ if and only if $\exists  \lambda_i$ for each active constraint s.t. $\nabla f(x^*) = \sum_i \lambda_i \nabla g_i(x^*)$. In fact, Farka's lemma is what underpins the Lagrange condition through the assumption that the non-linear objective $f$ and non-linear constraints $g_i$ behave linearly in a small neighborhood of $x^*$. 
<br>

But $p$ has dimension equal to only the number of active constraints at $x^*$. To be a dual variable at all, it must have dimension equal to the number of all primal constraints. We extend $p$ to $p^*$ by setting all the entries that do not correspond to the active constraints at $x^*$ to be zero. 

That is $p^*_i = \begin{cases} p_i \ \ \textrm{if} \ \  i \in I_{x^*} \\ 0   \ \ \textrm{if} \ \  i \notin I_{x^*} \end{cases}$. 

Now $A^Tp^*  = \sum_{i} p^*_ia_i = c$, so any feasibility condition in the dual, whether it be $A^Tp \leq c$, $A^Tp \geq c$, or $A^Tp = c$, is satisfied by $p^*$. 

Furthermore, the dual objective at $p^*$ agrees with the primal objective at $x^*$.

$$b^Tp^* = \sum_{i} b_ip_i^* = \sum_{i \in I_{x^*}} b_ip_i^* + \sum_{i \notin I_{x^*}} b_ip_i^* = \sum_{i \in I_{x^*}} a_i^Tx^*p_i^* = (\sum_{i \in I_{x^*}} p_ia_i^T)x^* = c^Tx^* $$

However, it still remains to be shown that $p^*$ is dual optimal. 

Whenever the primal objective and the dual objective agree on a value, the respective solutions must be primal-dual optimal. This is simply true by Weak Duality, which states that $b^Tp \leq c^Tx^*$ $\forall p$. So, $c^Tx^*$ is an upper bound for any dual feasible solution. But the dual is a maximization problem, so the dual optimal must be $p^*$ s.t. $b^Tp^* = c^Tx^*$.

# Theorems of the Alternative

As mentioned earlier, these are theorems that describe exclusively disjoint scenarios that together comprise the entire outcome space. Formally, these are theorems of the form $A \implies \neg B \land \neg A \implies  B$  where $A$, and $B$ are logical statements.

Note that theorems of equivalence (i.e. theorems of the form *'the following are equivalent - TFAE'*) can also be formulated as theorems of the alternative. To say that $A$ and $B$ are equivalent means $ A \iff B$. But this breaks down as $A \implies B \land B \implies A$. Letting $\hat B = \neg B$ we can rewrite the above as $A \implies \neg \hat B \land B \implies A$. But, by taking the contrapositive, $B \implies A$ becomes $\neg A \implies \neg B$, which is to say $\neg A \implies \hat B$. In summary, we have shown that $A \iff B$ is equivalent to $A \implies \neg \hat B \land \neg A \implies \hat B$.

So, the class of theorems of the alternative is much broader than it appears and includes theorems of equivalence.

## Example of a Theorem of the Alternative

To see how we can prove a theorem of the alternative, it helps to state one. 

> **Theorem:** &nbsp; Exactly one of the following two statements most hold for a given matrix A.
&nbsp;
> 1. $\exists x \ne 0$ s.t. $Ax = 0$ and $x \geq 0$
> 2. $\exists p$ s.t. $p^TA > 0$
<br>

### Using a Separation Argument

At the heart of separation arguments lies this simple fact. 

> **Separating Hyperplane Theorem:** For any convex set $C$, if a point $\omega \notin C$ then there exists a hyperplane separating $\omega$ and $C$.
<br>

Farkas' Lemma, for instance, is proved by a separation argument that uses, as its convex set, the conic combination of the $a_i$'s. The conclusion is immediate since in Farkas' Lemma the first statement plainly says that a vector belongs to the convex set, and the second statement plainly says there exists a separating hyperplane between the two. 

This is the pattern all separation arguments must follow. However, in general, it may take a bit of work to define the problem-specific convex set and also to show that the two statements are *really* talking about belonging to this set, and separation from it. However, once these three things are accomplished the proof is complete. 

Using this idea, let's give a proof of the above theorem of the alternative using a separation argument.

#### Proof

First order of business is to come up with a convex set. 

Let's take $C = \{ z : z = Ay, \sum_i y_i = 1, y \geq 0 \}$ to be the convex hull of the columns of $A$.

The first statement in the theorem was that $\exists x \ne 0$ s.t. $Ax = 0$ and $x \geq 0$.

Since $x \ne 0$ and $x \geq 0$ we can scale as $x$ as $y = \alpha x$ until $\sum_i y_i = 1$.

So, the first statement is equivalent to saying the origin belongs to the convex hull $C$ (i.e. $0 \in C$)

The second statement was that $\exists p$ s.t. $p^TA > 0$. This is equivalent to saying that all the columns of $A$ lie to one side of the separating hyperplane introduced by $p$.

But all $z \in C$ are convex combinations of $A$'s columns. In particular since they're a convex combination they're also a conic combination, so all $z \in C$ also lie on the same side of the hyperplane. That is $p^Tz > 0 \ \ \forall z \in C$. 

But, of course, $p^T0 = 0$ (not $> 0$). So, according to the second statement, the origin is separated from $C$. 

This concludes the proof since the two statements must be mutually exclusive. 

### Using Strong Duality

Strong duality isn't just a tool for applied science, it has important theoretical uses. For instance, now that we've proven it we can use Strong Duality, instead of a separation argument, to prove theorems of the alternative. 

Since it gives us feasibility of two different constraint sets, it makes sense to use duality to prove theorems of existence. 

Let's take the aforementioned theorem of the alternative for example...

#### Proof

To prove the theorem we need to show two things. First, we need to show $1 \implies \neg 2$, then we need to show $\neg 1 \implies 2$.

The $1 \implies \neg 2$ direction is simple. 

Suppose $\exists x \ne 0$ s.t. $Ax = 0$ and $x \geq 0$. 

Then $\forall p \ \ (p^TA)x = p^T(Ax) = p^T0 = 0$ (not $> 0$).

We tackle the $\neg 1 \implies 2$ direction using duality.

The strategy is to construct a linear program based on $\neg 1$ such that the feasibility of its dual implies $2$.

We can express $\neg 1$ as '$\forall x \ne 0$, either $Ax \ne 0$ or $x < 0$.' Equivalently, '$x \ne 0 \implies Ax \ne 0$ or $x < 0$.' Taking the contrapositive, statement $1$ becomes '$Ax = 0$ and  $x \geq 0 \implies x = 0$.' 

So, let's form the LP 

$
\begin{cases}
\max_x: \textbf{1}^Tx
\\
s.t.: \begin{aligned} &Ax = 0
\\ 
&x \geq 0
\end{aligned}
\end{cases}
$

Note that $x = 0$ is a feasible solution to the LP. Furthermore, assuming statement $1$ guarantees that $x = 0$ is the only feasible solution. Thus, the LP is feasible and bounded. 

By Strong Duality, its dual must also be feasible and bounded. 

The dual is...

$
\begin{cases}
\min_p: \textbf{0}^Tp
\\
s.t.: p^TA \geq \textbf{1}
\end{cases}
$

... and since it's feasible, $\exists p$ s.t. $p^TA \geq 1 > 0$ which means shows the truth of statement $2$.  

# Complementary Slackness

*Complementary Slackness* is a fundamental property that exists between any primal optimal solution and any dual optimal solution. 

In the preceding section on Strong Duality we constructed a dual optimal by setting those of its variables that corresponded to the inactive constraints of the primal optimal to be zero. 

This is true in general, for all primal-dual optimal pairs. 

If a primal's constraint is loose at a some primal optimal, then the corresponding variable in the dual optimal is zero, and vice versa. 

Formally, this can be stated as

> **Complementary Slackness:** if $x$ is primal feasible and $p$ is dual feasible, then $x$ and $p$ are respectively optimal iff:
&nbsp;
> 1. $(b_i - \sum_{j} a_{ij}x_j)p_i = 0 \ \ \forall i$
> 2. $(\sum_{i} a_{ij}p_i - c_j)x_j = 0  \ \ \forall j$
<br>

If we recall, in the proof of Strong Duality we constructed a dual optimal by setting those of its variables that corresponded to the primal's slack constraints to be zero. In other words, we constructed a dual optimal in such a way as to satisfy the Complementary Slackness theorem. So, the fact that this generalizes to all primal-dual optima shouldn't surprise us. 

However, the above does not constitute a proof of Complementary Slackness, so let's offer one.

Take as a starting point the primal-dual pair

$
\textrm{P} \ \ 
\begin{cases}
\min_x: c^Tx
\\
s.t.: \begin{aligned} &Ax \geq b
\\ 
&x \geq 0
\end{aligned}
\end{cases}
$

$
\textrm{D} \ \ 
\begin{cases}
\max_p: b^Tp
\\
s.t.: \begin{aligned} &A^Tp \leq c
\\ 
&p \geq 0
\end{aligned}
\end{cases}
$

## Proof of Complementary Slackness

**Sufficiency $\impliedby$:**

Suppose both equalities hold.

Summing each over all $i$'s and $j$'s respectively and adding the results we get

$$\sum_i \left(b_i - \sum_j a_{ij}x_j \right)p_i + \sum_j \left( \sum_i a_{ij}p_i - c_j \right)x_j = 0$$

Which simplifies to 

$$\sum_i b_ip_i - \sum_i \sum_j a_{ij}x_jy_i + \sum_j \sum_i a_{i,j}y_ix_j - \sum_j c_jx_j = 0$$

Or, in matrix-vector form

$$b^Tp - p^TAx + p^TAx - c^Tx = 0$$

The middle two terms cancel, and we get $b^Tp = c^Tp$. 

By Weak Duality, $x$ and $p$ are primal-dual optimal.

**Necessity $\implies$:**

Suppose $x$ and $p$ are primal-dual optimal. 

By Strong Duality $b^Tp = c^Tx$. 

In other words, $b^Tp - c^Tx = 0$. Adding and subtracting the terms canceled in the first part, we can bring the sum to the form

$$b^Tp - p^TAx + p^TAx - c^Tx = 0$$

Which is, once again, the same as

$$\sum_i \left(b_i - \sum_j a_{ij}x_j \right)p_i + \sum_j \left( \sum_i a_{ij}p_i - c_j \right)x_j = 0$$

But $p$ is dual feasible, so $p_i \geq 0 \ \ \forall i$. And since $x$ is primal feasible, $Ax \geq b$ implies $(b_i - \sum_j a_{ij}x_j) \leq 0 \ \ \forall i$. 

Similarly, $x_j \geq 0 \ \ \forall j$ and $( \sum_i a_{ij}p_i - c_j) \geq 0 \ \ \forall j$. 

So the above expression is a sum of all non-positive terms that adds up to zero. This can only happen if each term is equal to zero. 