## _Last Updated: 06/10/2021_

## General MIP Original Problem for $ L \ge 2 $

__Parameters:__

$ x_{n,d}: \textrm{binary vector inputs of size n} \times \textrm{d, where n is the number of data points, d is the number of dimensions/features} $ 

$ y_{n}: \textrm{ binary  vector labeled outputs of size n} \times \textrm{1, where n is the number of data points} $

__Decision Variables:__

$ \alpha_{d,k,0}: \textrm{Weight for feature d in unit k in the first hidden layer,} \; \forall\;d\in D,\;k \in K $

$ \alpha_{k\prime,k,l}: \textrm{Weight from the } k\prime^{th} \textrm{ unit in layer l-1 to unit k in layer l,} \; \forall\;k\prime,\;k \in K,\;l \in \{1,2,3,...,L-1\} $

$ \alpha_{k\prime,0,L}: \textrm{Weight from the } k\prime^{th} \textrm{ unit in hiddenlayer L to unit output layer L+1,} \; \forall\;\;k\prime \in K $

$ \beta_{k,l}: \textrm{Bias for unit k in layer l,} \; \forall\;k \in K,\;l\in \{0,1,...,L-1\} $

$ \beta_{L}: \textrm{Bias in the final layer} $

$ h_{n,k,l}: \textrm{Binary output of unit k in layer l,} \; \forall\;n \in N, \;k \in K,\;l \in \{0,1,...L-1\} $

$ z_{n,k\prime,k,l}: \textrm{Auxilliary variable that represents } \alpha_{k\prime,k,l}h_{n,k,(l-1)} \; \forall\;n \in N, \;k\prime,\;k \in K,\;l \in \{1,2,...,L-1\} $

$ z_{n,k\prime,0,L}: \textrm{Auxilliary variable that represents } \alpha_{k\prime,0,L}h_{n,k,(L-1)} \; \forall\;n \in N, \;k\prime,\;k \in K $

$ \hat{y}_{n}: \textrm{Output of final layer,} \; \forall\;n \in N $

$ \ell_{n}: \textrm{Absolute Misclassification of data point n,} \; \forall\;n \in N $

__Objective:__

$\displaystyle \min_{\alpha,\beta,h,z,\hat{y},\ell} \; \displaystyle  \sum_{n=1}^{N} \ell_{n} $

__Constraints:__

$ \textrm{subject to} \quad \displaystyle \sum_{k\prime=0}^{K} (z_{n,k\prime,0,L}) + \beta_{L} \le -\epsilon + (M+\epsilon)\hat{y}_{n}, \; \forall \; n $

$ \quad\quad\quad\quad\;\; \displaystyle \sum_{k\prime=0}^{K} (z_{n,k\prime,0,L}) + \beta_{L} \ge \epsilon + (m-\epsilon)(1-\hat{y}_{n}), \; \forall \; n $

$ \quad\quad\quad\quad\;\; \ell_{n} \ge y_{n} - \hat{y}, \; \forall \; n $

$ \quad\quad\quad\quad\;\; \ell_{n} \ge -y_{n} + \hat{y}, \; \forall \; n $

$ \quad\quad\quad\quad\;\; \displaystyle \sum_{d=0}^{D} (\alpha_{d,k,0}x_{n,d}) + \beta_{k,0} \le -\epsilon + (M+\epsilon)h_{n,k,0}, \; \forall \;n,k $

$ \quad\quad\quad\quad\;\; \displaystyle \sum_{d=0}^{D} (\alpha_{d,k,0}x_{n,d}) + \beta_{k,0} \ge \epsilon + (m-\epsilon)(1-h_{n,k,0}), \; \forall \; n,k $

$ \quad\quad\quad\quad\ \displaystyle  \sum_{k\prime=0}^{K} (z_{n,k\prime,k,l}) + \beta_{k,l} \le -\epsilon + (M+\epsilon)h_{n,k,l}, \; \forall \;n,k,l\in\;\{1,2,...,L-1\} $

$ \quad\quad\quad\quad\ \displaystyle  \sum_{k\prime=0}^{K} (z_{n,k\prime,k,l}) + \beta_{k,l}  \ge \epsilon + (m-\epsilon)(1-h_{n,k,l}), \; \forall \; n,k,l\in\;\{1,2,...,L-1\} $

$ \quad\quad\quad\quad\;\; z_{n,k\prime,k,l} \le \alpha_{k\prime,k,l}, \; \forall\;n,k\prime,k,l\in\;\{1,2,...,L-1\} $

$ \quad\quad\quad\quad\;\; z_{n,k\prime,k,l} - \alpha_{k\prime,k,l} \ge m(1-h_{n,k\prime,(l-1)}), \; \forall\;n,k,k\prime,l\in\;\{1,2,...,L-1\} $

$ \quad\quad\quad\quad\;\;  mh_{n,k\prime,(l-1)} \le z_{n,k\prime,k,l} \le Mh_{n,k\prime,(l-1)}, \; \forall\;n,k,k\prime,l\in\;\{1,2,...,L-1\}$

$ \quad\quad\quad\quad\;\; z_{n,k\prime,0,L} \le \alpha_{k\prime,0,L}, \; \forall\;n,k\prime $

$ \quad\quad\quad\quad\;\; z_{n,k\prime,0,L} - \alpha_{k\prime,0,L} \ge m(1-h_{n,k\prime,L-1)}), \; \forall\;n,k\prime $

$ \quad\quad\quad\quad\;\;  mh_{n,k\prime,L-1} \le z_{n,k\prime,0,L} \le Mh_{n,k\prime,L-1}, \; \forall\;n,k\prime $

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \alpha_{d,k,0},\;\alpha_{k\prime,k,l},\;\alpha_{k\prime,0,L-1} \le Upper\;Bound, \; \forall \; d,k,k\prime,l\in\;\{1,2,...,L-1\} $ 

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \;z_{n,k\prime,k,l},\;z_{n,k\prime,0,L} \le Upper\;Bound, \; \forall \; n,k,k\prime,l\in\;\{1,2,...,L-1\} $ 

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \beta_{k,l},\; \beta_{L-1} \le Upper\;Bound, \; \forall \; k,l\in\;\{0,1,2,...,L-1\} $

$ \quad\quad\quad\quad\;\; \hat{y}_{n},h_{n,k,l}  \in \{0,1\}, \; \forall \; n,k,l\in\;\{0,1,...,L-1\} $

$ \quad\quad\quad\quad\;\; 0 \le \ell_{n} \le 1, \; \forall \; n \in N $


## Lagrangian Relaxation and Dual

$ \zeta_{LR}(\lambda) = \displaystyle \min_{\alpha,\beta,z,h,\hat{y},\ell} \; 
\sum_{n=0}^{N} (\ell_{n} + ( \sum_{k\prime=0}^{K} ( \sum_{k=0}^{K} ( \sum_{l=1}^{L-1}
( \lambda_{2,n,k\prime,k,l}(-z_{n,k\prime,k,l} + \alpha_{k\prime,k,l} + m(1-h_{n,k\prime,l-1}))
+ \lambda_{3,n,k\prime,k,l}(mh_{n,k\prime,l-1} - z_{n,k\prime,k,l})
+ \lambda_{4,n,k\prime,k,l}(z_{n,k\prime,k,l} - Mh_{n,k\prime,l-1})))
+ \lambda_{2,n,k\prime,0,L}(-z_{n,k\prime,0,L} + \alpha_{k\prime,0,L} + m(1-h_{n,k\prime,L-1}))
+ \lambda_{3,n,k\prime,0,L}(mh_{n,k\prime,L-1} - z_{n,k\prime,0,L})
+ \lambda_{4,n,k\prime,0,L}(z_{n,k\prime,0,L} - Mh_{n,k\prime,L-1})
+ \mu_{1}\sum_{d=0}^{D}(\sum_{k=0}^{K}(|\alpha_{d,k,0}|))
+ \mu_{2}\sum_{l=1}^{L-1}(\sum_{k\prime=0}^{K}(\sum_{k=0}^{K}(|\alpha_{k\prime,k,l}|)))
+ \mu_{3}\sum_{k=0}^{K}(|\alpha_{k\prime,0,L}|)) $

$ \textrm{subject to} \quad \displaystyle \sum_{k\prime=0}^{K} (z_{n,k\prime,0,L}) + \beta_{L} \le -\epsilon + (M+\epsilon)\hat{y}_{n}, \; \forall \; n $

$ \quad\quad\quad\quad\;\; \displaystyle \sum_{k\prime=0}^{K} (z_{n,k\prime,0,L}) + \beta_{L} \ge \epsilon + (m-\epsilon)(1-\hat{y}_{n}), \; \forall \; n $

$ \quad\quad\quad\quad\;\; \ell_{n} \ge y_{n} - \hat{y}, \; \forall \; n $

$ \quad\quad\quad\quad\;\; \ell_{n} \ge -y_{n} + \hat{y}, \; \forall \; n $

$ \quad\quad\quad\quad\;\; \displaystyle \sum_{d=0}^{D} (\alpha_{d,k,0}x_{n,d}) + \beta_{k,0} \le -\epsilon + (M+\epsilon)h_{n,k,0}, \; \forall \;n,k $

$ \quad\quad\quad\quad\;\; \displaystyle \sum_{d=0}^{D} (\alpha_{d,k,0}x_{n,d}) + \beta_{k,0} \ge \epsilon + (m-\epsilon)(1-h_{n,k,0}), \; \forall \; n,k $

$ \quad\quad\quad\quad\;\; \displaystyle  \sum_{k\prime=0}^{K} (z_{n,k\prime,k,l}) + \beta_{k,l} \le -\epsilon + (M+\epsilon)h_{n,k,l}, \; \forall \;n,k,l\in\;\{1,2,...,L-1\} $

$ \quad\quad\quad\quad\;\; \displaystyle  \sum_{k\prime=0}^{K} (z_{n,k\prime,k,l}) + \beta_{k,l}  \ge \epsilon + (m-\epsilon)(1-h_{n,k,l}), \; \forall \; n,k,l\in\;\{1,2,...,L-1\} $

$ \quad\quad\quad\quad\;\; z_{n,k\prime,k,l} \le \alpha_{k\prime,k,l}, \; \forall\;n,k\prime,k,l\in\;\{1,2,...,L-1\} $

$ \quad\quad\quad\quad\;\; z_{n,k\prime,0,L} \le \alpha_{k\prime,0,L}, \; \forall\;n,k\prime $

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \alpha_{d,k,0},\;\alpha_{k\prime,k,l},\;\alpha_{k\prime,0,L} \le Upper\;Bound, \; \forall \; d,k,k\prime,l\in\;\{0,1,...,L-1\} $ 

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \;z_{n,k\prime,k,l},\;z_{n,k\prime,0,L} \le Upper\;Bound, \; \forall \; n,k,k\prime,l\in\;\{1,2,...,L-1\} $ 

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \beta_{k,l},\; \beta_{L} \le Upper\;Bound, \; \forall \; k,l\in\;\{0,1,2,...,L-1\} $

$ \quad\quad\quad\quad\;\; \hat{y}_{n},\;h_{n,k,l}  \in \{0,1\}, \; \forall \; n,k,l\in\;\{0,1,...,L-1\} $

$ \quad\quad\quad\quad\;\; 0 \le \ell_{n} \le 1, \; \forall \; n \in N $


### Sub-problems

$ \zeta_{0}(\lambda) = \displaystyle \min_{\alpha,\beta,h} \; 
\sum_{n=0}^{N} (\sum_{k\prime=0}^{K} (\sum_{k=0}^{K}
h_{n,k\prime,0}(- m\lambda_{2,n,k\prime,k,1}
                + m\lambda_{3,n,k\prime,k,1}
                - M\lambda_{4,n,k\prime,k,1})))
+ \mu_{1}\sum_{d=0}^{D}(\sum_{k=0}^{K}(|\alpha_{d,k,0}|)) $
                  
$ \textrm{subject to} \quad \displaystyle \sum_{d=0}^{D} (\alpha_{d,k,0}x_{n,d}) + \beta_{k,0} \le -\epsilon + (M+\epsilon)h_{n,k,0}, \; \forall \;n,k $ 

$ \quad\quad\quad\quad\;\; \displaystyle \sum_{d=0}^{D} (\alpha_{d,k,0}x_{n,d}) + \beta_{k,0} \ge \epsilon + (m-\epsilon)(1-h_{n,k,0}), \; \forall \; n,k $


$ \quad\quad\quad\quad\;\; Lower\;Bound \le \alpha_{d,k,0} \le Upper\;Bound, \; \forall \; d,k $


$ \quad\quad\quad\quad\;\; Lower\;Bound \le \beta_{k,0} \le Upper\;Bound, \; \forall \; k $

$ \quad\quad\quad\quad\;\; h_{n,k,0}  \in \{0,1\}, \; \forall \; n,k $



$ \zeta_{l}(\lambda) = \displaystyle \min_{z,\beta,h,\alpha} \; 
\sum_{n=0}^{N} (\sum_{k\prime=0}^{K} (\sum_{k=0}^{K}
(z_{n,k\prime,k,l}(- \lambda_{2,n,k\prime,k,l}
                   - \lambda_{3,n,k\prime,k,l}
                   + \lambda_{4,n,k\prime,k,l})
+ h_{n,k\prime,l}(- m\lambda_{2,n,k\prime,k,l+1}
                  + m\lambda_{3,n,k\prime,k,l+1}
                  - M\lambda_{4,n,k\prime,k,l+1})
+\alpha_{k\prime,k,l}(\lambda_{2,n,k\prime,k,l})))) 
+ \mu_{2}\sum_{k\prime=0}^{K}(\sum_{k=0}^{K}(|\alpha_{k\prime,k,l}|)) $
                  
$ \textrm{subject to} \quad \displaystyle  \sum_{k\prime=0}^{K} (z_{n,k\prime,k,l}) + \beta_{k,l} \le -\epsilon + (M+\epsilon)h_{n,k,l}, \; \forall \;n,k $

$ \quad\quad\quad\quad\ \displaystyle  \sum_{k\prime=0}^{K} (z_{n,k\prime,k,l}) + \beta_{k,l}  \ge \epsilon + (m-\epsilon)(1-h_{n,k,l}), \; \forall \; n,k $

$ \quad\quad\quad\quad\;\; z_{n,k\prime,k,l} \le \alpha_{k\prime,k,l}, \; \forall\;n,k\prime,k $

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \alpha_{k\prime,k,l} \le Upper\;Bound, \; \forall \; k\prime, k $

$ \quad\quad\quad\quad\;\; Lower\;Bound \le z_{n,k\prime,k,l} \le Upper\;Bound, \; \forall \; n,k,k\prime $

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \beta_{k,l} \le Upper\;Bound, \; \forall \; k $

$ \quad\quad\quad\quad\;\; h_{n,k,l} \in \{0,1\}, \; \forall \; n,k $


$ \zeta_{L-1}(\lambda) = \displaystyle \min_{z,\beta,h,\alpha} \; 
\sum_{n=0}^{N} (\sum_{k\prime=0}^{K} (\sum_{k=0}^{K} 
(z_{n,k\prime,k,L-1}(- \lambda_{2,n,k\prime,k,L-1}
                     - \lambda_{3,n,k\prime,k,L-1}
                     + \lambda_{4,n,k\prime,k,L-1})
+ \alpha_{k\prime,k,L-1}(\lambda_{2,n,k\prime,k,L-1}))
+ h_{n,k\prime,L-1}(- m\lambda_{2,n,k\prime,0,L}
                    + m\lambda_{3,n,k\prime,0,L}
                    - M\lambda_{4,n,k\prime,0,L})))
+ \mu_{2}\sum_{k\prime=0}^{K}(\sum_{k=0}^{K}(|\alpha_{k\prime,k,L-1}|)) $
                  
$ \textrm{subject to} \quad \displaystyle  \sum_{k\prime=0}^{K} (z_{n,k\prime,k,L-1}) + \beta_{k,L-1} \le -\epsilon + (M+\epsilon)h_{n,k,L-1}, \; \forall \;n,k $

$ \quad\quad\quad\quad\ \displaystyle  \sum_{k\prime=0}^{K} (z_{n,k\prime,k,L-1}) + \beta_{k,L-1}  \ge \epsilon + (m-\epsilon)(1-h_{n,k,L-1}), \; \forall \; n,k $

$ \quad\quad\quad\quad\;\; z_{n,k\prime,k,L-1} \le \alpha_{k\prime,k,L-1}, \; \forall\;n,k\prime,k $

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \alpha_{k\prime,k,L-1} \le Upper\;Bound, \; \forall \; k\prime, k $

$ \quad\quad\quad\quad\;\; Lower\;Bound \le z_{n,k\prime,k,L-1} \le Upper\;Bound, \; \forall \; n,k,k\prime $

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \beta_{k,L-1} \le Upper\;Bound, \; \forall \; k $

$ \quad\quad\quad\quad\;\; h_{n,k,L-1} \in \{0,1\}, \; \forall \; n,k $


$ \zeta_{L}(\lambda) = \displaystyle \min_{z,\alpha,\beta,\hat{y},\ell} \; 
\sum_{n=0}^{N} (\ell_{n} + (\sum_{k\prime=0}^{K}
z_{n,k\prime,0,L}(- \lambda_{2,n,k\prime,0,L}
                  - \lambda_{3,n,k\prime,0,L}
                  + \lambda_{4,n,k\prime,0,L})
+ \alpha_{k\prime,0,L}(\lambda_{2,n,k\prime,0,L})))
+ \mu_{3}\sum_{k=0}^{K}(|\alpha_{k\prime,0,L}|)) $
                  
$ \textrm{subject to} \quad \displaystyle \sum_{k\prime=0}^{K} (z_{n,k\prime,0,L}) + \beta_{L} \le -\epsilon + (M+\epsilon)\hat{y}_{n}, \; \forall \; n $

$ \quad\quad\quad\quad\;\; \displaystyle \sum_{k\prime=0}^{K} (z_{n,k\prime,0,L}) + \beta_{L} \ge \epsilon + (m-\epsilon)(1-\hat{y}_{n}), \; \forall \; n $

$ \quad\quad\quad\quad\;\; \ell_{n} \ge y_{n} - \hat{y}, \; \forall \; n $

$ \quad\quad\quad\quad\;\; \ell_{n} \ge -y_{n} + \hat{y}, \; \forall \; n $

$ \quad\quad\quad\quad\;\; z_{n,k\prime,0,L} \le \alpha_{k\prime,0,L}, \; \forall\;n,k\prime $

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \alpha_{k\prime,0,L} \le Upper\;Bound, \; \forall \; k\prime $ 

$ \quad\quad\quad\quad\;\; Lower\;Bound \le z_{n,k\prime,0,L} \le Upper\;Bound, \; \forall \; n,k\prime $ 

$ \quad\quad\quad\quad\;\; Lower\;Bound \le \beta_{L} \le Upper\;Bound $

$ \quad\quad\quad\quad\;\; \hat{y}_{n} \in \{0,1\}, \; \forall \; n $

$ \quad\quad\quad\quad\;\; 0 \le \ell_{n} \le 1, \; \forall \; n $

***

### Master Problem

$ z_{\mathcal{L}} = \displaystyle \max_{\lambda \in \mathbb{R}^{m_{1}}_{+}} 
\sum_{l=1}^{L-2}(\zeta_{l}(\lambda))
+ \zeta_{L-1}(\lambda)
+ \zeta_{0}(\lambda)
+ \zeta_{L}(\lambda) 
+ \sum_{n=0}^{N} (\sum_{k\prime=0}^{K} (\sum_{k=0}^{K} (\sum_{l=1}^{L-1}
    m\lambda_{2,n,k\prime,k,l}))
    + m\lambda_{2,n,k\prime,0,L}) $


=================================================================================================================

__Notes and Questions for 05/03/2021__

* Lagrange Dual
    * Clean up the formulation to merge the $ z_{n,k\prime,k,l} $ and respective $ h_{n,k\prime,l} $
    * Specifically general to have a sub problem for the first layer, middle layers and output layer separately. 
    
* Read up on MIP and NN papers
    * Laurent El-Ghaoui (UC Berkeley)
    

__Notes and Questions for 05/11/2021__

* Lagrange Dual
    * Min in the sub problems over the decision variables
    * Decompose the first subproblem
    
* Read up on MIP and NN papers/textbooks
    * Specifically, subgradient descent in lagrange decomposition problems.<br><br>
    _Notes:_
    * Subgradient vector of $ \zeta_{LR}(\lambda)$, $ s^{t} $, consists of the dualized constraints at $ \lambda^{t} $
    * 1. Choose a starting point for all $ \lambda $
      2. Let $ s^{t} = b - Ax^{t} $ of the $ \zeta(\lambda^{t}) $. If $ s^{t} = 0 $ , stop
      3. $ \lambda^{t+1} = \max(0, \lambda^{t}+\gamma^{t}s^{t}) $ where $ \gamma $ is the step size
      4. Increment t and go to 2.
    
      Referenced from: _Lagrangian Relaxation: An overview_; Discrete Mathematics for Bioinformatics WS 07/08, G. W. Klau, 18. Dezember 2007, 14:21 <br><br>
      
<img src="Held-Karp Stepsize.png" width = 750>
    
* _General Notes:_
    * Subgradient vs Cutting Plane? (Prof. Luedtke's Slides)
    * Bundle Method? (Prof. Luedtke's Slides)
    * Verify if our problem is convex (Since the relaxation is till a MIP, I'm guessing it's nonconvex)
    * [Ballstep Subgradient Method](https://www.researchgate.net/publication/220442497_Lagrangian_Relaxation_via_Ballstep_Subgradient_Methods) "can provide a solution to the problem at no extra cost". Scope for not just giving us the bounds? Example was for "Lagrangian relaxation of convex programs"

__Notes and Questions for 06/01/2021__

* Global Dependence (Introduction to Linear Optimization - Bertsimas, Tsitsikilis)

* Code up the problems (Langrangian Dual Method-Implementation)