# Lab4: 06/11/2023
## Equality Constraints
### Dafni Tziakouri
### Adriana Álvaro


## 2 Sequential Quadratic Programming


### **1.** One simple way to proceed is to take $α^k = 1$ and iteratively update the current point to obtain the next. This is a simple way to proceed that is proposed to perform first. The stopping condition should be performed over $∇_xL$. Test this approach and check if it works using the starting point proposed in the example.

We will start by defining the function $f$ and the equality constraint $h$:

In [7]:
from math import exp as e
import numpy as np

def f(x_1,x_2):
  return e(3*x_1) + e(-4*x_2)

def h(x_1,x_2):
  return x_1**2 + x_2**2 - 1

Let's also compute the gradient and the hessian:

In [8]:
def grad_f(x_1, x_2):
    return np.array([3*e(3*x_1), -4*e(-4*x_2)])

def grad_h(x_1, x_2):
    return np.array([2*x_1, 2*x_2])

def hess_f(x_1, x_2):
    return np.array([[9*e(3*x_1), 0], [0, 16*e(-4*x_2)]])

def hess_h(x_1, x_2):
    return np.array([[2, 0], [0, 2]])

We will compute the langrangian as $L(x, λ) = f(x) − λh(x)$ and also it's gradient and hessian.

In [9]:
def lag(x_1, x_2, lamda):
    return f(x_1, x_2) - lamda*h(x_1, x_2)

def grad_lag(x_1, x_2, lamda):
    return grad_f(x_1, x_2) - lamda*grad_h(x_1, x_2)

def hess_lag(x_1, x_2, lamda):
    return hess_f(x_1, x_2) - lamda*hess_h(x_1, x_2)

We will define the Newtons based iterative method to solve the problem.

In [10]:
def solveNewtonBased(x_1, x_2, lamda, alpha=1, eps=1e-5, MAX_ITER=100, verbose=False):
    print(f'Initial point x0 = [{x_1}, {x_2}]')
    for i in range(MAX_ITER):
        # Compute necessary gradients and Hessian matrices
        gh = grad_h(x_1, x_2)
        grad_lag_value = grad_lag(x_1, x_2, lamda)
        hess_lag_value = hess_lag(x_1, x_2, lamda)

        # Build the matrix A and vector b
        A = np.block([[hess_lag_value, -gh.reshape(-1, 1)], [-gh, np.array([[0]])]])
        b = np.concatenate([-grad_lag_value, [h(x_1, x_2)]])

        # Solve the linear system A * delta = b
        delta = np.linalg.solve(A, b)

        # Update variables
        x_1 += alpha * delta[0]
        x_2 += alpha * delta[1]
        lamda += alpha * delta[2]

        if np.linalg.norm(grad_lag(x_1, x_2, lamda)) < eps:
          if verbose:
                print('Break by Lagrangian gradient')
          break

        if verbose:
          print('Iterations: {}'.format(i))
          print('x = (x_1, x_2) = ({0:.5f}, {1:.5f}), lamda = {2:.5f}'.format(x_1, x_2, lamda))

    return x_1, x_2, lamda


In [11]:
x_1, x_2, lamda = -1, 1, -1
result = solveNewtonBased(x_1, x_2, lamda, alpha=1, eps=1e-5, MAX_ITER=100, verbose=True)

print("Final result:", result)

Initial point x0 = [-1, 1]
Iterations: 0
x = (x_1, x_2) = (-0.77423, 0.72577), lamda = -0.35104
Iterations: 1
x = (x_1, x_2) = (-0.74865, 0.66614), lamda = -0.21606
Break by Lagrangian gradient
Final result: (-0.7483381762503777, 0.663323446868971, -0.21232390186241443)


We observe that the solution is reached in two iterations and it also matches the given solution in the PDF file.

### **2.** This basic iteration also has drawbacks, leading to a number of vital questions. It is a Newtonlike iteration, and thus may diverge from poor starting points. In our example we have started from a point that is near to the optimal solution. Try to perform some experiments with starting points that are farther away of the optimal solution.

We will define some points which are farther away of the optimal solution and try again the previous method.

In [12]:
# Define larger ranges
farther_ranges = [
    (-10, -5),  # Increase the range for component 1
    (-4, -1),  # Increase the range for component 2
    (3, 6),  # Increase the range for component 3
]

far_away_points = np.array([
    np.random.uniform(low, high, 3) for low, high in farther_ranges
])

print(far_away_points)

[[-8.2721592  -5.03411667 -8.7863411 ]
 [-2.58883276 -2.3796944  -1.56814779]
 [ 4.07181224  5.1065274   4.79671312]]


In [13]:
for point in far_away_points:
    x_1, x_2, lamda = point
    print('Starting point:\nx = (x_1, x_2) = ({0:.5f}, {1:.5f}), lamda = {2:.5f}'.format(x_1, x_2, lamda))
    solveNewtonBased(x_1, x_2, lamda, verbose=True)

Starting point:
x = (x_1, x_2) = (-8.27216, -5.03412), lamda = -8.78634
Initial point x0 = [-8.272159202617534, -5.034116665029588]
Iterations: 0
x = (x_1, x_2) = (-2.81688, -4.78412), lamda = -5.79437
Iterations: 1
x = (x_1, x_2) = (2.05208, -4.53412), lamda = -10.01730
Iterations: 2
x = (x_1, x_2) = (-3.18696, -4.28408), lamda = -5099.42339
Iterations: 3
x = (x_1, x_2) = (0.79285, -4.03396), lamda = -6368.04871
Iterations: 4
x = (x_1, x_2) = (-7.94552, -3.78048), lamda = -70700.19405
Iterations: 5
x = (x_1, x_2) = (-3.25751, -3.52575), lamda = -41714.45614
Iterations: 6
x = (x_1, x_2) = (-0.15789, -3.26365), lamda = -39692.58779
Iterations: 7
x = (x_1, x_2) = (4.49251, -2.00619), lamda = -1169175.90962
Iterations: 8
x = (x_1, x_2) = (2.89892, 0.20921), lamda = -1315048.14894
Iterations: 9
x = (x_1, x_2) = (1.62114, 0.11560), lamda = -588414.28842
Iterations: 10
x = (x_1, x_2) = (1.11744, 0.07967), lamda = -182888.13108
Iterations: 11
x = (x_1, x_2) = (1.00391, 0.07160), lamda = -1855

We notice that with the selected starting point which are farter away from the optimal solution, the method diverges and is not finding the optimal solution.

### **3.** One way to find the optimal solution from points that are far away of the optimal solution is to start the optimization with another function that allows us to find an approximation to the solution we are looking for. Once an approximate solution is found, we can apply the Newton-based technique we have presented previously to find the optimal solution.

### The function that allows us to find an approximation to the solution we are looking for is called, in this context, the merit function. Usually, a merit function is the sum of terms that include the objective function and the amount of infeasibility of the constraints. One example of a merit function for the problem we are treating is the quadratic penalty function $M(x_1, x_2) = f(x_1, x_2) + ρh(x_1, x_2)^2$ where $ρ$ is some positive number. The greater the value of ρ, the greater the penalty for infeasibility. The difficulty arises in defining a proper merit function for a particular equality constrained problem.

###Here we propose you to take $ρ = 10$ and perform a classical gradient descent (with backtraking if you want) to find and approximation to the solution we are looking for. Observe if you arrive near to the optimal solution of the problem.Take into account that you may have numerical problems with the gradient. A simple way to deal with it is to normalize the gradient at each iteration, $∇M(x)/ ||∇M(x)||$, and use this normalized gradient as search direction.


We will define the merit function and it is gradient:

In [14]:
def merit(x_1, x_2, ro=10):
    return f(x_1, x_2) + ro*h(x_1, x_2)**2

def grad_merit(x_1, x_2, ro=10):
    return grad_f(x_1, x_2) + 2 * ro * h(x_1, x_2) * grad_h(x_1, x_2)

Now, we will use the gradient descent method as it was suggested but with gradient normalization.

In [15]:
def gradient_descent(f, grad_f, w0, f_tol=1e-3, grad_tol=1e-5):
    x = [w0]
    # Iterating until one of the stop criteria is fulfilled
    while True:
        gradient_of_f = grad_f(w0[0], w0[1])
        grad_normalized = gradient_of_f / np.linalg.norm(gradient_of_f)
        alpha = 1
        # Compute new alphas until the needed condition is true.
        while f(*(w0 - alpha * grad_normalized)) >= f(*w0):
            alpha /= 2

        # Same formula to compute the next iteration point
        w0 = w0 - alpha * grad_normalized
        x.append(w0)

        # If one of the stopping criterion is satisfied, we return the history of points
        gradient_of_f = grad_f(w0[0], w0[1])
        grad_normalized = gradient_of_f / np.linalg.norm(gradient_of_f)
        if np.abs(f(*x[-1]) - f(*x[-2])) < f_tol or np.linalg.norm(grad_normalized) < grad_tol:
            return np.array(x)

Let's test the method by choosing a far away point from the optimal solution.

In [16]:
point= far_away_points[0]
solution_approx = gradient_descent(merit, grad_merit, point[:2])
print("With a gradient descent method (and gradient normalization) on Merit function we get the approximation:", solution_approx[-1])

With a gradient descent method (and gradient normalization) on Merit function we get the approximation: [-0.82151982  0.56654271]


We notice that we get a little bit closer to the optimal solutuion but still not that close. The minimizers for the merit function do not coincide with the minimizers for the objective function.

### **4.** As previously commented, the minimizers of the merit function $M(x_1,x_2)$ do not necessarily have to coincide with the minimizers of the constrained problem. Thus, once we “sufficiently” approach the optimal solution we may use the Newton method (with $α = 1$) to find the solution to the problem.

### Therefore the algorithm consists in starting with the Merit function to obtain an approximation to the optimal point we are looking for. Once an approximation to the solution is found,use the Newton-based method to find the optimal solution. Check if you are able to find the optimal solution to the problem.

We will use the point which is closer to the optimal solution and minimize the merit function with gradient descent. Then, we will apply the Newton based method to find the optimal solution, using the minimizer point for the merit function as the starting point for the newton based method.

This way, we take leverage of the newton based algorithm and we avoid the inconvenient that it had, that is not performing well when the starting point is not already close to the minimum.

In [17]:
solution_x = solveNewtonBased(solution_approx[-1,0], solution_approx[-1,1], lamda=-1, alpha=1, verbose=True)

Initial point x0 = [-0.8215198215944418, 0.5665427128173111]
Iterations: 0
x = (x_1, x_2) = (-0.79025, 0.61553), lamda = -0.20791
Iterations: 1
x = (x_1, x_2) = (-0.75115, 0.66299), lamda = -0.20839
Iterations: 2
x = (x_1, x_2) = (-0.74833, 0.66334), lamda = -0.21231
Break by Lagrangian gradient


In this way we obtain the optimal solution given also in the PDF file.