##  Non-Linear Conjugate Gradient Method:

***Minimize***


$min \ f(x),\ f(x) = \sum\limits_{i=1}^{n-1} (x_i - 2x^2_{i+1})^2$


$\ x_0 = [1,1,1,1]^T$

In [2]:
import math
import numpy as np
from sympy import *

**Initializing the vectors**

In [3]:
x0 = np.array([1,1,1,1])

In [4]:
#function Defition

def func(a,b,c,d):
    x,y,z,w=symbols('x y z w')
    f = (x - 2*y**2)**2 + (y - 2*z**2)**2 + (z - 2*w**2)**2
    return f.subs({x:a,y:b,z:c,w:d})

def derv_x(a,b,c,d):
    x,y,z,w=symbols('x y z w')
    f = (x - 2*y**2)**2 + (y - 2*z**2)**2 + (z - 2*w**2)**2
    return diff(f,x).subs({x:a,y:b,z:c,w:d})
    
def derv_y(a,b,c,d):
    x,y,z,w=symbols('x y z w')
    f = (x - 2*y**2)**2 + (y - 2*z**2)**2 + (z - 2*w**2)**2
    return diff(f,y).subs({x:a,y:b,z:c,w:d})    
    
def derv_z(a,b,c,d):
    x,y,z,w=symbols('x y z w')
    f = (x - 2*y**2)**2 + (y - 2*z**2)**2 + (z - 2*w**2)**2
    return diff(f,z).subs({x:a,y:b,z:c,w:d})
    
def derv_w(a,b,c,d):
    x,y,z,w=symbols('x y z w')
    f = (x - 2*y**2)**2 + (y - 2*z**2)**2 + (z - 2*w**2)**2
    return diff(f,w).subs({x:a,y:b,z:c,w:d})

def grad_vector(x0):
    return np.matrix([derv_x(x0[0],x0[1],x0[2],x0[3]),derv_y(x0[0],x0[1],x0[2],x0[3]),
                         derv_z(x0[0],x0[1],x0[2],x0[3]),derv_w(x0[0],x0[1],x0[2],x0[3])])
def grad_vector1(x0):
    return np.array([derv_x(x0[0],x0[1],x0[2],x0[3]),derv_y(x0[0],x0[1],x0[2],x0[3]),
                         derv_z(x0[0],x0[1],x0[2],x0[3]),derv_w(x0[0],x0[1],x0[2],x0[3])])

In [5]:
grad_vector(x0) # gradient vector at x0 - guess vector

matrix([[-2, 6, 6, 8]], dtype=object)

In [47]:
def non_linear_conjugate_gradient(x0, tol = 1.0e-8, max_iter):
    """
    A function to solve [A]{x} = {b} linear equation system with the 
    conjugate gradient method.
        
    :param x0 : vector
        The starting guess for the solution.
        
    :param max_iter : integer
        Maximum number of iterations. Iteration will stop after max_iter 
        steps even if the specified tolerance has not been achieved.
        
    :param tol : float
        Tolerance to achieve. The algorithm will terminate when either 
        the relative or the absolute residual is below tol.
        
    :var    r0 : vector
                 Initialization stores the gradient of x0 * (-1)
    
    :var    d  : vector
    
    :var    a  : float
                 Iteratively computes the vector of x

    :var    ri : vector
                 Iteratively stores the value norm of x ( or a), used to check for the convergence
    
    :var    x  : vector 
                 Stores the solution for the next iteration iteratively
    
    :sym    lmda  : symbol 
                 Used to compute the lamda value using Goldstein Armijo criteria
                 
    """
    x = np.matrix(x0)
    r0 = grad_vector(x0) * (-1)
    d = r0
    alpha = 0.5

#   Iterations:   
    for i in xrange(max_iter):
        
        #compute the lamda value by line search
        lmda = Symbol('lmda',real = True)
        # function value at x0
        f_x0 = func(x0[0],x0[1],x0[2],x0[3])
        # The Goldstien-Armijo criteria for the lambda selection
        rhs = f_x0 + np.dot(grad_vector(x0),d.T)*(alpha)*(lmda)
        lhs = func(x0[0] + lmda*d.item(0),x0[1] + lmda*d.item(1),x0[2] + lmda*d.item(2),x0[3] + lmda*d.item(3))
        # solver for the lamda value from quadratic inequality
        try:
            lmda_value = max(solve(lhs-rhs,lmda))
        except ValueError: 
            pass
        
        # line search method ends there
        x = x + np.multiply(d,lmda_value)
        a = np.array([x.item(0),x.item(1),x.item(2),x.item(3)])
        print "iteration: ",i, "r(i): ",round(math.sqrt(grad_vector1(a)[0]**2 + grad_vector1(a)[1]**2 + grad_vector1(a)[2]**2  + grad_vector1(a)[3]**2),5)
        if math.sqrt(x.item(0)**2 + x.item(0)**2 + x.item(0)**2  + x.item(0)**2) < tol:
            print "\nConverged Successfully in iterations :",i
            print "The result of vector x:"
            return x
            break
        b = float((grad_vector(a)*grad_vector(a).T)/(grad_vector(x0)*grad_vector(x0).T))
        d = grad_vector(a) * (-1) + d * b
        x0 = a
    return x

**10 Iterations**

In [48]:
non_linear_conjugate_gradient(x0, tol = 1.0e-8, max_iter = 10)

iteration:  0 r(i):  2.27121
iteration:  1 r(i):  0.8224
iteration:  2 r(i):  0.44142
iteration:  3 r(i):  0.18847
iteration:  4 r(i):  0.09744
iteration:  5 r(i):  0.0449
iteration:  6 r(i):  0.02238
iteration:  7 r(i):  0.01097
iteration:  8 r(i):  0.00506
iteration:  9 r(i):  0.00261


matrix([[1.10646976973941, 0.743843588792184, 0.610066169803682,
         0.552504690335135]], dtype=object)

**2 Iterations**

In [50]:
non_linear_conjugate_gradient(x0, tol = 1.0e-8, max_iter = 2)

iteration:  0 r(i):  2.27121
iteration:  1 r(i):  0.8224


matrix([[1.08908941845879, 0.764107786237180, 0.667004995370713,
         0.631226288110453]], dtype=object)

**Comments on Results**:
- No. of Iterations :** 10 iterations**
- Using the initial guess as **[1,1,1,1]** approxiamated value of x is **[1.106, 0.744, 0.61,0.55]**

- After **2 iterations**, the value of **r(i) = 0.8224**,value of x is **[1.089, 0.764, 0.667,0.631]**


**Comments on Method**:
- The evaluation of the non - linear function is done using n = 4.
- The alpha value is chosen to be **0.5**
- The lambda value is calculated using the maximum of the two roots from the quadratic inequality of the Goldstein - Armijo criteria.
- Please note that the lambda value can take any value between the roots, due to the inequality( lambda term to the L.H.S)
