<a id='back_to_top'></a>

<img src='img/_logo.JPG' alt='Drawing' style='width:2000px;'/>

# <font color=blue>3. Libraries</font>
## <font color=blue>3.2. SciPy</font>
| | |
|-|-|
| | |
| <img src='https://www.nixp.ru/uploads/news/fullsize_image/8eef272e9391957a9f14b9a9974e51b8333d6e41.png' alt='Drawing' style='height:100px;'/> |
| | | |
The `scipy` framework builds on top of the low-level `numpy` framework for multidimensional arrays, and provides a large number of higher-level scientific algorithms. Some of the topics that `scipy` covers are:

- [Integration](http://docs.scipy.org/doc/scipy/reference/integrate.html)
- [Interpolation](http://docs.scipy.org/doc/scipy/reference/interpolate.html)
- [Optimization](http://docs.scipy.org/doc/scipy/reference/optimize.html)
- [Statistics](http://docs.scipy.org/doc/scipy/reference/stats.html)

Each of these submodules provides a number of functions and classes that can be used to solve problems in their respective topics. In this notebook only a limited number of these subpackages will be covered.

To use `scipy` you need to import the module, using for example:

In [None]:
import scipy

<font color=red><div style="text-align: right"> **Documentation for**  
[**`scipy`**](https://docs.scipy.org/doc/scipy/reference/)</div></font>

### <font color=blue>3.2.1. Integration</font>
#### <font color=blue>3.2.1.1. Numerical integration: quadrature</font> 
Numerical evaluation of a function of the type $\displaystyle \int_a^b f(x) dx$
 is called *numerical quadrature*, or simply *quadature*. `scipy` provides a series of functions for different kind of quadrature, for example the `quad`, `dblquad` and `tplquad` for single, double and triple integrals, respectively. Focusing on the `quad` function:

In [None]:
from scipy.integrate import quad

The `quad` function takes a large number of optional arguments, which can be used to fine-tune the behaviour of the function. The basic usage is as follows:

In [None]:
# define a simple function for the integrand
def f(x):
    return x**2

x_lower = 2 # the lower limit of x
x_upper = 6 # the upper limit of x
val = quad(f, x_lower, x_upper)
print('Integral value = ', val)

# This plots both the function and the integral for you to see
# This plotting framework will be discussed in detail in the next module
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0, 10 + 0.1, 0.1)
y = f(x)
plt.plot(x, 
         y, 
         'b', 
         label = 'Continuous function')
plt.xlim(0, 10)
plt.ylim(0, 100)
x_int = x[np.where((x_lower <= x) * (x <= x_upper))]
plt.fill_between(x_int, 
                 f(x_int), 
                 color = 'r', 
                 label = 'Integral')
plt.legend(frameon = False)
plt.grid()
plt.show()

<font color=red><div style="text-align: right"> **Documentation for**  
[**`scipy.integrate.quad`**](https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.integrate.quad.html)</div></font>

#### <font color=blue>3.2.1.2. Trapezoidal-rule integration</font>  

For cases in which the function is discretely defined, `scipy` offers at least one alternative with the `trapz` function:

In [None]:
from scipy.integrate import trapz

This method integrates the discrete function along the given axis using the composite trapezoidal rule. In its syntax, the method takes an array of y-axis locations, and, optionally, an array of the corresponding x-axis locations (if not provided, spacing between points is assumed as 1).

Example:

In [None]:
import numpy as np
x = np.arange(0, 5 + 0.5, 0.5)
y = [8, 18, 9, 10, 13, 7, 7, 15, 6, 6, 5]

x_lower = 1 # the lower limit of x
x_upper = 4 # the upper limit of x
x_int = x[np.where((x_lower <= x) * (x <= x_upper))]
y_int = np.array(y)[np.where((x_lower <= x) * (x <= x_upper))]
val = trapz(y_int, x_int)
print('Integral value = ', val)

# This plots both the function and the integral for you to see
# This plotting framework will be discussed in detail in the next module
import matplotlib.pyplot as plt
plt.plot(x, 
         y, 
         color = 'b', 
         ls = '--',
         marker = 'o',  
         label = 'Discrete function')
plt.xlim(0, 5)
plt.ylim(0, 20)
plt.fill_between(x_int, 
                 y_int, 
                 color = 'r',
                 label = 'Integral')
plt.legend(frameon = False)
plt.grid()
plt.show()

<font color=red><div style="text-align: right"> **Documentation for**  
[**`scipy.integrate.trapz`**](https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.integrate.trapz.html)</div></font>

### <font color=blue>3.2.2. Interpolation</font> 
Interpolation is simple and convenient in `scipy` (sometimes refered to as splines), and it's packed within the `scipy.interpolate` module:

In [None]:
from scipy.interpolate import *

A number of interpolation schemes are available, ranging in levels of complexity (e.g. linear, cubic, Akima), all functions that, when given arrays describing X and Y data, return an object that behaves like a function that can be called for an arbitrary value of x (in the range covered by X), and it returns the corresponding interpolated y value. Note that the X array, used to compute a virtual function for y = f(x), must be provided in an ascending order of values, otherwise an error will be raised.

Example:

In [None]:
import numpy as np
x = np.linspace(-1, 1, 9)
y = np.array([0, 0.5, 0.7, 3, 1, 0.9, 0.75, 0.5, 0])

linear_interpolation = interp1d(x, y)
cubic_interpolation = interp1d(x, y, kind = 'cubic')
akima_interpolation = Akima1DInterpolator(x, y)
x_int = -0.9
val_li = linear_interpolation(x_int)
val_cubic = cubic_interpolation(x_int)
val_ak = akima_interpolation(x_int)
print('Value at x = -0.9 for linear spline: ',  val_li, '(basic interpolation, limited at times)')
print('Value at x = -0.9 for cubic spline: ',  val_cubic, '(notice how it does not compute well with data outliers)')
print('Value at x = -0.9 for Akima spline: ',  val_ak, '(very good interpolation scheme, even with data outliers)')

# This plots both the original data and the interpolation functions for you to see
# This plotting framework will be discussed in detail in the next module
import matplotlib.pyplot as plt
plt.scatter(x, 
            y, 
            color = 'b',
            marker = 'D',
            label = 'Original',)
x_plot_int = np.linspace(-1, 1, 100)
y_plot_int_li = linear_interpolation(x_plot_int)
plt.plot(x_plot_int, 
         y_plot_int_li, 
         color = 'k',
         label = 'Linear',)
y_plot_int_cubic = cubic_interpolation(x_plot_int)
plt.plot(x_plot_int, 
         y_plot_int_cubic, 
         color = 'g',
         label = 'Cubic',)
y_plot_int_ak = akima_interpolation(x_plot_int)
plt.plot(x_plot_int, 
         y_plot_int_ak, 
         color = 'r',
         label = 'Akima',)
plt.xlim(-1.5, 1.5)
plt.ylim(-0.5, 3.5)
plt.legend(frameon = False)
plt.grid()
plt.show()

Notice that the interpolation functions were defined between a range of x between -1 and 1. Trying to compute the value of y for a given x outside this range will raise an error (or result in a `NaN` - Not a Number - value in the case of the Akima interpolator):

In [None]:
akima_interpolation(10)

In [None]:
linear_interpolation(10)

<font color=red><div style="text-align: right"> **Documentation for**  
[**`scipy.integrate.interp1d`**](https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.interpolate.interp1d.html)  
[**`scipy.interpolate.Akima1DInterpolator`**](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.interpolate.Akima1DInterpolator.html)</div></font>

### <font color=blue>3.2.3. Optimization</font>  
Optimization in `scipy` is packed within the `scipy.optimize` module:

In [None]:
from scipy.optimize import *

Several functionalities for optimization procedures are available in `scipy.optimize`. They cover a wide range of applications (e.g. unconstrained and constrained minimization of multivariate scalar functions, brute-force optimization schemes, least-squares minimization and curve fitting).
#### <font color=blue>3.2.3.1. Minimization</font>   
The `minimize` module entails the minimization of scalar functions of one or more variables. It takes severeal arguments, one of which relates to the type of solver you want to use. In general, the optimization problems are of the form:

$\displaystyle minimize\{f(x)\},\ where\ g_i(x)\leq 0\ and\ h_j(x)= 0$.

It can be applied to simple problems, such has finding the minimum of a well-defined mathematical function.

Example:

Function: $\displaystyle f(x) = (x_1 - 1)^2 + (x_2 - 2.5)^2$

In [None]:
fun = lambda x: (x[0] - 1)**2 + (x[1] - 2.5)**2

The function looks like this:

In [None]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import numpy as np
fig = plt.figure(figsize = (4, 4), dpi = 100)
ax = Axes3D(fig)
X = np.linspace(-6, 6, 100)
Y = np.linspace(-6, 6, 100)
X, Y = np.meshgrid(X, Y)
Z = fun(x = [X, Y])
ax.plot_surface(X, Y, Z, color = 'r', antialiased = False)
ax.set_xlim(-8, 8)
ax.set_xticks(np.arange(-8, 8 + 2, 2))
ax.set_ylim(-8, 8)
ax.set_yticks(np.arange(-8, 8 + 2, 2))
ax.set_zlim(0, 150)
ax.set_zticks(np.arange(0, 150 + 25, 25))
ax.set_xlabel(r'$\mathrm{x_1}$')
ax.set_ylabel(r'$\mathrm{x_2}$')
ax.set_zlabel(r'$\mathrm{f(x)}$')
plt.show()

To apply `minimize`, an initial guess is needed, which then allows for both the minimum and the variable values that lead to the minimum to be calculated, need to be computed.

In [None]:
fun = lambda x: (x[0] - 1)**2 + (x[1] - 2.5)**2  # The function to minimize
initial_guess = [0, 0]                           # x = [x[0], x[1]] = [x1, x2]
res = minimize(fun, x0 = initial_guess)          # This runs the optimization, taking (mandatory) an initial guess
x1_x2_minimum = res['x']                         # These are the obtained values of x1 and x2 for the minimum
f_minimum = res['fun']                           # These is the obtained f(x1, x2) at the minimum
print(x1_x2_minimum, f_minimum)

The location of the minimum looks like this:

In [None]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import numpy as np
fig = plt.figure(figsize = (4, 4), dpi = 100)
ax = Axes3D(fig)
X = np.linspace(-6, 6, 100)
Y = np.linspace(-6, 6, 100)
X, Y = np.meshgrid(X, Y)
Z = fun(x = [X, Y])
ax.plot_surface(X, Y, Z, color = 'r', antialiased = False)
ax.set_xlim(-8, 8)
ax.set_xticks(np.arange(-8, 8 + 2, 2))
ax.set_ylim(-8, 8)
ax.set_yticks(np.arange(-8, 8 + 2, 2))
ax.set_zlim(0, 150)
ax.set_zticks(np.arange(0, 150 + 25, 25))
ax.set_xlabel(r'$\mathrm{x_1}$')
ax.set_ylabel(r'$\mathrm{x_2}$')
ax.set_zlabel(r'$\mathrm{f(x)}$')
ax.plot([x1_x2_minimum[0], x1_x2_minimum[0]], [-8, 8], [0, 0], color = 'b', alpha = 0.25)
ax.plot([-8, 8], [x1_x2_minimum[1], x1_x2_minimum[1]], [0, 0], color = 'b', alpha = 0.25)
ax.plot([x1_x2_minimum[0], x1_x2_minimum[0]], [x1_x2_minimum[1], x1_x2_minimum[1]], [0, 150], color = 'b', alpha = 0.25)
plt.show()

Constraints can also be considered with `minimize`, by passing the `constraints` argument, which modifies the objective function itself (i.e. no longer os the global minimum found, but the minimum than obeys the constraints provided). When providing the constraints, the type ('eq' for equality, 'ineq' for inequality) must be provided. Also, the variables themselves can also be bounded (acts as a constraint, but can be used directly with the `bonds` argument of `minimize`. 

Example:

Function: $\displaystyle f(x) = (x_1 - 1)^2 + (x_2 - 2.5)^2$

Constraints: $\displaystyle x_1 - 2*x_2 + 2 \geq 0$ and $\displaystyle x_1 - 2*x_2 + 6 \geq 0$ and $\displaystyle -x_1 + 2*x_2 + 2 \geq 0$

Bounds: $\displaystyle x_1 \geq 0$ and $\displaystyle x_2 \geq 0$

In [None]:
import numpy as np
pos_infinite = np.inf
fun = lambda x: (x[0] - 1)**2 + (x[1] - 2.5)**2
initial_guess = [0, 0]
cons = ({'type': 'ineq', 'fun': lambda x:  x[0] - 2 * x[1] + 2},
        {'type': 'ineq', 'fun': lambda x: -x[0] - 2 * x[1] + 6},
        {'type': 'ineq', 'fun': lambda x: -x[0] + 2 * x[1] + 2})
bnds = ((0, pos_infinite), (0, pos_infinite))
res = minimize(fun, x0 = initial_guess, bounds = bnds, constraints = cons)
x1_x2_minimum = res['x']
f_minimum = res['fun']
print(x1_x2_minimum, f_minimum)

The location of the new minimum looks like this:

In [None]:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from matplotlib import cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import numpy as np
fig = plt.figure(figsize = (4, 4), dpi = 100)
ax = Axes3D(fig)
X = np.linspace(-6, 6, 100)
Y = np.linspace(-6, 6, 100)
X, Y = np.meshgrid(X, Y)
Z = fun(x = [X, Y])
ax.plot_surface(X, Y, Z, color = 'r', antialiased = False)
ax.set_xlim(-8, 8)
ax.set_xticks(np.arange(-8, 8 + 2, 2))
ax.set_ylim(-8, 8)
ax.set_yticks(np.arange(-8, 8 + 2, 2))
ax.set_zlim(0, 150)
ax.set_zticks(np.arange(0, 150 + 25, 25))
ax.set_xlabel(r'$\mathrm{x_1}$')
ax.set_ylabel(r'$\mathrm{x_2}$')
ax.set_zlabel(r'$\mathrm{f(x)}$')
ax.plot([x1_x2_minimum[0], x1_x2_minimum[0]], [-8, 8], [0, 0], color = 'b', alpha = 0.25)
ax.plot([-8, 8], [x1_x2_minimum[1], x1_x2_minimum[1]], [0, 0], color = 'b', alpha = 0.25)
ax.plot([x1_x2_minimum[0], x1_x2_minimum[0]], [x1_x2_minimum[1], x1_x2_minimum[1]], [0, 150], color = 'b', alpha = 0.25)
plt.show()

The `minimize` also allows for the definition of the optimization scheme (solver) intended by the user, from within a set of implemented methods (if the user defines nothing, the default one is used).

<font color=red><div style="text-align: right"> **Documentation for**  
[**`scipy.optimize.minimize`**](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#scipy.optimize.minimize)  
</div></font>

#### <font color=blue>3.2.3.2. Brute force and particle swarm optimization (some general notes)</font> 
For optimization problems with functions that deviate from a closed-form function definition (e.g. complex structural design problems), the `minimize` approach might not suffice. In these cases, a `brute` function can be used instead, provided that dimension of the searchspace involved is reasonable, this being up to the criterion of the user. Also, in cases in which the variables are intended to be within a specific range of possibilities (e.g. one variable can only be an integer number between 1 and 10), `brute` might come in handy.

By definition, brute-force search is a very general problem-solving technique that consists of systematically checking every possible scenario and choosing the best place to go. Hence, trying to tackle, for example, optimization of 40 variables, with each of them possibly varying between any possible number, would be highly computationally demanding (and also extremely inefficient). This is because the searchspace might be close to an infinite grid. For such cases, using particle swarm optimization is the way to go: thousands of possible solutions are generated, and the algorithm takes care of finding the optimum one, without actually having to test every single scenario. Here is a representation example of this optimum value search process with particle swarm algorithms:
<img src='https://i.makeagif.com/media/11-29-2015/3fnpZV.gif' alt='Drawing' style='width:200px;'/>
If you're tackling a problem that fits this close-to-infinite searchspace, have a look at the super powerful package
[`SwarmPackagePy`](https://github.com/SISDevelop/SwarmPackagePy).

<font color=red><div style="text-align: right"> **Documentation for**  
[**`scipy.optimize.brute`**](https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brute.html#scipy.optimize.brute)  
</div></font>

#### <font color=blue>3.2.3.3. Curve fitting</font>
`scipy` also allows for fitting of functions to data, with the `curve_fit` function. It uses a non-linear least squares approach to fit a function to the data, returning the optimal values for the parameters so that the sum of the squared residuals is minimized.

Example:

Let's try to fit a linear function to this data:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import random
data_x = np.linspace(2, 8, 100)
nums = np.array([x for x in range(100)])
random.shuffle(nums)
data_y = [5*data_x[idx] + 0.15*nums[idx] for idx in range(len(data_x))]
plt.scatter(data_x, data_y, color = 'r', alpha = 0.5, marker = '.')
plt.xlim(0, 10)
plt.ylim(0, 60)
plt.xlabel('x', fontsize = 14)
plt.ylabel('y', fontsize = 14)
plt.grid()
plt.show()

In [None]:
# First, we create the general function (in this case, a linear equation)
lin_func = lambda x, a, b : a*x + b

# The data
data_x = np.linspace(2, 8, 100)
nums = np.array([x for x in range(100)])
random.shuffle(nums)
data_y = np.array([5*data_x[idx] + 0.15*nums[idx] for idx in range(len(data_x))])

# Now we run the curve fitting procedure
popt, pcov = curve_fit(lin_func, data_x, data_y)

# The R2 parameter needs to be calculated manually
residuals = data_y - lin_func(data_x, popt[0], popt[1])
ss_res = np.sum(residuals**2)
ss_tot = np.sum((data_y - np.mean(data_y))**2)
r2 = 1 - (ss_res/ss_tot)

# Plot again to see what the fitted curve looks like
import matplotlib.pyplot as plt
import numpy as np
import random
plt.scatter(data_x, data_y, color = 'r', alpha = 0.5, marker = '.')
fit_x = [min(data_x)-2, max(data_x)+2]
fit_y = [lin_func(x, popt[0], popt[1]) for x in fit_x]
plt.plot(fit_x, fit_y, color = 'b', label = r'$\mathrm{y = %.2fx + %.2f\ (R^2 = %.0f\%%)}$' % (popt[0], popt[1],r2*100))
plt.xlim(0, 10)
plt.ylim(0, 60)
plt.xlabel('x', fontsize = 14)
plt.ylabel('y', fontsize = 14)
plt.grid()
plt.legend(frameon = False, loc = 'upper left', fontsize = 12)
plt.show()

[Back to top](#back_to_top)