# MS 141 Lecture 3
# Scientific Computing Packages for Python

In [None]:
# embed plots within the notebook
%matplotlib inline
import matplotlib.pyplot as plt

## Introduction

[SciPy](https://scipy.org/) is a Python-based suite of open source software for scientific computing in science and engineering. 
Its core packages include: 
* [NumPy](http://www.numpy.org/), the core Python package for scientific computing. It handles arrays, matrices and functions efficiently.
* [Matplotlib](https://matplotlib.org/), a Python library for plotting, which produces publication quality figures. It can be used both in Python scripts and Jupyter notebooks.
* [SciPy](https://docs.scipy.org/doc/scipy/reference/tutorial/index.html) library for higher-level scientific computing algorithms. It builds on top of the lower-level NumPy for multidimensional arrays, and can carry out tasks such as integration, optimization, interpolation, linear algebra, differential equations, and data analysis.

Other interesting and widely used packages are `SymPy` for symbolic manipulation and `pandas` for data analysis.

# NumPy
The main features of NumPy are N-dimensional arrays and vectorized operations and functions.<br>
NumPy is designed for fast handling of numerical arrays. A NumPy array is described by metadata (dimensions, shape, data type, ...) and the actual data. The data is stored in a homogeneous and contiguous block of memory. This is the main difference between an array and a pure Python structure, such as a list, where the items are scattered across the system memory.<br> 

Contiguous memory storage is the critical feature that makes NumPy arrays so efficient. Memory locality results in significant performance gains as the data is loaded efficiently by the CPU cache and processed with vectorized operations. Additionally, NumPy is linked to highly optimized linear algebra libraries such as BLAS and LAPACK through ATLAS or the Intel Math Kernel Library (MKL). A few specific matrix computations may also be multithreaded, taking advantage of the power of modern multicore processors. This is why it's essential to use NumPy for scientific computing with Python.<br>

To start using NumPy, we import it using the standard name `np`

In [None]:
import numpy as np

## Generating arrays

The basic object in NumPy is the `ndarray`. A one-dimensional (1D) `ndarray` is similar to a vector and a 2D array to a matrix.<br> 
Higher-dimensional arrays, which can be thought of as tensors, can also be generated and manipulated.<br>
The function `numpy.array` creates a NumPy array from a sequence type such as a list or tuple.

In [None]:
x = np.array([0,2,1])
print (x)

A NumPy array is different from a Python list – lists have commas, arrays have whitespace

In [None]:
x = np.array([0,2,1])
print (x)
print (type(x))
print()

x_list = [0,2,1]
print (x_list)
print (type(x_list))

NumPy 1D arrays are more similar to vectors than lists

In [None]:
# try adding together two arrays vs. two lists
a = np.array([1,2,3])  
b = np.array([2,3,4])

a_list = [1,2,3]
b_list = [2,3,4]

c = a + b #adds numpy arrays like vectors
print ('c = ',c)

c_list = a_list + b_list # does not add - it concatenates lists
print ('c_list =', c_list)

We can also create a 2D array (basically, a matrix) or multi-D arrays using nested lists

In [None]:
M = np.array([[1,0,1],[0,1,0]])
print(M)

The most commonly used functions for creating NumPy arrays are `numpy.linspace` and `numpy.arange`.<br>
In general, `numpy.linspace` works best when we know the number of points we want in the array,<br> 
and `numpy.arange` when we know the step size between values in the array. Let's create a few arrays using these two methods.

A 1D NumPy array with 11 equally spaced values from 0 to 10

In [None]:
a = np.linspace(0,10,11) # initial, final, number of values (= num. intervals + 1)
print (a)

A 1D NumPy array with values from 0 to 10 (excluded) with increments of 2.5

In [None]:
a = np.arange(0,10,2.5) #initial, final, increment
print (a)

If we want to include 10 (the right limit), we can use

In [None]:
a = np.arange(0,10.1,2.5) #initial, final, increment
print (a)

In [None]:
# compare the time to add 1 to every element
# of a numpy array vs. python list (use timeit)
a = np.arange(10000)
%timeit a+1

l = range(10000)
%timeit [i+1 for i in l]

Other commonly used functions can create arrays of zeros, ones, random numbers, or the identity matrix.

In [None]:
# 1D array of zeros
z1 = np.zeros(10) # 10 zeros
print (z1,'\n')

# 2D array of zeros
z2 = np.zeros((3,2)) # 3 rows, 2 columns
print (z2)

In [None]:
# 1D array of ones
o1 = np.ones(12)  #12 ones
print (o1,'\n')

# 2D array of ones
o2 = np.ones((3,4)) # 3 rows, 4 columns
print (o2)

In [None]:
# 1D array of random numbers
# from a uniform distribution in [0,1]
ru = np.random.rand(10)
print ('ru =', ru, '\n')

# 1D array of random numbers
# from the standard normal distribution
rn = np.random.randn(10)
print ('rn =', rn, '\n')

# 2D array of random numbers
r2 = np.random.rand(10,10)
print ('r2 =', r2)

In [None]:
# 10 x 10 identity matrix
i = np.eye(10) # np.identity(10) is equivalent
print (i)

## Array attributes 

We can access the attributes of NumPy arrays using the the dot syntax, through which we call methods acting on the array object.

In [None]:
# generate a 1D random array with 8 elements
A = np.random.randn(8)

In [None]:
# print its shape, number of dimensions, and size
print (A.shape)
print (A.ndim)
print (A.size)

In [None]:
# repeat for a 2D array
B = np.random.randn(10,10)

# print its shape, number of dimensions, and size
print (B.shape)
print (B.ndim)
print (B.size)

We can also check the data type the array contains using `dtype`

In [None]:
# an array of integers
M1 = np.array([[1,2,3],[4,5,6]])
print (M1.dtype)

In [None]:
# an array of floats
M2 = np.array([[1.,2,3],[4,5,6]])
print (M2.dtype)

We can specify the data type of the array when generating it:

In [None]:
# generate an array of complex numbers
c = np.array([[1,2],[3,4]],dtype=np.complex)
print (c,'\n')
print (c.dtype)

Most NumPy functions create by default arrays of float64 (double-precisions floats) datatype:

In [None]:
a = np.linspace(0,10,101)
print (a.dtype)

## Manipulating arrays

Slicing and indexing NumPy arrays is simple

In [None]:
A = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])  # 3 x 4 matrix
print (A)

In [None]:
# Access the entry at row index 1 and column index 2 
A[1,2] # I will omit the print() statement several times from now on

In [None]:
# Access the row at index 2
A[2,:] # equivalent to A[2, 0:4]

In [None]:
# Access the column at index 1
A[0:1,1]

Arithmetic operations are performed element wise on NumPy arrays

In [None]:
A * 2

In [None]:
A / 2

In [None]:
A ** 2  # does not mean matrix product A^2 

In [None]:
# _like generates an array of ones of the same size as A
print (np.ones_like(A)) 
print ('')
print (A + np.ones_like(A))

1D arrays behave exactly the same way

In [None]:
a = np.array([1,2,3,4,5])
print (a,'\n')
print (a[1],'\n')
print (a[2:4])

In [None]:
print (a*2)
print (a**2)
print (a/2)

We can stack arrays vertically or horizontally to make larger arrays

In [None]:
a = np.array([1,2,3])
b = np.array([4,5,6])
c = np.array([7,8,9])

# stacking vertically
v = np.row_stack((a,b,c)) # np.vstack has the same effect
print (v)

In [None]:
# stacking horizontally (by column)
h = np.column_stack((a,b,c))
print (h)

We can *reshape* arrays (change their shape) using a [row-major or column-major](https://en.wikipedia.org/wiki/Row-_and_column-major_order) ordering. You can think of reshaping as first raveling the array (using the given index order) and then inserting the elements from the raveled array into the new array using the same kind of index ordering as was used for the raveling.

In [None]:
M = np.array([[1,1,1,1],[2,2,2,2]])
print (M)

In [None]:
# reshape the array
np.reshape(M, (4, 2)) # C-like row-major index ordering

In [None]:
np.reshape(M,(4,2),'F') # Fortran-like, column major index ordering

We can also *flatten* arrays, that is, reduce multi-dimensional arrays to 1D arrays

In [None]:
print (M.flatten()) # np.ravel(M) is equivalent; can again use 'F' for column-major

Arrays can be copied using `copy` or one can also create a `view` of the array. The view is part of the original array, and it shares the memory address with the original array. The copy (also known as "deep copy") is a different array altogether, occupying its own memory addresses.<br> 
It takes longer to generate than the view. Modifying a view modifies the original array, whereas modifying a copy does not modify the base array. For example, slices are views, not copies in Python; np.flatten gives a copy, while np.ravel gives a view.<br>

A good article explaining views vs. copies can be found [here](https://www.jessicayung.com/numpy-views-vs-copies-avoiding-costly-mistakes/). The takeaway message is that whenever you want to edit a copy of the data but not the original, use np.copy(). This is the safest way to ensure you actually make a copy. Otherwise a view is fine and saves time and memory.

In [None]:
# Creating a copy of an array:
a = np.arange(15)
b = np.copy(a) # an actual copy
print (b is a)

In [None]:
# Creating a view of an array
a = np.arange(15)
b = a.view() # share the same data
print (b is a)

b[2] = 7 # assign 7 to the element at index 2 of b
print (a) # the element at index 2 in a also changes

Lastly, we can compute the sum, min / max, mean, standard devivation, etc. of the values in a NumPy array:

In [None]:
a = np.array([[1,2,3],[4,5,6]])
print (a)

print ('sum =', np.sum(a)) # a.sum() is the same
print ('min =', np.min(a))
print ('max =', np.max(a))
print ('mean =', np.mean(a))
print ('std =', np.std(a))

Another useful concept, which you may want to explore on your own, is [creating masks](https://docs.scipy.org/doc/numpy-1.15.0/reference/maskedarray.generic.html) on arrays using NumPy.

## Matrix operations

NumPy can compute products of vectors or matrices following the usual rules of linear algebra.<br>
We will only discuss a few example, but refer the student to [numpy.linalg](https://docs.scipy.org/doc/numpy/reference/routines.linalg.html) for additional information.<br>
The `@` operator can be used to carry out matrix multiplication with NumPy arrays

In [None]:
M = np.array([[0,1],[1,0]])
print (M)

In [None]:
# matrix product
M @ M 

The alternate function to multiply two matrices A and B is matmul(A,B)

In [None]:
np.matmul(M,M)

In [None]:
# this is different from M * M, which multiplies elementwise
print (M*M)
print ('')
print (2*M)

We can also compute the transpose and trace (for the inverse and determinant, see SciPy)

In [None]:
A = np.array([[1,2,3],[0,0,0],[3,2,1]])
print (A)

In [None]:
# transpose
print (A.T)

In [None]:
# trace = sum of diagonal elements
np.trace(A)

Similar operations are available for 1D arrays (vectors)

In [None]:
a = np.array([1,2,2])
b = np.array([0,1,0])

In [None]:
# dot product a.b
np.dot(a,b)

In [None]:
# norm of a vector
np.linalg.norm(a)

It is important to take advantage of NumPy's fast matrix / vector operations when developing algorithms.

## Functions

All commonly used mathematical functions are available in NumPy. 

In [None]:
np.cos(0)

In [None]:
np.exp(1)

In [None]:
np.log(2)

The constants $\pi$ and $e$ are also available:

In [None]:
np.pi

In [None]:
np.e

Since NumPy operations act on the arrays elementwise, this gives a convenient tool to **vectorize functions**.<br>
Mathematical functions are applied elementwise on arrays:

In [None]:
x = np.array([0,np.pi/2,np.pi]) # x_n an array of length 3
y = np.cos(x) # an array of length 3, cos(x_n)
print(y)

## File I/O

NumPy has a function called [`savetxt`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html) to write to file and [`loadtxt`](https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) to read from file.<br> 
Both methods write and read data columnwise, which is very useful in scientific computing.

In [None]:
import numpy as np

# files for storing the data
file1 = 'sine.dat'
file2 = 'cosine.dat'

# number of points, including the end points
npts = 101 

t1 = np.linspace(0,2*np.pi,npts)  # divide the (0,2pi) into 100 intervals
y1 = np.sin(t1)
y2 = np.cos(t1)

# write data into two columns, format data as floating point values
np.savetxt(file1, np.transpose([t1,y1]), fmt='%9.6f') 
np.savetxt(file2, np.transpose([t1,y2]), fmt='%9.6f')

# you can check these two files in the directory in which you are running this notebook

In [None]:
# Read the files we wrote in the previous cell and plot the two functions (more on this next time)
import numpy as np
import matplotlib.pyplot as plt

# files we want to read
file1 = 'sine.dat'
file2 = 'cosine.dat'

# read data from files
x1, y1 = np.loadtxt(fname=file1, usecols=(0,1), dtype=np.float64, unpack=True)
x2, y2 = np.loadtxt(fname=file2, usecols=(0,1), dtype=np.float64, unpack=True)

# an alternative to the above is:
# a1 = np.loadtxt(file1) #the file split into columns
# x1 = a1[:,0]  # the first column
# y1 = a1[:,1]  # the second column
#
# a2 = np.loadtxt(file2)
# x2 = a2[:,0]
# y2 = a2[:,1]

# plot data (more on this later)
plt.plot(x1,y1,'r+-',label='sine')
plt.plot(x2,y2,'bx-',label='cosine')

plt.legend()
plt.show()