# NumPy
#### *Basic Features and Advantages over Python Lists*


## Introduction

NumPy is the fundamental library for scientific computing in the Python Programming Language, which provides efficient operations on multi-dimensional arrays of homogeneous data. NumPy was created by Travis Oliphant by incorporating features of two (now deprecated) packages - *Numeric* and *Numarray* in 2005. Python was not originally designed for numeric computing, but it attracted the attention of the scientific and numeric community early. This led to the implementation of *Numeric* in around 1995. As a replacement for *Numeric*, *Numarray* was written. It was faster than Numeric for large arrays, however slower for small arrays. *NumPy* was created to have a single package features of both *Numeric* and *Numarray*. 

This spotlight is divided into the following sections: 
1. **Basic features of NumPy** (built-in functions) and 
2. A discussion on the **advantages of Numpy over Python's built-in list data structure**

## 1. Basic Features of NumPy

Following are some of the basic features provided by NumPy:

    i. a powerful N-dimensional array object
    ii. sophisticated (broadcasting) functions
    iii.. tools for integrating C,C++ and Fortran code
    iv. useful linear algebra, Fourier transform, and random number capabilities, etc.

### 1 (i) : The N-dimensional array (ndarray)

An ndarray is a multidimensional container of items of same type and size. The number of dimensions and items in an array are defined by its shape. The contents of an ndarray can be accessed and modified by indexing or slicing the array and using the methods and attributes of the ndarray.

An ndarray object has many methods which operate on or with the array in some fashion, typically returning an array result. 

#### 1 (i) a : Array Conversion

Here, I have explained two methods in the ndarray object which aid in array conversion: 

    -> tolist(): this method takes a n-dimensional array and returns a nested list with each row as a list
    -> fill(value): this method replaces all the elemnts of the array with the scalar-value.
    
You may find the examples below:

In [26]:
import numpy as np
# tolist()
a = np.array([[1,2],[3,4],[5,6]]) # a 3 dimensional array
print('input')
print(a)
print('tolist')
print(a.tolist()) # nested list continaing 3 lists

print()

#fill (value)
a = np.array([[1,2],[3,4],[5,6]])
print('input')
print(a)
a.fill(1) # replaces all the elements with 1
print('fill')
print(a)

input
[[1 2]
 [3 4]
 [5 6]]
tolist
[[1, 2], [3, 4], [5, 6]]

input
[[1 2]
 [3 4]
 [5 6]]
fill
[[1 1]
 [1 1]
 [1 1]]


#### 1 (i) b : Shape Manipulation

For reshape, resize, and transpose, the single tuple argument may be replaced with n integers which will be interpreted as an n-tuple.:

    -> reshape(shape[, order]): this method returns an array containing the same data as the input array with a new shape
    -> transpose(*axes): this method returns a view of the input array with axes transposed.

In [46]:
# reshape(shape[, order])
a = np.array([[1,2],[3,4],[5,6]])
print('input')
print(a)
a.reshape(2,3) #reshaping 3x2 array to 2x3
print('reshape')
print(a)

print()

# transpose(*axes)
a = np.array([[1,2],[3,4],[5,6]])
print('input')
print(a)
print('transpose')
print(a.transpose(1,0))

input
[[1 2]
 [3 4]
 [5 6]]
reshape
[[1 2]
 [3 4]
 [5 6]]

input
[[1 2]
 [3 4]
 [5 6]]
transpose
[[1 3 5]
 [2 4 6]]


#### 1 (i) c : Calculation

The ndarray object provides several methods for faster calculation of array elements, such as *matmul* for matrix multiplication. It returns the product of two matrices. 

In [28]:
# matmul
a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])
np.matmul(a,b) # product of a and b

array([[19, 22],
       [43, 50]])

I have mentioned very few methods here of the ndarray object. You may find a detailed list at https://docs.scipy.org/doc/numpy/reference/arrays.ndarray.html

### 1 (ii) : Sophisticated Broadcasting Functions

The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations. Subject to certain constraints, the smaller array is “broadcast” across the larger array so that they have compatible shapes. Broadcasting provides a means of vectorizing array operations so that looping occurs in C instead of Python. It does this without making needless copies of data and usually leads to efficient algorithm implementations. There are, however, cases where broadcasting is a bad idea because it leads to inefficient use of memory that slows computation.

NumPy operations are usually done on pairs of arrays on an element-by-element basis. You would find two cases below:

    i.The two arrays have exactly the same shape
    ii. Array and a scalar are combined

In [86]:
# Case I
a = np.array([[1,2],[3,4],[5,6]])
b = np.array([[7,8],[9,10],[11,12]])
print('Case I (arrays of same size):')
print(a*b)

print()

# Case II
a = np.array([[1,2],[3,4],[5,6]])
print('Case II (array and scalar combined):')
print(a*2.0)

Case I (arrays of same size):
[[ 7 16]
 [27 40]
 [55 72]]

Case II (array and scalar combined):
[[ 2.  4.]
 [ 6.  8.]
 [10. 12.]]


**General Broadcasting Rules:** When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing dimensions, and works its way forward. Two dimensions are compatible when

    i. they are equal, or
    ii. one of them is 1

If these conditions are not met, a *ValueError: frames are not aligned* exception is thrown, indicating that the arrays have incompatible shapes. The size of the resulting array is the maximum size along each dimension of the input arrays.

Arrays do not need to have the same number of dimensions. For example, if we have a 256x256x3 array of RGB values, and we want to scale each color in the image by a different value, we can multiply the image by a one-dimensional array with 3 values.

When either of the dimensions compared is one, the other is used. In other words, dimensions with size 1 are stretched or “copied” to match the other.

An example of broadcasting in practice:

In [34]:
x = np.arange(4)
xx = x.reshape(4,1)
y = np.ones(5)
z = np.ones((3,4))

x.shape # returns (4,)
y.shape # returns (5,)
#x + y  returns <type 'exceptions.ValueError'>: shape mismatch: objects cannot be broadcast to a single shape

xx.shape # returns (4,1)
y.shape # returns (5,)
print((xx+y).shape) # returns (4,5)
print(xx + y) # does not return error

print()

x.shape # returns (4,)
z.shape # returns (3,4)
print((x+z).shape) # returns (3,4)
print(x + z) # does not return error

(4, 5)
[[1. 1. 1. 1. 1.]
 [2. 2. 2. 2. 2.]
 [3. 3. 3. 3. 3.]
 [4. 4. 4. 4. 4.]]

(3, 4)
[[1. 2. 3. 4.]
 [1. 2. 3. 4.]
 [1. 2. 3. 4.]]


### 1 (iii) Tools for integrating C/C++ and Fortran Code

The numpy package provides extension modules like f2py, cython, ctypes, for integrating Fortran, C++ and C codes in Python. For more details: https://docs.scipy.org/doc/numpy/reference/routines.fft.html

### 1 (iv) Useful linear algebra, Fourier transform, and random number capabilities

The **linalg** library provided by the NumPy package offers fast computation of several linear algebra quantities like:

    i. Calculating Magnitude of a Vector
    ii. Determining the inverse of a Matrix
    iii. Determining the Eigen values of a Matrix, etc.

In [111]:
# calculating magnitude of vectors
import math
v = np.array([4,6])
vMag = math.sqrt(v[0]**2 + v[1]**2)
print ('Math',vMag)

vMag = np.linalg.norm(v)
print ('NumPy',vMag)

print()

# determining the inverse of a matrix
B = np.array([[6,5],
              [1,10]])
print('inverse')
print(np.linalg.inv(B))

print()

# determining the eigen value and matrix comprising of eigen vectors
A = np.array([[2,1],
              [5,3]])
eVals, eVecs = np.linalg.eig(A)
print('eigen values',eVals)
print('matrix comprising of eigen vectors')
print(eVecs)

Math 7.211102550927978
NumPy 7.211102550927978

inverse
[[ 0.18181818 -0.09090909]
 [-0.01818182  0.10909091]]

eigen values [0.20871215 4.79128785]
matrix comprising of eigen vectors
[[-0.48744474 -0.33726692]
 [ 0.87315384 -0.94140906]]


The **fft** fuction in NumPy can be used to carry out the Dicrete Fast Fourier Transformation without any computation. For more details: https://docs.scipy.org/doc/numpy/reference/routines.fft.html

## 2. Advantages of NumPy over Python Lists

NumPy has the following advantages over Python's built-in list data structure:

    i. Efficient Memory Utilization
    ii. Faster Computations
    iii. Convinient to use
    iv. Element wise operation
    v. Better slicing functionality
        
 Let us go through them in detail. 

### 2 (i). Efficient Memoty Utilization

NumPy arrays consume significantly less amount of memory as compared to lists. In addition to that, it provides a mechanism for specifying the data types of the elements, which allows further optimisation of the code. 

For e.g., let us consider the following case where we can observe the difference in the memory consumed by a list and an NumPy array, each consisting of six elements.

In [109]:
import sys

py_list = [1,2,3,4,5,6]
np_array = np.array([1,2,3,4,5,6])

size_py_list = sys.getsizeof(1) * len(py_list) # memory space utilized by Python List
size_np_array = np_array.itemsize * np_array.size # memory space utilized by NumPy array

print(size_py_list) 
print(size_np_array)

168
24


The efficient memory utilization is not restricted to integers, the same can be oberved while storing floating point numbers and string literals.

In [64]:
py_list = [1.1,2.2,3.3,4.4]
np_array = np.array([1.1,2.2,3.3,4.4])

size_py_list = sys.getsizeof(1.1) * len(py_list)
size_np_array = np_array.itemsize * np_array.size

print('Floating pont list memory consumed',size_py_list)
print('Floating point numpy array memory consumed',size_np_array)

py_string_list = ['CSCE', '670']
np_string = np.array(['CSCE','670'])

size_py_list = sys.getsizeof('CSCE') * len(py_list)
size_np_string = np_string.itemsize * np_string.size

print('String list memory consumed',size_py_list)
print('String numpy array memory consumed',size_np_string)

Floating pont list memory consumed 96
Floating point numpy array memory consumed 32
String list memory consumed 252
String numpy array memory consumed 32


**Therefore, NumPy arrays consume much less memory than Lists.**

### 2 (ii) Faster Computation

The computational time required for operations on NumPy arrays is much faster than that required on Python Lists. We would consider two examples below to prove the point. In the first case, we shall look upon the time required for the creation of arrays (lists and numpy) and in the second example, we shall consider the sum operation over all the elemnets of a list and an NumPy array   

In [83]:
import time

Size = 10000000
start = time.time()
list1 = list(range(Size))
print('Time required to create a Python List of size 10000000 is',(time.time() - start)*1000,'milliseconds')

start = time.time()
arr1 = np.arange(Size)
print('Time required to create a NumPy array of size 10000000 is',(time.time() - start)*1000,'milliseconds')

Time required to create a Python List of size 10000000 is 190.95945358276367 milliseconds
Time required to create a NumPy array of size 10000000 is 18.0814266204834 milliseconds


In [95]:
Size = 10000000 
list1 = list(range(Size)) # creating a list of size 10000000
list2 = list(range(Size)) # creating a list of size 10000000
start = time.time()
list3 = list1 + list2
print('Time required to add two Python Lists of sizes 10000000 is',(time.time() - start)*1000,'milliseconds')

arr1 = np.arange(Size) # creating an NumPy array of size 10000000
arr2 = np.arange(Size) # creating an NumPy array of size 10000000
start = time.time() 
arr3 = arr1 + arr2 
print('Time required to add two NumPy Arrays of sizes 10000000 is',(time.time() - start)*1000,'milliseconds')

Time required to add two Python Lists of sizes 10000000 is 342.52452850341797 milliseconds
Time required to add two NumPy Arrays of sizes 10000000 is 19.79660987854004 milliseconds


Therefore, computation on NumPy arrays are much faster than on Lists.

### 2 (iii) Convinient to use

NumPy arrays are more convinient to use given the numerous methods offered by the ndarray object. Let us consider the following examples to illustrate our case.

In [94]:
arr = np.array([[1,2],[3,4],[5,6]]) # 3 x 2 matrix

sum_column_1 = arr.sum(axis = 0) # sum of elemnts along the first column
print("Dimensions of the array: ", arr.ndim)
print("Size occupied by each element: ", arr.itemsize)
print("Datatype of the array: ", arr.dtype)
print("Number of elements in array: ", arr.size)
print("Shape of the array: ", arr.shape)

print("Maximum value: ", arr.max())
print("Minimum value: ", arr.min())
print("Median of the array elements: ", np.median(arr))
print("Sum of the elements: ", arr.sum())

Dimensions of the array:  2
Size occupied by each element:  4
Datatype of the array:  int32
Number of elements in array:  6
Shape of the array:  (3, 2)
Maximum value:  6
Minimum value:  1
Median of the array elements:  3.5
Sum of the elements:  21


Therefore, NumPy arrays are much more covinient to use.

### 2 (iv) Element wise operation

The NumPy array offers many operations, which can be performed element-wise, instead of entire array. Let us go through the following examples for more illustration:

In [102]:
list_1 = [1,2,3,4]
list_2 = [1,2,3,4]

print(list_1+list_2)
# print(list_1-list_2) returns error
# print(list_1 * list_2) returns error
# print(list_1 / list_2) returns error

np_array_1 = np.array([1,2,3,4])
np_array_2 = np.array([1,2,3,4])

print(np_array_1 + np_array_2)
print(np_array_1 - np_array_2)
print(np_array_1 * np_array_2)
print(np_array_1 / np_array_2)

[1, 2, 3, 4, 1, 2, 3, 4]
[2 4 6 8]
[0 0 0 0]
[ 1  4  9 16]
[1. 1. 1. 1.]


From the above examples we can clearly observe that element wise operations can be done on NumPy arrays, which is a huge advantage over Lists.

### 2 (v) Better slicing functionality

In contrast to regular slicing, NumPy slicing is has better functionality. Let us consider an example to see how NumPy handles an assignment of a value to an extended slice

In [105]:
list_1 = list(range(10))
# l[::2] == 999 returns error

np_array = np.arange(10)
np_array[::2] = 999

print(np_array)

[999   1 999   3 999   5 999   7 999   9]


In the above example, the user's intention was to replace every 999 to every second element. But the splicing for NumPy was able to perform unlike the regular splicing.

Additionally, NumPy can perform multi-dimensional slicing.

In [108]:
a = np.arange(16)
a = a.reshape((4,4))
print(a)
print()
print(a[:, 1]) # returns the second column
print()
print(a[1, :]) # returns the second row

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

[ 1  5  9 13]

[4 5 6 7]


## Summary:

The following conclusions can be drawn:

    1. NumPy offers numerous resources (methods and functions) which can greatly reduce the time spent in writing codes
    2. NumPy offers great flexibility while handling arrays
    3. NumPy offers better functionality than built-in data List in Python

## References:

    1. https://docs.scipy.org/doc/numpy-1.16.0/numpy-user-1.16.0.pdf
    2. https://towardsdatascience.com/a-hitchhiker-guide-to-python-numpy-arrays-9358de570121
    3. https://docs.scipy.org/doc/numpy-1.16.0/numpy-user-1.16.0.pdf
    4. https://numpy.org/