## What is NumPy?

In Python, we usually use <i>list</i> data type to store a collection of items. The Python list is similar to the concept of arrays in high-level programming lanuages like Java and C#. The following code snippet captures a Python list:

In [3]:
list1 = [0,1,2,3,4]

Unlike arrays, a Python list does not need to contain elements of the same type. The following example is a perfectly legal list in Python:

In [4]:
list2 = [6, 'Hello', 3.16, False, 0]

While this unique feature in Python provides flexibility when handling multiple types in a list, it has its disadvantages when processing large amounts of data (as is typical in machine learning and data science projects). The key problem with Python's list data type is its efficiency. To allow a list to have non‐uniform type items, each item in the list is stored in a memory location, with the list containing an “array” of pointers to each of these locations. A Python list requires the following:

- At least 4 bytes per pointer
- At least 16 bytes for the smallest Python object—4 bytes for a pointer, 4 bytes for the reference count, 4 bytes for the value. All of these together round up to 16 bytes.


Due to the way that a Python list is implemented, accessing items in a large list is computationally expensive. To solve this limitation with Python's list feature, Python programmers turn to NumPy, an extension to the Python programming language that adds support for large, multidimensional arrays and matrices, along with a large library of high‐level mathematical functions to operate on these arrays.

In NumPy, an array is of type ndarray (n‐dimensional array), and all elements must be of the same type. An ndarray object represents a multidimensional, homogeneous array of fixed‐size items, and it is much more efficient than Python's list. The ndarray object also provides functions that operate on an entire array at once.

## Creating NumPy Arrays

Before using NumPy, you first need to import the NumPy package (You may refer to the package using its alias <i>np</i> if you prefer)

In [5]:
import numpy as np

The first way to make NumPy arrays is to create them intrinsically, using the functions built right into NumPy. First, you can use the arange() function to create an evenly spaced array with a given interval:

In [6]:
a = np.arange(20) # Creates a range from 0 to 19
print(a)
print(a.shape)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
(20,)


The preceding code creates a rank 1 array (one‐dimensional) of 20 elements. To get the shape of the array, use the shape property. Think of a1 as a 20×1 matrix

You can also specify a step in the arange() function. The following code snippet inserts a step value of 2:

In [7]:
a1 = np.arange(0,20, 2) # creates a range from 0 to 20, step 2
print(a1)
print(a1.shape)

[ 0  2  4  6  8 10 12 14 16 18]
(10,)


To create an array of a specific size filled with 0s, use the zeros() function:

In [8]:
a2 = np.zeros(10) # creates a one-dim array with 10 zeros
print(a2)
print(a2.shape)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
(10,)


You can also create two‐dimensional arrays using the zeros() function:

In [9]:
a3 = np.zeros((2,3)) # array of zeros with 2-rows and 3-columns
print(a3)

[[0. 0. 0.]
 [0. 0. 0.]]


In [10]:
print(a3.shape)

(2, 3)


If you want an array filled with a specific number instead of 0, use the full() function:

In [11]:
a4 = np.full((2,3), 10) # an array of rank 2 with all 10s
print(a4)
print(a4.shape)

[[10 10 10]
 [10 10 10]]
(2, 3)


Sometimes, you need to create an array that mirrors an identity matrix. In NumPy, you can do so using the eye() function:
the eye() function returns a 2-D array with ones on the diagonal and zeros elsewhere.

In [12]:
a5 = np.eye(4) # 4 x 4 matrix
print(a5)
print(a5.shape)

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]
(4, 4)


To create an array filled with random numbers, you can use the random() function from the numpy.random module:

In [14]:
a6 = np.random.random((2,4)) # create a 2 x 4 array with random values in the intervals between 0.0 and 1.0
print(a6)
print(a6.shape)

[[0.94598426 0.88947641 0.26357243 0.66539915]
 [0.98211413 0.23647742 0.42673729 0.05495735]]
(2, 4)


Another way to create a NumPy array is to create it from a Python list as follows:

In [15]:
list1 = [1,2,3,4,5,6,4]
r1 = np.array(list1) # this is a rank 1 array
print(r1)
print(r1.shape)

[1 2 3 4 5 6 4]
(7,)


## Array Indexing

Accessing elements in the array is similar to accessing elements in a Python list:

In [16]:
# to print the first element from the array r
print(r1[0])

1


In [17]:
# to print the 2nd element from the array r1
print(r1[1])

2


The following code snippet creates another array named r2, which is two‐dimensional:

In [20]:
list2 = [6,13,14,8,9, 8, 10]
r2 = np.array([list1, list2])
print(r2)
print(r2.shape)

[[ 1  2  3  4  5  6  4]
 [ 6 13 14  8  9  8 10]]
(2, 7)


## Boolean Indexing

In addition to using indexing to access elements in an array, there is another very cool way to access elements in a NumPy array. Consider the following:

In [23]:
print(r1 > 2) # This statement prints out a list containing Boolean values. What it actually does is to go through each element in r1 and check if each element is more than two. The result is a Boolean value, and a list of Boolean values is created at the end of the process

[False False  True  True  True  True  True]


In [27]:
print (r1[r1 > 2]) #You can feed the list results back into the array as the index. This is called Boolean Indexing.

[3 4 5 6 4]


In [28]:
nums = np.arange(20)
print(nums)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


If you want to retrieve all odd numbers from the list nums, you can simply use Boolean Indexing as follows:

In [29]:
odd_nums = nums[nums % 2 == 1]
print(odd_nums)

[ 1  3  5  7  9 11 13 15 17 19]


## Slicing Arrays

In [32]:
a = np.array([[1,2,3,4,5], # rank 2 array
            [4,5,6,7,8],
            [9,8,7,6,5]])
print(a)
print(a.shape)

[[1 2 3 4 5]
 [4 5 6 7 8]
 [9 8 7 6 5]]
(3, 5)


To extract the last two rows and the first three columns: 

In [39]:
b1 = a[1:3, :3] # row 1 to 3 (not inclusive) and first 3 columsn
print(b1)

[[4 5 6]
 [9 8 7]]


Slicing has the following syntax: [start:stop]. For two‐dimensional arrays, the slicing syntax becomes [start:stop, start:stop]. The start:stop before the comma (,) refers to the rows, and the start:stop after the comma (,) refers to the columns. Hence for [1:3, :3], this means that you want to extract the rows with index 1 right up to 3 (but not including 3), and columns starting from the first column right up to index 3 (but not including 3). The general confusion regarding slicing is the end index. You need to remember that the end index is not included in the answer.

## NumPy Slice is a Reference

It is noteworthy that the result of a NumPy slice is a reference and not a copy of the original array. Consider the following:

In [40]:
b3 = a[1:, 2:] # row 1 onwards and column 2 onwards.
print(b3) # b3 is pointing to a subset of a

[[6 7 8]
 [7 6 5]]


here, b3 is a actually a reference to the original array a. See figure below: 

<p style="text-align: center">
  <img  src="Media\NumPy-Slice.PNG" width="600" alt="NumPy-Slice">
</p>

Hence, if you want to change one of the following elements in b3 as follows:

In [42]:
b3[0,2] = 88 # b3[0,2] is pointing to a[1,4]; modifying it will modify the original array a

In [43]:
print(a) 

# The result will affect the content of a like this:

[[ 1  2  3  4  5]
 [ 4  5  6  7 88]
 [ 9  8  7  6  5]]


## Reshaping Arrays

In [45]:
b5 = a[2, :] # row 2 and all columns
print(b5) # b5 is in rank 1
print (b5.shape)

[9 8 7 6 5]
(5,)


You can rehsape an array to another dimension using the reshape() function. Using b5 (which is a rank 1 array) example, you can reshape it to a rank 2 array as follows:

In [46]:
b5 = b5.reshape(1, -1)
print(b5)

[[9 8 7 6 5]]


In this example, you call the reshape() function with two arguments. The first 1 indicates that you want to convert it into rank 2 array with 1 row, and the ‐1 indicates that you will leave it to the reshape() function to create the correct number of columns. Of course, in this example, it is clear that after reshaping there will be five columns, so you can call the reshape() function as reshape(1,5). In more complex cases, however, it is always convenient to be able to use ‐1 to let the function decide on the number of rows or columns to create.

## Array Math

 You can perform array math very easily on NumPy arrays. Consider the following 2-rank arrays:

In [47]:
x1 = np.array([[1,2,3], [4,5,6]])
x2 = np.array([[7,8,9],[2,3,4]])

# To ad these arrays together, you use the + operator as follows:

print(x1 + x2)

[[ 8 10 12]
 [ 6  8 10]]


Array math is important, as it can be used to perform vector calculations. A good example is as follows:

In [49]:
x = np.array([2,3])
y = np.array([4,2])
z = x + y
print(z)

[6 5]


In [51]:
# beside using the + operator, you can also use the np.add() function to add two arrays
np.add(x1, x1)

array([[ 2,  4,  6],
       [ 8, 10, 12]])

In [52]:
# you can also perform subtraction, multiplication, as well as division with NumPy arrays:

print (x1 - x2)
print (x1 * x2)
print (x1 / x2)

[[-6 -6 -6]
 [ 2  2  2]]
[[ 7 16 27]
 [ 8 15 24]]
[[0.14285714 0.25       0.33333333]
 [2.         1.66666667 1.5       ]]


#### Example

Suppose you have three arrays: one containing the names of three individuals, another the corresponding heights of these individuals, and the last one the corresponding weights of the individuals in the group:

In [53]:
names = np.array(['Ann', 'Joe', 'Mark'])
heights = np.array([1.5, 1.78, 1.6])
weights = np.array([65, 46, 59])

Now say that you want to calculate the Body Mass Index (BMI) of this group of people. The formula to calculate BMI is as follows:
- Divide the weight in kilograms (kg) by the height in meters (m)
- Divide the answer by the height again

Using the BMI, you can classify a person as healthy, overweight, or underweight using the following categories:
- Underweight if BMI < 18.5
- Overweight if BMI > 25
- Normal weight if 18.5 <= BMI <= 25

Using array division, you could simply calculate BMI using the following statement:

In [54]:
bmi = weights / heights ** 2 # Calculate the BMI
print(bmi)

[28.88888889 14.51836889 23.046875  ]


In [55]:
# Finding out who is overweight, underweight, or otherwise is now very easy:

print("Overweight: " , names[bmi>25])
print("Underweight: " , names[bmi<18.5])
print("Healthy: " , names[(bmi>=18.5) & (bmi<=25)])


Overweight:  ['Ann']
Underweight:  ['Joe']
Healthy:  ['Mark']


## Dot Product

Note that when you multiply two arrays, you are actually multiplying each of the corresponding elements in the two arrays. Very often, you want to perform a scalar product (also commonly known as dot product). The dot product is an algebraic operation that takes two coordinate vectors of equal size and returns a single number. The dot product of two vectors is calculated by multiplying corresponding entries in each vector and adding up all of those products

In NumPy, dot product is accomplished using the dot() function:

In [57]:
x = np.array([2,3])
y = np.array([4,2])
np.dot(x,y) # 2 * 4 + 3 * 2 = 14

14

## Matrix

NumPy provides another class in addition to arrays (ndarray): matrix. The matrix class is a subclass of the ndarray, and it is basically identical to the ndarray with one notable exception—a matrix is strictly two‐dimensional, while an ndarray can be multidimensional. Creating a matrix object is similar to creating a NumPy array:

In [58]:
x2 = np.matrix([[1,2],[4,5]])
y2 = np.matrix([[7,8],[2,3]])

In [59]:
# You can also convert a NumPy array to a matrix using the asmatrix() function:

In [60]:
x1 = np.array([[1,2],[4,5]])
y1 = np.array([[7,8],[2,3]])
x1 = np.asmatrix(x1)
y1 = np.asmatrix(y1)