# Numpy

Numpy is a strong third-party library that emphasizes numerical calculations in python. 

In short, NumPy has a `ndArray` data type, which can process a large number of numbers faster and more efficiently than `List`.

## Table of Content

### [The Basics](#basics)
 - [Arrays Creation](#create)
 - [Basic Operations](#op)
 - [Indexing, Slicing and Iterating](#index)
 - [Arrays Manipulation](#man)
 - [Ordering](#order)
 - [Basic Statistics](#stat)
 
### [Simple Comparison](#compare)
 - [Code Complexity](#code)
 - [Speed](#speed)

### [Further Resources](#resources)

<a id=basics></a>
## The Basics

<a id=create></a>
### Arrays Creation

First, we will have to import `numpy` library. In python, it is
```python
import numpy as np
```
Here, `np` is a convention that abbreviates **numpy**. So that, later in the program, we just need to type `np` to call **numpy** library. 

There are many ways to create a numpy array.
First, let's create an array from list.
```python
array = np.array(list)
```

In [1]:
# import numpy library
import numpy as np

list_object = [[1, 2, 3, 4, 5],
               [6, 7, 8, 9, 10]]

# Let's Create an array
array = np.array(list_object)

# Print Out the Array
print(list_object)
print(array)
print(type(array))

[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
<class 'numpy.ndarray'>


In the **list**, printing the object separates each element with a *comma* whilst in an array, it separates each one with *space*.

The Attributes for ndarray:<br><br>
![](animations/attributes.gif)

In [2]:
# The attributes in `ndarray`.
# Check dtype of the array
print(array.dtype)

# Check item size (number of Bytes)
print(array.itemsize)

# Check array Size [Number of elements in that array]
print(array.size)

# Check number of axis
print(array.ndim)

# Check shape of array
print(array.shape)

# Check the byte of each element
print(array.nbytes)

int64
8
10
2
(2, 5)
80


We can also create arrays from range. `np.arange` function is just like `range()` from python built-in functions.
```python
np.arange([start,] stop[, step,], dtype=None)
```
The square bracket '[ ]' here means it is the optional argument and it has the default value set up. If we do not specify the value, it will take the default value.

e.g. the default value for *start* argument is **0**.

In [2]:
# Let's create another array with np.arange() function

array = np.arange(20)
print(array)

array = np.arange(2, 20, 2)
print(array)

# start and stop same
array = np.arange(2, 2, 2)
print(array)

# Reverse
array = np.arange(40, 20, -2)
print(array)

# Float points
array = np.arange(0, 2, 0.3)
print(array)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]
[ 2  4  6  8 10 12 14 16 18]
[]
[40 38 36 34 32 30 28 26 24 22]
[0.  0.3 0.6 0.9 1.2 1.5 1.8]


But, with `np.arange`, we could not control the numbers of element in that array. But with `np.linspace`, it can be achieved.
```python
array = np.linspace(start, stop, num=50)
```

Here, the *num* is set to default with **50**, but we could alos specify the numbers of element that we want between the range.

In [21]:
# Create with the count of array wanted
array = np.linspace(0, 2, 5)
print(array)

# reverse
array = np.linspace(20, 2, 20)
print(array.shape)
print(array)

# start and end same
array = np.linspace(20, 20, 5)
print(array)

[0.  0.5 1.  1.5 2. ]
(20,)
[20.         19.05263158 18.10526316 17.15789474 16.21052632 15.26315789
 14.31578947 13.36842105 12.42105263 11.47368421 10.52631579  9.57894737
  8.63157895  7.68421053  6.73684211  5.78947368  4.84210526  3.89473684
  2.94736842  2.        ]
[20. 20. 20. 20. 20.]



We can also initialize arrays by just describing the shape of the array that we want.


```python
zeros = np.zeros(shape, dtype, order='C')
ones = np.ones(shape, dtype, order='C')
empty = np.empty(shape, dtype, order='C')
```

In [3]:
# Initialize an array for place holder
array = np.zeros((5,5))
print(array)
print(array.dtype)

[[0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0.]]
float64


The default data type, when creating such array, is `float64`. But if we want to specify the data type of the array, we can do it as follows:

In [23]:
# Set data type to `uint8`
array = np.zeros((3,3), dtype=np.uint8)
print(array)
print(array.dtype)

[[0 0 0]
 [0 0 0]
 [0 0 0]]
uint8


`np.ones()` and `np.empty()` also initialize arrays like `np.zeros()`, but with different values.

In [29]:
array = np.ones((5,5))
print(array)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


In [30]:
array = np.empty((5,5))
print(array)

[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]


We can also create arrays from the existing list and set object by using `np.array()` function. Numpy can automatically choose the data type and set the data type for the array as either `int64` or `float64`.

In [6]:
# Create with existing Variable

list_var = ([1, 2, 3, 4])
array = np.array(list_var)
print(list_var, array)

[1, 2, 3, 4] [1 2 3 4]


In [27]:
nested_list = [[1, 2, 3], [4, 5, 6]]
nested_array = np.array(nested_list)

print(nested_list, '\n', nested_array)
print(nested_array.dtype)


nested_set = ((1, 2, 3), (4, 5, 6))
nested_array = np.array(nested_set)
print(nested_set, '\n', nested_array)
print(nested_array.dtype)

[[1, 2, 3], [4, 5, 6]] 
 [[1 2 3]
 [4 5 6]]
int64
((1, 2, 3), (4, 5, 6)) 
 [[1 2 3]
 [4 5 6]]
int64


In [29]:
list_var = [1., 2., 3.]
array = np.array(list_var)

print(type(list_var[0]), array.dtype)
print(list_var, array)

<class 'float'> float64
[1.0, 2.0, 3.0] [1. 2. 3.]


In [29]:
# But we can also specify the data type in `dtype` argument.
list_var = [1., 2., 3.]
array = np.array(list_var, dtype=np.uint8)

print(type(list_var[0]), array.dtype)
print(list_var, array)

<class 'float'> uint8
[1.0, 2.0, 3.0] [1 2 3]


<a id=op></a>
### Basic Operations

In [1]:
import numpy as np

# Simple Arithmetic Operations
array_a = np.array([1, 2, 3])
array_b = np.array([4, 5, 6])

print("Addition of two array : \t", array_a + array_b)
print("Subtraction of two array : \t", array_a - array_b)
print("Multiplication of two array : \t", array_a * array_b)
print("Division of two array : \t", array_a / array_b)

Addition of two array : 	 [5 7 9]
Subtraction of two array : 	 [-3 -3 -3]
Multiplication of two array : 	 [ 4 10 18]
Division of two array : 	 [0.25 0.4  0.5 ]


There are also many scientific functions built-in in numpy. These functions are called [universal functions](https://numpy.org/doc/stable/reference/ufuncs.html).

In [32]:
x = np.linspace(1, 5, 20)

sine_array = np.sin(x)
cos_array = np.cos(x)
exp_array = np.exp(x)
log_array = np.log(x)

print(sine_array)
print(cos_array)
print(exp_array)
print(log_array)

[ 0.84147098  0.93580167  0.98880935  0.99815331  0.96342094  0.88614595
  0.76974064  0.61934522  0.44160084  0.2443563   0.0363215  -0.17331717
 -0.37530253 -0.56071532 -0.72136812 -0.85016684 -0.94142399 -0.99110987
 -0.99703045 -0.95892427]
[ 0.54030231  0.35252692  0.1491847  -0.0607452  -0.26799272 -0.46340626
 -0.63835676 -0.78511878 -0.89721163 -0.96968552 -0.99934016 -0.98486606
 -0.92690238 -0.82800865 -0.69255183 -0.52651339 -0.33722524 -0.13304594
  0.07700839  0.28366219]
[  2.71828183   3.35525011   4.1414776    5.11193983   6.30980809
   7.78836987   9.61339939  11.86608357  14.64663368  18.07874325
  22.31509059  27.54413077  33.99847904  41.96525883  51.79887449
  63.93677707  78.91892444  97.41180148 120.23806881 148.4131591 ]
[0.         0.19105524 0.35139789 0.48954823 0.61090908 0.71912267
 0.81676114 0.90570862 0.98738665 1.06289421 1.13309846 1.19869575
 1.26025364 1.3182409  1.37304913 1.42500887 1.47440163 1.52146914
 1.56642053 1.60943791]


The random function in Numpy is also strong. It has the collection of various random distributons built in, including: **Uniform Distribution**, **Standard Normal Distribution** and **Gaussian Distribution**.

In [82]:
# Uniform Distribution
uniform_dist = np.random.rand(100)
print(uniform_dist.shape)

# Standard Normal Distribution
standard_normal_dist = np.random.randn(100)
print(standard_normal_dist.shape)

# Gaussian Distribution
gaussian_dist = np.random.normal(1, 2, 100)
print(gaussian_dist.shape)

(100,)
(100,)
(100,)


<a id=index></a>
### Indexing, Slicing and Iterating

![](animations/indexing.gif)

1D Arrays can be indexed, sliced and iterated over much like list. For the arrays of higher dimensions, we have to specify the index or range for each specific axes.

#### 1D Array

In [60]:
a = np.arange(0, 20, 2)
print(a)

[ 0  2  4  6  8 10 12 14 16 18]


In [61]:
# Index : Select sepecific element
element = a[1]
print(element)

2


In [62]:
# Slicing : Select Range of element
range_element = a[1:5]
print(range_element)

[2 4 6 8]


In [65]:
# Reverse Slicing
reverse_element = a[8:2:-1]
print(reverse_element)

[16 14 12 10  8  6]


In [67]:
# Iteration
for i in a:
    print(i)

0
2
4
6
8
10
12
14
16
18


#### 2D and higher Dimension Arrays

In [10]:
# 2D array
array = np.array([[1, 2, 3], [4, 5, 6]])
print(array.shape)
print(array)

# Indexing
first_element = array[0, 0]
last_element = array[-1, -1]
print()
print(first_element, last_element)

# Range
first_row = array[0, :]
first_column = array[:, 0]
print()
print(first_row, first_column)

(2, 3)
[[1 2 3]
 [4 5 6]]

1 6

[1 2 3] [1 4]


In [11]:
# 3D array
array = np.array([[[1, 2, 3], [4, 5, 6]],
                  [[7, 8, 9], [10, 11, 12]],
                  [[13, 14, 15], [16, 17, 18]]])

print(array.shape)
print(array)


# Indexing
first_element = array[0, 0, 0] # First_element
last_element = array[-1, -1, -1] # Last_element
print()
print(first_element, last_element)

# Range
print(array)
r_channel = array[:, :, 0]
g_channel = array[:, :, 1]
b_channel = array[:, :, 2]
print()
print(r_channel, g_channel, b_channel)

(3, 2, 3)
[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]

 [[13 14 15]
  [16 17 18]]]

1 18
[[[ 1  2  3]
  [ 4  5  6]]

 [[ 7  8  9]
  [10 11 12]]

 [[13 14 15]
  [16 17 18]]]

[[ 1  4]
 [ 7 10]
 [13 16]] [[ 2  5]
 [ 8 11]
 [14 17]] [[ 3  6]
 [ 9 12]
 [15 18]]


<a id=man></a>
### Arrays Manipulation

Numpy has built-in functions to manipulate the shape of the array. By using these methods, the `shape` attribute of the array can be changed.

![](animations/shape_manipulation.gif)

`array.reshape` function is often used to reshape the original array into desired shape.

> Note: When reshaping the array, the total size of the array cannot be changed.

e.g., an array having the shape of `(3,3)` cannot be reshaped into `(4,3)`. It will throw `ValueError: cannot reshape array of size 8 into shape (3,3)`

> Tips: When you are only sure about one dimension for the array, you can use `-1` for the rest of array dimensions to let numpy automatically choose the rest.

In [1]:
array = np.zeros((2, 4))
print("Original Array shape : ", array.shape)

reshaped = array.reshape((4, 2))
print("Reshaped Array shape : ", reshaped.shape)

# Here we only know the shape of one dimension
unknown_shape = array.reshape((2, -1))
print(unknown_shape.shape)

Original Array shape :  (2, 4)
Reshaped Array shape :  (4, 2)
(2, 4)


So the array automatically figures out the rest of the shape in the dimension.

You can use numpy built-in functions to stack arrays. There are three stacking functions `dstack`, `hstack` and `vstack`. Please refer to the above animation for more understanding of stacking arrays.

In [73]:
# This method stack arrays in third axes
dstack = np.dstack((array, array))
print(dstack.shape)

(2, 4, 2)


In [74]:
# This method stack arrays in second axes
hstack = np.hstack((array, array))
print(hstack.shape)

(2, 8)


In [75]:
# This method stack arrays in first axes
vstack = np.vstack((array, array))
print(vstack.shape)

(4, 4)


<a id=order></a>
### Ordering

In numpy, we can find the minimum and maximum of the array by simply calling built-in functions `np.min(array)` and `np.max(array)`. Also, the index of these values can be found by using`np.argmin(array)` and `np.argmax(array)`.

![](animations/sorting.gif)

In [85]:
array = np.array([2, 1, 2, 5, 2, 100, 2, 99, 12])

# we can sort the array by
sorted_array = np.sort(array)

# Finding the min, max, argmin, argmax value is easy in numpy by:
min_value = np.min(array)
argmin = np.argmin(array)
max_value = np.max(array)
argmax = np.argmax(array)

print("Original Array : \t\t\t", array)
print("Sorted Array : \t\t\t\t", sorted_array)
print("Minimum value in Array : \t\t", min_value)
print("Index where minimum value exists : \t", argmin)
print("Maximum value in Array : \t\t", max_value)
print("Index where maximum value exists : \t", argmax)

Original Array : 			 [  2   1   2   5   2 100   2  99  12]
Sorted Array : 				 [  1   2   2   2   2   5  12  99 100]
Minimum value in Array : 		 1
Index where minimum value exists : 	 1
Maximum value in Array : 		 100
Index where maximum value exists : 	 5


<a id=stat></a>
### Basic Statistics

We can find the basic statistics values `Mean`, `Standard Deviation` and `Variance` in the array by using the numpy built-in functions.

In [83]:
array = np.arange(20)
print(array)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [84]:
mean = np.mean(array)
std = np.std(array)
var = np.var(array)

print(f"Mean : \t\t\t{mean}\nStandard Deviation : \t{std:.2f}\nVariance : \t\t{var}")

Mean : 			9.5
Standard Deviation : 	5.77
Variance : 		33.25


Although the above functions are built-in in numpy library, we can also implement them using equations:

In [94]:
mean = np.sum(array)/array.size
print(mean)

std = np.sqrt(np.sum((array-mean)**2)/array.size)
print(std)

var = np.sum((array-mean)**2)/(array.size-1)
print(var)

9.5
5.766281297335398
35.0


<a id=compare></a>
### Simple Comparison

Let's Compare the calculation time needed between List and ndarray.

In [102]:
# Let's try to compare for large numbers.
array_obj = np.arange(1000000)

# We get (100,100,100) elements in that array
print(array_obj.shape)
print(type(array_obj))

(1000000,)
<class 'numpy.ndarray'>


In [103]:
# We can convert from ndarray object to list object with
list_obj = [i for i in range(1000000)]
print(len(list_obj))
print(type(list_obj))

1000000
<class 'list'>


<a id=code></a>
#### Code Complexity

Add 1 to each element in array and list. Here, we can see that list has to use For loop to iterate over each element whilst arrays in numpy carries out using numpy broadcasting.

In [104]:
# add 1 to list
list_result = [i+1 for i in list_obj]

# add 1 to array
array_obj = array_obj + 1

The above mentioned is the simplest one, where the dimension is only 1. Let's try with higher dimensions.

In [2]:
# let's create 3d arrays
array_obj = np.arange(100000).reshape(10,10,-1)
print(type(array_obj))

# We could convert array to list by:
list_obj = array_obj.tolist()
print(type(list_obj))

<class 'numpy.ndarray'>
<class 'list'>


In [3]:
# Add 1: to list
for i in range(len(list_obj)):
    for j in range(len(list_obj[0])):
        for k in range(len(list_obj[0][0])):
            list_obj[i][j][k] += 1

In [4]:
# Add 1: to Array
array_obj += 1

<a id=speed></a>
#### Speed

In [112]:
%%timeit
array_obj = np.arange(1000000)
array_obj = array_obj + 1

3.24 ms ± 101 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [113]:
%%timeit
list_obj = [i for i in range(1000000)]
list_obj = [i+1 for i in range(1000000)]

204 ms ± 58.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


We can see that doing one operation in creating an array is around 60 times faster than the list in the above comparison. As the size of the element and operations increase more, the difference will go higher.

<a id=resources></a>
## Further Resources

If you wanna know more about Numpy arrays, please visit this official Numpy [documentation](https://numpy.org/doc/1.19/user/quickstart.html). You can also learn more about Numpy arrays in this [blog post](https://realpython.com/numpy-array-programming/).