# NUMPY

**Attribution**:

- Some parts of this notebook, especially illustrations, come from the following [Google colab file](https://colab.research.google.com/github/ffund/ml-notebooks/blob/master/notebooks/1-python-numpy-tutorial.ipynb). Which is based on the following sources:
    -   Parts of this notebook are adapted from a [tutorial from CS231N at Stanford University](https://cs231n.github.io/python-numpy-tutorial/), which is shared under the [MIT license]((https://opensource.org/licenses/MIT)).
    -   Parts of this notebook are adapted from Jake VanderPlas’s [Whirlwind Tour of Python](https://colab.research.google.com/github/jakevdp/WhirlwindTourOfPython/blob/master/Index.ipynb), which is shared under the [Creative Commons CC0 Public Domain Dedication license](https://github.com/jakevdp/WhirlwindTourOfPython/blob/master/LICENSE).
    -   The visualizations in this notebook are from [A Visual Intro to NumPy](http://jalammar.github.io/visual-numpy/) by Jay Alammar, which is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
    -   Parts of this notebook (and some images) about numpy broadcasting are adapted from Sebastian Raschka’s [STATS451](https://github.com/rasbt/stat451-machine-learning-fs20) materials.

- Also some content come from wikipedia, like the [numpy page].(https://en.wikipedia.org/wiki/NumPy)
- Exercices have been created by [me](github.com/virgilus/) or come from various websites, including [Kaggle](kaggle.com).


## Introduction

Using Python for scientific activites or data science is made very easy thanks to three very popular librairies : "numpy", "pandas" and "matplotlib" / "seaborn". They are all designed to work well together and, almost every time, objects from one library can be reused in the others.

<img src="files/pandas_numpy_matplotlib_seaborn.png" width="100%" align="center">


## What is Numpy?

Numpy is the core library for scientific computing in Python. It provides a high-performance multidimensional array object, and tools for working with these arrays.

From 1995 to 2006 there were two librairies mainly used to deal with arrays and matrices: "numeric" and "numarray". But in early 2005, Travis Oliphant wanted to unify the community around a single array package and ported Numarray's features to Numeric, releasing the result as **NumPy 1.0 in 2006** ([Wikipedia](https://en.wikipedia.org/wiki/NumPy)).

To use Numpy, we first need to import the `numpy` package. By convention, we import it using the alias `np`. Then, when we want to use modules or functions in this library, we preface them with `np.`

In [None]:
import numpy as np

## Arrays and array construction

A numpy array is a grid of values, all of the same type, and is indexed by a tuple of non-negative integers.
- The number of dimensions is the rank of the array
- The shape of an array is a tuple of integers giving the size of the array along each dimension.

### 1-D Array

We can create a `numpy` array by passing a Python list to `np.array()`.

In [None]:
a = np.array([1, 2, 3]) # Create a rank 1 array
print("Type:", type(a))
print("Elements:", a[0], a[1], a[2]) # We get the elements just like in a list
print("Shape:", a.shape) # returns a tuple
print("Number of dimensions:", len(a.shape)) # that's the length of the previous tuple
print("Number of dimensions with .ndim:", a.ndim) # Returns the first element of the tuple shape
print(a)

Let's say the first dimension is verticality, then we can think of this array as a column, because each row is a new value or object we're dealing with.
![](files/create-numpy-array-1.png)

### Arrays with multiple dimensions

To create a numpy array with more dimensions, we can pass nested lists. Let's think of the second dimension as horizontality :

In [None]:
b = np.array([[1,2],[3,4]]) # Create a rank 2 array
print("Shape:", b.shape)
print("Number of dimensions:", len(b.shape))
print(b)

![](files/numpy-array-create-2d.png)

In [None]:
c = np.array([[[1,2],[3,4]],[[5,6],[7,8]]]) # Create a rank 3 array
print("Shape:", c.shape)
print("Number of dimensions:", len(c.shape))
print(c)

![](files/numpy-3d-array.png)

## Creating one-dimensional array with values already set

There are often cases when we want numpy to initialize the values of the array for us. Numpy provides methods like `np.ones()`, `np.zeros()`, and `np.random.random()` for these cases. We just pass them the number of elements we want it to generate:

In [None]:
print("np.ones(3):", np.ones(3))
print("np.zeros(3):", np.zeros(3))
print("np.random.random(3):", np.random.random(3))

![](files/create-numpy-array-ones-zeros-random.png)

## Creating multi-dimensional array with values already set

We can also use these methods to produce multi-dimensional arrays, as long as we pass them a list or a tuple describing the dimensions of the matrix we want to create.

Sometimes, we need an array of a specific shape with "placeholder" values that we plan to fill in with the result of a computation, the `.zeros()` or `.ones()` functions are handy for this. As for random matrices, they are useful, among other things, to test functions.

In [None]:
print("np.ones((3, 2)):\n", np.ones((3, 2))) # you can use either a tuple or list : np.ones([3, 2])
print("np.zeros((3, 2)):\n", np.zeros((3, 2)))
print("np.random.random((3, 2)):\n", np.random.random((3, 2)))

![](files/numpy-matrix-ones-zeros-random.png)

In [None]:
print("np.ones((4, 3, 2)):\n", np.ones((4, 3, 2)))
print("np.zeros((4, 3, 2)):\n", np.zeros((4, 3, 2)))
print("np.random.random((4, 3, 2)):\n", np.random.random((4, 3, 2)))

![](files/numpy-3d-array-creation.png)

### Full and identity matrices
You can also create a matrix full with a constant using `.full()`, or an identity matrix with `.eye()`.

In [None]:
d = np.full((2,2), 7) # Create a constant array
print(d)

In [None]:
e = np.eye(5) # Create a 5x5 identity matrix
print(e)

## Exercice

Using `np.array()`, create an array that looks like this:

```python
      [[[ 78,  87],
        [ 17,  71]],

       [[ 98,  89],
        [ 92,  28]],

       [[112, 211],
        [ 45,  54]]]
```


In [None]:
# Code here!



## Exercice

Use a function to create an array that looks like this (do **not** use `np.array()` or copy/paste the data).

```python
[[[12, 12, 12, 12, 12],
  [12, 12, 12, 12, 12],
  [12, 12, 12, 12, 12]],

 [[12, 12, 12, 12, 12],
  [12, 12, 12, 12, 12],
  [12, 12, 12, 12, 12]]]
```

In [None]:
# Code here!


## Exercice

Use a function, and an operation, to create an array that looks like this (do **not** use `np.array()` or copy/paste the data).

```python
      [[10.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.],
       [ 9., 10.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.],
       [ 9.,  9., 10.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.],
       [ 9.,  9.,  9., 10.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.],
       [ 9.,  9.,  9.,  9., 10.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.],
       [ 9.,  9.,  9.,  9.,  9., 10.,  9.,  9.,  9.,  9.,  9.,  9.,  9.],
       [ 9.,  9.,  9.,  9.,  9.,  9., 10.,  9.,  9.,  9.,  9.,  9.,  9.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9., 10.,  9.,  9.,  9.,  9.,  9.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9., 10.,  9.,  9.,  9.,  9.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9., 10.,  9.,  9.,  9.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9., 10.,  9.,  9.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9., 10.,  9.],
       [ 9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9.,  9., 10.]]
```

In [None]:
# Code here!


## Exercice

Use a function to create an array that looks like this (do **not** use `np.array()` or copy/paste the data).

```python
      [[[[99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99]],

        [[99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99]],

        [[99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99]]],


       [[[99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99]],

        [[99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99]],

        [[99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99],
         [99, 99, 99, 99, 99]]]]
```

In [None]:
# Code here!


## Creating sequences of numbers


Numpy also has two useful functions for creating sequences of numbers: `np.arange()` and `np.linspace()`.

- The `np.arange()` function accepts three arguments, which define the start value, stop value of a half-open interval, and step size. (The default step size, if not explicitly specified, is 1; the default start value, if not explicitly specified, is 0.)

In [None]:
f = np.arange(10,50,5) # Create an array of values starting at 10 in increments of 5
print(f)

The array created stops at 45, not 50 (does not include the upper bound of the interval).

- The `np.linspace()` function is similar, but we can specify the number of values instead of the step size, and it will create a sequence of evenly spaced values. This function includes the upper interval as it is used as the last number.

In [None]:
g = np.linspace(0, 1, num=5)
print(g)

## Exercise

Use `np.arange()` to generate a sequence of numbers from 0 to 100 (both included) with gaps of 2 numbers, for example: 0, 2, 4 ...

In [None]:
# Code here!


## Exercise

Use `np.arange()` to create this array:

```python
[104, 121, 138, 155, 172, 189, 206]
```

In [None]:
# Code here!


## Exercise

Use `np.arange()` to create this array:

```python
[10213, 11196, 12179, 13162, 14145, 15128, 16111, 17094, 18077,
       19060, 20043, 21026]
```

In [None]:
# Code here!


## Exercise

Use `np.linspace()` to create this array:

```python
[112.  , 112.75, 113.5 , 114.25, 115.  , 115.75, 116.5 , 117.25,
       118.  ]
```

In [None]:
# Code here!


## Exercise

Use `np.linspace()` to create this array:

```python
      [-45892.35647564, -45078.30814793, -44264.25982021, -43450.2114925 ,
       -42636.16316478, -41822.11483707, -41008.06650935, -40194.01818164,
       -39379.96985393, -38565.92152621, -37751.8731985 , -36937.82487078,
       -36123.77654307, -35309.72821535, -34495.67988764, -33681.63155993,
       -32867.58323221, -32053.5349045 , -31239.48657678, -30425.43824907,
       -29611.38992135, -28797.34159364, -27983.29326593, -27169.24493821,
       -26355.1966105 , -25541.14828278, -24727.09995507, -23913.05162735,
       -23099.00329964, -22284.95497193, -21470.90664421, -20656.8583165 ,
       -19842.80998878, -19028.76166107, -18214.71333335, -17400.66500564,
       -16586.61667793, -15772.56835021, -14958.5200225 , -14144.47169478,
       -13330.42336707, -12516.37503935, -11702.32671164, -10888.27838393,
       -10074.23005621,  -9260.1817285 ,  -8446.13340078,  -7632.08507307,
        -6818.03674535,  -6003.98841764,  -5189.94008993,  -4375.89176221,
        -3561.8434345 ,  -2747.79510678,  -1933.74677907,  -1119.69845135,
         -305.65012364,    508.39820407,   1322.44653179,   2136.4948595 ,
         2950.54318722,   3764.59151493,   4578.63984265,   5392.68817036,
         6206.73649807,   7020.78482579,   7834.8331535 ,   8648.88148122,
         9462.92980893,  10276.97813665,  11091.02646436,  11905.07479207,
        12719.12311979,  13533.1714475 ,  14347.21977522,  15161.26810293,
        15975.31643065,  16789.36475836]
```

In [None]:
# Code here!


# Array indexing

Numpy offers several ways to index into arrays.

We can index and slice numpy arrays in all the ways we can slice Python lists.

## One dimension

In [None]:
data = np.array([1, 2, 3])
print(data)
print(data[0])
print(data[1])
print(data[0:2])
print(data[1:])

![](files/numpy-array-slice.png)

## Multiple dimensions

You can index and slice numpy arrays in multiple dimensions, in that case you should specify a slice for each dimension:

In [None]:
sep = "\n------------" # Using this sep variable to better visualize
data = np.array([[1, 2], [3, 4], [5, 6]])
print("data \n", data, sep)
print("data.shape \n", data.shape, sep)
print("data[0, 1] \n", data[0,1], sep)
print("data[1:3] \n", data[1:3], sep)
print("data[0:2, O] \n", data[0:2, 0], sep)

![](files/numpy-matrix-indexing.png)

## Integer indexing and slice indexing

So far we've already used integer indexing and slice indexing but those two methods do not always behave the same way. Let's get into details.

### Integer indexing
#### Integer indexing with 1-D array

Integer indexing means using the following syntax:

```python
object_to_slice[integer] # For one element
object_to_slice[[integer1, integer2, integer3, ...]] # For several elements
```

- Integer indexing allows you to access specific elements of the array by **providing the exact indices** you want to retrieve.
- If you want to select several elements, you must provide an **array of indices**.
- It returns a **new array**.

Example:

In [None]:
sep = "\n-------------------"
a = np.array([50, 51, 52, 53])

# Access individual elements
print("a[2] :\n", a[2], sep)  # Retrieves the element at index 2

# Access several elements by giving an array of indices
print("a[[3, 0, 2]] :\n", a[[3, 0, 2]], sep)  # Retrieves elements at indices 3, 0, and 2

#### Integer indexing with n-D arrays

In that case you must provide coordinates for each element (if you leave a blank it gets all the element inside this dimension).

In [None]:
sep = "\n-------------------"
a = np.array([[[50,  51,  52,  53 ], [54,  55,  56,  57 ], [58,  59,  60,  61]],
                [[500, 510, 520, 530], [540, 550, 560, 570], [580, 590, 600, 610]]])
print("a: \n", a, sep)
print("a.shape :\n", a.shape, sep)

# Access one element
print("a[1,0,3] :\n", a[1,0,3], sep)  # retrieves the element located inside:
# - The second element in the first dimension
# - The first element in the second dimension,
# - The fourth element in the third dimension


# Access several elements by giving an array of indices
print("a[[1,0,1], [0, 1, 2], [3, 2, 1]] :\n",  a[[1, 0, 1], [0, 1, 2], [3, 2, 1]], sep)# Retrieves several elements
# - "530" located at (1, 0, 3)
# - "56" located at (0, 1, 2)
# - "590" located at (1, 2, 1)

print("An other syntax would be: np.array([a[1, 0, 3], a[0, 1, 2], a[1, 2, 1]] \n", np.array([a[1, 0, 3], a[0, 1, 2], a[1, 2, 1]]), sep)

# If you give less arrays than dimensions, every element of the missing dimensions will be selected
print("a[[1,0,1], [0, 1, 2]] :\n",  a[[1,0,1], [0, 1, 2]], sep)
# - [500 510 520 530] located at (1, 0)
# - [ 54  55  56  57] located at (0, 1,)
# - [580 590 600 610] located at (1, 2)

### Slice Indexing

#### Slice indexing with 1-D array

Slicing in python means using the following syntax:

```python
object_to_slice[start:stop:step]
```

- Slice indexing allows you to access a range of elements in an array using a **start, stop, and step** values.
- Using a slice returns **an array** and not a value (it can be an array of one element).
- It returns a **view** on the original array, which means it **does not** create a new copy of the data.

Example:

In [None]:
sep = "\n-------------------"
a = np.array([50, 51, 52, 53])

# Access individual elements
print("a[2:3] :\n", a[2:3], sep) # returns an array that contains the value of the 2nd element.

# Access several elements using start and stop
print("a[1:3] :\n", a[1:3], sep)

# Access several elements using start, stop and step
print("a[::2] :\n", a[::2], sep)

#### Slice indexing with n-D arrays

You can use slicing to select only some parts, some "pieces" of the array. If you don't specify anything, it returns all elements.

In [None]:
sep = "\n-------------------"
a = np.array([[[50,  51,  52,  53 ], [54,  55,  56,  57 ], [58,  59,  60,  61]],
                [[500, 510, 520, 530], [540, 550, 560, 570], [580, 590, 600, 610]]])
print("a: \n", a, sep)
print("a.shape :\n", a.shape, sep)

# Access one element
print("a[1:2, 0:1, 3:4] :\n", a[1:2, 0:1, 3:4], sep)  # returns a nested array with the element located inside:
# - The second element in the first dimension
# - The first element in the second dimension,
# - The fourth element in the third dimension

# Access several elements by giving an array of indices
print("a[:, 0:2, ::2], sep :\n",  a[:, 0:2, ::2], sep)# Retrieves several elements
# - All elements in first dimension
# - First and second element in second dimension
# - All elements in the last dimension, with a step of 2.


# If you give less arrays than dimensions, every element of the missing dimensions will be selected
print("a[:, 0:2], sep :\n",  a[:, 0:2], sep)
# - All elements in first dimension
# - First and second element in second dimension

### Array modification

- An array created with **integer indexing** is a **new array**.
- An array created with **slice indexing** is a **view** of the original array. If you modify the slice, you modify the array.

#### With integer indexing

In [None]:
sep = "\n-------------------"
a = np.array([50, 51, 52, 53])

# Let's store a the elements 1 and 2 of arr in a variable
my_new_array = a[[1,2]] # Using an array of indices

print("my_new_array", my_new_array, sep)

# Let's modify all the values of our new array
my_new_array[:] = -999 

print("my_new_array:", my_new_array, sep) # my_new_array has been modified, as intended.
print("a :", a) #... But the original array stays the same.

#### With slice indexing

In [None]:
sep = "\n-------------------"
a = np.array([50, 51, 52, 53])

# Let's store a slice of an array in a variable
my_slice = a[1:3]

print("my_slice", my_slice, sep) # Retrieves elements from index 1 (inclusive) to 3 (exclusive)

# Let's modify all the values of our slice
my_slice[:] = -999

print("my_slice:", my_slice, sep) # my_slice has been modified, as intended.
print("a :", a) #... But so was the original array!

## Exercice

Use integer indexing on the a array so it outputs those values (you are **not** allowed to use slices or to copy/paste the data!):

```python
[[54, 55, 56, 57],
 [50, 51, 52, 53],
 [58, 59, 60, 61]]
```


In [None]:
a = np.array([[50, 51, 52, 53], [54, 55, 56, 57], [58, 59, 60, 61]])
print(a)

# Code here!


## Exercice

Use integer array indexing to output :
    
```python
[57, 51, 58]
```

In [None]:
a = np.array([[50, 51, 52, 53], [54, 55, 56, 57], [58, 59, 60, 61]])
print(a)

# Code here!


## Exercice

With only a single line of code, modify the "x" array using integer indexing so it contains those values (you are **not** allowed to use slices or to copy/paste the data!):
```python
[[1050   51   52   53]
 [  54   55 1056   57]
 [1058   59   60   61]]
```

In [None]:
x = np.array([[50, 51, 52, 53], [54, 55, 56, 57], [58, 59, 60, 61]])
print(x)

# Code here!


## Exercise
Reverse the rows of the 2D array "a".

**hint**: use slices and... the little trick to reverse elements in a list.

Expected output:

```python
[[6, 7, 8],
 [3, 4, 5],
 [0, 1, 2]]
```

In [None]:
a = np.array([[0,1,2], [3,4,5], [6,7,8]])
print(a)

# Code here!

## Exercice

Slice the "z" array using only slices so it outputs:
```python
[[[[117, 116, 115],
   [120, 119, 118]]],

 [[[141, 140, 139],
   [144, 143, 142]]]]
```

In [None]:
z = np.array([[[[100, 101, 102],
         [103, 104, 105],
         [106, 107, 108],
         [109, 110, 111]],

        [[112, 113, 114],
         [115, 116, 117],
         [118, 119, 120],
         [121, 122, 123]]],


       [[[124, 125, 126],
         [127, 128, 129],
         [130, 131, 132],
         [133, 134, 135]],

        [[136, 137, 138],
         [139, 140, 141],
         [142, 143, 144],
         [145, 146, 147]]]])
print("z array: \n", z)
# Code here


## Exercise

Use the appropriate numpy function to generate this matrix, and then use slicing to modify its content. Result should look like this:
```python
[[1., 1., 1., 1., 1., 1., 1., 1., 1.],
 [1., 2., 2., 2., 2., 2., 2., 2., 1.],
 [1., 2., 2., 2., 2., 2., 2., 2., 1.],
 [1., 2., 2., 2., 2., 2., 2., 2., 1.],
 [1., 2., 2., 2., 2., 2., 2., 2., 1.],
 [1., 2., 2., 2., 2., 2., 2., 2., 1.],
 [1., 2., 2., 2., 2., 2., 2., 2., 1.],
 [1., 2., 2., 2., 2., 2., 2., 2., 1.],
 [1., 1., 1., 1., 1., 1., 1., 1., 1.]]
```

In [None]:
# Code here!


### Mixing integer indexing and slice indexing

- You can mix integer indexing with slice indexing.
- However, using integer indexing will yield an array of **lower rank** than the original array.

Let's compare two different ways of accessing the last element from the 'a' array:

In [None]:
a = np.array([1,2,3])
print(a[2]) # returns only 3
print(type(a[2]))
print(a[2:]) # returns a 1-D array containing the value '3'
print(type(a[2:]))

And now let's access the 2nd element of this 2-D array.

In [None]:
sep = "\n-------------------"
a = np.array([[50, 51, 52, 53], [54, 55, 56, 57], [58, 59, 60, 61]])
print(a, sep)

row_r1 = a[1, :]    # Rank 1 view of the second row of a
row_r2 = a[1:2, :]  # Rank 2 view of the second row of a
print(row_r1, "  shape:", row_r1.shape)
print(row_r2, "shape:", row_r2.shape)

The same applies when we work with columns:

In [None]:
sep = "\n-------------------"
a = np.array([[50, 51, 52, 53], [54, 55, 56, 57], [58, 59, 60, 61]])
print(a, sep)

# We can make the same distinction when accessing columns (2nd dimension) of an array:
col_r1 = a[:, 1]
col_r2 = a[:, 1:2]
print("shape:", col_r1.shape)
print(col_r1, sep)
print("shape:", col_r2.shape)
print(col_r2, sep)

## Exercise
Swap columns 1 and 2 in the array a.

**hint**: use slices and integers indexing!

Expected output:

```python
[[1, 0, 2],
 [4, 3, 5],
 [7, 6, 8]]
```

In [None]:
a = np.arange(9).reshape(3,3)
print(a)

# Code here!


## Exercice

Use a mix of slice indexing and integer indexing on the y array so it outputs:
```python
[[19, 23],
 [ 7, 11]]
```

In [None]:
y = np.array([[[1,2,3,4], [5,6,7,8], [9,10,11,12]], [[13,14,15,16], [17,18,19,20], [21,22,23,24]]])
print("y array: \n", y)
# Code here


### Boolean array indexing

Boolean (`True` or `False`) array indexing lets you pick out arbitrary elements of an array. You can give an array with boolean values to specify which elements you want to keep.

In [None]:
sep = "\n-------------------"
a = np.array([[50, 51, 52, 53], [54, 55, 56, 57], [58, 59, 60, 61]])

a[[[True, False, False, False], [False, False, True, True], [True, True, False, True]]]

 But frequently this type of indexing is used to select the elements of an array that satisfy some condition. Here is an example:

In [None]:
sep = "\n-------------------"
a = np.array([[50, 51, 52, 53], [54, 55, 56, 57], [58, 59, 60, 61]])

# Find the elements of a that are bigger than 2;
# this returns a numpy array of Booleans of the same
# shape as a, where each slot of bool_idx tells
# whether that element of a is > 2.

print(a > 55, sep)

print(a[a > 55], sep)

### Useful functions

When working with numpy arrays, it’s often helpful to get the indices (not only the values) of array elements that meet certain conditions. There are a few numpy functions that you’ll definitely want to remember:

-   [`np.argmax()`](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html) (get index of maximum element in array)
-   [`np.argmin()`](https://numpy.org/doc/stable/reference/generated/numpy.argmin.html) (get index of minimum element in array)
-   [`np.argsort()`](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) (get sorted list of indices, by element value, in ascending order)
-   [`np.where()`](https://numpy.org/doc/stable/reference/generated/numpy.where.html) (get indices of elements that meet some condition)

In [None]:
sep = "\n-------------------"
a = np.array([1, 8, 9, -3, 2, 4, 7, 9])

print("a:\n", a, sep)

# Get the index of the maximum value in variable 'a'
# (this array has two elements with the maximum value but 
# only one index is returned)
print("np.argmax(a) ->", np.argmax(a), sep)

# Get the index of the minimum element in 'a'
print("np.argmin(a)->", np.argmin(a), sep)

# Get sorted list of indices in ascending order
print("np.argsort(a) ->", np.argsort(a), sep)

# Get sorted list of indices in descending order
# [::-1] is a special slicing index that returns the reversed list
print("np.argsort(a)[::-1] ->", np.argsort(a)[::-1], sep)

# Get indices of elements that meet some condition
# this returns a tuple, the list of indices is the first entry
# so we use [0] to get it
print("np.where(a > 5) ->", np.where(a > 5), sep)
print("np.where(a > 5)[0] ->", np.where(a > 5)[0], sep)

# Get indices of elements that meet some condition
# this example shows how to get the index of *all* the max values
print(np.where(a >= a[np.argmax(a)])[0])

For brevity we have left out a lot of details about numpy array indexing; if you want to know more you should read the documentation.

## Exercise

Get all items greater than 7 from the array "a".

In [None]:
a = np.array([2, 6, 1, 9, 10, 3, 27])

# Code here!

## Exercise

From these 2 numpy arrays, extract the indices in which the elements in the 2 arrays match. What function are you going to use?

In [None]:
a = np.array([1,2,3,4,5])
b = np.array([1,3,2,4,5])

# Code here!


### Reshaping and transposing arrays
#### Transposing

Apart from computing mathematical functions using arrays, we frequently need to reshape or otherwise manipulate data in arrays. The simplest example of this type of operation is transposing a matrix; to transpose a matrix, simply use the T attribute of an array object.

![](files/numpy-transpose.png)

In [None]:
x = np.array([[1, 2], [3, 4], [5, 6]])

print(x)
print("Transposed array:\n", x.T)

In [None]:
v = np.array([[1,2,3]])
print(v)
print("Transposed array:\n", v.T)

#### Reshaping

In more advanced use case, you may find yourself needing to change the dimensions of a certain matrix. This is often the case in machine learning applications where a certain model expects a certain shape for the inputs that is different from your dataset. numpy's `reshape()` method is useful in these cases.

In [None]:
sep = "\n-------------------"
data = np.array([1,2,3,4,5,6])
print("data :\n", data, sep)
print("data.reshape(2,3) :\n", data.reshape(2,3), sep)
print("data.reshape(3,2) :\n", data.reshape(2,3), sep)

![](files/numpy-reshape.png)

For example, suppose we had this 2-D array, but we need to pass it to a function that expects a 1-D array.

In [None]:
w = np.array([[1],[2],[3]])
print(w)
w.shape

When we use the `.reshape()` method we can indicate to Numpy that one of the dimension is unknow (we don't care which size it will be in the end) thanks to the value '-1'. It's useful to remove the “unnecessary” extra dimension. Numpy will infer the correct size based on our matrix size!

In [None]:
w = np.array([[1],[2],[3]])
y = w.reshape(-1,)
print(y)
y.shape

In [None]:
sep = "\n-------------------"
w = np.array([[[1,2]],[[3,4]],[[5,6]]])
print("w : \n", w, sep) # 3-D array of shape (3, 1, 2)
print("w.reshape(-1): \n", w.reshape(-1), sep)
print("w.reshape(1,-1): \n", w.reshape(1,-1), sep)
print("w.reshape(2,-1): \n", w.reshape(2,-1), sep)
print("w.reshape(3,-1): \n", w.reshape(3,-1), sep)
print("w.reshape(-1,1): \n", w.reshape(-1, 1), sep)
print("w.reshape(-1,2): \n", w.reshape(-1, 2), sep)
print("w.reshape(-1,3): \n", w.reshape(-1, 3), sep)
print("w.reshape(1, 3, -1, 2): \n", w.reshape(1, 3, -1, 2), sep)

There’s also a `squeeze()` function that removes *all* of the “unnecessary” dimensions (dimensions that have size 1) from an array:

In [None]:
w = np.array([[1],[2],[3]])
print(w.squeeze())

## Exercise

Use the method `.reshape()` and convert the variable "a", which is a 1-D array, to a 3-D array.

In [None]:
a = np.array([x for x in range(27)])

# Code here!

### Datatypes

Every numpy array is a grid of elements of the same type. Numpy provides a large set of numeric datatypes that you can use to construct arrays. Numpy tries to guess a datatype when you create an array, but many functions that construct arrays usually also include an optional argument to explicitly specify the datatype. Here is an example:

In [None]:
x = np.array([1, 2])  # Let numpy choose the datatype
y = np.array([1.0, 2.0])  # Let numpy choose the datatype
z = np.array([1, 2], dtype=np.int64)  # Force a particular datatype

print(x.dtype, y.dtype, z.dtype)

You can also cast an array to a different datatype using the function `astype()`.

In [None]:
x = np.array([1, 2])
print(x)
x = x.astype(np.float64)
print(x)

You can read all about numpy datatypes in the [documentation](http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html).

## Exercise (easy)

Convert all the elements of a numpy array from float to integer datatype.

In [None]:
a = np.array([[2.5, 3.8, 1.5],
              [4.7, 2.9, 1.56]])

# Code here!


## Exercise

Convert a binary numpy array (containing only 0s and 1s) to a boolean numpy array.

In [None]:
a = np.array([[1, 0, 0],
              [1, 1, 1],
              [0, 0, 0]])

# Code here!

## Exercise

Generate a sequence of equally gapped 12 numbers in the range 0 to 1342. Then convert it to integers.

In [None]:
# Code here!


### Array math

What makes working with `numpy` so powerful and convenient is that it comes with many *vectorized* math functions for computation over elements of an array. These functions are highly optimized and are *very* fast - much, much faster than using an explicit `for` loop.

For example, let’s create a large array of random values and then sum it both ways. We’ll use a `%%time` *cell magic* to time them.

In [None]:
a = np.random.random(100_000_000)

In [None]:
%%time
x = np.sum(a)
print(x)

In [None]:
%%time
x = 0
for element in a:
    x = x + element
print(x)

Look at the “Wall Time” in the output - note how much faster the vectorized version of the operation is! This type of fast computation is a major enabler of machine learning, which requires a *lot* of computation.

Whenever possible, we will try to use these vectorized operations.

# Exercice

What is the average value of the second column ?
What is the average value of the first 5 rows of the third and fourth columns?

In [None]:
np.random.seed(21) # This guarantees the code will generate the same set of random numbers whenever executed
random_integers = np.random.randint(1, high=500000, size=(20, 5))
random_integers

In [None]:
# Code here!


In [None]:
# Code here!


# Let's go deeper

### Stacking arrays

Sometimes, we may want to construct an array from existing arrays by “stacking” the existing arrays, either vertically or horizontally. We can use `np.vstack()` (or `np.row_stack()` as it's an alias) and `np.hstack()` or `np.column_stack()`.

#### Vertical arrays
##### 1-D arrays

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.vstack((a,b))

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.row_stack((a,b)) # does the same thing than np.vstack(), it's an alias!

##### 2-D arrays (and more)

In [None]:
# Create two 2-D arrays
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

np.vstack((a, b))

#### Horizontal arrays
##### 1-D arrays

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.hstack((a, b)) # Result is a 1-D array

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
np.column_stack((a, b)) # not an alias of hstack()!
# Stacks 1-D arrays as columns into a 2-D array.

In [None]:
# To achieve the same result than the previous cell with np.hstack(),
# we would need to write the input arrays as follows:
a = np.array([[1], [2], [3]])
b = np.array([[4], [5], [6]])
np.hstack((a,b))

##### 2-D arrays (and more)

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

np.hstack((a, b))

In [None]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

np.column_stack((a, b)) # Behave the same way than hstack() if array is rank 2 or more

In summary, ``np.hstack()`` is typically used for horizontal stacking of 2-D arrays along their existing columns, while ``np.column_stack()`` is often used for creating a new 2-D array by stacking 1-D arrays as columns. However, when you pass 2-D arrays as input to ``np.column_stack()``, it behaves similarly to ``np.hstack()``.

## Exercice

Use the correct function to stack the 'y' and 'z' arrays and outputs this array:
```python

      [[[ 1,  2],
        [ 3,  4],
        [ 5,  6],
        [ 7,  8]],

       [[ 9, 10],
        [11, 12],
        [13, 14],
        [15, 16]]]
```



In [None]:
y = np.array([[[1, 2], [3, 4]], [[9, 10],  [11, 12]]])
z = np.array([[[5, 6], [7, 8]], [[13, 14], [15, 16]]])

# Code here!


## Functions and operator overloads

Some mathematic functions are available both as operator overloads and as functions in the numpy module.

For example, you can perform an elementwise sum on two arrays using either the + operator or the `add()` function.

![](files/numpy-arrays-adding-1.png)

![](files/numpy-matrix-arithmetic.png)

In [None]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
print(x + y)
print(np.add(x, y))

And this works for other operations as well, not only addition:

![](files/numpy-array-subtract-multiply-divide.png)

In [None]:
# Elementwise difference; both behave the same way
print(x - y)
print(np.subtract(x, y))

In [None]:
# Elementwise product; both behave the same way
print(x * y)
print(np.multiply(x, y))

In [None]:
# Elementwise division; both behave the same way
print(x / y)
print(np.divide(x, y))

In [None]:
# Elementwise square root;
print(np.sqrt(x))

The operator `*` is elementwise multiplication, not matrix multiplication. We instead use the `dot()` method to compute inner products of vectors, to multiply a vector by a matrix, and to multiply matrices. `dot()` is available both as a function in the numpy module and as an instance method of array objects. We can also use the "@" operator that does the same thing.

![](files/numpy-matrix-dot-product-1.png)

In [None]:
a = np.array([1, 2, 3])
b = np.array([[1, 10], [100, 1_000], [10_000, 100_000]])
print(a.dot(b))
print(a @ b)
# print(a * b) will yield an error

In [None]:
v = np.array([9, 10])
w = np.array([11, 12])

# Inner product of vectors; produce 219
print(v.dot(w)) # Method 1
print(v @ w) # Method 2
print(np.dot(v, w)) # Method 3

In [None]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9, 10])
w = np.array([11, 12])

# Matrix / vector product; both produce the rank 1 array
print(x.dot(v)) # Method 1
print(np.dot(x, v)) # Method 2
print(x @ v) # Method 3

In [None]:
x = np.array([[1,2],[3,4]])
y = np.array([[5,6],[7,8]])

v = np.array([9, 10])
w = np.array([11, 12])

# Matrix / matrix product; both produce the rank 2 array

print(x.dot(y)) # Method 1
print(np.dot(x, y)) # Method 2
print(x @ y) # Method 3

Besides for the functions that overload operators, Numpy also provides many useful functions for performing computations on arrays, such as `min()`, `max()`, `sum()`, and others:

![](files/numpy-matrix-aggregation-1.png)

In [None]:
x = np.array([[1, 2], [3, 4], [5, 6]])

print(np.max(x))
print(np.min(x))
print(np.sum(x))

Not only can we aggregate all the values in a matrix using these functions, but we can also aggregate across the rows or columns by using the `axis` parameter:

![](files/numpy-matrix-aggregation-4.png)

In [None]:
x = np.array([[1, 2], [5, 3], [4, 6]])

print(np.max(x, axis=0))  # Compute max of each column; prints "[5 6]"
print(np.max(x, axis=1))  # Compute max of each row; prints "[2 5 6]"

You can find the full list of mathematical functions provided by numpy in the [documentation](http://docs.scipy.org/doc/numpy/reference/routines.math.html).

### Broadcasting

Broadcasting is a powerful mechanism that allows numpy to work with arrays of different shapes when performing arithmetic operations.

For example: in basic linear algebra, we can only add (and perform similar element-wise operations) two matrices that have the same dimension. In numpy, if we want to add two matrices that have different dimensions, numpy will implicitly “extend” the dimension of one matrix to match the other so that we can perform the operation.

So these operations will work, instead of returning an error:

![](files/broadcasting-1.png)

![](files/broadcasting-2.png)

Broadcasting two arrays together follows these rules:

**Rule 1**: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.

For example, in the following cell, "a" will be implicitly extended to shape (1,3):

In [None]:
a = np.array([1,2,3])         # has shape (3,): one dimension
b = np.array([[4], [5], [6]]) # has shape (3,1): two dimensions
c = a + b                     # will have shape (3,3) (two dimensions)
print(c)

**Rule 2**: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.

For example, in the following cell "a" will be implicitly extended to shape (3,2):

In [None]:
a = np.array([[1],[2],[3]])         # has shape (3,1)
b = np.array([[4,5], [6,7], [8,9]]) # has shape (3,2)
c = a + b                           # will have shape (3,2) 
print(c)

**Rule 3**: If in any dimension the sizes disagree and neither is equal to 1, an error is raised:

For more detail, you can read the explanation from the [documentation](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).

Functions that support broadcasting are known as universal functions. You can find the list of all universal functions in the [documentation](http://docs.scipy.org/doc/numpy/reference/ufuncs.html#available-ufuncs).

Here are a few visual examples involving broadcasting.

![](files/numpy-array-broadcast.png)

Note that these arrays are compatible in each dimension if they have either the same size in that dimension, or if one array has size 1 in that dimension.

![](files/numpy-matrix-broadcast.png)

And here are some more practical applications:

In [None]:
# Compute outer product of vectors
v = np.array([1,2,3])  # v has shape (3,)
w = np.array([4,5])    # w has shape (2,)
# To compute an outer product, we first reshape v to be a column
# vector of shape (3, 1); we can then broadcast it against w to yield
# an output of shape (3, 2), which is the outer product of v and w:

print(np.reshape(v, (3, 1)) * w)

In [None]:
# Add a vector to each row of a matrix
x = np.array([[1,2,3], [4,5,6]])
# x has shape (2, 3) and v has shape (3,) so they broadcast to (2, 3),
# giving the following matrix:

print(x + v)

In [None]:
# Add a vector to each column of a matrix
# x has shape (2, 3) and w has shape (2,).
# If we transpose x then it has shape (3, 2) and can be broadcasted
# against w to yield a result of shape (3, 2); transposing this result
# yields the final result of shape (2, 3) which is the matrix x with
# the vector w added to each column. Gives the following matrix:

print((x.T + w).T)

In [None]:
# Another solution is to reshape w to be a row vector of shape (2, 1);
# we can then broadcast it directly against x to produce the same
# output.
print(x + np.reshape(w, (2, 1)))

In [None]:
# Multiply a matrix by a constant:
# x has shape (2, 3). Numpy treats scalars as arrays of shape ();
# these can be broadcast together to shape (2, 3), producing the
# following array:
print(x * 2)

Broadcasting typically makes your code more concise and faster, so you should strive to use it where possible.

This brief overview has touched on many of the important things that you need to know about numpy, but is far from complete. Check out the [numpy reference](http://docs.scipy.org/doc/numpy/reference/) to find out much more about numpy.