# NumPy Introduction

- **NumPy** stands for **Numerical Python**, it is a widely used scientific computing package in Python.

- It provides a multidimensional array object, called `ndarray`, and various functions|methods for performing various computations such as linear algebra, statistics, etc.

- NumPy package is used in the backend for many other Python packages such as pandas, matplotlib, scikit-learn, etc.

- To get started, install the NumPy package: `pip install numpy` in the command line.

- In your code, import NumPy as: 

```python
import numpy as np
```

- Here's the link to the documentation: [NumPy documentation](https://numpy.org/doc/stable/index.html)

In [2]:
import numpy as np

## NumPy's `ndarray`

- NumPy provides a `ndarray` object, which is a multidimensional array, storing homogeneous data.

- A lot of operations are performed at compile-time, for better performance.

- But still, why NumPy when we already have `lists` in Python?

### Why NumPy when we have lists?

- Lists are slow, NumPy arrays are fast. But why?

- One reason is that NumPy arrays store homogeneous data, which is why it is faster.
    
- Consider storing an element `5` in a list. It uses the Python's inbuilt `int` object which stores the following information about the item: size, reference_count, object_type, object_value.

    - If we are using a `32` bit notation, object value is of size `8B`, object type is of size `8B`, reference_count is of size `8B`, and size might be of `4B`.

- Storing the same `5` in a NumPy array, using `int32` datatype by default, takes 4B space.

- Since NumPy uses less amount of memory, computations are faster.

- One more noteworthy point is that the lists can be dynamic whereas NumPy arrays have size fixed at the time of creation.
    
    - NumPy arrays store elements in **contiguous** memory locations, so accessing them is easier and faster.
    
    - Lists store items in non-continuous memeory locations, and hence the actual list contains the pointer to a memory location. So, data access is comparitively slower.
    
- Simultaneous computations are possible with NumPy arrays, because they support **SIMD Vector Processing**.

- NumPy also uses the **cache** effectively, for better performance.

- Also, accomplishing many mathematical operations is more easy in NumPy than doing so using lists.

## NumPy arrays

- An array is a central data structure of the NumPy library. 

- It has a grid of elements that can be indexed in various ways. 

- The elements are all of the same type, referred to as the array `dtype`.

- To construct an array, we make use of `np.array()` function. One way to construct an array is using a list.

In [3]:
# Create a simple NumPy array
a = np.array([10, 20, 30])
print(a)
print(type(a))

[10 20 30]
<class 'numpy.ndarray'>


- In general, we refer to the array as an **`ndarray`** or an N-dimensional array.

- In NumPy, the dimensions are called **axes**.

- The number of items across each dimension is specified by the **`shape`**.

    - The `shape` is a tuple of `N` positive integers, where each integer is the size of the array across each dimension.
    
- The number of dimensions is defined by **`rank`** of the array.
    
- The type of elements are specified by **`dtype`**.
    
- Consider this 2D array:

![image.png](attachment:image.png)

- 1. It has 2 axes. Hence, it's `rank` is 2.

  2. It's `shape` is `(3, 4)` &rarr; 3 is the size along one axis(rows), and 4 along another axis (columns).
  
  3. It's `dtype` is `int32` which is a 32 bit integer. This is the default.

- The following properties are helpful:

    - `ndarray.shape` returns the shape of the array.
    
    - `ndarray.ndim` returns the rank of the array.
    
    - `ndarray.dtype` returns the datatype of the elements of the array.
    
    - `ndarray.size` returns the number of elements in the array.

In [13]:
# Utility function
def print_nd_array_info(arr):
    print(a)
    print('Rank: ', arr.ndim)
    print('Shape:', arr.shape)
    print('Dtype:', arr.dtype)
    print('Size: ', arr.size)

In [14]:
# Creating a 1D array
a = np.array([1, 2, 3])
print_nd_array_info(a)

[1 2 3]
Rank:  1
Shape: (3,)
Dtype: int32
Size:  3


In [15]:
# Creating a 2D array
b = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
print_nd_array_info(b)

[1 2 3]
Rank:  2
Shape: (2, 3)
Dtype: int32
Size:  6


In [16]:
# Creating a 3D array
c = np.array([
    [
        [1, 2],
        [3, 4]
    ],
    [
        [5, 6],
        [7, 8]
    ],
    [
        [9, 10],
        [11, 12]
    ]
])
print_nd_array_info(c)

[1 2 3]
Rank:  3
Shape: (3, 2, 2)
Dtype: int32
Size:  12


- We can specify the datatype explicitly, by using the `dtype` keyword argument.

In [17]:
# using the dtype keyword argument
a = np.array([1, 2, 3], dtype='int16')
print_nd_array_info(a)

[1 2 3]
Rank:  1
Shape: (3,)
Dtype: int16
Size:  3


## `np.zeros()`

- **`np.zeros()`** returns an array filled with zeros

- Note that by default, the datatype is inferred as `float64`

In [20]:
# Get a 1D array of all zeros
a = np.zeros(5)
print_nd_array_info(a)

[0. 0. 0. 0. 0.]
Rank:  1
Shape: (5,)
Dtype: float64
Size:  5


In [21]:
# 2D array of all zeros
a = np.zeros((2, 3), dtype='int32')
print_nd_array_info(a)

[[0 0 0]
 [0 0 0]]
Rank:  2
Shape: (2, 3)
Dtype: int32
Size:  6


## `np.ones()`

- **`np.ones()`** returns an array filled with ones.

- Note that by default, the datatype is inferred as `float64`

In [22]:
# Get a 1D array of all ones
a = np.ones(5)
print_nd_array_info(a)

[1. 1. 1. 1. 1.]
Rank:  1
Shape: (5,)
Dtype: float64
Size:  5


In [23]:
# 2D array of all ones
a = np.ones((2, 3), dtype='int32')
print_nd_array_info(a)

[[1 1 1]
 [1 1 1]]
Rank:  2
Shape: (2, 3)
Dtype: int32
Size:  6


## `np.full()`

- **`np.full()`** lets you fill an entire array with a given value.

- Syntax is `np.full(shape_of_array, value)`

In [42]:
a = np.full((4, 3), 100)
print_nd_array_info(a)

[[100 100 100]
 [100 100 100]
 [100 100 100]
 [100 100 100]]
Rank:  2
Shape: (4, 3)
Dtype: int32
Size:  12


## `np.arange()`

- Returns an array with evenly spaced values in a given range

- Syntax is: `np.arange(start, stop, step)`

- The output excludes `stop`.

- It can be used like:

    - `np.arange(stop)` where `start` is taken as `0` and `step` is taken as `1`.
    
    - `np.arange(start, stop)` where `step` is taken as `1`.
    
    - `np.arange(start, stop, end)`
    
> **WARNING**
>
> The length of the output might not be numerically stable.
>
> The actual `step` is actually `dtype(start + step) - dtype(start)`, which might lead to **precision loss** in some cases.
>
> You might actually want to use `np.linspace()` in most cases.

In [25]:
# examples of np.arange
a = np.arange(15)
b = np.arange(2, 15)
c = np.arange(2, 15, 3)
print_nd_array_info(a)
print(b)
print(c)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
Rank:  1
Shape: (15,)
Dtype: int32
Size:  15
[ 2  3  4  5  6  7  8  9 10 11 12 13 14]
[ 2  5  8 11 14]


## `np.linspace()`

- Returns an array of evenly space number over a given interval.

- Syntax is `np.linspace(start, stop, num, endpoint=True)`

- Returns `num` number of evenly spaced spaced over the given interval, with or without excluding the `stop`, depending on the value of `endpoint`.

- By default, it includes the `stop` in the result.

In [27]:
# examples of np.linspace()
a = np.linspace(2, 15, num=3)
b = np.linspace(-10, 10, num = 5, endpoint=False)
print_nd_array_info(a)
print(b)

[ 2.   8.5 15. ]
Rank:  1
Shape: (3,)
Dtype: float64
Size:  3
[-10.  -6.  -2.   2.   6.]


## Indexing & Slicing

- We use a syntax similar to the syntax used in lists.

- For the below demonstrations, lets use a 2D array:

![image.png](attachment:image.png)



- The most commonly used syntax is to separate the index position across each dimension using a `,`.

- For example:

    - 1D array: `[x]`
    
    - 2D array: `[x, y]`
    
    - 3D array: `[x, y, z]`

In [34]:
a = np.linspace(1, 12, num=12, dtype='int32').reshape((3, 4))
print(a)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [35]:
# indexing on a 1D array
b = a.reshape(12)
print(b[0], b[-1])
print(b[2:9])
print(b[2:9:2])
print(b[::-1])

1 12
[3 4 5 6 7 8 9]
[3 5 7 9]
[12 11 10  9  8  7  6  5  4  3  2  1]


In [36]:
# Now lets move to the above example
# grabbing the first row
a[0, :]

array([1, 2, 3, 4])

In [37]:
# grabbing the last row
a[-1, :]

array([ 9, 10, 11, 12])

In [38]:
# grabbing the 2nd column
a[:, 1]

array([ 2,  6, 10])

In [39]:
# grabbing the cell with vaue 9
a[2, 0]

9

In [40]:
# slicing the middle 2 columns
a[:, 1:-1]

array([[ 2,  3],
       [ 6,  7],
       [10, 11]])

In [41]:
# slicing the middle row, middle 2 columns
a[1:-1, 1:-1]

array([[6, 7]])

- There are some advanced indexing methods, please refer to [this article](https://www.tutorialspoint.com/numpy/numpy_advanced_indexing.htm) and [the docs](https://numpy.org/doc/stable/user/quickstart.html#advanced-indexing-and-index-tricks).

## Sorting a NumPy array

- One common method used is the **`np.sort()`** method, which returns a sorted copy of the array.

- Syntax is `np.sort(array_variable, axis=-1, kind=None, order=None)`

- By default, sorting is done with the last axis (`-1`).

    - If you give axis as `None`, the array will be flattened before sorting.
    
- `kind` argument specifies the algorithm to use.

- `order` argument specifies the field to compare incase of structured arrays.

- Refer to [the documentation](https://numpy.org/doc/stable/reference/generated/numpy.sort.html#numpy.sort) for more details.

In [44]:
# sorting a 1D array
a = np.array([100, 70, 99, 123, 6181, 123])
b = np.sort(a)
print(a)
print(b)

[ 100   70   99  123 6181  123]
[  70   99  100  123  123 6181]


In [45]:
# sorting a 2D array
a = np.array([
    [3, 2, 1],
    [9, 7, 8],
    [5, 6, 4]
])
print(a)

[[3 2 1]
 [9 7 8]
 [5 6 4]]


In [46]:
# axis=None -> flattens the array before sorting
b = np.sort(a, axis=None)
print(b)

[1 2 3 4 5 6 7 8 9]


In [47]:
# by default, sorting is on last axis
# in case of 2D array, last axis is the 'row'
b = np.sort(a)
print(b)

[[1 2 3]
 [7 8 9]
 [4 5 6]]


In [49]:
# custom axis
# in this case, column wise -> axis '0'
b = np.sort(a, axis=0)
print(b)

[[3 2 1]
 [5 6 4]
 [9 7 8]]


In [60]:
# creating a structured array
# define the values
data = [
    ('Hyderabad', 100, 200),
    ('Mumbai', 150, 150),
    ('Delhi', 90, 175)
]
# define the datatype
dtype_ = [('city', 'S20'), ('temparature', 'int32'), ('rainfall', 'float64')]
a = np.array(data, dtype=dtype_)
print_nd_array_info(a)

[(b'Hyderabad', 100, 200.) (b'Mumbai', 150, 150.) (b'Delhi',  90, 175.)]
Rank:  1
Shape: (3,)
Dtype: [('city', 'S20'), ('temparature', '<i4'), ('rainfall', '<f8')]
Size:  3


In [61]:
# default sorting -> along column city
b = np.sort(a)
print(b)

[(b'Delhi',  90, 175.) (b'Hyderabad', 100, 200.) (b'Mumbai', 150, 150.)]


In [63]:
# sorting based on rainfall
b = np.sort(a, order='rainfall')
print(b)

[(b'Mumbai', 150, 150.) (b'Delhi',  90, 175.) (b'Hyderabad', 100, 200.)]


In [62]:
# sorting based on temparature
b = np.sort(a, order='temparature')
print(b)

[(b'Delhi',  90, 175.) (b'Hyderabad', 100, 200.) (b'Mumbai', 150, 150.)]


## Reshaping an array - `np.reshape()` and `arr.reshape()`

- These methods allow you to reshape an existing array.

- The reshaped array **must** have **same** number of elements as that of the original array.

- **`arr.reshape(shape, order='C')`** returns an array containing the same data with a new shape.

- Its same as **`np.reshape(arr, new_shape, order='C')`**.

In [64]:
a = np.arange(1, 13)
print(a)

[ 1  2  3  4  5  6  7  8  9 10 11 12]


In [65]:
b = np.reshape(a, (4, 3))
print(b)

[[ 1  2  3]
 [ 4  5  6]
 [ 7  8  9]
 [10 11 12]]


In [66]:
c = a.reshape((3, 4))
print(c)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [67]:
d = np.reshape(c, (2, 6))
print(d)

[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]]


- **`linalg`** module: [docs](https://numpy.org/devdocs/reference/routines.linalg.html#module-numpy.linalg)

- [freeCodeCamp's article](https://www.freecodecamp.org/news/the-ultimate-guide-to-the-numpy-scientific-computing-library-for-python/)