# Machine Learning Zoomcamp


## 1.7 Introduction to NumPy


Plan:

* Creating arrays
* Multi-dimensional arrays
* Randomly generated arrays
* Element-wise operations
    * Comparison operations
    * Logical operations
* Summarizing operations

## Understanding Numpy: A Simple Introduction

Numpy, short for Numerical Python, is a powerful Python library that provides support for large, multi-dimensional arrays and matrices, along with a wide range of mathematical functions to operate on these arrays. By importing the Numpy module as `np`, we gain access to all of its functionality and can easily manipulate arrays in our code. 

One of the key features of Numpy is its ability to perform vectorized operations, which allows for faster and more efficient computations. Instead of looping over individual elements of an array, we can perform operations on the entire array at once. This not only simplifies our code but also improves its performance.

Numpy also provides numerous functions for creating arrays of different shapes and sizes. For example, we can use the `np.array()` function to create a new array from a list or a tuple. We can specify the data type of the array elements using the `dtype` parameter.

In addition to creating arrays, Numpy offers a wide range of functions for performing various mathematical operations. We can easily perform basic arithmetic operations, such as addition, subtraction, multiplication, and division, on Numpy arrays. Numpy also provides functions for calculating statistics, finding maximum and minimum values, and performing linear algebra operations. 

In this article, we'll provide a straightforward explanation of Numpy concepts and how to use them.

### Importing Numpy

Before diving into Numpy's capabilities, we need to import it. Conventionally, we import Numpy with the alias `np`, making it easier to reference its functions:

In [1]:
# Import numpy package and give it an alias, "np", as convention. 
import numpy as np

In [2]:
np

<module 'numpy' from 'C:\\Users\\ASUS\\Documents\\machine-learning-zoomcamp-2024\\.venv\\Lib\\site-packages\\numpy\\__init__.py'>

### Creating arrays

Arrays are the building blocks of Numpy, and they can be thought of as lists but with enhanced features. 

#### Creating Arrays with Zeros, Ones, or Constants

You can create arrays filled with zeros, ones, or any constant using `np.zeros()`, `np.ones()`, and `np.full()`:

In [3]:
# Creates an array of zeros with the specified size in argument. 
# np.zeros(size)

np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [4]:
# Fill the array with ones instead of zeros.
# np.ones(size)

np.ones(10)

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [5]:
# Fill the array with some arbitrary value.
# np.full(size, value)

np.full(10, 2.5)

array([2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5, 2.5])

#### Converting Lists to Arrays

To convert a Python list into a Numpy array, you can use `np.array()`:

In [6]:
# Create an array in numpy by passing in a list.
a = np.array([1, 2, 3, 5, 7, 12])
a

array([ 1,  2,  3,  5,  7, 12])

In [7]:
# Array indexing.
# Retrieve the 3rd element in the array and replace it with 10.

a[2] = 10

In [8]:
a

array([ 1,  2, 10,  5,  7, 12])

#### Generating Ranges of Numbers

Numpy provides functions for generating arrays of sequential numbers. For example:

In [9]:
# Create an array with values ranging from 0 to 9.
# arange() function is similar to range() function except it returns 
# a numpy array instead of an iterator/generator-like object.
np.arange(10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [10]:
# Create an array with values ranging from 3 to 9.
np.arange(3, 10)

array([3, 4, 5, 6, 7, 8, 9])

#### Creating Arrays with Linear Spacing

`np.linspace()` creates arrays with evenly spaced numbers within a specified range:

In [11]:
# Create an array of size determined by the third parameter, filled with 
# values between the first parameter and the second parameter.
# np.linspace(start_val, end_val, size)

np.linspace(0, 100, 11)

array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.,  80.,  90., 100.])

In [12]:
np.linspace(0, 1, 10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

## Multi-dimensional arrays

Numpy can handle multi-dimensional arrays, often referred to as matrices. Here are some examples:

In [13]:
# Create a two-dimensional array with 5 rows and 2 columns.
# np.zeros((row, col))

np.zeros((5, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.],
       [0., 0.]])

In [14]:
# Create a 2D array from a python list.
n = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
])
n

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

#### Indexing and Slicing Arrays

Like Python lists, you can access elements in Numpy arrays using indexing and slicing. For two-dimensional arrays:

In [15]:
# Access a particular element in the 2D array and replace it with another value.
# n[row_idx, col_idx] = new_value

n[0, 1] = 20

In [16]:
n

array([[ 1, 20,  3],
       [ 4,  5,  6],
       [ 7,  8,  9]])

In [17]:
# Access the entire row in the 2D array and replace it with another row.
# n[row_idx] = [col_0, col_1, col_2]

n[2] = [1, 1, 1]

In [18]:
n

array([[ 1, 20,  3],
       [ 4,  5,  6],
       [ 1,  1,  1]])

In [19]:
# Retrieve a specific column of a 2D array.
# The column (:) operator returns all the rows and we need 
# to pass it as an argument here because the row field can't be left empty. 
# n[:, col_idx]

n[:, 2]

array([3, 6, 1])

In [20]:
# Access the third column of a 2D array and replace it with another column.
n[:, 2] = [0, 1, 2]

In [21]:
n

array([[ 1, 20,  0],
       [ 4,  5,  1],
       [ 1,  1,  2]])

## Randomly generated arrays

Numpy can create arrays filled with random numbers. To ensure reproducibility, you can set a seed using `np.random.seed()`:

In [22]:
# Using the random package in numpy
# Generate a 2D array of size, 5 rows and 2 columns, with random values ranging from 0 to 1. 
# np.random.rand(row, col)

np.random.rand(5, 2)

array([[0.99208673, 0.24555124],
       [0.81069173, 0.08591352],
       [0.36629364, 0.03725779],
       [0.99736131, 0.35428299],
       [0.67444845, 0.82024659]])

In [23]:
# Set the seed of this random generator so that the sequence of random numbers
# produced on my computer and on your computer will be the same. 
# The numbers are random but every time we execute this cell, the results are the 
# same because we fixed the random seed. 

np.random.seed(2)
# Multiply by 100 to get numbers from 0 to 100 instead of 0 to 1. 
100 * np.random.rand(5, 2)

array([[43.59949021,  2.59262318],
       [54.96624779, 43.53223926],
       [42.03678021, 33.0334821 ],
       [20.4648634 , 61.92709664],
       [29.96546737, 26.68272751]])

For random numbers from a normal distribution or integers within a range:

In [24]:
# Draw numbers from the standard normal distribution using randn().
# The distribution looks a bit different here with n.

np.random.seed(2)
# Multiply by 100 to get numbers from 0 to 100 instead of 0 to 1. 
100 * np.random.randn(5, 2)

array([[ -41.67578474,   -5.62668272],
       [-213.61960957,  164.02708084],
       [-179.34355852,  -84.17473657],
       [  50.28814172, -124.52880866],
       [-105.79522189,  -90.90076149]])

In [25]:
# Generate random integers using randint() by specifying the lowest number
# and the highest number (exclusive) with its row and column size. 
# np.random.randint(low=min, high=max, size=(row, col))

np.random.seed(2)
# Generate a 2D array of size, 5 rows and 2 columns, with random integers 
# ranging from 0 to 99.
np.random.randint(low=0, high=100, size=(5, 2))

array([[40, 15],
       [72, 22],
       [43, 82],
       [75,  7],
       [34, 49]], dtype=int32)

## Array Operations

Numpy excels in performing mathemathical operations on arrays efficiently. 

### Element-wise Operations

You can perform operations on entire arrays element by element: 

In [26]:
# Create an array with values starting from 0 to 4.
a = np.arange(5)
a

array([0, 1, 2, 3, 4])

In [27]:
# Add 1 to every element in the array.
# numpy makes arithmetic computation easier using its array.
a + 1

array([1, 2, 3, 4, 5])

In [28]:
# Multiply every element in the array by 2.
# numpy makes arithmetic computation easier using its array.
a * 2

array([0, 2, 4, 6, 8])

In [29]:
# We can chain the arithmetic computations and save the result
# in the array to a new numpy array. 
b = (10 + (a * 2)) ** 2 / 100

In [30]:
b

array([1.  , 1.44, 1.96, 2.56, 3.24])

### Element-wise Operations with Two Arrays

You can also perform operations between two arrays of the same shape:

In [31]:
# Combine the two arrays by summing up the values (element-wise operation).
a + b

array([1.  , 2.44, 3.96, 5.56, 7.24])

In [32]:
# We can chain the operations.
a / b + 10

array([10.        , 10.69444444, 11.02040816, 11.171875  , 11.2345679 ])

### Comparison Operations

You can perform element-wise comparisons and create boolean arrays:

In [33]:
a

array([0, 1, 2, 3, 4])

In [34]:
# Check if elements in the array are higher or equal to 2. 
a >= 2

array([False, False,  True,  True,  True])

In [35]:
b

array([1.  , 1.44, 1.96, 2.56, 3.24])

In [36]:
# Compare the elements of one array with the elements of another array to see 
# if elements in a are higher than elements in b.
a > b

array([False, False,  True,  True,  True])

### Selecting Elements Based on Conditions

You can create subarrays based on certain conditions:

In [37]:
# Returns all the elements of array a for which this condition is true.
# Access the elements of a that are greater than b (to look at all the elements
# that satisfy this condition).

a[a > b]

array([2, 3, 4])

### Summarizing Operations

Numpy provides functions for summarizing array data:

In [38]:
a

array([0, 1, 2, 3, 4])

In [39]:
# Return a single number (summarizing operations) instead of 
# an array (element-wise operations and comparison operations)

# Return the smallest number in the array.
a.min()

np.int64(0)

In [40]:
# Return the largest number in the array.
a.max()

np.int64(4)

In [41]:
# Compute and return the sum of all elements in the array.
a.sum()

np.int64(10)

In [42]:
# Compute the average/mean of all elements in the array.
a.mean()

np.float64(2.0)

In [43]:
# Return the standard deviation of all elements in the array.
a.std()

np.float64(1.4142135623730951)

In [44]:
# The summarizing operations also work on 2D Array. 
n

array([[ 1, 20,  0],
       [ 4,  5,  1],
       [ 1,  1,  2]])

In [45]:
n.min()

np.int64(0)

In [46]:
n.max()

np.int64(20)

In [47]:
n.sum()

np.int64(35)

## Conclusion

Overall, Numpy is an essential library for any data scientist or programmer working with numerical data in Python. Its efficient array operations and wide range of mathematical functions make it a powerful tool for scientific computing and data analysis. 

With the basics covered in this article, you're well on your way to harnessing Numpy's capabilities. 

### Further Readings
- [Introduction to NumPy](https://mlbookcamp.com/article/numpy)
- [Machine Learning Bookcamp Numpy Notebook](https://github.com/alexeygrigorev/mlbookcamp-code/blob/master/appendix-c-numpy.ipynb)
- [Numpy Cheatsheet](https://www.datacamp.com/community/blog/python-numpy-cheat-sheet)

### Next

Linear algebra refresher