<a href="https://colab.research.google.com/github/vyavasthita/dsml_learning/blob/master/numpy/3_Vectorization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vectorization

In [None]:
import numpy as np

In [None]:
m1 = np.arange(1, 11)
m1 ** 2

We can not use external functions as numpy functions
For example, we can not apply a factorial function from the math library to a set of arrays.

**Vectorising** the functions which are not numpy's built-in functions

In [None]:
from math import factorial

vectorized_arr = np.vectorize(factorial)# convenient wrapper over for loop, this will not speed up, it is just writing for loop for us
vectorized_arr(m1)

In [None]:
np.vectorize(factorial)(m1)

In [None]:
def plus_one(value: int):
  return value + 1

In [None]:
np.vectorize(plus_one)(m1)

In [None]:
np.vectorize(plus_one)(np.arange(10, 31, 2))

In [None]:
plus_ones = np.vectorize(plus_one)
plus_ones(np.arange(1, 51))

multiplying 1d array with another 1d array with same shape

In [None]:
a = np.arange(5, 11)
b = np.arange(10, 16)
a*b

Multiplying 2d arrays

In [None]:
c = np.arange(1, 16).reshape(5, 3)
d = np.arange(16, 31).reshape(5, 3)
c*d

But if shapes are different, these operations do not work

# Matrix multiplication.
It works when the number of rows in the first matrix is the same as the number of columns in the second matrix.

There are 3 ways to do this.

1. np.matmul(a, b)
2. np.dot(a, b) - this is more generic, as it works with 1d & 1d, array & scaler, array & array
3. a @ b

Matrix multiplication is used in image processing, ML models, and LLMs

In [None]:
a = np.arange(1, 13).reshape(2, 6)
a

In [None]:
b = np.arange(13, 25).reshape(6, 2)
b

In [None]:
np.matmul(a, b)

In [None]:
a = np.arange(1, 13).reshape(2, 6)
b = np.arange(13, 25).reshape(6, 2)
np.matmul(a, b)

In [None]:
np.dot(a, b)

In [None]:
a @ b

# Shallow Vs Deep Copy.

Any element-wise copy will create a new object, so deep copy.

a = np.arange(1, 10)

a ** 2 This will be an element-wise copy and hence a deep copy.

Shallow copy saves memory.

When we want to change an image, we need to use a shallow copy.

If we have data and we want this data to be used by multiple ML models, then we need to use shallow copy.

In [None]:
np.shares_memory(a, b)

In [None]:
m1 = np.arange(1, 7)
m2 = m1.reshape(2, 3)
print(id(m1[0]), id(m2[0][0]))

Use np.copy() for deep copy
Use np.view() for shallow copy

In [None]:
m1 = np.arange(1, 7)
m2 = m1.copy()

np.shares_memory(m1, m2)

In [None]:
m3 = m1.view()
np.shares_memory(m1, m3)

# Array Splitting

We have our LLM models and we have a data set containing 1000 rows.

We need to train our data as well as we need to test our models.

So we will split the data into two parts: the first part is to be used for training the LLMs, and the second part is for testing the LLMs.

In [None]:
arr = np.arange(10, 201, 10)
np.split(arr, 10)

In [None]:
m1 = np.arange(1, 11)
np.split(m1, 5)

Unequal size split

In [None]:
m1 = np.arange(1, 11)
np.split(m1, (2, 5, 8)) # provide tuple of indexes

2D Array split

In [None]:
m1 = np.arange(1, 17)
m2 = m1.reshape(4, 4)
m2

In [None]:
np.split(m2, 2, axis=0)

In [None]:
np.hsplit(m2, 2)

In [None]:
np.vsplit(m2, 2)

# Stacking array

In [None]:
a = np.arange(1, 11)
b = np.arange(11, 21)

In [None]:
np.vstack((a, b))

In [None]:
np.vstack((b, a))

In [None]:
np.hstack((a, b))

In [None]:
m1 = np.array(
    [
        [1, 2],
        [3, 4],
        [5, 6]]
    )
m1