<a href="https://colab.research.google.com/github/vyavasthita/dsml_learning/blob/master/Vectorization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [43]:
import numpy as np

In [44]:
m1 = np.arange(1, 11)
m1 ** 2

array([  1,   4,   9,  16,  25,  36,  49,  64,  81, 100])

We can not use external functions as numpy functions
For example, we can not apply a factorial function from the math library to a set of arrays.

Vectorising the functions which are not numpy's built-in functions

In [45]:
from math import factorial

vectorized_arr = np.vectorize(factorial)# convenient wrapper over for loop, this will not speed up, it is just writing for loop for us
vectorized_arr(m1)

array([      1,       2,       6,      24,     120,     720,    5040,
         40320,  362880, 3628800])

In [46]:
def plus_one(value: int):
  return value + 1

In [47]:
plus_ones = np.vectorize(plus_one)
plus_ones(np.arange(1, 51))

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
       19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
       36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51])

multiplying 1d array with another 1d array with same shape

In [48]:
a = np.arange(5, 11)
b = np.arange(10, 16)
a*b

array([ 50,  66,  84, 104, 126, 150])

Multiplying 2d arrays

In [49]:
c = np.arange(1, 16).reshape(5, 3)
d = np.arange(16, 31).reshape(5, 3)
c*d

array([[ 16,  34,  54],
       [ 76, 100, 126],
       [154, 184, 216],
       [250, 286, 324],
       [364, 406, 450]])

But if shapes are different, these operations do not work

Matrix multiplication.
It works when the number of rows in the first matrix is the same as the number of columns in the second matrix.

There are 3 ways to do this.

1. np.matmul(a, b)
2. np.dot(a, b) - this is more generic, as it works with 1d & 1d, array & scaler, array & array
3. a @ b

Matrix multiplication is used in image processing, ML models, and LLMs

In [50]:
a = np.arange(1, 13).reshape(2, 6)
b = np.arange(13, 25).reshape(6, 2)
np.matmul(a, b)

array([[ 413,  434],
       [1061, 1118]])

In [51]:
np.dot(a, b)

array([[ 413,  434],
       [1061, 1118]])

In [52]:
a @ b

array([[ 413,  434],
       [1061, 1118]])

Shallow Vs Deep Copy.

Any element-wise copy will create a new object, so deep copy.

a = np.arange(1, 10)

a ** 2 This will be an element-wise copy and hence a deep copy.

Shallow copy saves memory.

When we want to change an image, we need to use a shallow copy.

If we have data and we want this data to be used by multiple ML models, then we need to use shallow copy.

In [53]:
np.shares_memory(a, b)

False

In [54]:
m1 = np.arange(1, 7)
m2 = m1.reshape(2, 3)
print(id(m1[0]), id(m2[0][0]))

135852717976272 135852717976272


Use np.copy() for deep copy
Use np.view() for shallow copy

In [55]:
m1 = np.arange(1, 7)
m2 = m1.copy()

np.shares_memory(m1, m2)

False

In [56]:
m3 = m1.view()
np.shares_memory(m1, m3)

True

Array Splitting

We have our LLM models and we have a data set containing 1000 rows.

We need to train our data as well as we need to test our models.

So we will split the data into two parts: the first part is to be used for training the LLMs, and the second part is for testing the LLMs.

In [57]:
m1 = np.arange(1, 11)
np.split(m1, 5)

[array([1, 2]), array([3, 4]), array([5, 6]), array([7, 8]), array([ 9, 10])]

Unequal size split

In [58]:
m1 = np.arange(1, 11)
np.split(m1, (2, 5, 8)) # provide tuple of indexes

[array([1, 2]), array([3, 4, 5]), array([6, 7, 8]), array([ 9, 10])]

2D Array split

In [59]:
m1 = np.arange(1, 17)
m2 = m1.reshape(4, 4)
m2
np.split(m2, 2, axis=0)

[array([[1, 2, 3, 4],
        [5, 6, 7, 8]]),
 array([[ 9, 10, 11, 12],
        [13, 14, 15, 16]])]

In [60]:
np.hsplit(m2, 2)

[array([[ 1,  2],
        [ 5,  6],
        [ 9, 10],
        [13, 14]]),
 array([[ 3,  4],
        [ 7,  8],
        [11, 12],
        [15, 16]])]

In [61]:
np.vsplit(m2, 2)

[array([[1, 2, 3, 4],
        [5, 6, 7, 8]]),
 array([[ 9, 10, 11, 12],
        [13, 14, 15, 16]])]

Stacking array

In [62]:
a = np.arange(1, 11)
b = np.arange(11, 21)

In [63]:
np.vstack((a, b))

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])

In [64]:
np.hstack((a, b))

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

In [65]:
m1 = np.array(
    [
        [1, 2],
        [3, 4],
        [5, 6]]
    )
m1

array([[1, 2],
       [3, 4],
       [5, 6]])