Given is a pair of vectors, and a comparison between vector and loop based implementation.

In [17]:
import numpy as np

In [2]:
a = [i + 1 for i in range(0, 500)]
b = [i for i in range(0, 500)]
dist_squared = sum([(a_i - b_i)**2 for a_i, b_i in zip(a, b)])
dist_squared

500

In [3]:
a_numpy = np.array(a)
b_numpy = np.array(b)
dist_squared = np.sum(np.square(a_numpy - b_numpy))
dist_squared

np.int64(500)

In [4]:
# using pure python
%timeit dist_squared = sum([(a_i - b_i)**2 for a_i, b_i in zip(a, b)])

39.2 μs ± 1.18 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


In [5]:
# using numpy
%timeit dist_squared = np.sum(np.square(a_numpy - b_numpy))

5.5 μs ± 161 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


Now we want to calculate the euclidian distance between not only two vectors, but between 2 sets of vectors, each represented as a matrix. Your implementations should use just numpy (do not use scipy, sklearn, pandas, etc).

1. Implement the loop based code.
2. Implement the vector based code, without broadcast
3. Same, using broadcast.
4. Compare the run time between the loop and vector based implementations (questions 1 and 2).
5. Calculate the l2 norm of a matrix with row-based vectors. You can not use np.linalg.norm.
6. Calculate the l1 norm of a matrix with row-based vectors. You can not use np.linalg.norm.

## 1. Loop-based code

In [20]:
import random
import math

NUM_VECTORS = 500

random.seed(42)

# lambda func for calculating distance
distance = lambda a, b : math.sqrt((b[0] - a[0]) ** 2 + (b[1] - a[1]) ** 2)

a = [[random.random(), random.random()] for _ in range(NUM_VECTORS)]
b = [[random.random(), random.random()] for _ in range(NUM_VECTORS)]

%timeit matrix = [[distance(a_i, b_i) for b_i in b] for a_i in a]
np.array(matrix).shape

85.5 ms ± 354 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


(500, 500)

## 2. Vector-based code without broadcast

In [34]:
# lambda func for calculating distance
distance = lambda a_i, b_j: np.sqrt(np.sum((b_j - a_i)**2))

a = [[random.random(), random.random()] for _ in range(NUM_VECTORS)]
b = [[random.random(), random.random()] for _ in range(NUM_VECTORS)]

a_numpy = np.array(a)
b_numpy = np.array(b)

(b_numpy - a_numpy) ** 2

# %timeit matrix = [[distance(a_i, b_j) for b_j in b_numpy] for a_i in a_numpy]
# np.array(matrix).shape

array([[7.42078252e-03, 2.36864032e-02],
       [5.38182553e-01, 3.84742016e-02],
       [1.22109026e-01, 2.96104677e-02],
       [3.14567738e-03, 4.82798779e-03],
       [2.79977994e-03, 3.86046986e-01],
       [1.10033784e-02, 2.69564215e-01],
       [2.86939078e-01, 8.45992879e-04],
       [6.23347152e-02, 1.81210292e-04],
       [5.32949909e-01, 1.78351877e-01],
       [4.15598849e-05, 3.94759335e-01],
       [4.99643772e-02, 3.47323199e-02],
       [8.75826678e-03, 6.63982552e-01],
       [2.79704160e-01, 4.13888411e-01],
       [1.45827333e-01, 2.11493888e-04],
       [7.05657301e-01, 3.58124147e-01],
       [3.04938020e-01, 1.52552356e-01],
       [2.22236688e-01, 6.94702458e-02],
       [6.53203510e-02, 1.39248499e-01],
       [3.18602753e-02, 6.07241447e-02],
       [1.04620417e-03, 4.07455382e-01],
       [5.49324474e-02, 1.18509561e-01],
       [2.95022160e-01, 1.40744285e-02],
       [4.14119404e-01, 4.23503353e-02],
       [7.27195547e-01, 3.93364033e-01],
       [6.462605