# Today's Coding Topics
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/xiangshiyin/data-programming-with-python/blob/main/2023-summmer/2023-06-21/notebook/concept_and_code_demo.ipynb)

* Recap of previous lecture
  * Numpy
  * Nearest neighbor search example review
  * Special topic: namespaces in Python
* `Pandas`


# Recap of previous lecture

## Numpy and Numpy Arrays

## import `numpy`

In [2]:
import numpy as np

## Create numpy arrays

In [3]:
aList = [1,2,3,4]
aNumpyArray = np.array(aList)

In [4]:
aNumpyArray

array([1, 2, 3, 4])

In [5]:
aNumpyArray.ndim

1

In [6]:
aNumpyArray.shape

(4,)

In [7]:
aNumpyArray = np.array(aList).reshape(2,2)

In [8]:
aNumpyArray.ndim

2

In [9]:
aNumpyArray.shape

(2, 2)

## Operations on numpy arrays

In [10]:
aNumpyArray.T

array([[1, 3],
       [2, 4]])

In [11]:
a = np.array(aList).reshape(2,2)
b = np.eye(2)

In [12]:
b

array([[1., 0.],
       [0., 1.]])

In [13]:
a.dot(b)

array([[1., 2.],
       [3., 4.]])

## Generate random numbers with `numpy`

In [14]:
np.random.rand(3)

array([0.32256308, 0.09448646, 0.98135526])

In [15]:
np.random.randn(2,2)

array([[-0.99073533, -1.62638306],
       [ 0.62458928,  0.49296837]])

In [16]:
np.random.randint(low=0, high=10, size=100)

array([0, 8, 0, 5, 5, 5, 5, 6, 0, 8, 1, 4, 9, 5, 7, 3, 5, 8, 3, 7, 6, 1,
       0, 3, 4, 9, 9, 3, 0, 8, 6, 0, 6, 6, 7, 0, 6, 1, 3, 1, 0, 6, 3, 8,
       9, 9, 4, 3, 9, 9, 8, 0, 7, 8, 3, 5, 3, 9, 1, 3, 0, 9, 5, 1, 5, 2,
       0, 9, 8, 4, 3, 0, 1, 9, 1, 9, 5, 2, 0, 3, 1, 6, 2, 8, 7, 6, 5, 1,
       3, 2, 2, 2, 0, 4, 2, 8, 4, 2, 9, 3])

## Example: Nearest neighbor search

Euclidean distance between 2 points $(x_1,y_1,z_1)$ and $(x_2,y_2,z_2)$ is:
$$\sqrt{(x_2-x_1)^2+(y_2-y1)^2+(z_2-z_1)^2}$$

In [None]:

### Pure iterative Python ###
points = [[9,2,8],[4,7,2],[3,4,4],[5,6,9],[5,0,7],[8,2,7],[0,3,2],[7,3,0],[6,1,1],[2,9,6]]
qPoint = [4,5,3]

minIdx = -1
minDist = -1
for idx, point in enumerate(points):  # iterate over all points
    print('index is {}, point is {}'.format(idx, point))
    dist = sum([(dp-dq)**2 for dp,dq in zip(point,qPoint)])**0.5  # compute the euclidean distance for each point to q
    if dist < minDist or minDist < 0:  # if necessary, update minimum distance and index of the corresponding point
        minDist = dist
        minIdx = idx

print('Nearest point to q: ', points[minIdx])

In [None]:
# # # Equivalent NumPy vectorization # # #
import numpy as np
points = np.array([[9,2,8],[4,7,2],[3,4,4],[5,6,9],[5,0,7],[8,2,7],[0,3,2],[7,3,0],[6,1,1],[2,9,6]])
qPoint = np.array([4,5,3]).reshape(1,3)
minIdx = np.argmin(np.linalg.norm(points-qPoint,axis=1))  # compute all euclidean distances at once and return the index of the smallest one
print('Nearest point to q: ', points[minIdx])