### Why Useful
- `predict` other attributes about people using their `proximity` data.
    - `example`, those who live in a particular neighborhood may be more likely to be a certain age or have similar interests. Using proximity, we might even be able to determine whether their likelihood to purchase a product approximates that of their neighbors.
    

### HyperParameters - K | Model performance (error metric, bias, variance)

Senario: cupcake purchases prediction dramatically change with `k` value, choose best k with most accurate prediction on new data. 


#### Train, Test
- Train: for tweaking algorithm, `minize error metric` 
- Test: for `generalization` purpose, test model performance on `unseen` data 

#### High Bias , Variance
`Both poor prediction accuracy/large error on unseen data(test data)`

##### Underfitting
- model fails to catch relevant signals/pattern of data
- e.g., `k = large number, with large training error`

##### Overfitting
- `overgeneralize` from data  
- e.g., `k = 1, with minimized training error`

#### Error Metric
- simple self-defined, e.g.: `error =  y-y_pred`
- Goal: `choose k with minimized(aggregate(y-y_pred) of all training data)`

### Assumptions

`Euclidean distance`
- shortest distance btw 2 points
    - straight line: `hypothenus`
    - `Pythagorean theorem` 
- x, y values: same scale

### Code Structure

To write a `nearest_neighbors` function, we break this into steps:

1. Write a function to calculate the distance of one neighbor from another
2. Write a function that returns the distance between one neighbor and all others (using `map`)
3. Return a selected number of nearest neighbors

`check intermediate outputs` from time to time, ensure each function is correct 

### Flow of Data
watch `flow of data` by seeing:
1. inputs of functions, `keep in mind what is original data look like`
2. outputs of functions, `have in mind what the output should look like`
3. relationship btw inputs & output of each nested function

In [13]:
neighbors = [{'name': 'Bob', 'x': 4, 'y': 8}, {'name': 'Suzie', 'x': 1, 'y': 11}, 
             {'name': 'Fred', 'x': 5, 'y': 8}, {'name': 'Edgar', 'x': 6, 'y': 13},
             {'name': 'Steven', 'x': 3, 'y': 6}, {'name': 'Natalie', 'x': 5, 'y': 4}]

In [14]:
bob = neighbors[0] # selected_individual
suzie = neighbors[1] # neighbor to test

`import math`, `math.sqrt()`

In [15]:
import math
def distance(selected_individual, neighbor):
   distance_squared = (neighbor['x'] - selected_individual['x'])**2 + (neighbor['y'] - selected_individual['y'])**2
   return math.sqrt(distance_squared)

`new_dict = dict.copy()`

In [16]:
def distance_between_neighbors(selected_individual, neighbor):
    neighbor_with_distance = neighbor.copy()
    neighbor_with_distance['distance'] = distance(selected_individual, neighbor)
    return neighbor_with_distance

In [17]:
# Check 
distance_between_neighbors(bob, suzie)

{'distance': 4.242640687119285, 'name': 'Suzie', 'x': 1, 'y': 11}

`filter()`, `map()`

In [18]:
def distance_all(selected_individual, neighbors):
    remaining_neighbors = filter(lambda neighbor: neighbor != selected_individual, neighbors)
    return list(map(lambda neighbor: distance_between_neighbors(selected_individual, neighbor), remaining_neighbors))

`or len()`, `sorted()`

In [19]:
def nearest_neighbors(selected_individual, neighbors, number = None):
    number = number or len(neighbors)
    neighbor_distances = distance_all(selected_individual, neighbors)
    sorted_neighbors = sorted(neighbor_distances, key=lambda neighbor: neighbor['distance'])
    return sorted_neighbors[:number]

In [20]:
nearest_neighbors(bob, neighbors)

[{'distance': 1.0, 'name': 'Fred', 'x': 5, 'y': 8},
 {'distance': 2.23606797749979, 'name': 'Steven', 'x': 3, 'y': 6},
 {'distance': 4.123105625617661, 'name': 'Natalie', 'x': 5, 'y': 4},
 {'distance': 4.242640687119285, 'name': 'Suzie', 'x': 1, 'y': 11},
 {'distance': 5.385164807134504, 'name': 'Edgar', 'x': 6, 'y': 13}]

In [21]:
nearest_neighbors(bob, neighbors, 2)

[{'distance': 1.0, 'name': 'Fred', 'x': 5, 'y': 8},
 {'distance': 2.23606797749979, 'name': 'Steven', 'x': 3, 'y': 6}]