# K-Nearest Neighbors

## Using NearestNeighbors

In [1]:
using NearestNeighbors
using RDatasets

In [2]:
data = rand(3, 10^4)
k = 3
point = rand(3)

3-element Array{Float64,1}:
 0.18410323414446528
 0.7459431641981347
 0.550775980816407

In [3]:
kdtree = KDTree(data)

KDTree{StaticArrays.SArray{Tuple{3},Float64,1,3},Euclidean,Float64}
  Number of points: 10000
  Dimensions: 3
  Metric: Euclidean(0.0)
  Reordered: true

In [4]:
idxs, dists = knn(kdtree, point, k, true)

([4786, 2931, 5220], [0.01662805959377539, 0.020823085204417193, 0.02968720655791071])

## Using MLJ

In [5]:
using MLJ

### Load data

In [6]:
smarket = dataset("ISLR", "Smarket")
first(smarket, 6)

Unnamed: 0_level_0,Year,Lag1,Lag2,Lag3,Lag4,Lag5,Volume,Today,Direction
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Categorical…
1,2001.0,0.381,-0.192,-2.624,-1.055,5.01,1.1913,0.959,Up
2,2001.0,0.959,0.381,-0.192,-2.624,-1.055,1.2965,1.032,Up
3,2001.0,1.032,0.959,0.381,-0.192,-2.624,1.4112,-0.623,Down
4,2001.0,-0.623,1.032,0.959,0.381,-0.192,1.276,0.614,Up
5,2001.0,0.614,-0.623,1.032,0.959,0.381,1.2057,0.213,Up
6,2001.0,0.213,0.614,-0.623,1.032,0.959,1.3491,1.392,Up


### Casting scientific types

In [7]:
y, X = unpack(smarket, ==(:Direction), colname -> true);
X = select(X, Not([:Year, :Today]))
first(X, 6)

Unnamed: 0_level_0,Lag1,Lag2,Lag3,Lag4,Lag5,Volume
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64
1,0.381,-0.192,-2.624,-1.055,5.01,1.1913
2,0.959,0.381,-0.192,-2.624,-1.055,1.2965
3,1.032,0.959,0.381,-0.192,-2.624,1.4112
4,-0.623,1.032,0.959,0.381,-0.192,1.276
5,0.614,-0.623,1.032,0.959,0.381,1.2057
6,0.213,0.614,-0.623,1.032,0.959,1.3491


In [8]:
y = coerce(y, OrderedFactor)
classes(y[1])

2-element CategoricalArray{String,1,UInt8}:
 "Down"
 "Up"

### Training/testing set

In [9]:
train, test = partition(eachindex(y), 0.7, shuffle=true)

([611, 1215, 1026, 660, 727, 484, 207, 176, 355, 951  …  375, 867, 149, 880, 204, 312, 464, 704, 404, 61], [897, 824, 543, 684, 309, 441, 525, 905, 891, 911  …  574, 968, 519, 395, 33, 459, 66, 549, 237, 664])

### Model

In [10]:
model = @load KNNClassifier pkg=NearestNeighbors

KNNClassifier(
    K = 5,
    algorithm = :kdtree,
    metric = Euclidean(0.0),
    leafsize = 10,
    reorder = true,
    weights = :uniform)[34m @ 1…16[39m

In [11]:
model.K = 3
match = machine(model, X, y)

[34mMachine{KNNClassifier} @ 1…06[39m


### Training

In [12]:
fit!(match, rows=train)

┌ Info: Training [34mMachine{KNNClassifier} @ 1…06[39m.
└ @ MLJBase /home/yuehhua/.julia/packages/MLJBase/O5b6j/src/machines.jl:187


[34mMachine{KNNClassifier} @ 1…06[39m


### Predict

In [13]:
ŷ = predict_mode(match, rows=test)

375-element CategoricalArray{String,1,UInt8}:
 "Down"
 "Up"
 "Up"
 "Up"
 "Up"
 "Up"
 "Down"
 "Down"
 "Down"
 "Up"
 "Down"
 "Down"
 "Down"
 ⋮
 "Down"
 "Up"
 "Up"
 "Up"
 "Up"
 "Down"
 "Down"
 "Down"
 "Up"
 "Down"
 "Up"
 "Up"

### Evaluation

In [14]:
accuracy(ŷ, y[test])

0.504