# Applying Machine Learning Process: Model Selection

## Outline

* [Loading Data](#Loading-Data)
* [Finding Best Set of Parameters](#Finding-Best-Set-of-Parameters)
* [Finding Best Model](#Finding-Best-Model)
* [Challenge](#Challenge)

## Loading Data

In [None]:
import pandas as pd

In [None]:
iris_data_url = 'https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv'
df = pd.read_csv(iris_data_url)

In [None]:
X = df.drop(['species'], axis='columns')
y = df['species']

---

## Finding Best Set of Parameters

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4)

In [None]:
from sklearn import metrics
from sklearn.neighbors import KNeighborsClassifier

In [None]:
k_range = range(1, 26)
scores = []
for k in k_range:
    knn = KNeighborsClassifier(n_neighbors=k)
    knn.fit(X_train, y_train)
    y_pred = knn.predict(X_test)
    scores.append(metrics.accuracy_score(y_test, y_pred))

In [None]:
scores

In [None]:
%matplotlib inline

import matplotlib.pyplot as plt

In [None]:
plt.plot(k_range, scores)
plt.xlabel('Value of K for KNN')
plt.ylabel('Testing Accuracy')

In [None]:
# instantiate
knn = KNeighborsClassifier(n_neighbors=11)

# fit
knn.fit(X, y)

# predict
X_new = [[3, 5, 4, 2]]
knn.predict(X_new)

---

## Finding Best Model

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC

In [None]:
model_set = [
    KNeighborsClassifier,
    LogisticRegression,
    SVC
]

In [None]:
from sklearn import metrics

In [None]:
scores = []
for model in model_set:
    m = model()
    m.fit(X_train, y_train)
    y_pred = m.predict(X_test)
    scores.append(metrics.accuracy_score(y_test, y_pred))

In [None]:
scores

---

## Challenge

ลองหาทั้ง best set of paramters และ best model ในเวลาเดียวกัน