# Intro to Machine Learning: Scikit-Learn

## Outline:

* [Intro to Scikit-Learn](#Intro-to-Scikit-Learn)
* [Creating a Model](#Creating-a-Model)
* [Scikit-Learn Algorithm Cheat Sheet](#Scikit-Learn-Algorithm-Cheat-Sheet)

## Intro to Scikit-Learn

In [1]:
from IPython.display import IFrame
IFrame('http://scikit-learn.org', width=800, height=350)

---

## Creating a Model

In [2]:
import pandas as pd

In [3]:
iris_data_url = 'https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv'
df = pd.read_csv(iris_data_url)

In [4]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [5]:
X = df.drop(['species'], axis=1)
y = df['species']

### 4-Step Modeling Pattern

1. Import the model (Import)
1. Instantiate an estimator (Instantiate)
1. Fit the model (Fit)
1. Make a prediction (Predict)

**Step 1:** Import the model (import)

In [6]:
from sklearn.neighbors import KNeighborsClassifier

**Step 2:** Instantiate an estimator (instantiate)

In [7]:
knn = KNeighborsClassifier(n_neighbors=1)

**Step 3:** Fit the model (fit)

In [8]:
knn.fit(X, y)

KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=None, n_neighbors=1, p=2,
           weights='uniform')

**Step 4:** Make a prediction (predict)

In [9]:
X_new = [[3, 5, 4, 2]]
knn.predict(X_new)

array(['virginica'], dtype=object)

In [10]:
X_new = [
    [3, 5, 4, 2], 
    [5, 4, 3, 2]
]
knn.predict(X_new)

array(['virginica', 'versicolor'], dtype=object)

#### Try a different model

In [11]:
# import
from sklearn.linear_model import LogisticRegression

# instantiate
logreg = LogisticRegression()

# fit
logreg.fit(X, y)

# predict
logreg.predict(X_new)



array(['virginica', 'setosa'], dtype=object)

#### What about SVM?

In [12]:
# import
from sklearn import svm

# instantiate
clf = svm.SVC()

# fit
clf.fit(X, y)

# predict
clf.predict(X_new)



array(['virginica', 'versicolor'], dtype=object)

---

## Scikit-Learn Algorithm Cheat Sheet

![](images/scikit-learn-algorithm-cheat-sheet.png)
<div style="text-align: center;">
<strong>Credit:</strong> http://peekaboo-vision.blogspot.de/2013/01/machine-learning-cheat-sheet-for-scikit.html
</div>