## About

Working with famous Iris Dataset.

### Links

* [Iris Data Set](https://archive.ics.uci.edu/ml/datasets/Iris)
* Original Articles
    - [Notebook](http://nbviewer.ipython.org/url/astroml.github.com/sklearn_tutorial/_downloads/02_iris_classification.ipynb)
    - [Detailed Description](http://www.astroml.org/sklearn_tutorial/general_concepts.html)

### Dependecies

```
pip install sklearn
```

## Creating First Model

### Loading Data

The first step is always a data loading. We are lucky because iris dataset is already emdeded into machine learning package.

In [1]:
from sklearn.datasets import load_iris
iris = load_iris()

### Data Understanding

In the iris dataset example, suppose we are assigned the task to guess the class of an individual flower given the measurements of petals and sepals. This is a classification task.

In [3]:
X, y = iris.data, iris.target

### Selecting Classifier

After putting data into proper format it is trivial to train a classifier. For instance a support vector machine with a linear kernel.


In [4]:
from sklearn.svm import LinearSVC

### Getting Help On Classifier

LinearSVC is an example of a scikit-learn classifier. If you're curious about how it is used, you can use ipython's "?" magic function to see the documentation:

In [7]:
LinearSVC?

### Creating Classifier Instance

The first thing to do is to create an instance of the classifier. This can be done simply by calling the class name, with any arguments that the object accepts.


In [8]:
clf = LinearSVC(loss = 'l2')

### About Created Instance

**clf** is a statistical model that has parameters that control the learning algorithm (those parameters are sometimes called the hyperparameters). Those hyperparameters can be supplied by the user in the constructor of the model. We will explain later how to choose a good combination using either simple empirical rules or data driven selection:


In [9]:
clf

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
     intercept_scaling=1, loss='l2', multi_class='ovr', penalty='l2',
     random_state=None, tol=0.0001, verbose=0)

### Fitting (Creating) Model

By default the model parameters are not initialized. They will be tuned automatically from the data by calling the fit method with the data X and labels y.


In [15]:
clf = clf.fit(X, y)
print clf.coef_
print clf.intercept_

[[ 0.18423951  0.4512281  -0.80794191 -0.45071502]
 [ 0.05172211 -0.8948879   0.40570078 -0.93876744]
 [-0.85072188 -0.98665753  1.38092202  1.86530784]]
[ 0.10956103  1.67278605 -1.70970053]


### Using Model

Once the model is trained, it can be used to predict the most likely outcome on unseen data. For instance let us define a list of simple sample that looks like the first sample of the iris dataset.

In [17]:
X_new = [[ 5.0,  3.6,  1.3,  0.25]]

print clf.predict(X_new)

[0]


## Using Other Classifiers


There are many possibilities of classifiers; you could try any of the methods discussed at http://scikit-learn.org/stable/supervised_learning.html. Alternatively, you can explore what's available in scikit-learn using just the tab-completion feature. For example, import the linear_model submodule.

And use the tab completion to find what's available. Type linear_model. and then the tab key to see an interactive list of the functions within this submodule. The ones which begin with capital letters are the models which are available.


## Creaing Model in SVM

In [25]:
from sklearn.svm import SVC
SVC?

In [32]:
clf = SVC()
clf = clf.fit(X, y)
print clf
print clf.intercept_

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0, degree=3, gamma=0.0,
  kernel='rbf', max_iter=-1, probability=False, random_state=None,
  shrinking=True, tol=0.001, verbose=False)
[-0.03985691 -0.16777453 -0.14370469]


In [33]:
X_new = [[ 5.0,  3.6,  1.3,  0.25]]
print clf.predict(X_new)

[0]
