### Simple Machine Learning Example - Apples and Oranges

This is our first look at using machine learning techniques to classify items. In this example, we have information about fruit, and we want to classify it as being an apple or an orange. The general idea is to use an existing (and powerful) machine learning data structure for prediction. In other words, given some information about a piece of fruit, can we accutately determine if its an apple or an orange?

There are two major strategies for making predictions: supervised learning and unsupervised learning. The majority of practical techniques are supervised, so we'll focus on those. There are many supervised learning models that are used for classifying information. Examples include:
+ decision trees
+ k Nearest Neighbors (kNN)
+ Support Vector Machines (SVM)
+ Neural Networks
+ Naive Bayes Classifier
+ Discriminant Analysis, etc
In this example we use a machine learning data structure called a Decision Tree

In supervised learning, you have a 'training set' of data with a range of information about objects you want to classify (which we call features) PLUS an accurate classification for each one (which we call the label). 

+ Typically you divide this features/label dataset into two parts: a training dataset and a testing dataset
+ You first use the training dataset with both features and labels to 'train' your machine learning data structure. 
+ Then you use the testing dataset with just the features and see what labels it produces
+ Finally, you compare the predicted labels with the actual labels to see how accurate your machine learning data structure was

One final thought- there are two categories of problems that we want to predict. The type of prediction being made can be either:
+ list of discrete values: think type of fruit, next move on chess board, spam/not spam. These are called **classification problems**.
+ a continuous range of values: think weight of a fish, dollar value of a stock, height of a tree . These are called **regression problems**.

In [16]:
import sklearn
from sklearn import tree

#skin type
bumpy = 0
smooth = 1

#labels
apple = 0
orange = 1

# training dataset with two features: weight and skin type, and accurate labels
# note that they are in different Python lists
features = [[140, smooth], [130, smooth], [150, bumpy], [170, bumpy]]
labels = [apple, apple, orange, orange]

#choose a decision tree for our classifier
classifier = tree.DecisionTreeClassifier()

#train the classifier using training dataset
classifier = classifier.fit(features, labels)

#take it for a test spin - can it predict the type of fruit?
# prediction should be 1 (orange)
print("Predict fruit weighing 160 grames and bumpy (apple =", apple, "orange=", orange, ") Prediction is" , classifier.predict([[160, bumpy]]))


Predict fruit weighing 160 grames and bumpy (apple = 0 orange= 1 ) Prediction is [1]


### Another Machine Learning Example - Iris Classification

Now we are going to look at another common machine learning example: classifying Iris flowers measurements of the sepal and petals. You can find out more about this data at: https://en.wikipedia.org/wiki/Iris_flower_data_set

Sci-kit learn has code that loads this dataset into a list, 

In [17]:
from sklearn.datasets import load_iris
import numpy as np

# load the Iris dataset into a list. 
# There are four features (sepal length, sepal width, petal length, petal width) 
# used to classify a particular flower into one of 3 labels (setosa, versicolor, virginica)
iris = load_iris()

# Use 3 items in the list for our test: the ones at indexes 0, 50, and 100.
# We picked those because we happen to know the dataset has exactly 50 examples of each label, 
# and the labeled data is in order (50 setosa followed by 50 versicolor followed by 50 virginica)
test_idx=[0,50,100]

# create a training dataset by deleting the 3 test items. Create two lists: one with features and one with labels
train_target = np.delete(iris.target,test_idx)
train_data = np.delete(iris.data,test_idx,axis=0)

# create a test dataset with just the 3 test items. Create two two lists: one with features and one with labels
test_target = iris.target[test_idx]
test_data = iris.data[test_idx]

# choose a Decision Tree as the classifier
irclf = tree.DecisionTreeClassifier()

# train the classifier
irclf.fit(train_data,train_target)

# Make prediction and print the expected and predicted labels. Hopefully they match!
print("Expected labels: ", test_target)
print("Predicted labels: ", irclf.predict(test_data))

Expected labels:  [0 1 2]
Predicted labels:  [0 1 2]
