<a href="https://colab.research.google.com/github/tproffen/ORCSGirlsPython/blob/master/MachineLearning/OtherExamples/FunWithMachineLearning-Iris.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://github.com/tproffen/ORCSGirlsPython/blob/master/Images/Logo.png?raw=1" width="10%" align="right" hpsace="50">

# Machine Learning

## Extra example - Fun with Machine Learning - *Iris classifier*

We're going to use K-nearest neighbors and neural networks on the iris data set!  

### Loading Extensions ##

A great thing about Python is that there are extensions for lots of different things you want to do, including machine learning! We're going to use several Python extensions to help us read in the data and do machine learning.  First, we'll load those libraries in.  Make sure that you execute it by using `shift-enter`.

In [None]:
import sys
# scipy
import scipy
# numpy
import numpy
# matplotlib -- the extension that plots the data
import matplotlib
# pandas -- the extension that reads in the data
import pandas
# scikit-learn -- where the machine learning code comes from!
import sklearn

import matplotlib.pyplot as plt
from sklearn import model_selection
from sklearn.metrics import accuracy_score
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier

### Step 1: Loading the data ##

Now we're going to load the iris data into our program using pandas.  Since the data on the website doesn't have column labels telling us what each of the numbers are, we specify the names of each type of data.  In this case, "class" will be one of Iris-Verginica, Iris-Versicolour or Iris-Setosa.

In [None]:
# Website where the data is located -- You can visit this website to see what the raw data looks like.
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class']

# Read the data set from the website.
dataset = pandas.read_csv(url, names=names)


### Step 2: Look at the data ##

Let's inspect the data to see what it looks like. The pandas extension has some handy methods for us to use to do that.

In [None]:
# This will tell us how many irises there are (the first number) and how many data points 
# each iris has (the second number)
print(dataset.shape)

In [None]:
# This will show us the first 20 irises in the data set.
print(dataset.head(20))

Now let's find out how many there are per class.

In [None]:
# This commands groups the irises by class and prints out how many there are in each class.
print(dataset.groupby('class').size())

### Step 3: Separate the data into input data and labels. ##

This array has all of the data in it, including the labels.  The labels are what we're trying to get the machine learning methods to learn based on the inputs we give it, so we need to separate them.  The inputs are the first four columns (petal and sepal length and width) and the labels are the last column.

In [None]:
# The value of the data are inside the variable dataset, so let's pull them out.
values = dataset.values

# Let's get the inputs by pulling out all of the rows and columns 0 through 4 (not including 4)
inputs = values[:,0:4]

# Let's get the labels by pullling out all of the rows and column 4
labels = values[:,4]

### Step 4: Separate the data into training and testing. ##

Remember that for maching learning, we always want to use training data to teach the machine learning method to do something and use the testing method to see how well it has learned.  The extension sklearn's model_selection will do this for us, but first we have to specify how much of the data we want to be testing.

In [None]:
# This says that we want 20 percent of our data to be the testing set.
testing_size = 0.2

# This is a number that is used to determine how the training/testing set is randomly split.
# It's best to treat it as a "magic" number.
seed = 7

# Let's split the data into training and testing!
inputs_train, inputs_test, labels_train, labels_test = model_selection.train_test_split(inputs, labels, test_size=testing_size, random_state=seed)


### Step 5: Let's train K-Nearest Neighbors! ##

We'll start with K-Nearest Neighbors (KNN).  Up where we loaded the extensions, we loaded in a KNeighborsClassifier. Now we're going to create one, and then call `fit` on it, which is where the training or learning happens based on the training set.

In [None]:
# Create the KNN
knn = KNeighborsClassifier()

# Train the KNN!
knn.fit(inputs_train, labels_train)

### Step 6: Let's see if the KNN learned anything... ##

First, we'll see if it learned the training data.  We use a method called `predict` to run data through the KNN classifier.  Then we're going to print the accuracy score and we're going to multiply it by 100.  You can think of the score that it gets like a score on a test at a school.  The higher the score, the better the classifier!

In [None]:
# Predict what the classes are based on the training data
predictions = knn.predict(inputs_train)

# Print the score on the training data
print("KNN Training Set Score:")
print(accuracy_score(labels_train, predictions)*100)

It should do pretty well on the training data, since that's what we used to teach it.  But machine learning is all about applying your machine learning method to NEW data.  So now let's see how it does on the testing set.  We're going to do the same thing, but with the testing data instead of the training data.

In [None]:
# Predict what the classes are based on the testing data
predictions = knn.predict(inputs_test)

# Print the score on the testing data
print("KNN Testing Set Score:")
print(accuracy_score(labels_test, predictions)*100)

It still does okay, but maybe not as good as the training set.  But it clearly learned something!

### Step 7: Let's train a neural network! ##

Okay, now we're going train a neural network.  The great thing about sklearn is that all of the machine learning methods use the same structure. In this step we're going to create a neural network, and **then YOU are going to add in the training step** by writing the code to train it.  You can use the exact same structure from the knn example about for training the neural network, just change knn to nnet!

In [None]:
# MLP stands for multi-layer perceptron -- it's just a fancy name for the neural networks we've been working with

# The hidden layer sizes specify how many layers and how many neurons per layer.  
# Right now, there's only one hidden layer with 10 neurons, but for example, we could change that to (10,5,)
# to make two hidden layers, one with 10 neurons and one with 5.

# max_iter specifies how many training steps to use

nnet = MLPClassifier(hidden_layer_sizes=(10,), max_iter=1000)

# Train the NNET!
# Here's where you add the training step for nnet!  
# Just do the same thing we did for mlp when training, but replace mlp with nnet
# YOU ADD CODE HERE


### Step 8: Let's test it! ##

Now we're going to do the exact same thing we did to test knns, but for nnet! **You're going to fill in the code below, just like we did it with knn!**

In [None]:
# Predict what the classes are based on the training data
# YOU ADD CODE HERE!


# Print the score on the training data
print("Neural Network Training Set Score:")
print(accuracy_score(labels_train, predictions)*100)

# Predict what the classes are based on the testing data

######## YOU ADD CODE HERE!


# Print the score on the testing data
print("Neural Network Testing Set Score:")
print(accuracy_score(labels_test, predictions)*100)

How did the neural network do? Did it make better or worse grades than k-nearest neighbors?

## That's it! ##

You've trained TWO different types of machine learning algorithms to recognize different types of irises!

Want to learn more? 

## BONUS STEPS ##

1. We set testing_size early on to be 20% of the data (by setting the value to 0.2). What happens when we change that number to 0.4? 0.6? 

2. What happens when you increase the number of neurons in the hidden layer in the neural network? What about decreasing the number of neurons in the hidden layer? What happens when you add more layers?

3. What happens when you change the number of training iterations or steps for the neural network (change `max_iter`)? 