# Machine Learning

## Overview
*Derived from a Lesson by Allison Earnhart*

While **artificial intelligence** and machines that learn may sound like topics from science fiction, our modern world is full of “intelligent” machines that process data and make decisions not unlike the way humans do. Traditional computers can only perform the specific tasks for which they have been programmed. When a computer has the ability to independently make decisions without direct instruction, that is Artificial Intelligence (AI). AI can employ many layers of algorithms, a process of problem-solving rules, to respond to new situations in a way that traditional computers can’t. From self-driving vehicles to spell check and predictive text, AI is assisting the world with all kinds of work.

How is your music streaming service able to predict which new songs you’ll probably enjoy hearing? How does your social media service recognize your face even when you haven’t been tagged in the photo? These systems work with huge amounts of data and find ways to sort it into meaningful identifications for decision making -- a process called **Machine Learning** (ML). There are three methods by which computer systems learn: **Supervised, Unsupervised, and Reinforcement**. In **supervised** learning, the system is given pre-identified examples to use as a reference. When the system encounters new data, it is compared to the examples to help determine the new data’s classification. With **unsupervised** learning, the system finds its own patterns when analyzing data and sorts it into logical categories. New data is compared to the patterns and categorized accordingly. Both of these methods can be supplemented by **reinforcement**, when a system can hone its accuracy by receiving feedback on whether a decision is correct or incorrect, and modifying its algorithm accordingly. Just like with humans, practice makes perfect. The more data a system learns with, the more accurate its future decisions can be.

It is also important to acknowledge the ethical hazards that exist with machine learning, especially on the topic of **bias**. Machines that learn from biased data sets can perpetuate that bias, causing unfair or inaccurate outcomes. Biased data sets can be anything from scientifically incomplete or statistically skewed information, to socially or morally questionable content created by people with conscious or unconscious prejudice or favoritism. AI systems are rapidly being integrated into society everywhere in fields as diverse as medical diagnoses, job applicant reviews, plagiarism detection, and criminal justice profiling. AI is an automated decision making process, which can be inequitable or even harmful when left to itself without the contextual awareness that human minds possess.

# Supervised Simple Classifier - Morse Code Machine

In ML, **Classification**, is when a computer is expected to take some amount of data (an **observation**) and from that, determine what is happening by assigning that data a **category**. Categories are discrete words, descriptions or groups for describing the data by putting it in a group. For example, categories of vehicles could be "trucks", "buses", "sedans", "coupes", etc. there are many different examples of vehicles within each category but they all have commonalities.

In Classification, we are showing a computer program many examples and asking the program to find the commonalities.

When we use supervised ML, the system goes through 3 main stages:

1. **Training** - The system collects many **observations** which are matched with human provided **labels** (e.g. correct categories). 
2. **Modelling** - The system uses the training data to construct a model (e.g. find the commonalities).
3. **Prediction** - Provided a new unlabelled **observation**, the program uses the model to determine what category or label should be applied to that observation.

Let's create a machine that takes a simple observation (namely, how long a button is pressed) and classifies the press types to recognize morse code.

[Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called dots and dashes.](https://en.wikipedia.org/wiki/Morse_code)

## Exercise 1

Take a look at the provided code below.

We want this classifier to use the length of a button press to determine if it is a dot or dash. We collect will some number of training samples of each and take the average to finding the average length of a dot or dash.

Then when a button is pressed, the length of that press is compared to the average length of a dot and dash. We decide with 'classifier' which (dot or dash) that press is more similar too.

You need to add in the *model*. Look for the TO DOs:

1. Calculate the Average Times for Dashes and Dots

2. Fix the `classify` function, In order to decide if the "time" given, is a dot or a dash, you need to calculate the number is more similiar to the dot average or the dash average. There are multiple ways to do this.

In [None]:
"""
===================================================================================
Importing and Initializing
===================================================================================
"""
import easygopigo3
import gopigo3
import easysensors
egpg = easygopigo3.EasyGoPiGo3()
gpg = gopigo3.GoPiGo3()
egpg.reset_all()

# IMPORT TOOLS FOR MAKING JUPYTER EASIER TO READ
from IPython.display import clear_output
from EDL_Jupyter_resources import HiddenPrints

# IMPORT USFUL TOOLS
import time
import math
import numpy as np
import matplotlib.pyplot as plt
hiddenprints=HiddenPrints()


# INITIALIZE A BUTTON (TELL THE ROBOT THERE IS A BUTTON)
my_button = egpg.init_button_sensor("AD2")
PRESSED = 1

"""
===================================================================================
SOME CUSTOM FUNCTIONS
We define them here so that we can use them later.
===================================================================================
"""

def record_button_press_time():
    # This function measures how long a button is pressed for.
    while not my_button.read() == PRESSED:
        # Wait for press
        time.sleep(.01)
    down_time = time.time()
    while my_button.read() == PRESSED:
        # Wait for release
        time.sleep(.01)
    up_time = time.time()     
    return up_time - down_time


def collect_training_presses(number, type_name):
    # This function collects "number" samples of "type_name" and records the durations in an array "times"
    times = np.zeros(number)
    for press_num in range(number):
        print("Waiting for ", type_name, " PRESS ", press_num+1, " of ", number)
        times[press_num] = record_button_press_time()
    return times

def classify(time, dot_average, dash_average):
    # This function should:
    # RETURN "-" if the time is for a dash, or "." if the time is for a dot.
    dot='.'
    dash='-'
    """
    --------------------------------------------------------------------------------
    TO DO:
    In the space below:
        - Decide if the variable "time" is closer to dot_average or dash_average.
        ***Hint: absolute value and the "-" operator might be the best way to do this.***
        - If "time" is closer to dot_average, have the function return the variable dot
        - If "time" is closer to the dash_average, have the function return the variable dash
    --------------------------------------------------------------------------------
    """
    if (  ): #find a conditional that determines if time is closer to dot
        return dot
    else:
        return dash
 
    
"""
===================================================================================
CODE THAT RUNS
===================================================================================
""" 

"""
Collect LABELED TRAINING samples
"""
try:
    if not dot_average is None and not dash_average is None:
        # If it is already trained, allow the user to decide whether to train again.
        train = input("Do you want to retrain? 1 = Yes, 0 = No ")
        try:
            train = int(train)
        except:
            print("*** Please provide an integer ***")
except:
    train = 1 # This is a 1 if the system needs to be trained
        
if train == 1:
    # If training or retraining:
    # Ask how many samples to collect.
    collect_number = input("How many training samples do you want to give? ")
    try:
        collect_number = int(collect_number)
    except:
        print("*** Please provide an integer ***")
    
    #Collect the samples
    print("DOTS")
    dot_times = collect_training_presses(collect_number, "Dot")

    print("DASHES")
    dash_times = collect_training_presses(collect_number, "Dash")

    print("Dot Times: ", dot_times)
    print("Dash Times: ", dash_times)


"""
--------------------------------------------------------------------------------
TO DO: 
In the space below:
 - Calculate the Average Times for:
       --  Dashes (from an array called dot_times) and 
       --  Dots (from an array called dot_times)
 
 - These functions might be helpful:
       --  np.mean(ARRAY) will return the mean (average) of an array
       --  np.sum(ARRAY) will return the sum of an array
       --  np.size(ARRAY) will return the number of items in an array
"""
dot_average =  0 # calculate the average dot time here
dash_average =  1 # calculate the average dash time here

'''
--------------------------------------------------------------------------------'''

print("Dot Average: ", dot_average)
print("Dash Average: ", dash_average)

print("===================")
print("STARTING IN")
time.sleep(1)
print("3...")
time.sleep(1)
print("2...")
time.sleep(1)
print("1...")
time.sleep(1)
print("GO")
print("===================")

last_press_time = time.time()
space_time = False
message = [] # Initialize a blank message
letter = "" # Initialize a blank message


"""
ONCE TRAINED THE FOLLOWING LOOP WILL TEST THE CLASSIFIER
"""
while True: 
    
    # If there has not been a press in a while, stop the program
    if time.time() - last_press_time > 8*dash_average:
        print("===================")     
        break
            
    # If the button is pressed, decide if it is a dot or a dash.
    elif my_button.read() == PRESSED:
        down_time = time.time()
        while my_button.read() == PRESSED:
            time.sleep(.01)
        up_time = time.time()
        press_time = up_time - down_time
        print("Saw: ", classify(press_time, dot_average, dash_average))
        last_press_time = up_time
    
print("DONE")


# Reading Morse Code

Now that your DOT - DASH classifier is working, let's try to write some MORSE CODE MESSAGES.

Here are some letters and numbers in morse code.

![](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/International_Morse_Code.svg/500px-International_Morse_Code.svg.png)

## Exercise 2

Update the following code with your classifier from EX 1 to start reading messages.

Test out some messages .... can you send:

1. **SOS** (Save our ship, an intentionally easy to remember morse code signal for emergencies)

2. **HELLO** 

3. **Your Name**

### Optional Challenges
* Can you program the robot to flash the LEDs when the button is pressed?
* After the message has been decoded in text to confirm accuracy, can you program the robot to flash the LEDs in the morse code pattern (using the trained dot and dash average lengths of course)?

In [None]:
"""
===================================================================================
Importing and Initializing
===================================================================================
"""
import easygopigo3
import gopigo3
import easysensors
egpg = easygopigo3.EasyGoPiGo3()
gpg = gopigo3.GoPiGo3()
egpg.reset_all()

# IMPORT TOOLS FOR MAKING JUPYTER EASIER TO READ
from IPython.display import clear_output
from EDL_Jupyter_resources import HiddenPrints

# IMPORT USFUL TOOLS
import time
import math
import numpy as np
import matplotlib.pyplot as plt
hiddenprints=HiddenPrints()


# INITIALIZE A BUTTON (TELL THE ROBOT THERE IS A BUTTON)
my_button = egpg.init_button_sensor("AD2")
PRESSED = 1


# SETUP A DICTIONARY FOR MORSE CODE TRANSLATION
MORSE_CODE = {     '.-':'A',  '-...':'B', 
                 '-.-.':'C',   '-..':'D',        '.':'E', 
                 '..-.':'F',   '--.':'G',     '....':'H', 
                   '..':'I',  '.---':'J',      '-.-':'K', 
                 '.-..':'L',    '--':'M',       '-.':'N', 
                  '---':'O',  '.--.':'P',     '--.-':'Q', 
                  '.-.':'R',   '...':'S',        '-':'T', 
                  '..-':'U',  '...-':'V',      '.--':'W', 
                 '-..-':'X',  '-.--':'Y',     '--..':'Z', 
                '.----':'1', '..---':'2',    '...--':'3', 
                '....-':'4', '.....':'5',    '-....':'6', 
                '--...':'7', '---..':'8',    '----.':'9', 
                '-----':'0', '--..--':',',  '.-.-.-':'.', 
               '..--..':'?',  '-..-.':'/',  '-....-':'-', 
                '-.--.':'(', '-.--.-':')',       ' ':' ',
                'END':""} 

"""
===================================================================================
SOME CUSTOM FUNCTIONS
We define them here so that we can use them later.
===================================================================================
"""

def record_button_press_time():
    # This function measures how long a button is pressed for.
    while not my_button.read() == PRESSED:
        # Wait for press
        time.sleep(.01)
    down_time = time.time()
    while my_button.read() == PRESSED:
        # Wait for release
        time.sleep(.01)
    up_time = time.time()     
    return up_time - down_time


def collect_training_presses(number, type_name):
    # This function collects "number" samples of "type_name" and records the durations in an array "times"
    times = np.zeros(number)
    for press_num in range(number):
        print("Waiting for ", type_name, " PRESS ", press_num+1, " of ", number)
        times[press_num] = record_button_press_time()
    return times

def classify(time, dot_average, dash_average):
    # This function should:
    # RETURN "-" if the time is for a dash, or "." if the time is for a dot.
    """
    ---------------------------------------------------------------------
    TO DO:
    In the space below:
        -PASTE YOUR SOLUTION FROM THE PREVIOUS EXERCISE BELOW.
    """
    if True:  # REPLACE TRUE WITH YOUR OWN CONDITIONAL TO DETERMINE DOT FROM DASH FROM EX1
        return "."
    else:
        return "-"

    '''
    ---------------------------------------------------------------------'''
    
"""
===================================================================================
CODE THAT RUNS
===================================================================================
""" 

"""
Collect LABELED TRAINING samples
"""
try:
    if not dot_average is None and not dash_average is None:
        # If it is already trained, allow the user to decide whether to train again.
        train = input("Do you want to retrain? 1 = Yes, 0 = No ")
        try:
            train = int(train)
        except:
            print("*** Please provide an integer ***")
except:
    train = 1 # This is a 1 if the system needs to be trained
        
if train == 1:
    # If training or retraining:
    # Ask how many samples to collect.
    collect_number = input("How many training samples do you want to give? ")
    try:
        collect_number = int(collect_number)
    except:
        print("*** Please provide an integer ***")
    
    #Collect the samples
    print("DOTS")
    dot_times = collect_training_presses(collect_number, "Dot")

    print("DASHES")
    dash_times = collect_training_presses(collect_number, "Dash")

    print("Dot Times: ", dot_times)
    print("Dash Times: ", dash_times)


"""
---------------------------------------------------------------------
    TO DO:
    In the space below:
        -PASTE YOUR SOLUTION FROM THE PREVIOUS EXERCISE BELOW.
"""
dot_average = 0 # THIS IS A PLACEHOLDER TO REPLACE FROM EX1
dash_average = 1 # THIS IS A PLACEHOLDER TO REPLACE FROM EX1

'''
---------------------------------------------------------------------'''

print("Dot Average: ", dot_average)
print("Dash Average: ", dash_average)

print("===================")
print("STARTING IN")
time.sleep(1)
print("3...")
time.sleep(1)
print("2...")
time.sleep(1)
print("1...")
time.sleep(1)
print("GO")
print("===================")

last_press_time = time.time()
space_time = False
message = [] # Initialize a blank message
letter = "" # Initialize a blank message

"""
ONCE TRAINED THE FOLLOWING LOOP WILL COLLECT A MESSAGE
"""
while True: 
    
    # If there is a long pause, calculate the message.
    if not my_button.read() == PRESSED and time.time() - last_press_time > 8*dash_average:
        if message[-1] == " ":
            print("END MESSAGE")
            print("===================")

            message[-1] = "END"          
            break
    
    # If there is a medium pause, add a space.
    elif not my_button.read() == PRESSED and time.time() - last_press_time > 4*dash_average:
        if letter == "" and not message[-1] == " " :
            print("SPACE")
            print("NEXT LETTER ===========")
            message.append(" ")
          
        
    # If there is a short pause, go to next letter.
    elif not my_button.read() == PRESSED and time.time() - last_press_time > 2*dash_average:
        if not letter == "":
            print("NEXT LETTER ===========")
            message.append(letter)
            letter = "" 
            
    # If the button is pressed, decide if it is a dot or a dash.
    elif my_button.read() == PRESSED:
        down_time = time.time()
        while my_button.read() == PRESSED:
            time.sleep(.01)
        up_time = time.time()
        press_time = up_time - down_time
        print("Saw: ", classify(press_time, dot_average, dash_average))
        letter = letter + classify(press_time, dot_average, dash_average)
        last_press_time = up_time
    
print(message)

"""Translate the Morse Code into Letters"""
message_as_letters = ""
for code in message:
    try:
        message_as_letters = message_as_letters + MORSE_CODE[code]
    except:
        print("DID NOT RECOGNIZE: ", code)
print("===================")
print("Message: ", message_as_letters)

# Supervised -- Nearest Neighbor Algorithm 

The Nearest Neighbor Algorithm (and it's close cousin the K-Nearest Neighbor Algorithm) are a simple but powerful supervised classifier. 

In the Nearest Neighbor Algorithm, we collect a large number of training labelled observations. Then when we take a new observation, we compare it to all the training examples and find that which is most similar ("the nearest neighbor") the new observation receives the same label as that nearest neighbor.

Let's use this to make our color sensor way more powerful.

The color sensor documentation is [HERE](https://di-sensors.readthedocs.io/en/master/api-basic.html#easylightcolorsensor). In the documentation, you can see that the color sensor is able to classify 8 colors total:

| Color   | R, G, B       |
|---------|---------------|
| black   | 0, 0, 0       |
| blue    | 0, 0, 255     |
| cyan    | 0, 255, 255   |
| fuchsia | 255, 0, 255   |
| green   | 0, 255, 0     |
| red     | 255, 0, 0     |
| white   | 255, 255, 255 |
| yellow  | 255, 255, 0   |

That is great and all, but how many of those super bright and specific colors do you have around your house?

Even more important, those colors are sensitive to the lighting conditions in your house.

## Exercise 3

Let try that built-in behavior with the example code below.

Try the sensor on a bunch of objects around the house. Try:

1. A bunch of colors --- what happens if the object is orange or pink or purple?

2. What else effects the color? Distance from the object? Outside lighting? Angle of surface?

In [None]:
from time import sleep
from di_sensors.easy_light_color_sensor import EasyLightColorSensor

my_lcs=EasyLightColorSensor(led_state=True) #initialize sensor

print('Sensor initialized, reading color for 10 seconds')

for i in range(10): #read for 10 seconds
    in_color = my_lcs.safe_raw_colors() #get raw data
    candidate = my_lcs.guess_color_hsv(in_color)[0] #get the computer's guess
    print("I think the color is: %s"%candidate)
    sleep(1)
    
print("Done!")

Let's do better with Nearest Neighbor classifier.

We will collect training values for a variety of colors, label them, take a new measurement, and use the earlier labels to describe the new color.


## Exercise 4

In order to get our nearest neighbor algorithm working, **look for the TO DO.** 

Calculate the distance between the new value and the trained values (we recommend a 3D distance formula). Don't know what a 3D distance formula is? [Click here.](https://www.varsitytutors.com/hotmath/hotmath_help/topics/distance-formula-in-3d)

This will create an array of distances and then we can find which neighbor is actually the closest.

Once you see how this is working, feel free to customize it as you see fit.


In [None]:
## """Some import statements"""
from time import sleep
from di_sensors.easy_light_color_sensor import EasyLightColorSensor
import easygopigo3
import gopigo3
import easysensors
egpg = easygopigo3.EasyGoPiGo3()
gpg = gopigo3.GoPiGo3()
egpg.reset_all()

# IMPORT TOOLS FOR MAKING JUPYTER EASIER TO READ
from IPython.display import clear_output
from EDL_Jupyter_resources import HiddenPrints

# IMPORT USFUL TOOLS
import time
import math
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
hiddenprints=HiddenPrints()

"""
========================================
Custom Functions
We define them here so that we can use them later.
========================================
"""
def take_color_with_button():
    while not my_button.read() == PRESSED:
        # Wait for press
        time.sleep(.01)
    rgb = my_color_sensor.safe_rgb()
    while my_button.read() == PRESSED:
        # Wait for release
        time.sleep(.01)    
    return rgb

def find_nearest_neighbor(color, trained_colors):
    distances = np.zeros((trained_colors.shape[0]))
    new_R, new_G, new_B = color
    for c in range(trained_colors.shape[0]):
        # Loop through all the trained values and calculate their distance to the newest color
        trained_R = trained_colors[c, 1]
        trained_G = trained_colors[c, 2]
        trained_B = trained_colors[c, 3]
        '''========================================'''
        """
        TO DO:
        In the space below:
            - Calculate the distance between the new values (new_R,...) and the trained values (trained_R,...)
            ***Hint: we recommend a 3D distance formula***
            - To help you, points (x1,y1,z1) and (x2,y2,z2) have already been defined
            - To get our nearest neighbor, we need to calculate the distance
            between these two points.
            
        """
        x1,y1,z1 = new_R, new_G, new_B #this is our first point in 3D
        x2,y2,z2 = trained_R, trained_G, trained_B #this is our second point in 3D
        
        ###FIND THE DISTANCE BETWEEN (x1,y1,z1) and (x2,y2,z2)
        
        distance = 1 # Replace 1 with the distance formula --- it is a placeholder
        
        '''========================================'''
        distances[c] = distance # Save the distance in the distance array
    
    nearest_neighbor = np.argmin(distances) # Find the index of the smallest distance.
    return nearest_neighbor
    
def classify_colors(color, trained_colors, labels):
    nearest_neighbor = find_nearest_neighbor(color, trained_colors)
    nearest_class = int(trained_colors[nearest_neighbor, 0])
    nearest_label = labels[nearest_class]
    return nearest_label 

def plot_colors(trained_colors, labels):
    color_count = np.max(trained_colors[:,0])
    
    fig = plt.figure()
    ax = plt.axes(projection='3d')
    for i in range(int(color_count)+1):
        matches = trained_colors[:,0] == i
        color = np.random.random([3])
        ax.plot(trained_colors[matches,1], trained_colors[matches,2], trained_colors[matches,3],
                'o', markerfacecolor=color, markeredgecolor='k', label = labels[i])
    plt.legend()
    ax.set_xlim(0, 255)
    ax.set_ylim(0, 255)
    ax.set_zlim(0, 255)
    ax.set_xlabel('RED')
    ax.set_ylabel('GREEN')
    ax.set_zlabel('BLUE')
    plt.show()


"""
========================================
Code that runs
========================================
"""

# Initialize the color sensor
my_color_sensor=EasyLightColorSensor(led_state=True) #initialize sensor

my_button = egpg.init_button_sensor("AD2")
PRESSED = 1

try:
    trained_colors[0,0]
    # If it is already trained, allow the user to decide whether to train again.
    train = input("Do you want to retrain? 1 = Yes, 0 = No ")
    try:
        train = int(train)
    except:
        print("*** Please provide an integer ***")
except:
    train = 1 # This is a 1 if the system needs to be trained

egpg.open_eyes()
if train == 1:
    """TRAINING"""
    # If training or retraining:
    # Ask how many samples to collect.
    num_colors = int(input("How many colors will you train? (at least 2) "))
    num_samples= int(input("How many training samples will you take of each color? (at least 3) "))
    labels = {} 
    trained_colors = np.zeros((num_samples*num_colors, 4)) #Make a table of samples with labels and RGB values

    for c in range(num_colors):
        color_name = input("What will you call color number " +str(c+1)+ "? ")
        labels.update({c:color_name})
        for i in range(num_samples):
            print("Press button to take sample " + str(i+1) +" of " + str(num_samples) + " of " + color_name)
            R, G, B = take_color_with_button()
            trained_colors[c*num_samples + i, 0] = c # label this color with name
            trained_colors[c*num_samples + i, 1] = R # record green value
            trained_colors[c*num_samples + i, 2] = G # record green value
            trained_colors[c*num_samples + i, 3] = B # record blue value
            egpg.set_eye_color((R//3, G//3, B//3))
            egpg.open_eyes()

            
plot_colors(trained_colors, labels)
#print(trained_colors)
#print(labels)

ready=input('Type "y" when you are ready to begin 10 seconds of testing: ')
while ready!='y':
    ready=input('Type "y" when you are ready to begin 10 seconds of testing: ')

print("====== 10 Seconds of Testing ======= ")

#Classifying for 10 seconds
start_time = time.time()
while time.time() - start_time < 10:
    color = my_color_sensor.safe_rgb()
    label = classify_colors(color, trained_colors, labels)
    print("I think that is: ", label)
    time.sleep(1)

print("======= DONE ======")
egpg.close_eyes()

When we use K-Nearest Neighbor Classification instead of Nearest Neighbor Classification, we decide on the class of the new data based on multiple nearby data points (instead of just one!)

We will not have you write that code today but know that is an option if Nearest Neighbor classification is overly influenced by irregular training data.

# Unsupervised -- K-Means Clustering

When using Nearest Neighbor and K-Nearest Neighbor, humans must completely supervise the process providing labels for each observation - essentially saying "this is orange" and "this is blue" etc.

There is another type of algorithm called **unsupervised** learning, where the computer collects training observations without labels and then after training the computer looks at the data and identifies the categories on its own. This is called **clustering** because it looks for clusters or concentrations of data.

One popular method for clustering is called K-Means Clustering. The K-Means Clustering Algorithm has 4 primary steps:

1. Select - Choose K (where K is the number of clusters you want) centers and randomly assign some starting guesses.
2. Assign - Take a look at all the training observations, *assign* each observation to it's closest center.
3. Update - Now take each center and look at all the observations assigned to it. Move that center to be in the middle of the cluster of assigned observations (the mean of their "positions").
4. Iterate - Repeat steps 2 and 3 until step 3 results in no change.

In order to make this code work, we will need some arrays to hold our data:

`centers` --- an array with 3 columns (for the R, G, and B values of the current centers) and K rows (one for each of the centers. We will refer to the centers based on the row index so Centers 0 to K-1. So, if the second row holds `[100, 0, 50,]` that means Center 1 (the second one) has Red=100, G=0, B=50. 

`training_data` ---- an array with 3 columns (for the R, G, and B values) and N rows where N is the number of training observations we have.

`distances` ---- an array with K columns and N rows (each cell holding a distance between a training data point (for that row #, 0 to N-1) and  a center (for that column #, 0 to K-1). So for example, a row in the third row and the second column will contain the distance between the third training data point and the second center.

`assignments` ---- an array with 1 column and N rows, each cell holding the number of the closest center for each training observation (row).

## Exercise 5

We have coded up all of the supporting custom functions you need for the KMC steps.However, the Iterate Step is not finished. Look for the TO DO in the k_means_clustering custom function. Fill in the missing steps of the K Means Clustering Algorithm (2 and 3)

**Step 2. Assign** labels to data (and store those labels as "assignments"). The function you need for this is called: 

`        assign_labels_to_data(K, centers, training_data)`

**Step 3. Update** the centers with the assignmetns (and store those centers as "centers"). The function you need for this is called: 

`      update_centers(K, assignments, training_data, centers)`

Once you have those added, test the algorithm for different numbers of colors and samples.
* What is the smallest number of samples that you can provide for each color?
* Does the order of the samples matter?

### Challenge

Modify the code so that the robot does something interesting based the color it sees.

For example, make a line of post-its or contrasting colored paper along your floor. Can you get the robot to follow the trail of colored paper? (e.g. go straight when it sees color A and turn when it sees color B.)

In [None]:
## """Some import statements"""
from time import sleep
from di_sensors.easy_light_color_sensor import EasyLightColorSensor
import easygopigo3
import gopigo3
import easysensors
egpg = easygopigo3.EasyGoPiGo3()
gpg = gopigo3.GoPiGo3()
egpg.reset_all()

# IMPORT TOOLS FOR MAKING JUPYTER EASIER TO READ
from IPython.display import clear_output
from EDL_Jupyter_resources import HiddenPrints

# IMPORT USFUL TOOLS
import time
import math
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
hiddenprints=HiddenPrints()

"""
========================================
Custom Functions
We define them here so that we can use them later.
========================================
"""
def take_color_with_button():
    while not my_button.read() == PRESSED:
        # Wait for press
        time.sleep(.01)
    rgb = my_color_sensor.safe_rgb()
    while my_button.read() == PRESSED:
        # Wait for release
        time.sleep(.01)    
    return rgb

def show_color_swatch(color1):              
    # SHOW COLOR SAMPLES           
    color1_float =  (color1[0]/255, color1[1]/255, color1[2]/255)
    square = np.full((10, 10, 3), color1_float)
    plt.subplot(1, 1, 1)
    plt.imshow(square)
    plt.title(color1)
    plt.show()

def plot_colors(K, trained_colors, assignments, centers): #, labels):
    
    fig = plt.figure()
    ax = plt.axes(projection='3d')
    for i in range(K):
        matches = assignments == i
        color = np.random.random([3])
        ax.plot(trained_colors[matches,0], trained_colors[matches,1], trained_colors[matches,2],
                'o', markerfacecolor=color, markeredgecolor='k', label = ("Class " + str(i)))
    ax.plot(centers[:, 0], centers[:, 1], centers[:, 2],
                's', markerfacecolor='k', markeredgecolor='k', markersize = 5)
    plt.legend()
    ax.set_xlim(0, 1)
    ax.set_ylim(0, 1)
    ax.set_zlim(0, 1)
    ax.set_xlabel('RED')
    ax.set_ylabel('GREEN')
    ax.set_zlabel('BLUE')
    plt.show()
    
def k_means_clustering(K, training_data):
    old_centers = np.zeros((K,3)) # create a empty array for comparison to centers
    centers = create_random_centers(K) # STEP 1: creat random starting centers
    while np.max(np.absolute(centers - old_centers)) > .03:
        old_centers = centers # Save the old center values
        
        """
        TO DO
        Fill in the two steps of the K Means Clustering Algorithm
        1. Assign Labels to data (and store those labels as "assignments")
        2. Update the centers with the assignmetns (and store those centers as "centers")
        
        Look below to see the custom functions that we have already created for those steps
        """
        # STEP 2. Assign Labels to data (and store those labels as "assignments")
        1+1 #placeholder
        
        # STEP 3. Update the centers with the assignmetns (and store those centers as "centers")
        1+1 #placeholder

    # STEP 4. Repeat (with the above while loop) until the centers stop changing.

    assignments = assign_labels_to_data(K, centers, training_data) # assign the labels one last time    
    return centers, assignments

def assign_labels_to_data(K, centers, training_data):
    distance_array = np.zeros((training_data.shape[0], K)) # create empty distance array
    for i in range(K):
        distance_array[:, i] = create_distance_array(centers[i], training_data) # For each center, 
                                                                                # calculate the distance to each observation  
    new_assignments = np.argmin(distance_array, axis=1) # For each training observation, 
                                                        # find the center with the minimum distance
    return new_assignments


def update_centers(K, assignments, training_data, centers):
    new_centers = np.zeros((K,3))
    for i in range(K):
        if not np.sum(assignments==i) == 0:
            new_red = np.mean(training_data[(assignments==i),0])
            new_green = np.mean(training_data[(assignments==i),1])
            new_blue = np.mean(training_data[(assignments==i),2])
            new_centers[i,0] = new_red
            new_centers[i,1] = new_green
            new_centers[i,2] = new_blue
        else: #if there are no values assigned, do not update location
            new_centers[i,0] = centers[i,0]
            new_centers[i,1] = centers[i,1]
            new_centers[i,2] = centers[i,2]
    return new_centers

def distance_colors(new, old):
    distance = math.sqrt((new[0]-old[0])**2 + (new[1]-old[1])**2 + (new[2]-old[2])**2)
    return distance
    
def create_distance_array(center, training_data):
    # Calculate the distance from each training data point to the given center
    distances = np.sqrt((training_data[:,0] - center[0])**2 + 
                        (training_data[:,1] - center[1])**2 + 
                        (training_data[:,2] - center[2])**2)
    return distances
    
    
def create_random_centers(K):
    random_centers = np.random.random((K, 3)) # pick K random centers   
    #print("Random Centers: ", random_centers)
    return random_centers
    


def classify_colors(K, color, centers):
    distance_to_center = np.zeros((K))
    
    for i in range(K):
        distance_to_center[i] = distance_colors(color, centers[i,:])

    return np.argmin(distance_to_center) 

"""
========================================
Code that runs
========================================
"""
#centers
#training_data
#distances
#assignment

# Initialize the color sensor
my_color_sensor=EasyLightColorSensor(led_state=True) #initialize sensor

my_button = egpg.init_button_sensor("AD2")
PRESSED = 1
train = 0

try:
    training_data[0,0]
    # If it is already trained, allow the user to decide whether to train again.
    train = input("Do you want to retrain? 1 = Yes, 0 = No ")
    try:
        train = int(train)
    except:
        print("*** Please provide an integer ***")
except:
    train = 1 # This is a 1 if the system needs to be trained

if not train == 1 and not train == 0:
    print("*** Please provide an integer ***")
    #exit()

elif train == 1:
    """
    TRAINING ===========================================================================
    """
    # If training or retraining:
    # Ask how many samples to collect.
    num_colors = int(input("How many clusters should be trained (K)? (at least 2) "))
    num_samples= int(input("How many training samples will you collect (N)? (at least 5 x K) "))
    
    """
    Set up the training data structure described in the notes
    """
    training_data = np.zeros((num_samples, 3))
    ntraining_data = np.zeros((num_samples, 3)) # for storing normalized training data
    
    print("Provide examples of your colors in any order")
    
    for i in range(num_samples):
        print("Press button to take sample " + str(i+1) +" of " + str(num_samples))
        R, G, B = take_color_with_button()
        training_data[i, 0] = R # record green value
        training_data[i, 1] = G # record green value
        training_data[i, 2] = B # record blue value

print("DATA: ", training_data)

#Find values
max_red = np.max(training_data[:, 0])
min_red = np.min(training_data[:, 0])
max_green = np.max(training_data[:, 1])
min_green = np.min(training_data[:, 1])
max_blue = np.max(training_data[:, 2])
min_blue = np.min(training_data[:, 2])

#Normalize Training Data with Min-Max Normalization
ntraining_data[:, 0] = (training_data[:, 0] - min_red)/(max_red-min_red)
ntraining_data[:, 1] = (training_data[:, 1] - min_green)/(max_green-min_green)
ntraining_data[:, 2] = (training_data[:, 2] - min_blue)/(max_blue-min_blue)

centers, assignments = k_means_clustering(num_colors, ntraining_data) # This is a custom function above!
print("Final Assignments: ", assignments)
print("Final Centers: ", centers)


plot_colors(num_colors, ntraining_data, assignments, centers)
#print(trained_colors)
#print(labels)

ready=input('Type "y" when you are ready to begin 10 seconds of testing: ')
while ready!='y':
    ready=input('Type "y" when you are ready to begin 10 seconds of testing: ')

start_time = time.time()
print("====== 10 Seconds of Testing ======= ")

#Classifying for 10 seconds
while time.time() - start_time < 10:
    color = my_color_sensor.safe_rgb()
    norm_red = (color[0] - min_red)/(max_red-min_red)
    norm_green = (color[1] - min_green)/(max_green-min_green)
    norm_blue = (color[2] - min_blue)/(max_blue-min_blue)
    
    label = classify_colors(num_colors, color, centers)
    print("Class " + str(label))
    time.sleep(1)

print("======= DONE ======")

Great work! Keep thinking about the types of things you could teach your robot to classify. 

If you are looking to keep exploring Machine Learning, another method is called regression (a learned relationship between two or more numbers). Do some research on regression and think about how regression could be used to teach a robot something interesting.