# Clustering

Clustering is a Machine Learning technique that involves the grouping of data points. In theory, data points that are in the same group should have similar properties and/or features, while data points in different groups should have highly dissimilar properties and/or features.

## Basic Algorithm for K-Means.
- Step 1: Randomly pick K points to place K centroids
- Step 2: Assign all the data points to the centroids by distance. The closest centroid to a point is the one it is assigned to.
- Step 3: Average all the points belonging to each centroid to find the middle of those clusters (center of mass). Place the corresponding centroids into that position.
- Step 4: Reassign every point once again to the closest centroid.
- Step 5: Repeat steps 3-4 until no point changes which centroid it belongs to.

## Hidden Markov Models

"The Hidden Markov Model is a finite set of states, each of which is associated with a (generally multidimensional) probability distribution []. Transitions among the states are governed by a set of probabilities called transition probabilities." (http://jedlik.phy.bme.hu/~gerjanos/HMM/node4.html)

A hidden markov model works with probabilities to predict future events or states. In this section we will learn how to create a hidden markov model that can predict the weather.

### Data
Let's start by discussing the type of data we use when we work with a hidden markov model. 

In the previous sections we worked with large datasets of 100's of different entries. For a markov model we are only interested in probability distributions that have to do with states. 

We can find these probabilities from large datasets or may already have these values. We'll run through an example in a second that should clear some things up, but let's discuss the components of a markov model.

**States:** In each markov model we have a finite set of states. These states could be something like "warm" and "cold" or "high" and "low" or even "red", "green" and "blue". These states are "hidden" within the model, which means we do not direcly observe them.

**Observations:** Each state has a particular outcome or observation associated with it based on a probability distribution. An example of this is the following: *On a hot day Tim has a 80% chance of being happy and a 20% chance of being sad.*

**Transitions:** Each state will have a probability defining the likelyhood of transitioning to a different state. An example is the following: *a cold day has a 30% chance of being followed by a hot day and a 70% chance of being follwed by another cold day.*

To create a hidden markov model we need.
- States
- Observation Distribution
- Transition Distribution

In [2]:
import tensorflow_probability as tfp  # We are using a different module from tensorflow this time
import tensorflow as tf

### Weather Model

We will model a simple weather system and try to predict the temperature on each day given the following information.
1. Cold days are encoded by a 0 and hot days are encoded by a 1.
2. The first day in our sequence has an 80% chance of being cold.
3. A cold day has a 30% chance of being followed by a hot day.
4. A hot day has a 20% chance of being followed by a cold day.
5. On each day the temperature is
 normally distributed with mean and standard deviation 0 and 5 on
 a cold day and mean and standard deviation 15 and 10 on a hot day.

In [4]:
tfd = tfp.distributions  # making a shortcut for later on
# Refer to point 2 above
initial_distribution = tfd.Categorical(probs=[0.8, 0.2]) # First day is 80% cold, or 20% warm
# refer to points 3 and 4 above
transition_distribution = tfd.Categorical(probs=[[0.7, 0.3],  # Cold->Cold is 70%, or Cold->Hot is 30%
                                                 [0.2, 0.8]]) # Hot->Cold is 20%, or Hot->Hot is 80%
# refer to point 5 above
observation_distribution = tfd.Normal(loc=[0., 15.], # Mean is 0 for cold, 15 for hot
                                      scale=[5., 10.]) # Std is 5 for cold, 10 for hot

# the loc argument represents the mean and the scale is the standard devitation

### Create Model

The number of steps represents the number of days that we would like to predict information for. In this case we've chosen 7, an entire week.

In [5]:
model = tfd.HiddenMarkovModel(
    initial_distribution=initial_distribution,
    transition_distribution=transition_distribution,
    observation_distribution=observation_distribution,
    num_steps=7)

### Getting Prediction

To get the **expected temperatures** on each day we can do the following.

In [9]:
mean = model.mean()

# due to the way TensorFlow works on a lower level we need to evaluate part of the graph
# from within a session to see the value of this tensor

# in the new version of tensorflow we need to use tf.compat.v1.Session() rather than just tf.Session()
#with tf.compat.v1.Session() as sess:  
print(mean.numpy())

[3.        5.9999995 7.4999995 8.25      8.625     8.812501  8.90625  ]


### Variation

Lets change the distribution for the transition from cold to hot.
- Cold day has a 50% chance of being followed by a hot day.

In [10]:
tfd = tfp.distributions  # making a shortcut for later on
# Refer to point 2 above
initial_distribution = tfd.Categorical(probs=[0.8, 0.2]) # First day is 80% cold, or 20% warm
# refer to points 3 and 4 above
transition_distribution = tfd.Categorical(probs=[[0.5, 0.5],
                                                 [0.2, 0.8]])
# refer to point 5 above
observation_distribution = tfd.Normal(loc=[0., 15.],
                                      scale=[5., 10.])

model = tfd.HiddenMarkovModel(
    initial_distribution=initial_distribution,
    transition_distribution=transition_distribution,
    observation_distribution=observation_distribution,
    num_steps=7)

mean = model.mean()

#with tf.compat.v1.Session() as sess:  
print(mean.numpy())

[ 3.        8.4      10.02     10.506    10.651799 10.69554  10.708661]


### Comparison

Some key points to consider when comparing from the initial model and the modified model:
1. Since the initial distribution is the same, the initial temperature stayed the same.
2. Since we increased the chance that cold days change to hot days from 0.3 to 0.5, the mean temperature in the modified prediction in subsequent days are higher than that from the original prediction.