# General Remarks

- Please complete all tasks (text and code) directly in this notebook
- Save the notebook with your first name and surname in the filename:  e.g. **Klausur_ThomasManke.ipynb**
- Submit as github repo (preferred), alternatively you can upload the notebook to the CQ portal (and share with me)
- This test will cover three parts: Markov chains, Hidden Markov Models, Artifical Neural Networks
- Each part will need their own and sometimes overlapping packages to import (e.g numpy). Even if it is redundant, import the relevant parts explicitly at the beginnig of each part.
- Complete the code cells in their respective sections and add (concise) text, where more verbal explanations are required. Comments in the code cells are also welcome.
- Feel free to add multiple code cells if you prefer, but make sure that they stay in their respective sections
- All tasks have been tested with mybinder.org and should run on any modern laptops (with 2 GB free RAM).
Make sure to switch off other resource hungry programs.
If you encounter any technical problems, please inform me immediately !
- Deadline for submission: **21.06. 2022 15:30**


---



# Markov Chains

##  The story: A ball game

Alice, Bob and young Clemens are playing a new ball game - here are the rules:
- If Alice has the ball, she will throw a (fair 6-sided) die and keep the ball if she throws a 6, otherwise she'll pass the ball to Bob
- If Bob has the ball, he'll pass it to Alice or Clemens, based on the throw of a fair coin
- If Clemens has the ball he'll return it to the child from whom he got it 

At the beginning of the game, their father throws the ball to Alice or Bob.
However, he is three times more likely to throw it to Alice, and he never throws it to Clemens.

## The Tasks

Translate the story into a Markov Model. 
Optionally: add a scanned drawing of the Markov graph as jpeg file to this notebook.

- What are the states and how many states are there?
- What is the initial state distribution ? Write it down as numpy.array below.
- Write down the transition matrix as numpy.array.
- Does the Markov Model have a stationary distribution - and does your answer depend on whether Alice has a fair die? 
- Validate your answer numerically with the "matrix power method". 
- For each child, give their long-term probabilities that they hold the ball. 
- Bonus: In the fair die scenario, what is the number of steps that Alice can expect to hold the ball before having to pass it on. 

## Your solutions

In [None]:
# import the necessary modules


print('long-term probabilities:    ', ...)
print('expected stretch for Alice: ', ...)

Your verbal answers here:

# Hidden Markov Models

## A story

The DNA of a (hypothetical) organism exists in 3 different configurations (0,1,2) that cannot be observed directly. They are, however, characterized by a specific distribution of observable nucleotides (A,C,G,T) that are emitted from each state. The state transition rates and emission rates are shown in the figure below.
<div>
   <img src="https://github.com/thomasmanke/ABS/raw/main/figures/HMM_DNA.jpg",  width="1000">
</div>


This problem can be modelled as a hidden Markov Model.










## Tasks

1. Write down the HMM parameters as numpy arrays. The initial state probability $\pi$ is not given, but you may assume that it is the stationary distribution of state transitions - calculate it and report it.

2. Using MultinomialHMM() from the hmmlearn package, set up a probabilistic model with the parameters $(\pi, P, E)$.

3.  Sample a sequence of 2000 hidden states $Z$ and the corresponding observations $X$ from the model. Use a random seed = 42 for reproducibility.
Report the first 20 pairs of hidden states and observations.


4. Calculate the logarithm of the probability $\log Pr(X)$ given the model from which you generated $X$ - why is it so low (1-2 sentences)?

5. Name two algorithms to decode the "best" possible path of hidden states $Z$ from observations $X$ and a given model. Briefly describe their different goals (2 sentences).
Run the respective function from hmmlearn to calculate 
$Z$ for both methods, given the $X$ and the current model.
Save the result as $Z_1$ and $Z_2$.
Report the number of differences between $Z_1$ and $Z_2$.

6. Use the hmmlearn implementation of the Baum-Welch algorithm to determine the best parameters for the HMM model, if only $X$ is given. 
  - You will have to define a new model that does not yet know any parameters (e.g. model_fit). 
  - You may assume that the number of hidden states is known to be 3.
  - Set "np.random.seed(1)" and run 500 iterations. 
  - Compare the results with your knowlegde of parameters from the generating model for $X$. You might want to round the fitted parameters to two digits: np.round(...,2)
  - Comment on possible difference and name two ways in which you might improve the parameter fit.

## Load the Software

In [None]:
# install hmmlearn (if necessary)
!pip install hmmlearn

In [None]:
# import modules you need



## Your solution

In [None]:
P = np.array(...) # transitions
E = np.array(...) # emission
pi= ....          # statitonary distribution

# define model
...

# sample from model
...

print('Z=', *Z[:20])
print('X=', *X.flatten()[:20])

# log P(X)
print('log P(X) = ', )

# two ways to predict best path
... Z1 ...
... Z2 ...

# differences between two paths Z1 and Z2


# new model for fit
...

# print results
print('fit score:    ', ... )

print('fitted P: \n', np.round(...,2))
print('known P: \n', P)
print('\n')
print('fitted E: \n'  , np.round(...,2))
print('know E: \n', E)


# Artificial Neural Networks




## The Data

The MINST-Fashion dataset contains a large number of (small and coarse-grained) images from fashion items. This set has been annotated with labels for both traing and test data sets.

Link: https://www.tensorflow.org/datasets/catalog/fashion_mnist

The goals is to construct a Neural Network that can predict the fashion label from a given image.

The sections below will describe the individual tasks.

## Load Packages

In [None]:
# import required packages

# a convenience function
def plot_cm(mat):
  classes = np.arange(cm.shape[0])
  plt.imshow(mat, cmap=plt.cm.Blues)
  for (j,i),label in np.ndenumerate(mat):
    plt.text(i,j,np.round(label,2),ha='center',va='center')

  plt.colorbar()
  plt.title('Confusion Matrix')
  plt.xlabel('True label')
  plt.ylabel('Pred label')
  plt.xticks(classes)
  plt.yticks(classes)
  plt.show()

## Load Data

This section is given purposefully. Simply run it to get train and test data together with the respective labels. 

Also keep the normalization as is.

In [None]:
mnist = tf.keras.datasets.fashion_mnist
(X_train, y_train), (X_test, y_test) = mnist.load_data()

# normalization
X_train, X_test = X_train / 255.0, X_test / 255.0

## Data Exploration and Preprocessing

- How many images (=samples) are included in the training data? 
- What is the shape of these images?
- How many distinct labels does it have? 

## Define Model and Learning Strategy

Construct an artifical neural network with

- an input layer that takes the proper shape of images
- a dense layer with 128 nodes including a 'ReLu' activation function for non-linear mapping 
- an output layer corresponding to the number of classes in the problem and a softmax activation function

Use the Adam optimizer and define a suitable loss function.
Make sure that during the learning process you will track both loss and 'sparse_categorical_accuracy' as metrics.

Summarize the model. How many parameters does it have?

In [None]:
# Define Model and Learning strategy here


...

## Fit the Model

Fit the model to the training data for 10 epochs - 
use 10% of the ttraining data for validation.

Once the fit is finished you may save the model.

In [None]:
....

## Evaluate the Model

Plot the history of loss and accuracy for the training and validation set and compare it the same metrics obtained (after fitting) for the test data.

Are there any indications for overfitting - explain this briefly (1-2 sentences).

In [None]:
# Evaluation & Learning history

...
..

## Inspect predictions

Inspect the test image with index 43 and compare the predicted label with the true label.

Compare all predicted label from the test set with all true labels - you may want to use the plot_cm() funcion defined above.

In [None]:
id=43

... prediction and inspection for one test image


... predictions for all test images

... obtain confusion matrix ...


## Suggestions for improvements

Make suggestions for possible improvements to the model and the fitting process

- 
-
-
