<font color='darkred'> Unless otherwise noted, **this notebook will not be reviewed or autograded.**</font> You are welcome to use it for scratchwork, but **only the files listed in the exercises will be checked.**

---

# Exercises

For these exercises, add your functions to the *apputil\.py* file. If you like, you're welcome to adjust the *app\.py* file, but it is not required.

## Markov Chain Simple Example

Markov chains are a way of representing how systems change over time. The main concept behind Markov chains are that they are memoryless, meaning that the next state of a process only depends on the previous state.

![image](https://upload.wikimedia.org/wikipedia/commons/7/7a/Markov_Chain_weather_model_matrix_as_a_graph.png)

The way to read the Markov chain above from [Wikipedia](https://commons.wikimedia.org/w/index.php?curid=25300524) is:
* If I am currently in the sunny state, there is a 10% chance I will go to the rainy state and a 90% chance I will remain in the sunny state
* If I am currently in the rainy state, there is an 50% chance I will go to the sunny state and a 50% chance I will remain in the rainy state

## Transition Matrices

This is what our **transition matrix** will look like for the Markov chain diagram above. Take a minute to interpret the rows and columns of this matrix.

In [None]:
P = np.asarray([.9, .1, .5, .5]).reshape(2,2)
states = ['sunny', 'rainy']

pd.DataFrame(P, index=states, columns=states)

Unnamed: 0,sunny,rainy
sunny,0.9,0.1
rainy,0.5,0.5


## Predict Tomorrow's Weather

Let's say it's sunny today, we can represent that as:

`today = [1, 0]`

**Predict tomorrow's weather using what you know about today and the transition matrix.**

In [None]:
today = [1, 0]

tomorrow = np.dot(today, P)
tomorrow

array([0.9, 0.1])

In this example, there is a 90% chance it will remain sunny tomorrow, and a 10% chance it'll be rainy.

**Predict the day after tomorrow's weather.**

In [None]:
# Method 1: Multiply tomorrow's weather by the transition matrix
day_after = np.dot(tomorrow, P)
day_after

array([0.86, 0.14])

In [None]:
# Method 2: Multiply today's weather by the transition matrix^2
day_after = np.dot(today, np.linalg.matrix_power(P, 2))
day_after

array([0.86, 0.14])

# Text Generation

Markov chains can also be used for very basic text generation. <font color='darkblue'>**Think about every word in a corpus as a state.**</font> We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain. In this exercise, you'll create a text generator which uses only this concept.

## Read in some text to imitate

We are going to generate some text in the style of inspirational quotes, so let's first read in the data.

In [None]:
url = 'https://raw.githubusercontent.com/leontoddjohnson/datasets/main/text/inspiration_quotes.txt'

content = requests.get(url)
quotes_raw = content.text

print(quotes_raw[:1000])

“Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions.” —Peter Shepherd

“Life is a journey and if you fall in love with the journey you will be in love forever.” —Peter Hagerty

“When you return to your old hometown, you find it wasn’t the town you missed, but your childhood.” —Earl Wilson

“As we grow old, the beauty steals inward.” —Ralph Waldo Emerson

“Life begins as a quest of the child for the man, and ends as a journey by the man to rediscover the child.” —Sam Ewing

Happiness
“Ultimately your greatest teacher is to live with an open heart.” —Emmanuel (Pat Rodegast)

“Doing what you like is freedom. Liking what you do is happiness.” —Frank Tyger

“We forge the chains we wear in life.” —Charles Dickens

happiness quote
“If you look to others for fulfillment, you will never be fulfilled. If your happiness depends on money, you will never be happy with yourself. Be content with what you 

## Clean up the text data

There are many ways to clean up data before building a text generator. In this case, we'll try to at least just extract the quotes themselves.

*After you complete the exercises, feel free to adjust this section of the process ...*

In [None]:
quotes = quotes_raw.replace('\n', ' ')
quotes = re.split("[“”]", quotes)
quotes[:3]

['',
 'Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions.',
 ' —Peter Shepherd  ']

In [None]:
quotes = quotes[1::2]
quotes[:3]

['Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions.',
 'Life is a journey and if you fall in love with the journey you will be in love forever.',
 'When you return to your old hometown, you find it wasn’t the town you missed, but your childhood.']

In [None]:
corpus = ' '.join(quotes)
corpus[:200]

'Healing comes from taking responsibility: to realize that it is you - and no one else - that creates your thoughts, your feelings, and your actions. Life is a journey and if you fall in love with the '

In general, this version of `corpus` should work just fine!

## EXERCISES

### Exercise 1: Build a Transition Dictionary

Build a dictionary with the following traits:

* The keys should be all of the (unique) tokens in the corpus
* For each key, the value should be a `list` of all the tokens that follow that key
    - E.g., if my total corpus is "Astrid is very kind, is she not?", then my dictionary might include `{... "is": ["very", "she"] ...}`.
    - Decide whether or not to include duplicates (i.e., *every* iteration) in these lists. Then, explain why or why not.

*Hint: You'll likely want to use [`defaultdict(list)`](https://realpython.com/python-defaultdict/#understanding-the-python-defaultdict-type) here.*

In [None]:
from collections import defaultdict

def get_quote_dict(corpus):

    # your code here ...

    return m_dict

Apply the function to the quotes. Your final output should look something like this:
    
```
{'Healing': ['comes'],
 'comes': ['from', 'the', 'the', ...],
 'from': ['taking', 'aesthetic', 'a'],
```

In [None]:
# quote_dict = get_quote_dict(corpus)
# quote_dict

### Exercise 2: Create a text generator

Create a function that generates sentences. It should take two things as inputs:

* The dictionary you just created
* The number of words you want generated

Then, generate a few sentences!

Your function can accept a user-defined "seed" word that starts the generator, or you can have the function select a random first token. After that, think about how to pick the "next word(s)" *at random* given the "current" word, using the quote dictionary from the first exercise. *Hint: Consider [`numpy.random.choice`](https://numpy.org/doc/stable/reference/random/generated/numpy.random.choice.html).*

In [None]:
def generate_sentence(quote_dict, count=15):

    # your code here ...

    return(sentence)

In [None]:
# generate_sentence(quote_dict)

*Notice how this exercise illustrates both a Markov Chain (with constant transition probabilities) **and** a Monte Carlo simulation (iterative sampling from a constantly defined probability distribution of words)!*