# Text Generation

## Introduction

Markov chains can be used for very basic text generation. Think about every word in a corpus as a state. We can make a simple assumption that the next word is only dependent on the previous word - which is the basic assumption of a Markov chain.

Markov chains don't generate text as well as deep learning, but it's a good (and fun!) start.

## Select Text to Imitate

In this notebook, we're specifically going to generate review text for rr, so as a first step, let's extract the text from its review.

In [16]:
# Read in the corpus, including punctuation!
import pandas as pd

data = pd.read_pickle('../pickle/corpus.pkl')
data

Unnamed: 0,review,full_name
ktc,Amazing and beautiful place to stay : We went ...,KTDC Tea County
mm,Okayish : Hotel is just fine but view is great...,Misty Mountain
mtc,"Well maintained with clean room, with great vi...",Munnar Tea Country
rr,"Rivelute, Munnar : Location was extremely good...",Rivulet Resort
sc,Excellent Stay : Awesome experience in Swiss C...,Swiss County
tv,Excellent service and staff : The location is ...,Tea Valley


In [17]:
# Extract only rr's text
rr_text = data.review.loc['rr']
rr_text[:200]

'Rivelute, Munnar : Location was extremely good. Variety and good quality of food. All staff was polite, cooperative and always ready. Excellent Stay : All things are perfect like food stay and service'

## Build a Markov Chain Function

We are going to build a simple Markov chain function that creates a dictionary:
* The keys should be all of the words in the corpus
* The values should be a list of the words that follow the keys

In [18]:
from collections import defaultdict

def markov_chain(text):
    '''The input is a string of text and the output will be a dictionary with each word as
       a key and each value as the list of words that come after the key in the text.'''
    
    # Tokenize the text by word, though including punctuation
    words = text.split(' ')
    
    # Initialize a default dictionary to hold all of the words and next words
    m_dict = defaultdict(list)
    
    # Create a zipped list of all of the word pairs and put them in word: list of next words format
    for current_word, next_word in zip(words[0:-1], words[1:]):
        m_dict[current_word].append(next_word)

    # Convert the default dict back into a dictionary
    m_dict = dict(m_dict)
    return m_dict

In [19]:
# Create the dictionary for rr review, take a look at it
rr_dict = markov_chain(rr_text)
rr_dict

{'Rivelute,': ['Munnar'],
 'Munnar': [':',
  ':',
  ':',
  'and',
  ':',
  'Excellent',
  'Its',
  ':',
  ':',
  'town',
  ':',
  ':',
  'City',
  'and',
  ':',
  'but',
  'it',
  'town',
  'is',
  'is',
  'town',
  'town',
  'with',
  'valley,',
  'city)',
  'is',
  'but',
  'town.',
  'town.',
  'means',
  'city,',
  'and',
  'but'],
 ':': ['Location',
  'All',
  'The',
  'Location',
  'We',
  'Best',
  'Rivulet',
  'It',
  'Amazing',
  'I',
  'A',
  'Nice',
  'Amazing',
  'Most',
  'This',
  'Nice',
  'Food',
  "It's",
  'Pleasant',
  "It's",
  'It',
  'Overall',
  'It',
  'We',
  'Everything',
  'The',
  'Perfect',
  'Enjoyed',
  'Heavenly',
  'Excellent',
  'Excellent',
  'Calm,',
  'Good',
  'Nice',
  'It',
  'Great',
  'Good',
  'The',
  'Without',
  'Highly',
  'The',
  'View',
  'Very',
  'The',
  "Don't",
  'Hotel',
  'Location',
  'Rivulet',
  'Road',
  'Overall',
  'We',
  'Eco',
  'Nice',
  'Would',
  'Lack',
  'Restaurant',
  'Location',
  'Amazing',
  'Just',
  'Excellen

## Create a Text Generator

We're going to create a function that generates sentences. It will take two things as inputs:
* The dictionary you just created
* The number of words you want generated

Here are some examples of generated sentences:

>'Shape right turn– I also takes so that she’s got women all know that snail-trail.'

>'Optimum level of early retirement, and be sure all the following Tuesday… because it’s too.'

In [20]:
import random

def generate_sentence(chain, count=15):
    '''Input a dictionary in the format of key = current word, value = list of next words
       along with the number of words you would like to see in your generated sentence.'''

    # Capitalize the first word
    word1 = random.choice(list(chain.keys()))
    sentence = word1.capitalize()

    # Generate the second word from the value list. Set the new word as the first word. Repeat.
    for i in range(count-1):
        word2 = random.choice(chain[word1])
        word1 = word2
        sentence += ' ' + word2

    # End it with a period
    sentence += '.'
    return(sentence)

In [21]:
generate_sentence(rr_dict)

'Behind the city, nothing extraordinary. Has nice guesthouse with a great place to remember :.'