# Week 4. Sequence models and literature

* 유니버스 / CogΨ : RL & NLP - NLP 기초 [1]
* 김무성

------------------

# 차례
1. A conversation with Andrew Ng
2. Introduction
3. Looking into the code
4. Training the data
5. More on training the data
6. Notebook for lesson 1
7. Finding what the next word should be
8. Example
9. Predicting a word
10. Poetry!
11. Looking into the code
12. Laurence the poet!
13. Your next task
* Week 4 Quiz
* Exercise 4- Using LSTMs, see if you can write Shakespeare!

-----------------------------

* we thought it might be a bit of fun to turn the tables away from classification and use your knowledge for prediction. 
* Given a body of words, you could conceivably predict the word most likely to follow a given word or phrase, and once you've done that, to do it again, and again. 
* With that in mind, this week you'll build a poetry generator. 
* It's trained with the lyrics from traditional Irish songs, and can be used to produce beautiful-sounding verse of it's own!

#### 참고 [2]
    - Instructor의 colab - https://colab.research.google.com/github/lmoroney

------------------

## 1. A conversation with Andrew Ng

* One of the most fun applications of sequence models, is that they can read the body of text, so train on the certain body of text, and then generate or synthesize new texts, that sounds like it was written by similar author or set of authors. 
* In the courses, we're going to take a body of work from Shakespeare, and Shakespeare as a medieval English author, and so he wrote in a different style of English than we're used to reading, and it makes for a really interesting exercise in text generation,
  - because if you're not familiar with like Shakespearean language and how it is done, then the language is actually generated by the neural network, will probably look a lot like the original one, probably, if you lived in the 1600's when Shakespeare was around, you'd be able to identify as being generated by a neural network, but for us now with this slightly different version of England, it actually makes for a really fun scenario. 
* There's a really fun application in neural network, and one of my favorite teachers in high school, was actually my English teacher that made me memorize a lot of Shakespeare. I really wonder what she would think of this. 

------------------

## 2. Introduction

* We've seen classification of text over the last few lessons. 
* But what about if we want to generate new text. 
* Instead of generating new text, how about thinking about it as a prediction problem. 
    - Remember when for example you had a bunch of pixels for a picture, and you trained a neural network to classify what those pixels were, and it would predict the contents of the image, 
    - like maybe a fashion item, or a piece of handwriting. 
    - Well, text prediction is very similar. 
* We can get a body of texts, 
    - extract the full vocabulary from it, 
    - and then create datasets from that, 
        - where 
            - we make it phrase the Xs 
            - and the next word in that phrase to be the Ys. 
* For example, consider the phrase, Twinkle, Twinkle, Little, Star. 
    - What if we were to create training data 
        - where 
            - the Xs are Twinkle, Twinkle, Little, 
            - and the Y is star. 
    - Then, whenever neural network 
        - sees the words Twinkle, Twinkle, Little, 
        - the predicted next word would be star. 
* Thus given enough words in a corpus with a neural network trained on each of the phrases in that corpus, and the predicted next word, we can come up with some pretty sophisticated text generation

------------------

## 3. Looking into the code

* So let's start with a simple example. 
* I've taken a traditional Irish song and here's the first few words of it
* In this case to keep things simple, I put the entire song into a single string. 
* You can see that string here and I've denoted line breaks with \n. 
* I can create a Python list of sentences from the data 
* Using the tokenizer, 
    - it will create the dictionary of words 
    - and the overall corpus. 

<img src="figures/cap03_1.png" />
<img src="figures/cap03_2.png" />

------------------

## 4. Training the data

* So now, let's look at the code 
    - to take this corpus and 
    - turn it into training data. 
* Here's the beginning, I will unpack this line by line. 
    - First of all, our training x's will be called input sequences
    - and this will be a Python list. 
* Then for each line in the corpus, we'll generate a token list using the tokenizers
    - This will convert 
        - a line of text like, 
            - "In the he town of Athy one Jeremy Lanigan," 
        - into 
            - a list of the tokens representing the words. 
* Then we'll iterate over this list of tokens and create a number of n-grams sequences
    - namely 
        - the first two words in the sentence or one sequence, 
        - then the first three are another sequence etc. 
    - The result of this will be, for the first line in the song, the following input sequences that will be generated. 
    - The same process will happen for each line, 
        - but as you can see, the input sequences are 
            - simply the sentences being broken down into 
                - phrases, 
                - the first two words, 
                - the first three words, etc. 
* We next need to find the length of the longest sentence in the corpus. 
* Once we have our longest sequence length, the next thing to do is pad all of the sequences so that they are the same length. 
    - We will pre-pad with zeros to make it easier to extract the label, you'll see that in a few moments. 
* Now, that we have our sequences, the next thing we need to do is turn them into x's and y's, our input values and their labels. 
    - When you think about it, now that the sentences are represented in this way, all we have to do is take all but the last character as the x and then use the last character as the y on our label. 
* By this point, it should be clear why we did pre-padding, 
    - because it makes it much easier for us to get the label simply by grabbing the last token.

<img src="figures/cap04_1.png" />
<img src="figures/cap04_2.png" />
<img src="figures/cap04_3.png" />
<img src="figures/cap04_4.png" />
<img src="figures/cap04_5.png" />
<img src="figures/cap04_6.png" />
<img src="figures/cap04_7.png" />
<img src="figures/cap04_8.png" />
<img src="figures/cap04_9.png" />
<img src="figures/cap04_10.png" />
<img src="figures/cap04_11.png" />

------------------

## 5. More on training the data

* So now, we have to split our sequences into our x's and our y's. 
* So to get my x's, I just get all of the input sequences sliced to remove the last token. 
* To get the labels, I get all of the input sequence sliced to keep the last token. 
* Now, I should one-hot encode my labels as this really is a classification problem. 
    - Where given a sequence of words, I can classify from the corpus, what the next word would likely be. 
    - So to one-hot encode, I can use the contrast utility to convert a list to a categorical.
    - I simply give it the list of labels and the number of classes which is my number of words, and it will create a one-hot encoding of the labels. 

<img src="figures/cap05_1.png" />
<img src="figures/cap05_2.png" />
<img src="figures/cap05_3.png" />
<img src="figures/cap05_4.png" />

------------------

## 6. Notebook for lesson 1

* 실습 링크 -  [lesson 1 notebook ](https://colab.research.google.com/github/lmoroney/dlaicourse/blob/master/TensorFlow%20In%20Practice/Course%203%20-%20NLP/Course%203%20-%20Week%204%20-%20Lesson%201%20-%20Notebook.ipynb)

------------------

## 7. Finding what the next word should be

* In the previous video we looked at the data, a string containing a single song, and saw how to prepare that for generating new text. 
    - We saw how to tokenize the data and then create sub-sentence engrams that were labelled with the next word in the sentence. 
    - We then one-hot encoded the labels to get us into a position where we can build a neural network that can, given a sentence, predict the next word. 
* Now that we have our data as xs and ys, it's relatively simple for us to create a neural network to classify what the next word should be, given a set of words. 
* Here's the code. 
    - We'll start with an embedding layer. 
        - We'll want it to handle all of our words, so we set that in the first parameter. 
        - The second parameter is the number of dimensions to use to plot the vector for a word. 
            - Feel free to tweak this to see what its impact would be on results, 
            - but I'm going to keep it at 64 for now. 
            - Finally, the size of the input dimensions will be fed in, and this is the length of the longest sequence minus 1. 
                - We subtract one because we cropped off the last word of each sequence to get the label, 
                - so our sequences will be one less than the maximum sequence length. 
    - Next we'll add an LSTM. As we saw with LSTMs earlier in the course, 
        - their cell state means that they carry context along with them, 
            - so it's not just next door neighbor words that have an impact. 
        - I'll specify 20 units here, but again, you should feel free to experiment. 
    - Finally there's a dense layer sized as the total words, 
        - which is the same size that we used for the one-hot encoding. 
        - Thus this layer will have one neuron, per word and 
            - that neuron should light up when we predict a given word. 
    - We're doing a categorical classification, so we'll set the laws to be categorical cross entropy. 
    - And we'll use the atom optimizer, which seems to work particularly well for tasks like this one. 
    - Finally, we'll train for a lot of epoch, say about 500, as it takes a while for a model like this to converge, particularly as it has very little data. 
        - So if we train the model for 500 epochs, it will look like this.

<img src="figures/cap07_1.png" />
<img src="figures/cap07_2.png" />

------------------

## 8. Example

* Here are a few phrases that were generated 
    - when I gave the neural network the sentence 
        - "Lawrence went to Dublin", 
        - and I asked it to predict 
            - the next 10 words. 
* notice that there's a lot of repetition of words. 
* This is because our LSTM was only carrying context forward. 
* Let's take a look at what happens if we change the code to be bidirectional. 
    - I can see that I do converge a bit quicker as you'll see in this chart. 
* They make a little bit more sense, but there's still some repetition. 
    - That being said, remember this is a song where words rhyme such as ball, all and wall, et cetera, and as such many of them are going to show up.

<img src="figures/cap08_1.png" />
<img src="figures/cap08_2.png" />
<img src="figures/cap08_3.png" />
<img src="figures/cap08_4.png" />
<img src="figures/cap08_5.png" />
<img src="figures/cap08_6.png" />
<img src="figures/cap08_7.png" />

------------------

## 9. Predicting a word

* So now, let's take a look at how to get a prediction for a word and how to generate new text based on those predictions. 
* So let's start with a single sentence. 
    - For example, 'Lawrence went to Dublin.' 
        - I'm calling this sentence the seed. 
* If I want to predict the next 10 words in the sentence to follow this, 
    - then this code will tokenizer that for me using the text to sequences method on the tokenizer. 
    - As we don't have an outer vocabulary word, it will ignore 'Lawrence,' which isn't in the corpus and will get the following sequence. 
* This code will then pad the sequence so it matches the ones in the training set. 
* So we end up with something like this which we can pass to the model to get a prediction back. 
* This will give us the token of the word most likely to be the next one in the sequence. 
* So now, we can do a reverse lookup on the word index items to turn the token back into a word and to add that to our seed texts, and that's it. Here's the complete code to do that 10 times and you can tweak it for more. 
* But do you know that the more words you predict, the more likely you are going to get gibberish? 
    - Because each word is predicted, so it's not 100 per cent certain, and then the next one is less certain, and the next one, etc. 
    - So for example, if you try the same seed and predict 100 words, you'll end up with something like this. 
* Using a larger corpus we'll help, and then the next video, you'll see the impact of that, as well as some tweaks that a neural network that will help you create poetry.

<img src="figures/cap09_1.png" />
<img src="figures/cap09_2.png" />
<img src="figures/cap09_3.png" />
<img src="figures/cap09_4.png" />
<img src="figures/cap09_5.png" />
<img src="figures/cap09_6.png" />
<img src="figures/cap09_7.png" />
<img src="figures/cap09_8.png" />
<img src="figures/cap09_10.png" />

------------------

## 10. Poetry!

* In the previous video, 
    - we used the single song to generate text. 
    - We got some text from it but once we tried to predict beyond a few words, 
    - it rapidly became gibberish. 
* So in this video, 
    - we'll take a look at adapting that work for a larger body of words to see the impact.
* The good news is that it will require very little code changes, so you'll be able to get it working quite quickly. 
* I've prepared a file 
    - with a lot of songs that has 1,692 sentences in all to see what the impact would be on the poetry that a neural network would create. 
* To download these lyrics, you can use this code.

* 다운로드 - [irish-lyrics](https://storage.googleapis.com/laurencemoroney-blog.appspot.com/irish-lyrics-eof.txt)

<img src="figures/cap10_1.png" />

------------------

## 11. Looking into the code

* Now instead of hard-coding the song into a string called data, I can read it from the file like this. 
* I've updated the model a little bit to make it work better with a larger corpus of work 
    - but please feel free to experiment with these hyper-parameters. 
        - Three things that you can experiment with. 
            - First, is the dimensionality of the embedding, 
                - 100 is purely arbitrary and I'd love to hear what type of results you will get with different values.
            - Similarly, I increase the number of LSTN units to 150. 
                - Again, you can try different values 
                - or you can see how it behaves if you remove the bidirectional. 
                    - Perhaps you want words only to have forward meaning, where big dog makes sense but dog big doesn't make so much sense. 
            - Perhaps the biggest impact is on the optimizer. 
                - Instead of just hard coding Adam as my optimizer this time and getting the defaults, 
                    - I've now created my own Adam optimizer and set the learning rate on it.
                    - Try experimenting with different values here and see the impact that they have on convergence. 
                    - In particular, see how different convergences can create different poetry. 
* And of course, training for different epochs will always have an impact with more generally being better but eventually you'll hit the law of diminishing returns.

<img src="figures/cap11_1.png" />
<img src="figures/cap11_2.png" />
<img src="figures/cap11_3.png" />
<img src="figures/cap11_4.png" />
<img src="figures/cap11_5.png" />
<img src="figures/cap11_6.png" />

------------------

## 12. Laurence the poet!

* In a co-lab with this data and these parameters, using a GPU, it typically takes about 20 minutes to train a model. 
* Once it's done, try a a seed sentence and get it to give you 100 words. 
* Note that there are no line breaks in the prediction, so you'll have to add them manually to turn the word stream into poetry. 
* Here's a simple example. 
    - I used a famous quote from a very famous movie and let's see if you can recognize it.
    - And I tried that to see what type of poem it would give me and I got this.

<img src="figures/cap12_1.png" />

------------------

## 13. Your next task

* Now, this approach works very well until you have very large bodies of text with many many words. 
* So for example, 
    - you could try the complete works of Shakespeare 
    - and you'll likely hit memory errors, 
        - as assigning the one-hot encodings of the labels to matrices that have over 31,477 elements, 
            - which is the number of unique words in the collection, 
        - and there are over 15 million sequences generated using the algorithm that we showed here. 
        - So the labels alone would require the storage of many terabytes of RAM. 
* So for your next task, you'll go through a workbook by yourself that uses character-based prediction. 
    - The full number of unique characters in a corpus is far less than the full number of unique words, at least in English. 
    - So the same principles that you use to predict words can be used to apply here. 
    - The workbook is at this URL, so try it out, and once you've done, that you'll be ready for this week's final exercise.

* https://www.tensorflow.org/tutorials/sequences/text_generation

------------------

## Week 4 Quiz

* [Quiz 링크](https://www.coursera.org/learn/natural-language-processing-tensorflow/exam/Vxr03/week-4-quiz)

------------------

## Week 4 Outro

* Over the last four weeks you've gotten a grounding in how to do Natural Language processing with TensorFlow and Keras. 
    - You went from first principles -- basic Tokenization and Padding of text to produce data structures that could be used in a Neural Network.
* You then learned about embeddings, 
* and how words could be mapped to vectors, 
* and words of similar semantics given vectors pointing in a similar direction, 
    - giving you a mathematical model for their meaning, which could then be fed into a deep neural network for classification.
* From there you started learning about sequence models, 
* and how they help deepen your understanding of sentiment in text 
    - by not just looking at words in isolation, 
    - but also how their meanings change when they qualify one another.
* You wrapped up by taking everything you learned and using it to build a poetry generator!
* This is just a beginning in using TensorFlow for natural language processing. 
* I hope it was a good start for you, and you feel equipped to go to the next level!

------------------

## Exercise 4- Using LSTMs, see if you can write Shakespeare!

* In this course you’ve done a lot of NLP and text processing. 
* This week you trained with a dataset of Irish songs to create traditional-sounding poetry.
* For this week’s exercise, you’ll take a corpus of Shakespeare sonnets, 
* and use them to train a model. 
* Then, see if that model can create poetry!

* [실습 colab 노트북](https://colab.research.google.com/github/lmoroney/dlaicourse/blob/master/TensorFlow%20In%20Practice/Course%203%20-%20NLP/NLP_Week4_Exercise_Shakespeare_Question.ipynb)

* [실습 정답 colab 노트북](https://colab.research.google.com/github/lmoroney/dlaicourse/blob/master/TensorFlow%20In%20Practice/Course%203%20-%20NLP/NLP_Week4_Exercise_Shakespeare_Answer.ipynb)

------------------

# 참고자료 

* [1] Natural Language Processing in TensorFlow - https://www.coursera.org/learn/natural-language-processing-tensorflow
* [2] Instructor의 colab - https://colab.research.google.com/github/lmoroney