# Text generation with an RNN

Adapted from the [tensorflow example](https://www.tensorflow.org/tutorials/sequences/text_generation), to run on [datahub.ucsd.edu](datahub.ucsd.edu), with TF 1.14. 

Robert Twomey, rtwomey@ucsd.edu.

### Import TensorFlow and other libraries

In [None]:
from __future__ import absolute_import, division, print_function

import tensorflow as tf
# tf.enable_eager_execution()

import numpy as np
import os
import time
from IPython.display import Image
import bs4
from bs4 import BeautifulSoup
import requests

### Download the Shakespeare dataset

Change the following line to run this code on your own data.

In [None]:
# Default example training on Shakespeare:
# path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

# Use your own file:
# path_to_file = "try_with_your_own_file.txt"

# Robert's file
# path_to_file = "script.txt"

path = 'https://fangj.github.io/friends/'
page = requests.get(path)
soup = BeautifulSoup(page.text, "html.parser")
links = soup.find_all("a")

with open('scripts.txt', 'a+') as out:
    for tag in links:
        if(tag.get('href') == 'season/07outtakes.html'):
            continue
        ep = path+tag.get('href')
        epage = requests.get(ep)
        esoup = BeautifulSoup(epage.text, "html.parser")
        esoup.find('html').decompose()
        out.write(esoup.text.strip())
        

### Read the data

First, look in the text.

In [1]:
!pip install gpt-2-simple --user



Now we have an integer representation for each character. Notice that we mapped the character as indexes from 0 to `len(unique)`.

In [1]:
import gpt_2_simple as gpt2

model_name = "124M"
gpt2.download_gpt2(model_name=model_name)   # model is saved into current directory under /models/124M/

sess = gpt2.start_tf_sess()
gpt2.finetune(sess,
              'scripts.txt',
              model_name=model_name,
              print_every=100,
              steps=1000)



Fetching checkpoint: 1.05Mit [00:00, 303Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:00, 40.0Mit/s]                                                   
Fetching hparams.json: 1.05Mit [00:00, 492Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:06, 81.3Mit/s]                                  
Fetching model.ckpt.index: 1.05Mit [00:00, 343Mit/s]                                                
Fetching model.ckpt.meta: 1.05Mit [00:00, 73.2Mit/s]                                                
Fetching vocab.bpe: 1.05Mit [00:00, 73.9Mit/s]                                                      
W1019 13:49:06.322689 140568921237312 deprecation.py:323] From /home/cgh003/.local/lib/python3.7/site-packages/gpt_2_simple/src/sample.py:17: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for up

Loading checkpoint checkpoint/run1/model-2100


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:03<00:00,  3.56s/it]


dataset has 665164 tokens
Training...
 okay I am not a big fan. Okay? I mean you’ve only been in there a couple of hours and you’ve only had it a couple of
hours so you’ve got this…smell really real?
Joey: Uh yeah y’know it’s coming from you. And I-I-I gotta change y’know.
Now is that why you care so much?
Chandler: No no, I-I-I’m gonna care a lot about you. And besides y’know what
if we were both like your sisters? Well you really would want to get married…Also
you’re great! But that’s a big mistake!
Joey: Dude I don’t want to get married! I’m already gotten
married! Whew! The actor I’m seeing gets their head blown off in Wayne.
HBO

Ending Credits
[Scene: Monica and Rachel's, time lapse.]
Chandler: What was that?
Ross: Get out of the chair!
Monica: Hang on a second! (She pinches him and he retreats.)
Chandler: Okay, I didn’t pry.
Rachel: Oh! You priped! 
Chandler: What?! (Monica tackles him and he retreats.) Monica Pross Poggle
You Prossed!
Rachel: I was taking off! (Monica pokes him

[2400 | 442.36] loss=0.35 avg=0.30
.
Monica: You know how you say you want the best guy to get the worst guy? Well,
that's true everywhere I go! Okay, but it's not true in this city!
Phoebe: I want the best guy. I want the best girl. I want the guy I love!
I want a girlfriend! I want a boyfriend!
Monica: I can't wait for you to grow up!
Phoebe: I'll give you five! You get to spend Thanksgiving with her and whatever
you do, do as the father urges you!
[Time lapse, Ross is entering the casino to find a crowd of people. Suddenly,
heaps pressure on the table and the dog collar falls off. He frantically pushes the table to
access the table and the dog collar slips. He yanks the table back, trying to get the dog
 collar to stay attached, and everyone gasps and runs away! He continues his trek back to the
table as the door opens! He stops, tries toggles between two different housings, and ends up
at the table.]
Rachel: Hey!
Ross: Ho-hum!
Rachel: Where are they?
Ross: Yeah, they left us their 

[2700 | 878.44] loss=0.17 avg=0.23
 Okay! No, 'cause! You don't know me, you don't know me!
Joey: Hey, hey!! (He puts a stop to that by putting on a bunch of balloons and
singing.) New Year's! Two joyous years of laughter!! (To everyone) Happy New Year!
EndJoey: (still trapped under Ross) Rach!
Chandler: Oh my God, this sequence isn't even in the episode I ordered from the
box. Is this the juice?
Ross: Whoa!
Opening Credits
[Scene: Monica, Chandler, and Phoebe's. Ross is sitting in the beanbag chair. Chandler
is on the couch.]
Chandler: And that guy from the box office is gonna be my big boy next. Okay,
I’m gonna have to see you grow back right now.
Ross: Ehh?
Chandler: Yeah. There’s just too many of you, I mean many women, and I
don’t want you to be the guy who’s disappointed you’re dead.
Ross: Well, I guess there’s someone I can talk to about that.
Phoebe: offers comfort and relief, as we enter
Chandler: Hey.
Ross: Hey.
Joey: Hey.
Chandler: Look, I understand if you came to this one,

[3000 | 1312.19] loss=0.13 avg=0.19
Saving checkpoint/run1/model-3000
 work.

MRS. GELLER: Really. But I was just thinking you wanted to call me.

MR. GELLER: I couldn’t get the message out.

MRS. GELLER: Jack. He’s gone.

MR. GELLER: He wanted to be long. So, uh, what would you do?

MRS. GELLER: Well, I would arrest the copyboy. Or at least I would try to find a
guy to take him down like the piggy bank guy. But I’m not really a piggy guy.

MR. GELLER: You’re piggy guy, you’re not a cartoonish figure.

MRS. GELLER: I am not a cartoonish figure.

MR. GELLER: Do you want a rugrats over here?

MRS. GELLER: Yes. Richard’s back.

BOTH: Yeah, I can’t believe this is happening.

JOEY ON TV: This is serious. She said that she moved to a condo west of
the line. (Ross points out the street.) She said that Richard’s back has been moved
to a condo west of the line.

ROSS, MONICA, and CHANDLER: That’s a rather interesting move, that’s
a rather interesting move. If that’s the same guy who just happe

W1019 14:13:44.423732 140568921237312 deprecation.py:323] From /opt/conda/lib/python3.7/site-packages/tensorflow/python/training/saver.py:960: remove_checkpoint (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to delete files with this prefix.


In [36]:
with open('scenes.txt', 'a+') as g:
    for i in range(5):
        g.write('\n===========Result '+ str(i) +'===========\n')
        result = gpt2.generate(sess,
                      length=500,
                      temperature=0.75,
                      prefix="[Scene",
                      top_p=0.8,
                      return_as_list=True)
        g.write(result[0])