Source: 

*   https://pypi.org/project/gpt-2-simple/#description
*   https://medium.com/@stasinopoulos.dimitrios/a-beginners-guide-to-training-and-generating-text-using-gpt2-c2f2e1fbd10a
*   https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce#scrollTo=VHdTL8NDbAh3
*  https://github.com/ak9250/gpt-2-colab
*  https://www.aiweirdness.com/d-and-d-character-bios-now-making-19-03-15/
*  https://minimaxir.com/2019/09/howto-gpt2/





#Let's teach AI writing like a Shakespeare 🎓

##Installing the model

In [None]:
#install the library we'll use today
!pip install gpt-2-simple

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


##Generating text with basic model

###Importing and loading necessary components

In [None]:
#import what we need
import gpt_2_simple as gpt2 #for gpt-2 (our AI model)
import os #lets us doing things with files and folders
import requests #this one helps to dowload from the internet

In [None]:
#and let's download our AI model
gpt2.download_gpt2()   # model is saved into current directory under /models/124M/

Fetching checkpoint: 1.05Mit [00:00, 398Mit/s]                                                      
Fetching encoder.json: 1.05Mit [00:01, 561kit/s]
Fetching hparams.json: 1.05Mit [00:00, 435Mit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [01:45, 4.70Mit/s]
Fetching model.ckpt.index: 1.05Mit [00:00, 409Mit/s]                                                
Fetching model.ckpt.meta: 1.05Mit [00:01, 844kit/s]
Fetching vocab.bpe: 1.05Mit [00:01, 848kit/s]


In [None]:
#strating the session so we can play with the gpt-2 model
sess = gpt2.start_tf_sess()

In [None]:
#we load the model from file to use it
gpt2.load_gpt2(sess, run_name='124M', checkpoint_dir='models')

Loading checkpoint models/124M/model.ckpt


###Text generation

In [None]:
#this is how we would start model statement
prefix = "Is there a second Earth?"

In [None]:
#the model is generating text
gpt2.generate(sess, run_name='124M', checkpoint_dir='models', prefix=prefix, length=50)

Is there a second Earth? — J.J. Abrams (@jimawars) August 22, 2012

This post originally appeared on The Verge.

Photo: Star Wars: The Force Awakens<|endoftext|>A Democratic candidate is preparing to run for Texas Senate because she is


##Generating text with improved (finetuned) model

**IMPORTANT**
</br>Restart the runtime (Runtime -> Restart runtime)

###Importing and loading necessary components

In [None]:
#import what we need
import gpt_2_simple as gpt2 #for gpt-2 (our AI model)
import os #lets us doing things with files and folders
import requests #this one helps to dowload from the internet

In [None]:
#let's dowload a file with all Shakespeare plays
file_name = "shakespeare.txt"
if not os.path.isfile(file_name):
	url = "https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt"
	data = requests.get(url)

	with open(file_name, 'w') as f:
		f.write(data.text)

In [None]:
#strating the session so we can play with the gpt-2 model
sess = gpt2.start_tf_sess()

###Teaching our model

In [None]:
#finetuning with shakespeare.txt (which, to be honest, means that we are teaching the model how to write like a shakespeare)
#it takes a lot of time (~15min)...
gpt2.finetune(sess, 'shakespeare.txt', steps=500)   # steps is max number of training steps

Loading checkpoint models/124M/model.ckpt
Loading dataset...


100%|██████████| 1/1 [00:01<00:00,  1.50s/it]


dataset has 338025 tokens
Training...
[1 | 7.44] loss=4.08 avg=4.08
[2 | 9.65] loss=3.68 avg=3.88
[3 | 11.86] loss=3.84 avg=3.87
[4 | 14.09] loss=3.91 avg=3.88
[5 | 16.33] loss=3.65 avg=3.83
[6 | 18.57] loss=3.42 avg=3.76
[7 | 20.82] loss=3.72 avg=3.76
[8 | 23.08] loss=3.46 avg=3.72
[9 | 25.33] loss=3.55 avg=3.70
[10 | 27.61] loss=3.49 avg=3.68
Saving checkpoint/run1/model-10


###Text generation

In [None]:
prefix = "Is there a second Earth?"

In [None]:
gpt2.generate(sess, prefix=prefix, length=50)

Is there a second Earth?

No.

A follow-up question: What is the answer to all that?

Yes, the answer is to the last of the Doomsday Clock.

I will make it obvious.

The last of the Doomsday


###Saving model to Google Drive (optional)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
gpt2.copy_checkpoint_to_gdrive(run_name='run1')

You can find more texts e.g. on:
https://www.gutenberg.org/cache/epub/1597/pg1597.txt
</br></br>
You can download them to Colab using code similar to the ones below.

In [None]:
#!wget https://www.gutenberg.org/cache/epub/1597/pg1597.txt

--2023-02-22 13:24:56--  https://www.gutenberg.org/cache/epub/1597/pg1597.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 329071 (321K) [text/plain]
Saving to: ‘pg1597.txt’


2023-02-22 13:24:58 (367 KB/s) - ‘pg1597.txt’ saved [329071/329071]



In [None]:
#!wget https://www.gutenberg.org/files/98/98-0.txt

--2023-02-22 13:25:10--  https://www.gutenberg.org/files/98/98-0.txt
Resolving www.gutenberg.org (www.gutenberg.org)... 152.19.134.47, 2610:28:3090:3000:0:bad:cafe:47
Connecting to www.gutenberg.org (www.gutenberg.org)|152.19.134.47|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 807231 (788K) [text/plain]
Saving to: ‘98-0.txt’


2023-02-22 13:25:12 (718 KB/s) - ‘98-0.txt’ saved [807231/807231]

