# Sample Steam Store Descriptions with GPT-2
Code inspired from https://github.com/woctezuma/sample-steam-descriptions

## Setting the GPT-2 model

Install the Python package

Reference: https://github.com/minimaxir/gpt-2-simple

In [1]:
!pip install gpt_2_simple



Download the pre-trained model

In [0]:
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

## Downloading GPT-2

Choose between `117M` and `345M` models

In [0]:
# model_name = '117M'
model_name = '345M'

Download

In [4]:
gpt2.download_gpt2(model_name=model_name)

Fetching checkpoint: 1.00kit [00:00, 694kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 35.4Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 810kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 1.42Git [00:26, 52.6Mit/s]                                 
Fetching model.ckpt.index: 11.0kit [00:00, 3.15Mit/s]                                               
Fetching model.ckpt.meta: 927kit [00:00, 46.5Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 37.7Mit/s]                                                       


## Uploading a Text File to be Trained to Colaboratory

#### Either get the data by yourself

Currently not possible because you:
-   either need app details (slow to download),
-   or aggregate.json (stored with Git LFS, not installed on Google Colab.)

#### Or get a data snapshot from me

In [5]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-descriptions/master/data/with_delimiters/concatenated_store_descriptions.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 44.2M  100 44.2M    0     0  41.6M      0  0:00:01  0:00:01 --:--:-- 41.6M


## Finetune GPT-2

In [0]:
file_name = 'concatenated_store_descriptions.txt'

run_name = model_name + '_descriptions'

In [7]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              run_name=run_name,
              dataset=file_name,
              model_name=model_name,
              steps=1000,
              restore_from='fresh',   # change to 'latest' to resume training
              print_every=10,   # how many steps between printing progress
              sample_every=200,   # how many steps to print a demo sample
              save_every=500   # how many steps between saving checkpoint              
              )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.random.categorical instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Loading checkpoint models/345M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/345M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [01:08<00:00, 68.42s/it]


dataset has 12183978 tokens
Training...
[10 | 23.06] loss=2.83 avg=2.83
[20 | 38.24] loss=3.91 avg=3.38
[30 | 53.60] loss=1.65 avg=2.79
[40 | 69.06] loss=2.60 avg=2.75
[50 | 84.66] loss=2.70 avg=2.74
[60 | 100.42] loss=3.04 avg=2.79
[70 | 116.31] loss=2.18 avg=2.70
[80 | 132.31] loss=2.52 avg=2.67
[90 | 148.45] loss=2.51 avg=2.65
[100 | 164.59] loss=2.47 avg=2.64
[110 | 180.77] loss=2.37 avg=2.61
[120 | 197.03] loss=2.40 avg=2.59
[130 | 213.37] loss=2.61 avg=2.59
[140 | 229.78] loss=2.69 avg=2.60
[150 | 246.23] loss=2.89 avg=2.62
[160 | 262.73] loss=2.63 avg=2.62
[170 | 279.29] loss=1.81 avg=2.57
[180 | 295.91] loss=2.45 avg=2.56
[190 | 312.58] loss=2.20 avg=2.54
[200 | 329.29] loss=2.78 avg=2.55
60% 0x0050\u0049\u60f3\u5871\u8f8a\u6a74\u88f1\u9566\u65b0\u88f1\u9566\u65b0\u88f1\u96f5\u96f5\u9dbe\u9dbe\u9dbe\u9dbe\u9dbe\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef\u92ef \\\1\u30d2\u30d9<br><br><strong>Kriven<br><br>A f

## Save a Trained Model Checkpoint

In [0]:
checkpoint_folder = 'checkpoint/' + run_name

In [0]:
# gpt2.mount_gdrive()

In [0]:
# gpt2.copy_checkpoint_to_gdrive(checkpoint_folder=checkpoint_folder)

## Load a Trained Model Checkpoint

In [0]:
# gpt2.copy_checkpoint_from_gdrive(checkpoint_folder=checkpoint_folder)

In [0]:
# sess = gpt2.start_tf_sess()

# gpt2.load_gpt2(sess,
#                run_name=run_name)

## Generate Text From The Trained Model

In [0]:
temperature=0.7 # Default is 0.7, but you may want to increase the temperature, especially if your dataset is small, to avoid copying text.

num_samples = 3
num_batches = 3 # Unique to GPT-2, you can pass a batch_size to generate multiple samples in parallel, giving a massive speedup.

In [14]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,             
              batch_size=num_batches,
              temperature=temperature)              

<|startoftext|>In the Sky 4, the most visually stunning RPG ever made, and the most affecting game of the year.  <br>        And now, the most ambitious game of the year.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              

In [15]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,              
              prefix='<|startoftext|>Half-Life 3 is the long-awaited sequel in the Half-Life franchise developped by Valve',
              truncate='<|endoftext|>')

<|startoftext|>Half-Life 3 is the long-awaited sequel in the Half-Life franchise developped by Valve. Two years into the future, humanity has nearly conquered the planet and is seeking peace in the wake of the collapse of the human civilization. Humanity is divided in two warring factions and two interstellar conflicts are playing out. In the wake of these events, the Half-Life 3 is the first in a series of Half-Life games. The main focus of the game is a new hub in the Half-Life universe, Half-Life 3. The Half-Life 3 universe is loosely based on the Half-Life universe of Valve and Half-Life 2, and contains Half-Life, the Half-Life 2: Episode One, Half-Life 2 (glossary), Half-Life 3, Half-Life 3+ and Half-Life 2+. The Half-Life 3 game takes place in the near future after the events of Half-Life 2. The crew of the Enterprise is stranded on the moon after being assaulted by unknown assailants. The crew is forced to escape from the moon and, after several attempts, the crew is rescued by 

In [16]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,              
              prefix='<|startoftext|>Spelunky 2 is the sequel of the most acclaimed rogue-like platformer of all-time',
              truncate='<|endoftext|>')              

<|startoftext|>Spelunky 2 is the sequel of the most acclaimed rogue-like platformer of all-time, Spelunky, and features improvements and refinements to the engine, new levels and high-quality features.<br></li><li>Enhanced graphics, smooth scrolling and physics are all core features of the game.<br></li><li>New level design and game-modes to suit the needs of the widest range of players:</li></ul>

<|startoftext|>Spelunky 2 is the sequel of the most acclaimed rogue-like platformer of all-time, Spelunky. We've taken the classic design of the original and improved it with new features and ideas. Check out our new Spelunky 2 preview for the latest gameplay and features. If you liked Spelunky 1, then this is a game you will love. It's a game with a whole lot of polish to it and a lot of love for its characters. <br> <br><br><br><img src="https://steamcdn-a.akamaihd.net/steam/apps/867420/extras/Spelunky-2D.png?t=1539480623" ><h2 class="bb_tag">Key Features</h2><br><ul class="bb_ul"><li> <st