# Sample Steam Store Descriptions with GPT-2
Code inspired from https://github.com/woctezuma/sample-steam-descriptions

## Setting the GPT-2 model

Install the Python package

Reference: https://github.com/minimaxir/gpt-2-simple

In [0]:
!pip install gpt_2_simple



Import

In [0]:
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

## Downloading GPT-2

Choose between `117M` and `345M` models

In [0]:
# model_name = '117M'
model_name = '345M'

Download the pre-trained model

In [0]:
gpt2.download_gpt2(model_name=model_name)

Fetching checkpoint: 1.00kit [00:00, 741kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 47.6Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 393kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 1.42Git [00:20, 68.9Mit/s]                                 
Fetching model.ckpt.index: 11.0kit [00:00, 2.77Mit/s]                                               
Fetching model.ckpt.meta: 927kit [00:00, 47.0Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 39.2Mit/s]                                                       


## Uploading a Text File to be Trained to Colaboratory

### Either get the data by yourself

Currently not possible because you:
-   either need app details (slow to download),
-   or aggregate.json (stored with Git LFS, not installed on Google Colab.)

### Or get a data snapshot from me

In [0]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-descriptions/master/data/with_delimiters/concatenated_store_descriptions.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0 44.2M    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100 44.2M  100 44.2M    0     0   151M      0 --:--:-- --:--:-- --:--:--  150M


## Finetune GPT-2

In [0]:
file_name = 'concatenated_store_descriptions.txt'

run_name = model_name + '_descriptions'

In [0]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              run_name=run_name,
              dataset=file_name,
              model_name=model_name,
              steps=1000,
              restore_from='fresh', # change to 'latest' to resume training
              print_every=10,       # how many steps between printing progress
              sample_every=200,     # how many steps to print a demo sample
              save_every=500        # how many steps between saving checkpoint              
              )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.random.categorical instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Loading checkpoint models/345M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/345M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [01:09<00:00, 69.26s/it]


dataset has 12183978 tokens
Training...
[10 | 24.55] loss=3.09 avg=3.09
[20 | 41.46] loss=2.91 avg=3.00
[30 | 58.72] loss=2.73 avg=2.91
[40 | 75.46] loss=2.02 avg=2.68
[50 | 91.90] loss=2.78 avg=2.70
[60 | 108.36] loss=2.26 avg=2.63
[70 | 124.95] loss=2.71 avg=2.64
[80 | 141.65] loss=2.50 avg=2.62
[90 | 158.27] loss=2.57 avg=2.62
[100 | 174.87] loss=2.86 avg=2.64
[110 | 191.54] loss=2.61 avg=2.64
[120 | 208.18] loss=2.01 avg=2.58
[130 | 224.85] loss=3.32 avg=2.64
[140 | 241.53] loss=3.03 avg=2.67
[150 | 258.24] loss=2.02 avg=2.63
[160 | 274.95] loss=2.28 avg=2.60
[170 | 291.66] loss=1.52 avg=2.53
[180 | 308.34] loss=1.90 avg=2.50
[190 | 325.00] loss=2.44 avg=2.49
[200 | 341.63] loss=2.74 avg=2.51
70 | Lair:2 | Learned a level 3 spell: Portal Projectile | HP: 45/62 MP: 0/15 10881 | Lair:3 | Learned a level 5 spell: Lightning Bolt | HP: 50/73 MP: 0/15 10887 | Lair:4 | Reached skill level 14 in Conjurations | HP: 46/82 MP: 0/16 11145 | Lair:4 | Learned a level 1 spell: Apportation | HP: 5

## Save a Trained Model Checkpoint

In [0]:
# gpt2.mount_gdrive()

In [0]:
# gpt2.copy_checkpoint_to_gdrive(run_name=run_name)

## Load a Trained Model Checkpoint

In [0]:
# gpt2.mount_gdrive()

In [0]:
# gpt2.copy_checkpoint_from_gdrive(run_name=run_name)

## Generate Text From The Trained Model

In [0]:
temperature=1.0 # Default is 0.7, but you may want to increase the temperature, especially if your dataset is small, to avoid copying text.
top_k = 40      # Default: 0   ; Recommended: 40  ; useless parameter if top_p > 0.0
top_p = 0.9     # Default: 0.0 ; Recommended: 0.9 ; no need for top_k if top_p > 0.0

In [0]:
num_samples = 3
num_batches = 3 # Unique to GPT-2, you can pass a batch_size to generate multiple samples in parallel, giving a massive speedup.

In [0]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,             
              batch_size=num_batches,
              temperature=temperature,
              top_k=top_k,
              top_p=top_p,
              truncate='<|endoftext|>')

A U-turn, maybe? Probably no!<br />
<br />
Maybe! Maybe?!<br />
<br />
Whatever it is, it is uncharacteristic of the big campus for a departure time. What with all the events, unknown events, that are unlikely to get resolved.<br />
<br />
Uptown campus is being shaken and won't take its mind off the excitement. So it is virtually closed until the process is over.<br />
<br />
In the unlikely event that everything goes well, the Time travel factor will simply teleport the atmosphere in multiple directions. Be careful, whatever you choose, what you choose will be a unique mix of events!

RS2·ROUGHT</h2>RISK OR EASY TO ATTACK. Proven Battlegrounds - More than 100 important buildings on the battlefield in your hands! Grinding your skills will test your skills and bring you far from total failure! <br><br><img src="https://steamcdn-a.akamaihd.net/steam/apps/788100/extras/ad_numbers.png?t=15494925994" ><h2 class="bb_tag">P1 WITH A RUNDOWN</h2><h2 class="bb_tag">HARD ACCESSORIES</h2>Get a sm

In [0]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,          
              top_k=top_k,
              top_p=top_p,              
              prefix='<|startoftext|>Half-Life 3 is the long-awaited sequel in the Half-Life franchise developped by Valve',
              truncate='<|endoftext|>')

<|startoftext|>Half-Life 3 is the long-awaited sequel in the Half-Life franchise developped by Valve Studios’Rise 3.  It's the natural evolution of the most acclaimed Half-Life games, all based on original design concepts.  Featuring a number of new features, fans and players alike will be intrigued by the game’s intriguing narrative and massive variety of gameplay with eight routes to complete the many scenarios set.  Return to the Half-Life 3 foundation is an immersive experience and a lot of fun for all who participated in the Half-Life 2/3 campaigns.</h2><br>Mark your description with the unique  badge, unleash a powerful power-up and enjoy the breathtaking environment, driving, rocket-firing, and grappling gizmo Gungy flying in your game pocket.<h2 class="bb_tag">Skateport Multiplayer (Team Deathmatch)</h2>Team Deathmatch is the easiest way to experience multiplayer shooter multiplayer.  In the traditional Half-Life scenario, make enemies cry as you defeat them.<h2 class="bb_tag">

In [0]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,          
              top_k=top_k,
              top_p=top_p,              
              prefix='<|startoftext|>Spelunky 2 is the sequel of the most acclaimed rogue-like platformer of all-time',
              truncate='<|endoftext|>')              

<|startoftext|>Spelunky 2 is the sequel of the most acclaimed rogue-like platformer of all-time. Steam trading cards are now included as content for your games. Beat a number of absurdly difficult levels to expand your collection and unlock new ones!<h2 class="bb_tag">Touched by Rogue One</h2>Dark Shadows will bring you new strategies, plus you will be able to participate in online tournaments where you can battle your friends! When you need to find all the coins in a treasure chest, use Dark Shadows to collect them from each of the targets! <br><ul class="bb_ul"><li>Each level is made up of 15 randomly generated tiles that are visually and functionally identical to tilesets from Spyro the Dragon Age. <br></li><li>Eight exciting skill sections, each one featuring over 50 skill combinations that will test your skills against opponents.<br></li><li>Fast-paced, intense combat.<br></li><li>Solve puzzles, collect treasure chests, and help Dark Shadows fulfill its most important purpose.<br>