# Sample Steam Store Descriptions with GPT-2
Code inspired from https://github.com/woctezuma/sample-steam-descriptions

## Setting the GPT-2 model

Install the Python package

Reference: https://github.com/minimaxir/gpt-2-simple

In [1]:
!pip install gpt_2_simple

Collecting gpt_2_simple
  Downloading https://files.pythonhosted.org/packages/b6/cf/4003c7d85425af353e15d938bc0d87a0bdedd6b00229e1f7808c2524b518/gpt_2_simple-0.2.tar.gz
Building wheels for collected packages: gpt-2-simple
  Building wheel for gpt-2-simple (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/51/d0/bd/293c80200f60bcd75a0f4028684e55e959da3a2727858d98a0
Successfully built gpt-2-simple
Installing collected packages: gpt-2-simple
Successfully installed gpt-2-simple-0.2


Download the pre-trained model

In [0]:
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

## Downloading GPT-2

In [3]:
gpt2.download_gpt2()

Fetching checkpoint: 1.00kit [00:00, 330kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 49.3Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 323kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:09, 54.4Mit/s]                                  
Fetching model.ckpt.index: 6.00kit [00:00, 2.95Mit/s]                                               
Fetching model.ckpt.meta: 472kit [00:00, 35.9Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 37.1Mit/s]                                                       


## Mounting Google Drive

In [0]:
gpt2.mount_gdrive()

## Uploading a Text File to be Trained to Colaboratory

#### Either get the data by yourself

Currently not possible because you:
-   either need app details (slow to download),
-   or aggregate.json (stored with Git LFS, not installed on Google Colab.)

#### Or get a data snapshot from me

In [5]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-descriptions/master/data/concatenated_store_descriptions.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 43.1M  100 43.1M    0     0  36.8M      0  0:00:01  0:00:01 --:--:-- 36.8M


## Finetune GPT-2

In [0]:
file_name='concatenated_store_descriptions.txt'

In [7]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              run_name='descriptions',
              dataset=file_name,
              steps=1000,
              restore_from='fresh',   # change to 'latest' to resume training
              print_every=10,   # how many steps between printing progress
              sample_every=200,   # how many steps to print a demo sample
              save_every=500   # how many steps between saving checkpoint              
              )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.random.categorical instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Loading checkpoint models/117M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/117M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [01:08<00:00, 68.25s/it]


dataset has 11714797 tokens
Training...
[10 | 29.33] loss=2.80 avg=2.80
[20 | 53.64] loss=2.79 avg=2.79
[30 | 78.10] loss=2.63 avg=2.74
[40 | 101.76] loss=2.56 avg=2.69
[50 | 125.52] loss=2.72 avg=2.70
[60 | 149.66] loss=2.70 avg=2.70
[70 | 173.65] loss=3.05 avg=2.75
[80 | 197.55] loss=2.86 avg=2.76
[90 | 221.43] loss=2.81 avg=2.77
Saving checkpoint/descriptions/model-100


In [0]:
gpt2.copy_checkpoint_to_gdrive()

## Load a Trained Model Checkpoint

In [0]:
gpt2.copy_checkpoint_from_gdrive()

In [0]:
sess = gpt2.start_tf_sess()

gpt2.load_gpt2(sess,
               run_name='descriptions')

## Generate Text From The Trained Model

In [10]:
num_samples = 3
num_batches = 3 # Unique to GPT-2, you can pass a batch_size to generate multiple samples in parallel, giving a massive speedup.

gpt2.generate(sess,
              run_name='descriptions',
              nsamples=num_samples,
              batch_size=num_batches)

You can easily find the selection of your favorite songs by searching for them on our website, or by pressing and holding 'A' key on your keyboard.</li></ul>
<img src="https://steamcdn-a.akamaihd.net/steam/apps/525620/extras/download.png?t=1451802498" ><br><strong>Features:</strong><br><ul class="bb_ul"><li>Classic songs with very catchy melodies<br></li><li>Extras to play all songs in the game<br></li><li>Keyboard shortcuts to select songs from the list<br></li><li>1) Easy to use game mode<br></li><li>2) Highly customizable soundtrack with numerous options<br></li><li>3) Support for both music and music apps</li></ul>
<img src="https://steamcdn-a.akamaihd.net/steam/apps/525620/extras/Diede.png?t=1553820391" ><br>Diede is a new game in the genre of RPGs.<br><br>Diede is an action-based, supernatural role-playing game where you play as the protagonist of the game. You'll have to master the game mechanic of the game, and try to outsmart your enemies.<br>You'll need to defeat your enemies

In [11]:
gpt2.generate(sess,
              run_name='descriptions',
              nsamples=num_samples,
              batch_size=num_batches,
              prefix='Half-Life 3 is the long-awaited sequel in the Half-Life franchise developped by Valve')

H-Life 3 is the long-awaited sequel in the Half-Life franchise developped by Valve. Over the past year, we've been working hard to bring you the best Half-Life. From original Half-Life 2, to the latest, we are ready to do whatever we can to bring you the best Half-Life experience ever.
<strong>PLAY as a single player</strong> <br>You will be able to play as a single player or as a co-op with other players on the team. <br><br>In order to play as a co-op, you will have to complete the 'Keys to a Free Ride' game mode, and the 'Necessary Combat' game mode. <br><br>In order to complete the 'Keys to a Free Ride' game mode, you will have to copy the keys from the game and place them on the map. <br><br>Once you have played the game, you will be able to select which of the game modes to play in a co-op. <br><br>Each co-op has its own rules. You will have to make sure that you are careful to not get lost. <br><br>You will need to pick up the game keys from the game that you want to play and pl

In [12]:
gpt2.generate(sess,
              run_name='descriptions',
              nsamples=num_samples,
              batch_size=num_batches,
              prefix='Spelunky 2 is the sequel of the most acclaimed rogue-like platformer of all-time')

Selunky 2 is the sequel of the most acclaimed rogue-like platformer of all-time. Witness the terrifying battles and engage in brutal battles against your enemies, while exploring a vast array of levels you can explore.
When you die, your spirit is gone, and you are a new prisoner in a new prison. You have been imprisoned, for a long time, and you are not going to be free forever.
A soldier in a private army has sent his loyal friend to an enemy base. The soldiers defend the base against attacks from both conventional and unconventional forces. But there are no soldiers left, and they are trapped in the base. The soldier's spirit is gone, and he is a prisoner of the base. But he has to wait until the next day to be released. So when he is released, he is free again. But it is only through the power of the spirits that he can escape, and find a new life in this terrible wasteland.
A boy at his high school reacts to the fact that his school is attacking his school. He is forced to make hi