# Sample Steam Store Descriptions with GPT-2
Code inspired from https://github.com/woctezuma/sample-steam-descriptions

## Setting the GPT-2 model

Install the Python package

Reference: https://github.com/minimaxir/gpt-2-simple

In [1]:
!pip install gpt_2_simple

Collecting gpt_2_simple
  Downloading https://files.pythonhosted.org/packages/b6/cf/4003c7d85425af353e15d938bc0d87a0bdedd6b00229e1f7808c2524b518/gpt_2_simple-0.2.tar.gz
Building wheels for collected packages: gpt-2-simple
  Building wheel for gpt-2-simple (setup.py) ... [?25ldone
[?25h  Stored in directory: /root/.cache/pip/wheels/51/d0/bd/293c80200f60bcd75a0f4028684e55e959da3a2727858d98a0
Successfully built gpt-2-simple
Installing collected packages: gpt-2-simple
Successfully installed gpt-2-simple-0.2


Download the pre-trained model

In [0]:
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

## Downloading GPT-2

In [3]:
gpt2.download_gpt2()

Fetching checkpoint: 1.00kit [00:00, 735kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 45.6Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 544kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 498Mit [00:07, 70.9Mit/s]                                  
Fetching model.ckpt.index: 6.00kit [00:00, 1.55Mit/s]                                               
Fetching model.ckpt.meta: 472kit [00:00, 47.6Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 40.3Mit/s]                                                       


## Mounting Google Drive

In [4]:
gpt2.mount_gdrive()

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/drive


## Uploading a Text File to be Trained to Colaboratory

#### Either get the data by yourself

Currently not possible because you:
-   either need app details (slow to download),
-   or aggregate.json (stored with Git LFS, not installed on Google Colab.)

#### Or get a data snapshot from me

In [5]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-descriptions/master/data/concatenated_store_descriptions.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 43.1M  100 43.1M    0     0  51.5M      0 --:--:-- --:--:-- --:--:-- 51.4M


## Finetune GPT-2

In [0]:
file_name = 'concatenated_store_descriptions.txt'

run_name = 'descriptions'

In [7]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              run_name=run_name,
              dataset=file_name,
              steps=1000,
              restore_from='fresh',   # change to 'latest' to resume training
              print_every=10,   # how many steps between printing progress
              sample_every=200,   # how many steps to print a demo sample
              save_every=500   # how many steps between saving checkpoint              
              )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.random.categorical instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Loading checkpoint models/117M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/117M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [01:06<00:00, 66.02s/it]


dataset has 11714797 tokens
Training...
[10 | 29.31] loss=2.94 avg=2.94
[20 | 54.04] loss=2.89 avg=2.92
[30 | 79.60] loss=2.66 avg=2.83
[40 | 104.15] loss=2.59 avg=2.77
[50 | 128.73] loss=2.62 avg=2.74
[60 | 153.66] loss=2.73 avg=2.74
[70 | 178.41] loss=2.80 avg=2.75
[80 | 203.07] loss=2.67 avg=2.74
[90 | 227.79] loss=2.94 avg=2.76
[100 | 252.73] loss=2.40 avg=2.72
[110 | 277.61] loss=2.72 avg=2.72
[120 | 302.44] loss=2.47 avg=2.70
[130 | 327.10] loss=2.50 avg=2.68
[140 | 351.78] loss=2.60 avg=2.68
[150 | 376.48] loss=2.60 avg=2.67
[160 | 401.14] loss=2.75 avg=2.68
[170 | 425.83] loss=2.56 avg=2.67
[180 | 450.52] loss=2.88 avg=2.68
[190 | 475.21] loss=2.78 avg=2.69
 a ????                                                                             )                                                                                                                                                                                                                                                 

In [0]:
# gpt2.copy_checkpoint_to_gdrive()

## Load a Trained Model Checkpoint

In [0]:
# gpt2.copy_checkpoint_from_gdrive()

In [0]:
# sess = gpt2.start_tf_sess()

# gpt2.load_gpt2(sess,
#                run_name=run_name)

## Generate Text From The Trained Model

In [11]:
num_samples = 3
num_batches = 3 # Unique to GPT-2, you can pass a batch_size to generate multiple samples in parallel, giving a massive speedup.

gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches)

<strong>The term “The Lost World””” means something very different than what it means in the games. In “The Lost World”, the player is a renegade explorer, and the story is told through a series of flashbacks. In “The Lost World”, the player is always searching for answers, and the story is told through a series of flashbacks. <br><br>The game is an exploration game, and the player has to find his way through a series of flashbacks to uncover the truth behind the quest. <br><br>Several different endings are provided, and the player will have to face many obstacles and obstacles to overcome. <br><br>Each level is different, and new levels are added every time the game is played. <br><br>The game features a rich world, and new levels will be added as the game progresses. <br><br>The game has a grid system, and you can move freely between them. <br><br>The difficulty of the game is simple, and the challenge is complex.
<img src="https://steamcdn-a.akamaihd.net/steam/apps/912300/extras/1.p

In [12]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              prefix='Half-Life 3 is the long-awaited sequel in the Half-Life franchise developped by Valve')

H-Life 3 is the long-awaited sequel in the Half-Life franchise developped by Valve Studios.<br><br>This is a first-person experience, fully playable with a first-person perspective. You will experience a new game environment every time you play.
Not a game, a game. It's about the series of games, with the goal of making sure the players are satisfied. <br />
<br />
The game is set in a city, surrounded by a mysterious war and the war has already begun. You are the pilot of a spaceship that is sent to the war area of a foreign planet. You are sent, along with dozens of other people, to investigate a mysterious crisis. However, you are not the only one who is at the war zone. And you are not alone.<br />
<br />
You will be involved in the investigation to learn more about the war and its aftermath. You will be able to go to different places and places, where you will have to interact with people, give them advice and interact with different personalities.<br />
<br />
The game features a

In [13]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              prefix='Spelunky 2 is the sequel of the most acclaimed rogue-like platformer of all-time')

Selunky 2 is the sequel of the most acclaimed rogue-like platformer of all-time, designed by David Beckworth and produced and co-written by David Beckworth.
<img src="https://steamcdn-a.akamaihd.net/steam/apps/855660/extras/steam_banner.png?t=1539102902" ><br><br><strong>The old school arcade shooter!</strong><br><br><img src="https://steamcdn-a.akamaihd.net/steam/apps/855660/extras/steam_banner_date_updates.png?t=1539102902" ><br><br>Wonderful retro styled visuals, and some very special effects, will keep you on your toes, and you'll have to be a bit lazy to explore the levels as they are being destroyed by deadly monsters. <br><br><img src="https://steamcdn-a.akamaihd.net/steam/apps/855660/extras/steam_banner_gameplay.png?t=1539102902" ><br><br>The original roguelike game, with a unique take on the classic arcade genre. The game is in its infancy, but its stable gameplay that makes it a perfect companion to all games of this genre.<br><br><img src="https://steamcdn-a.akamaihd.net/ste