# Sample Steam Reviews with GPT-2
Code inspired from https://github.com/woctezuma/sample-steam-reviews-with-gpt-2

## Setting the GPT-2 model

Install the Python package

Reference: https://github.com/minimaxir/gpt-2-simple

In [1]:
!pip install gpt_2_simple



Download the pre-trained model

In [0]:
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

## Downloading GPT-2

Choose between `117M` and `345M` models

In [0]:
# model_name = '117M'
model_name = '345M'

Download

In [4]:
gpt2.download_gpt2(model_name=model_name)

Fetching checkpoint: 1.00kit [00:00, 292kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 40.5Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 609kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 1.42Git [00:23, 59.2Mit/s]                                 
Fetching model.ckpt.index: 11.0kit [00:00, 3.06Mit/s]                                               
Fetching model.ckpt.meta: 927kit [00:00, 42.2Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 38.6Mit/s]                                                       


## Uploading a Text File to be Trained to Colaboratory

#### Either get the data by yourself

In [5]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/export_review_data.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  7198  100  7198    0     0  43361      0 --:--:-- --:--:-- --:--:-- 43361


In [6]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/requirements.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100    37  100    37    0     0    253      0 --:--:-- --:--:-- --:--:--   253


In [7]:
!pip install -r requirements.txt



In [0]:
# app_id = 583950 # Artifact
app_id = 369990

# num_days = 28*3 # slightly less than 3 months
num_days = -1 # no time limit if negative

In [9]:
from export_review_data import apply_workflow_for_app_id

apply_workflow_for_app_id(app_id,
                          num_days=num_days)

[appID = 369990] expected #reviews = 856
#reviews = 856
Filtering out reviews which were not written in english.
#reviews = 856
Filtering out reviews with strictly fewer than 150 characters.
#reviews = 504
Filtering out reviews which were not detected as written in en.
#reviews = 501


#### Or get a data snapshot from me

Currently only possible for Artifact, as an example, because the recommended way is to run the code above for the game of your choice instead.

In [0]:
!mkdir -p data/

## Either Artifact (recent reviews):
# !curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/data/with_delimiters/583950.txt
# !mv 583950.txt data/

## Or Crusader Kings II (all the English reviews):
# !curl -O https://raw.githubusercontent.com/wiki/woctezuma/sample-steam-reviews-with-gpt-2/data/with_delimiters/203770.txt
# !mv 203770.txt data/

## Finetune GPT-2

In [0]:
file_name = 'data/' + str(app_id) + '.txt'

run_name = model_name + '_reviews_' + str(app_id)

In [12]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              run_name=run_name,
              dataset=file_name,
              model_name=model_name,
              steps=1000,
              restore_from='fresh',   # change to 'latest' to resume training
              print_every=10,   # how many steps between printing progress
              sample_every=200,   # how many steps to print a demo sample
              save_every=500   # how many steps between saving checkpoint              
              )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.random.categorical instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Loading checkpoint models/345M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/345M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  1.43it/s]


dataset has 92284 tokens
Training...
[10 | 22.46] loss=2.77 avg=2.77
[20 | 37.61] loss=2.97 avg=2.87
[30 | 52.90] loss=2.85 avg=2.86
[40 | 68.37] loss=2.29 avg=2.72
[50 | 83.96] loss=2.05 avg=2.58
[60 | 99.61] loss=2.24 avg=2.52
[70 | 115.40] loss=2.76 avg=2.56
[80 | 131.28] loss=2.00 avg=2.49
[90 | 147.19] loss=2.42 avg=2.48
[100 | 163.11] loss=2.55 avg=2.49
[110 | 179.09] loss=3.25 avg=2.56
[120 | 195.12] loss=2.45 avg=2.55
[130 | 211.21] loss=1.63 avg=2.47
[140 | 227.35] loss=1.86 avg=2.43
[150 | 243.55] loss=1.49 avg=2.36
[160 | 259.81] loss=2.78 avg=2.39
[170 | 276.12] loss=2.77 avg=2.41
[180 | 292.48] loss=2.32 avg=2.41
[190 | 308.87] loss=2.99 avg=2.44
[200 | 325.27] loss=1.83 avg=2.41
comb] <|endoftext|>
<|startoftext|>I was never one to get along, but when two friends decided to make a game together I was hooked and completely smitten.
The gameplay is complex, dynamic, and a pleasure to play. There's a ton of options that all influence the course of the game (which may affect 

## Save a Trained Model Checkpoint

In [0]:
# gpt2.mount_gdrive()

In [0]:
# !tar -cvf review-model-checkpoint.tar checkpoint/345M_reviews_583950/

In [0]:
# !scp review-model-checkpoint.tar '/content/drive/My Drive/'

## Load a Trained Model Checkpoint

In [0]:
# gpt2.mount_gdrive()

In [0]:
# !scp '/content/drive/My Drive/review-model-checkpoint.tar' .

In [0]:
# !tar -xvf review-model-checkpoint.tar

## Generate Text From The Trained Model

In [0]:
temperature=0.7 # Default is 0.7, but you may want to increase the temperature, especially if your dataset is small, to avoid copying text.

num_samples = 3
num_batches = 3 # Unique to GPT-2, you can pass a batch_size to generate multiple samples in parallel, giving a massive speedup.

In [20]:
gen_texts_A = gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,              
              temperature=temperature,
              return_as_list=True)

print('\n\n--- SEPARATOR ---\n\n'.join(gen_texts_A))

Warden of the broken world
- 1/3 luck
- 1/3 skill
- 1/3 cards to be used
- 1/3 victory points
- 1/3 money
- 1/3 votes
- 1/3 captures
- 1/3 assassinations
- 1/3 captures and detentions
- 1/3 executions
- 1/3 gambling
- 1/3 court appearances
- 1/3 illegal gambling dens
- 1/3 mine cart accidents
- 1 temple
- 1 swamp
- 1 desert
- 1 mountains
- 1 secret
- 2 casinos
- 1 secret temple
- 1 court appearance
- 1 stolen car
- 1 stolen letter
- 1 court victory
- 1 court defeat
- 1 jail break
- 1 jail break and 2 jail breaks
- 1 jail time
- 3 votes
- 1 judge overturned ruling
- 1 jail time and 2x votes
- 1 jail time and 2x votes
- 1 stolen car and 1x vote
- 2x misfortunes
- 1x bankruptcy
- 10x illegal
- 1x bribe
- 1x illegal and 10x unethical
- 10x misfortunes and 1x betrayal
- 5x illegal and 5x unethical
- 10x illegal and 10x unethical
- 1x bribe and 10x misfortunes
- 2x misfortunes
- 1x betrayal
- 10X illegal and 10X unethical
- 100x illegal and 100x unethical
- 1x bribe and 10x misfortunes
- 2x 

In [21]:
gen_texts_B = gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>I love',
              truncate='<|endoftext|>',
              return_as_list=True)

print('\n\n--- SEPARATOR ---\n\n'.join(gen_texts_B))

<|startoftext|>I love board games and this one is legend!
Great art style, great sounds and loads of fun.
You can customize the game to your liking, different game modes, different board designs, and even have some characters that win based on dice rolls... so long as they are not the same as yours.
You can also recruit other players to your team and play team mode, which is very welcomed news as other than in-game sound effects you can't get anywhere else.
The game also has a nice community, and even though it is in Chinese, it is still very active.
So if you like board games and im sure you do, this one is a must play!
Buy it and come back for more!

--- SEPARATOR ---

<|startoftext|>I love this game when I just want to chill and listen to music. When I need a break from playing the computer game I just head to the music shop and there  I pay full price and wait 20 minutes for it to load...nevermind waiting for 15 minutes for a normal game...still recommend this game to my friends.
S

In [22]:
gen_texts_C = gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>I hate',
              truncate='<|endoftext|>',
              return_as_list=True)

print('\n\n--- SEPARATOR ---\n\n'.join(gen_texts_C))

<|startoftext|>I hate this game. Not because it's poorly designed, more that it's poorly implemented. The interface is strange, the cards are inconsistent, and most of all, the gameplay revolves around scheming, stealing, and essentially being a bad guy to the bitter end. While I understand that the motivation for playing the game is to lose, and while some of that motivation may be good, it's very rare that you can actually achieve that by playing the game and winning versus the AI. If you can't, you may as well be playing chess.

--- SEPARATOR ---

<|startoftext|>I hate board games. I really, really hate them. And I'm not even the best player on the board. But I'm writing this so 1. that at least some of my fellow players/audience members can learn about it, and 2. that at least some of my fellow players/audience members can be put in jail by it. And believe me, I've been put in jail multiple times my entire life. But I always end up winning anyway, because luck's like that. Anyway, 

In [23]:
gen_texts_D = gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>Please',
              truncate='<|endoftext|>',
              return_as_list=True)

print('\n\n--- SEPARATOR ---\n\n'.join(gen_texts_D))

<|startoftext|>Please do not redistribute without my express written permission.
This project was made possible because of the generous support from the following people:
Aaron Yudkowsky - Creator of 'Gremlins, Inc.'
Aaron Powell - Co-creator of 'Pillars of Eternity'
Andy Gavin - Co-creator of 'Gremlins, Inc.'
Brian Crecente - Co-creator of 'Gremlins, Inc.'
Brian Greene - Co-creator of 'Gremlins, Inc.'
Brian Lang - Co-creator of 'Gremlins, Inc.'
Chris Avellone - Creator of 'Super Smash Bros. Melee'
Chris Pizzello - Co-creator of 'Super Smash Bros. Melee'
Dan Greenawalt - Co-creator of 'Gremlins, Inc.'
Dave Gross - Co-creator of 'Gremlins, Inc.'
Eduard Klimov - Co-creator of 'Gremlins, Inc.'
Fatal1ty - Co-creator of 'StarCraft II'
Gremlin - Co-creator of 'StarCraft II'
Havoc - Co-creator of 'StarCraft II'
Iris - Co-creator of 'StarCraft II'
Jaedong - Co-creator of 'StarCraft II'
KuroKy - Co-creator of 'StarCraft II'
Luminosity - Co-creator of 'StarCraft II'
Mvp - Co-creator of 'StarCraf

In [24]:
gen_texts_E = gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>This game has near infinite replay value',
              truncate='<|endoftext|>',
              return_as_list=True)

print('\n\n--- SEPARATOR ---\n\n'.join(gen_texts_E))

<|startoftext|>This game has near infinite replay value if you like board games. I personally don't have much time to play multiplayer but if you do, I can recommend this game as a really fun and possibly challenging game. 10/10.

--- SEPARATOR ---

<|startoftext|>This game has near infinite replay value if you like this genre of game: teaming, cards,islands, money,islands, court, casinos, factories, and so on. The ability to team with a friend to try to win the game makes this a very good multiplayer strategy! The gameplay itself is rather deep and it's not always obvious which routes to take, but you have to pay attention when you pass certain spots. The cost/benefit analysis is rather complex, but bear with me. It's not impossible, but it's not that hard. There are some really solid mechanics, and the game always seems to be in good spirit. Though I would like to have more variety in my team mateships, so I can play with more women, so I can have a good time overall.
This game has a

## Copy the Generated Text to Google Drive

In [0]:
output_file_name = 'output_' + str(app_id) + '.md'

In [0]:
with open(output_file_name, 'w') as f:
  
  f.write('## Game\n\n')
  f.write('[<img alt="game name" src="https://steamcdn-a.akamaihd.net/steam/apps/{}/header.jpg" width="150">](https://store.steampowered.com/app/{})\n\n'.format(app_id, app_id))
  
  f.write('## Reviews generated unconditionally\n\n')
  for (i, gen_text) in enumerate(gen_texts_A):
    f.write('{}.\n\n'.format(i+1))
    f.write('> {}\n\n'.format(gen_text))
    
  f.write('## Reviews starting with I love\n\n')
  for (i, gen_text) in enumerate(gen_texts_B):
    f.write('{}.\n\n'.format(i+1))
    f.write('> {}\n\n'.format(gen_text))
    
  f.write('## Reviews starting with I hate\n\n')    
  for (i, gen_text) in enumerate(gen_texts_C):
    f.write('{}.\n\n'.format(i+1))
    f.write('> {}\n\n'.format(gen_text))
  
  f.write('## Reviews starting with Please\n\n')  
  for (i, gen_text) in enumerate(gen_texts_D):
    f.write('{}.\n\n'.format(i+1))
    f.write('> {}\n\n'.format(gen_text))
  
  f.write('## Reviews starting with This game has near infinite replay value\n\n')  
  for (i, gen_text) in enumerate(gen_texts_E):
    f.write('{}.\n\n'.format(i+1))
    f.write('> {}\n\n'.format(gen_text))
   

In [27]:
gpt2.mount_gdrive()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [28]:
import shutil

shutil.copyfile(output_file_name, '/content/drive/My Drive/' + output_file_name)

'/content/drive/My Drive/output_369990.md'