# Sample Steam Reviews with GPT-2
Code inspired from https://github.com/woctezuma/sample-steam-reviews-with-gpt-2

## Setting the GPT-2 model

Install the Python package

Reference: https://github.com/minimaxir/gpt-2-simple

In [1]:
!pip install gpt_2_simple

Collecting gpt_2_simple
  Downloading https://files.pythonhosted.org/packages/10/fa/9ff4ce16abea04d2069d9065da862990b4036d85a5d061ea21ca5e441120/gpt_2_simple-0.4.2.tar.gz
Collecting toposort (from gpt_2_simple)
  Downloading https://files.pythonhosted.org/packages/e9/8a/321cd8ea5f4a22a06e3ba30ef31ec33bea11a3443eeb1d89807640ee6ed4/toposort-1.5-py2.py3-none-any.whl
Building wheels for collected packages: gpt-2-simple
  Building wheel for gpt-2-simple (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/4a/86/24/ff73926776a4da5522f7eabb4bc6d87c9932989f9df61b8c84
Successfully built gpt-2-simple
Installing collected packages: toposort, gpt-2-simple
Successfully installed gpt-2-simple-0.4.2 toposort-1.5


Download the pre-trained model

In [0]:
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

## Downloading GPT-2

Choose between `117M` and `345M` models

In [0]:
# model_name = '117M'
model_name = '345M'

Download

In [4]:
gpt2.download_gpt2(model_name=model_name)

Fetching checkpoint: 1.00kit [00:00, 307kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 53.9Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 614kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 1.42Git [00:28, 49.1Mit/s]                                 
Fetching model.ckpt.index: 11.0kit [00:00, 2.57Mit/s]                                               
Fetching model.ckpt.meta: 927kit [00:00, 42.9Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 38.2Mit/s]                                                       


## Uploading a Text File to be Trained to Colaboratory

#### Either get the data by yourself

In [6]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/export_review_data.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  7198  100  7198    0     0  49986      0 --:--:-- --:--:-- --:--:-- 49986


In [7]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/requirements.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100    37  100    37    0     0    262      0 --:--:-- --:--:-- --:--:--   262


In [8]:
!pip install -r requirements.txt

Collecting steamreviews==0.8.0 (from -r requirements.txt (line 1))
  Downloading https://files.pythonhosted.org/packages/c9/2c/556162233faa4c854f66d5f3e4a4495dc294c72e897711aa83c6fa742a86/steamreviews-0.8.0-py3-none-any.whl
Collecting langdetect==1.0.7 (from -r requirements.txt (line 2))
[?25l  Downloading https://files.pythonhosted.org/packages/59/59/4bc44158a767a6d66de18c4136c8aa90491d56cc951c10b74dd1e13213c9/langdetect-1.0.7.zip (998kB)
[K     |████████████████████████████████| 1.0MB 3.4MB/s 
Building wheels for collected packages: langdetect
  Building wheel for langdetect (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/ec/0c/a9/1647275e7ef5014e7b83ff30105180e332867d65e7617ddafe
Successfully built langdetect
Installing collected packages: steamreviews, langdetect
Successfully installed langdetect-1.0.7 steamreviews-0.8.0


In [0]:
app_id = 583950

num_days = 28*3 # slightly less than 3 months

In [10]:
from export_review_data import apply_workflow_for_app_id

apply_workflow_for_app_id(app_id,
                          num_days=num_days)

[appID = 583950] expected #reviews = 8712
#reviews = 341
Filtering out reviews which were not written in english.
#reviews = 341
Filtering out reviews with strictly fewer than 150 characters.
#reviews = 129
Filtering out reviews which were not detected as written in en.
#reviews = 127


#### Or get a data snapshot from me

Currently only possible for Artifact, as an example, because the recommended way is to run the code above for the game of your choice instead.

In [0]:
# !mkdir -p data/

## Either Artifact (recent reviews):
# !curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/data/with_delimiters/583950.txt
# !mv 583950.txt data/

## Or Crusader Kings II (all the English reviews):
# !curl -O https://raw.githubusercontent.com/wiki/woctezuma/sample-steam-reviews-with-gpt-2/data/with_delimiters/203770.txt
# !mv 203770.txt data/

## Finetune GPT-2

In [0]:
file_name = 'data/' + str(app_id) + '.txt'

run_name = model_name + '_reviews_' + str(app_id)

In [13]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              run_name=run_name,
              dataset=file_name,
              model_name=model_name,
              steps=1000,
              restore_from='fresh',   # change to 'latest' to resume training
              print_every=10,   # how many steps between printing progress
              sample_every=200,   # how many steps to print a demo sample
              save_every=500   # how many steps between saving checkpoint              
              )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.random.categorical instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Loading checkpoint models/345M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/345M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:00<00:00,  4.97it/s]


dataset has 18070 tokens
Training...
[10 | 24.08] loss=3.13 avg=3.13
[20 | 39.45] loss=2.91 avg=3.02
[30 | 54.94] loss=1.83 avg=2.62
[40 | 70.59] loss=1.98 avg=2.46
[50 | 86.41] loss=1.15 avg=2.19
[60 | 102.39] loss=1.08 avg=2.00
[70 | 118.53] loss=0.56 avg=1.79
[80 | 134.65] loss=0.65 avg=1.64
[90 | 150.91] loss=0.73 avg=1.54
[100 | 167.28] loss=0.13 avg=1.39
[110 | 183.72] loss=0.85 avg=1.34
[120 | 200.25] loss=0.10 avg=1.23
[130 | 216.82] loss=0.22 avg=1.15
[140 | 233.46] loss=0.12 avg=1.07
[150 | 250.16] loss=0.18 avg=1.00
[160 | 266.92] loss=0.07 avg=0.94
[170 | 283.73] loss=0.04 avg=0.88
[180 | 300.60] loss=0.21 avg=0.84
[190 | 317.45] loss=0.11 avg=0.80
[200 | 334.24] loss=0.04 avg=0.76
 that then-candidate Trump promised "major league" crowds of fans for the convention (the last time I checked, that was probably more of a joke than anything else).
Here we are, almost there. The hard work is behind us.
Thank you to everyone who supported us from the beginning -- we couldn't have

## Save a Trained Model Checkpoint

In [0]:
checkpoint_folder = 'checkpoint/' + run_name

In [0]:
# gpt2.mount_gdrive()

In [0]:
# gpt2.copy_checkpoint_to_gdrive(checkpoint_folder=checkpoint_folder)

## Load a Trained Model Checkpoint

In [0]:
# gpt2.copy_checkpoint_from_gdrive(checkpoint_folder=checkpoint_folder)

In [0]:
# sess = gpt2.start_tf_sess()

# gpt2.load_gpt2(sess,
#                run_name=run_name)

## Generate Text From The Trained Model

In [0]:
temperature=0.7 # Default is 0.7, but you may want to increase the temperature, especially if your dataset is small, to avoid copying text.

num_samples = 3
num_batches = 3 # Unique to GPT-2, you can pass a batch_size to generate multiple samples in parallel, giving a massive speedup.

In [21]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature)              

It's hard to believe that this game had such a successful beta period, but it did. It's been almost two months and it hasn't changed a bit. The game is dead and Valve killed it since start.
There is rumors out there that it will become free to play "in a few months", i will not talk about this rumor because is a rumor, i don't want rumors from a 4Chan user, i want an official statement from Valve.
The game is dead and Valve killed it.
(ok, ok, there is other 99 wrong things but who cares, the game is dead and Valve killed it.)<|endoftext|>
<|startoftext|>This game is dead and Valve killed it.
(ok, ok, there is other 99 wrong things but who cares, the game is dead and Valve killed it.)<|endoftext|>
<|startoftext|>Dead on Arrival game. Don't buy it. It's a dead game. Do yourself a favor and pick up Slay the Spire to fill the last vestiges of nostalgia.<|endoftext|>
<|startoftext|>This game is dead and Valve killed it.
(ok, ok, there is other 99 wrong things but who cares, the game is dea

In [22]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>I love',
              truncate='<|endoftext|>')

<|startoftext|>I love Artifact. Yes, even now I'm still playing and enjoying it. The issues with the game are all (in my opinion). They are not systemic and they do not affect the gameplay. They are, however, pretty boring. I'm looking forward to whatever they do with it from here. As to the financial cost, I played Magic the Gathering for years, and I've played Hearthstone for years as well, sinking more than reasonable amounts of money into both of those games than I ever intended. Rng card packs, Three $70 expansions a year, or grind, grind, grind, dailies, grind, grind, grind, dailies etc etc. At the time of writing this, it cost less than $10 to get all of the Black cards in existence, and less or around $20 for any other colour. The market took a bit to settle, but friends and free modes were there till they did. I know several friends who made money on this system. I would hope that some kind of regional version of the game could get put out though....
In it for the long Haul!  

In [23]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>I hate',
              truncate='<|endoftext|>')              

<|startoftext|>I hate to break it to you, but this is going to be a late game match for sure. I am hoping that Gwent will evolve and do something better with it, but I am not holding my breath. Dont buy it 

<|startoftext|>I hate to break it to you, but this is going to be a late game match for sure. I do love the fact that creeps can be purchased in two different prices: normal for 1cents each, or more if you want to buy a "rare" unit.
What is worse, instead of just letting the game die and the economy die, they actually changed the base unit price of the game!  How can they do this is beyond me.  I do love the fact that creeps can be purchased in two different prices: normal for 1cents each, or more if you want to buy a "rare" unit.What is worse, instead of just letting the game die and the economy die, they actually changed the base unit price of the game!  How can they do this is beyond me.
They should've stuck to the basic creeps and creep patches. They could've done something bet

In [24]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>Please',
              truncate='<|endoftext|>')

<|startoftext|>Please save the artifact valve. I will not modify my review until Valve make a big update.
Personally to say, it is not a bad game. Since we cannot get any information about what are the fxxking valve doing to save this game. People are less and less confidential about the game. So they just left, they could back if Valve change a lot.
Please save it Valve.

<|startoftext|>Please save the artifact valve. I will not modify my review until Valve make a big update.
Personally to say, it is not a bad game. Since we cannot get any information about what are the fxxking valve doing to save this game. People are less and less confidential about the game. So they just left, they could back if Valve change a lot.
Please save it Valve.

<|startoftext|>Please update as soon as possible!



In [25]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>This game has near infinite replay value',
              truncate='<|endoftext|>')              

<|startoftext|>This game has near infinite replay value, and I've even started playing constructed right after I finish playing constructed. This is definitely a game for beginners, and I've gotten a lot out of it so far. But there are serious issues with the game that makes me very angry.
First, the market.  Valve sold the game on the premise of competitive support and tournaments and events and regular content patches, and then when people purchased their product, they wordlessly abandoned it and their audience along with it.
Second, the microtransactions.  Normally I wouldn't buy a game that cost money to play, but this is a bit steep for a card game.  But let me say that if you don't want to spend extra money in card boosters, you can buy any of the regular boosters now.
And last, the game is basically dead.  There is no communication to the player base about MAJOR overhauls Artifact desperately needs to be successful and fun.
But most importantly, the game is not pay2win.  Valve k