# Sample Steam Reviews with GPT-2
Code inspired from https://github.com/woctezuma/sample-steam-reviews


**Caveat:** a more recent version is being developped at https://github.com/woctezuma/sample-steam-reviews-with-gpt-2

As of May 20th, 2019, the main difference with the version shown here is the use of delimiters (`<|startoftext|>` and `<|endoftext|>`).

## Setting the GPT-2 model

Install the Python package

Reference: https://github.com/minimaxir/gpt-2-simple

In [1]:
!pip install gpt_2_simple

Collecting gpt_2_simple
  Downloading https://files.pythonhosted.org/packages/bc/7d/1ea4c2a54ecdda5e57e45686e5cdf1ccc45809841ab50c89bc63638c5553/gpt_2_simple-0.5.tar.gz
Collecting toposort (from gpt_2_simple)
  Downloading https://files.pythonhosted.org/packages/e9/8a/321cd8ea5f4a22a06e3ba30ef31ec33bea11a3443eeb1d89807640ee6ed4/toposort-1.5-py2.py3-none-any.whl
Building wheels for collected packages: gpt-2-simple
  Building wheel for gpt-2-simple (setup.py) ... [?25l[?25hdone
  Stored in directory: /root/.cache/pip/wheels/0a/0d/50/166d4caecc4bb1820ce1b7d8e68ce12f9839c919a5c530cc60
Successfully built gpt-2-simple
Installing collected packages: toposort, gpt-2-simple
Successfully installed gpt-2-simple-0.5 toposort-1.5


Download the pre-trained model

In [0]:
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

## Downloading GPT-2

Choose between `117M` and `345M` models

In [0]:
# model_name = '117M'
model_name = '345M'

Download

In [4]:
gpt2.download_gpt2(model_name=model_name)

Fetching checkpoint: 1.00kit [00:00, 628kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 54.7Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 582kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 1.42Git [00:22, 63.4Mit/s]                                 
Fetching model.ckpt.index: 11.0kit [00:00, 2.04Mit/s]                                               
Fetching model.ckpt.meta: 927kit [00:00, 43.3Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 38.2Mit/s]                                                       


## Uploading a Text File to be Trained to Colaboratory

### Get a data snapshot from me

In [5]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews/master/output/583950.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3579k  100 3579k    0     0  8969k      0 --:--:-- --:--:-- --:--:-- 8969k


### AppID

Store page: https://store.steampowered.com/app/583950/Artifact/

In [0]:
app_id = 583950 # Artifact: 583950

### Pre-processing

#### Input

In [0]:
artifact_file_name = str(app_id) + '.txt'

#### Strip lines

In [8]:
with open(artifact_file_name, 'r', encoding='utf8') as f:
  lines = [line.strip() for line in f.readlines()]
  
print('#lines = {}'.format(len(lines)))

#lines = 25728


#### Remove empty lines

In [9]:
texts = [line for line in lines if len(line)>0]

print('#lines = {}'.format(len(texts)))

#lines = 17575


#### Output

In [0]:
artifact_trimmed_file_name = 'artifact.txt'

#### Save output

In [0]:
line_separator = '\n'

with open(artifact_trimmed_file_name, 'w', encoding='utf8') as f:
  print(line_separator.join(texts), file=f)

## Fine-tune GPT-2

Reference: https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce

In [0]:
file_name = artifact_trimmed_file_name

run_name = model_name + '_reviews_' + str(app_id)

In [13]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              run_name=run_name,
              dataset=file_name,
              model_name=model_name,              
              steps=1000,
              restore_from='fresh', # change to 'latest' to resume training
              print_every=10,       # how many steps between printing progress
              sample_every=200,     # how many steps to print a demo sample
              save_every=500        # how many steps between saving checkpoint              
              )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.random.categorical instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Loading checkpoint models/345M/model.ckpt
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from models/345M/model.ckpt


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:05<00:00,  5.97s/it]


dataset has 834870 tokens
Training...
[10 | 24.58] loss=3.08 avg=3.08
[20 | 39.51] loss=3.25 avg=3.17
[30 | 54.62] loss=3.18 avg=3.17
[40 | 69.86] loss=3.66 avg=3.29
[50 | 85.19] loss=3.34 avg=3.30
[60 | 100.63] loss=3.31 avg=3.30
[70 | 116.17] loss=3.08 avg=3.27
[80 | 131.85] loss=2.92 avg=3.23
[90 | 147.61] loss=3.28 avg=3.23
[100 | 163.42] loss=3.22 avg=3.23
[110 | 179.30] loss=3.20 avg=3.23
[120 | 195.29] loss=3.37 avg=3.24
[130 | 211.39] loss=3.18 avg=3.24
[140 | 227.52] loss=3.32 avg=3.24
interrupted
Saving checkpoint/345M_reviews_583950/model-147


## Generate Text From The Trained Model

In [0]:
temperature=1.0 # Default is 0.7, but you may want to increase the temperature, especially if your dataset is small, to avoid copying text.
top_k = 40      # Default: 0   ; Recommended: 40  ; useless parameter if top_p > 0.0
top_p = 0.9     # Default: 0.0 ; Recommended: 0.9 ; no need for top_k if top_p > 0.0

In [0]:
num_samples = 3
num_batches = 3 # Unique to GPT-2, you can pass a batch_size to generate multiple samples in parallel, giving a massive speedup.

In [16]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              top_k=top_k,
              top_p=top_p,                            
              prefix='I love Artifact')

I love Artifact so much. But I find it really boring to play with a deck because you have 3 cards and you have to do the best to get all cards you need (it's pretty luck and the weaker one is allowed to go first). I can see the game being fun, the gameplay is nice. There is always "RNG" in the game.
I dont want to buy in at all until I'm satisfied with it. Its so expensive right now and I've just been really impatient. It's very common for me to find my round three full of RNG, and now I'm finding that really boring and unfun.
I hate RNG in card games.
Its too bad that there is no free way to win cards. There was a "free 3 packs" like deck for free in HS but now i would need to pay 5 bucks for every 7-8 heroes. If a card had to cost $20 to have a "free" deck then thats just not possible.
If you really want to play the game then get in the game now because there are so many awesome new features in it that you will love it.
If you want to buy the starter deck then get in the game now bec

In [17]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              top_k=top_k,
              top_p=top_p,                            
              prefix='I hate Artifact')

I hate Artifact (especially the promo card sets) but I also get that Artifact does not have any free way to earn cards.
I would definitely recommend getting this game without playing the paid expansions (although you can still enjoy constructed at a reasonable price if you want to!). I would say a $20 price point for Artifact is a bit high, but you do not need to pay for a full set, you can get most of the basic cards at that price.
The game plays very well, and there is a lot of depth. You can purchase cards to create constructed decks, build your own competitive decks, and that is not a chore. Constructed is a $20 game, while those competitive decks are actually pretty fun to play. If you are looking for an epic card game and the pricing model is something you like, you might be better off spending the $20. It is still super fun, I have just bought all the cards I need (including the one I want to get for $5).
I have had great fun playing this game, I think you can easily spend $20 o

In [18]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              top_k=top_k,
              top_p=top_p,                            
              prefix='Please, Valve, ')

P, Valve,  if this game is called some bullshit and give the game a break, people. Screw your wankers.
Oh wait, again...people, you don't even have to pay to buy the game, everyone who are buying this game are paying for the game because you made a game and now you think that the game is profitable? Then play now or you will be paying for it.
The worst thing you can say about this game is that there are several games I want in my collection. The worst thing about this game is that it cost me one more$ to get the game (one time), than I wanted it in every game that I've been playing in the past.
This game is a rip off. I have spent so many hours on this game for so many reasons.
The reviews are really saying that the game is not well designed but I'll tell you why it's a rip off. The gameplay is really well designed but this game has issues with the game balance right now.
So much the obvious: - There is no progression. When you buy the game, you have to spend 20$.
- If you get 3 wins i