# Sample Steam Reviews with GPT-2
Code inspired from https://github.com/woctezuma/sample-steam-reviews-with-gpt-2

## Setting the GPT-2 model

Install the Python package

Reference: https://github.com/minimaxir/gpt-2-simple

In [1]:
!pip install gpt_2_simple



Download the pre-trained model

In [0]:
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

## Downloading GPT-2

Choose between `117M` and `345M` models

In [0]:
# model_name = '117M'
model_name = '345M'

Download

In [4]:
gpt2.download_gpt2(model_name=model_name)

Fetching checkpoint: 1.00kit [00:00, 253kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 46.8Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 507kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 1.42Git [00:25, 56.7Mit/s]                                 
Fetching model.ckpt.index: 11.0kit [00:00, 2.54Mit/s]                                               
Fetching model.ckpt.meta: 927kit [00:00, 48.6Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 39.4Mit/s]                                                       


## Uploading a Text File to be Trained to Colaboratory

#### Either get the data by yourself

In [0]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/export_review_data.py

In [0]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/requirements.txt

In [0]:
!pip install -r requirements.txt

In [0]:
app_id = 583950
# app_id = 203770

num_days = 28*3 # slightly less than 3 months

In [0]:
from export_review_data import apply_workflow_for_app_id

apply_workflow_for_app_id(app_id,
                          num_days=num_days)

#### Or get a data snapshot from me

Currently only possible for Artifact, as an example, because the recommended way is to run the code above for the game of your choice instead.

In [10]:
!mkdir -p data/

## Either Artifact (recent reviews):
# !curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/data/with_delimiters/583950.txt
# !mv 583950.txt data/

## Or Crusader Kings II (all the English reviews):
# !curl -O https://raw.githubusercontent.com/wiki/woctezuma/sample-steam-reviews-with-gpt-2/data/with_delimiters/203770.txt
# !mv 203770.txt data/

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0 7151k    0  1894    0     0   4466      0  0:27:19 --:--:--  0:27:19  4456100 7151k  100 7151k    0     0  13.1M      0 --:--:-- --:--:-- --:--:-- 13.1M


## Finetune GPT-2

In [0]:
file_name = 'data/' + str(app_id) + '.txt'

run_name = model_name + '_reviews_' + str(app_id)

In [12]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              run_name=run_name,
              dataset=file_name,
              model_name=model_name,
              steps=1000,
              restore_from='fresh',   # change to 'latest' to resume training
              print_every=10,   # how many steps between printing progress
              sample_every=200,   # how many steps to print a demo sample
              save_every=500   # how many steps between saving checkpoint              
              )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.random.categorical instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Loading checkpoint checkpoint/345M_reviews_203770/model-2000
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from checkpoint/345M_reviews_203770/model-2000


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:09<00:00,  9.96s/it]


dataset has 1767161 tokens
Training...
Saving checkpoint/345M_reviews_203770/model-2000
 embodies the same feeling as the first Crusader Kings II.
The learning curve is massive, with almost 200 hours sunk into this. If you've ever wanted to build an empire from nothing and then become a ruler of the most populous region in all of Europe, it's the perfect game.<|endoftext|>
<|startoftext|>This game is my favourite game on Steam. The sheer depth that you can put in to it; the fact that this game keeps repeating itself, makes it a real gem to play. I highly recommend purchasing the Collection to complete your collection, and there are many expansions still in development that will take the game even further.
There are only a handful of people that have completed at least one full game with all DLCs. Thats when I'll take it as my challenge.<|endoftext|>
<|startoftext|>So basically I have put my heart and soul into this game for over 1000 hours, but every now and then I feel like dying, so 

## Save a Trained Model Checkpoint

In [26]:
# gpt2.mount_gdrive()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [27]:
# !tar -cvf review-model-checkpoint.tar checkpoint/345M_reviews_203770/

checkpoint/345M_reviews_203770/
checkpoint/345M_reviews_203770/events.out.tfevents.1557293346.a0969606a7f4
checkpoint/345M_reviews_203770/model-5000.index
checkpoint/345M_reviews_203770/hparams.json
checkpoint/345M_reviews_203770/counter
checkpoint/345M_reviews_203770/encoder.json
checkpoint/345M_reviews_203770/checkpoint
checkpoint/345M_reviews_203770/model-5000.data-00000-of-00001
checkpoint/345M_reviews_203770/model-5000.meta
checkpoint/345M_reviews_203770/events.out.tfevents.1557298585.a0969606a7f4
checkpoint/345M_reviews_203770/vocab.bpe


In [0]:
# !scp review-model-checkpoint.tar '/content/drive/My Drive/'

## Load a Trained Model Checkpoint

In [0]:
# gpt2.mount_gdrive()

In [0]:
# !scp '/content/drive/My Drive/review-model-checkpoint.tar' .

In [0]:
# !tar -xvf review-model-checkpoint.tar

## Generate Text From The Trained Model

In [0]:
temperature=0.7 # Default is 0.7, but you may want to increase the temperature, especially if your dataset is small, to avoid copying text.

num_samples = 3
num_batches = 3 # Unique to GPT-2, you can pass a batch_size to generate multiple samples in parallel, giving a massive speedup.

In [21]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature)              

- I had my good friend teaching me how to play, he was really helpful
- I killed my brother for a mistress
- I seduced my sister
- I seduced my mother
- I seduced my sister- I seduced my mother- I seduced my brother
- I seduced my brother- The Pope
- I seduced my mother- I seduced my sister- I seduced my mother- I seduced my sister- My sister- My sister- My sister- I seduced my sister- My sister- I seduced my sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sister- My sist

In [22]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>I love',
              truncate='<|endoftext|>')

<|startoftext|>I love this game, I love the amount of development they put into it, and I do love the amount of fun it has. But I have to warn you: This game has a steep learning curve. Take it slow, and don't let the learning process overwhelm you.
8/10

<|startoftext|>I love this game. and i'm just so confused about all the little things that happen. i'm like "i'm kinda good at this game but......something just doesnt feel right" and it wont let me do something simple like move an army. i have to restart the game all over again.

<|startoftext|>I love this game. It is so much fun to play.  There are so many different ways to play the game and the devs are still coming up with new stuff.  I have played the Game of Thrones mod and it is amazing.  It is a great game to play with your friends.     You can make your own character, or you can import one from the Game of Thrones mod.  If you are into strategy games and want more detail then this is the game for you.  It has a large learning

In [23]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>I hate',
              truncate='<|endoftext|>')              

<|startoftext|>I hate this game. I've wasted 2,000+ hours of my life on it. I'm completely useless. I'm bored. I don't know what my doing. I don't know how to do anything. I started it because my buddy wanted me to play as a Irish lord. I've spent every waking moment trying to make this happen. I've turned it off. I'm sure I'll come back and conquer the British Isles as the Spanish. I don't care.
I love this game so much. I wish I had started it earlier. I'm sure I can teach you how to play. But I can't force you to start as a legendary lord of a foreign country and struggle your way to the top. It would be pointless. I'm sure I'm capable of starting a small independent count in Iceland and trying to carve out a new country. But I can't force you to start out Iceland as the King of Iceland and then go to war with a nearby country to claim its land.
I'm not a very good player. I feel like I'm wasting my money. I hope I never play this again.
(Visited 2,736 times, 1 visit today)
Most vis

In [24]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>Please',
              truncate='<|endoftext|>')

<|startoftext|>Please note that this game is not for the faint of heart. It is very steep and very complicated with very little guidance.  You will spend most of the game staring at tiny buttons and waiting for something to happen.  Even after spending several hours playing this game you will still learn very little.  I recommend starting with a free trial and trying the game out for a while before committing to a buying a dlc.  Try it out and decide whether you like it or not.  At the end of the day, I would say that a dlc is not needed unless you really like the game and want to support the developer.  If you do not like the game, just disable the dlc and continue playing your game normally.

<|startoftext|>Please help! I'm in my late teens and I'm barely an Irish Lord. My wife is pregnant with two sons. Please help! I'm depressed. Please help! I'm insane. Please help! I'm dying. Please help! I'm suffering from insomnia. Please help! I'm stressed. Please help! I'm insane. Please help

In [25]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>This game has near infinite replay value',
              truncate='<|endoftext|>')              

<|startoftext|>This game has near infinite replay value. If you love any of the other grand strategy games, you'll love this one. There are no real guides to follow, so you can play however you like. If you're looking for a good strategy game, look elsewhere.

<|startoftext|>This game has near infinite replay value.   If you're a fan of grand strategy, this is a must-buy.    The learning curve is steep, but there are many good YouTube tutorials to help you along.  10/10

<|startoftext|>This game has near infinite replay value. You can play as a lowly count and rise to become King of England or you can play as a powerful Emperor and wage a variety of wars to expand your realm. You can even convert your save to EU4 and continue your gameplay in CK3.
Pros:
-Good mix of RNG and Probability based features.
-Complexity (in a good way) and replayability.
-Interesting and diverse feature set.
-Fun random playstyle.
-Random and unpredictable events and traits.
Cons:
-Expensive DLC.
-Casus Beli 