# Sample Steam Reviews with GPT-2
Code inspired from https://github.com/woctezuma/sample-steam-reviews-with-gpt-2

## Setting the GPT-2 model

Install the Python package

Reference: https://github.com/minimaxir/gpt-2-simple

In [1]:
!pip install gpt_2_simple



Download the pre-trained model

In [0]:
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

## Downloading GPT-2

Choose between `117M` and `345M` models

In [0]:
# model_name = '117M'
model_name = '345M'

Download

In [4]:
gpt2.download_gpt2(model_name=model_name)

Fetching checkpoint: 1.00kit [00:00, 731kit/s]                                                      
Fetching encoder.json: 1.04Mit [00:00, 40.7Mit/s]                                                   
Fetching hparams.json: 1.00kit [00:00, 581kit/s]                                                    
Fetching model.ckpt.data-00000-of-00001: 1.42Git [00:21, 64.8Mit/s]                                 
Fetching model.ckpt.index: 11.0kit [00:00, 2.49Mit/s]                                               
Fetching model.ckpt.meta: 927kit [00:00, 32.7Mit/s]                                                 
Fetching vocab.bpe: 457kit [00:00, 38.9Mit/s]                                                       


## Uploading a Text File to be Trained to Colaboratory

#### Either get the data by yourself

In [5]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/export_review_data.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  7198  100  7198    0     0   125k      0 --:--:-- --:--:-- --:--:--  125k


In [6]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/requirements.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100    37  100    37    0     0   1275      0 --:--:-- --:--:-- --:--:--  1275


In [7]:
!pip install -r requirements.txt



In [0]:
app_id = 583950
# app_id = 203770

num_days = 28*3 # slightly less than 3 months

In [0]:
from export_review_data import apply_workflow_for_app_id

apply_workflow_for_app_id(app_id,
                          num_days=num_days)

#### Or get a data snapshot from me

Currently only possible for Artifact, as an example, because the recommended way is to run the code above for the game of your choice instead.

In [0]:
!mkdir -p data/

## Either Artifact (recent reviews):
# !curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews-with-gpt-2/master/data/with_delimiters/583950.txt
# !mv 583950.txt data/

## Or Crusader Kings II (all the English reviews):
# !curl -O https://raw.githubusercontent.com/wiki/woctezuma/sample-steam-reviews-with-gpt-2/data/with_delimiters/203770.txt
# !mv 203770.txt data/

## Finetune GPT-2

In [0]:
file_name = 'data/' + str(app_id) + '.txt'

run_name = model_name + '_reviews_' + str(app_id)

In [12]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              run_name=run_name,
              dataset=file_name,
              model_name=model_name,
              steps=1000,
              restore_from='fresh',   # change to 'latest' to resume training
              print_every=10,   # how many steps between printing progress
              sample_every=200,   # how many steps to print a demo sample
              save_every=500   # how many steps between saving checkpoint              
              )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Use tf.random.categorical instead.
Instructions for updating:
Use tf.cast instead.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
Loading checkpoint checkpoint/345M_reviews_203770/model-5000
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from checkpoint/345M_reviews_203770/model-5000


  0%|          | 0/1 [00:00<?, ?it/s]

Loading dataset...


100%|██████████| 1/1 [00:09<00:00,  9.95s/it]


dataset has 1767161 tokens
Training...
Saving checkpoint/345M_reviews_203770/model-5000
of tumblr and twitter sooo much stuff :3<|endoftext|>
<|startoftext|>Great Game. Great Game Review.  But beware that there is a high learning curve and you MUST watch some YouTube let's plays of the game before you do anything.
Best game I never played. Would have to kill my wife and marry my cousin again.<|endoftext|>
<|startoftext|>I would definitely recommend the game if you are into strategy games of that era. You can play as anyone from a minor vassal to the ruler of an empire. Not only can you command armies and control kingdoms, but you also have to look after your family, who will make the more interesting characters, and so on.
A lot of the game is scripted, but that's part of the fun, and there are some great mods available, such as the AGOT and After the End mod, as well as fan made mods, such as the game of thrones mod.
There's also a Game of Thrones fan made mod available that will recr

## Save a Trained Model Checkpoint

In [25]:
# gpt2.mount_gdrive()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [26]:
# !tar -cvf review-model-checkpoint.tar checkpoint/345M_reviews_203770/

checkpoint/345M_reviews_203770/
checkpoint/345M_reviews_203770/events.out.tfevents.1557293346.a0969606a7f4
checkpoint/345M_reviews_203770/hparams.json
checkpoint/345M_reviews_203770/counter
checkpoint/345M_reviews_203770/model-6000.index
checkpoint/345M_reviews_203770/encoder.json
checkpoint/345M_reviews_203770/checkpoint
checkpoint/345M_reviews_203770/model-6000.meta
checkpoint/345M_reviews_203770/model-6000.data-00000-of-00001
checkpoint/345M_reviews_203770/events.out.tfevents.1557298585.a0969606a7f4
checkpoint/345M_reviews_203770/vocab.bpe
checkpoint/345M_reviews_203770/events.out.tfevents.1557307383.a0969606a7f4


In [0]:
# !scp review-model-checkpoint.tar '/content/drive/My Drive/'

## Load a Trained Model Checkpoint

In [0]:
# gpt2.mount_gdrive()

In [0]:
# !scp '/content/drive/My Drive/review-model-checkpoint.tar' .

In [0]:
# !tar -xvf review-model-checkpoint.tar

## Generate Text From The Trained Model

In [0]:
temperature=0.7 # Default is 0.7, but you may want to increase the temperature, especially if your dataset is small, to avoid copying text.

num_samples = 3
num_batches = 3 # Unique to GPT-2, you can pass a batch_size to generate multiple samples in parallel, giving a massive speedup.

In [20]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature)              

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        

In [21]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>I love',
              truncate='<|endoftext|>')

<|startoftext|>I love this game, I have put over 300 hours into it. It is one of my most played games, but I do not play it enough to recommend it.
This game is, in my opinion, one of the least developed games out there. The whole game is a spreadsheet with pretty much just a map and stats.  You move your army around the world trying to keep up with the rapid changes in technology.  The game doesn't really have a story or a character development, it is more like a giant sim where the player simulates the rise and fall of a dynasty over hundreds of years.  The only story that really happens is the one you create for yourself.  If you want to try creating your own story, and expand your empire, and lose everything because your dynasty is fucking screwed up, that is the story.
I do not reccomend this game to anyone.  There are too many DLCs and each of them costs too much.  I have bought some DLC, but I only have about 30 hours into the game so I cannot say how the full experience is like

In [22]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>I hate',
              truncate='<|endoftext|>')              

<|startoftext|>I hate how some of the game comments refer to you as just "CK2", insinuating that you are anything but a humble king. This game is nothing like any strategy games you have played, and even playing as a lowly count is a challenge. There are so many facets to this game, and it is one of the most complex games I have ever played. Even if you somehow manage to become the Emperor of Byzantium, it will still be a game of strategy.
10/10

<|startoftext|>I hate to give bad reviews for a game that is really good, but the DLC policy for all of paradox's games really makes this a bad game. All of the DLC for this game are really overpriced and really change the game (of which I only example) in a bad way. I can not recommend this game at all, I would rather get one for free.

<|startoftext|>I hate to give negative reviews for a game that I, personally, love, but it has cost me a ton of money and time to enjoy it.  I have to admit, after the base game was released, I was a bit under

In [23]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>Please',
              truncate='<|endoftext|>')

<|startoftext|>Please note that the game has a HUGE learning curve. If you want to play this game, you have to put in many hours and DO a LOT of reading. If you want a game that lets you kill a family member, to take their place as king of a huge empire, to run a kingdom that covers all of Europe and northern Africa, to design your own character and history, it's not for you. If you do buy it, you will get an addiction like no other game.

<|startoftext|>Please make a sequel to this gem. The constant updates and DLCs for it is what makes it great. I'm not a fan of DLC, but I know many people who think it's worth it. I think it's more than worth it, though I think I'd wait for a sale before getting it.

<|startoftext|>Please make a better medieval politics game. I spent a couple hundred hours in EU4 and I'm not bored. Please make a CK3.



In [24]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              prefix='<|startoftext|>This game has near infinite replay value',
              truncate='<|endoftext|>')              

<|startoftext|>This game has near infinite replay value. It is more about the story you create than it is about military conquest.
I have played for hundreds of hours and I still feel like I have a lot to learn. If you are a history buff, this game is essential.
The mechanics are hard to master, but once you get over the learning curb, it will be a ton of fun.
The graphics are nothing spectacular, but they are functional.
9/10
OVERALL REVIEW: 8.5/10
GRAPHICS: 8/10
CREATION MECHANISM: 7/10
INTEGRATION: 8/10
LENGTH: 9/10
CREATION MECHANISM: 9/10
PUNISHMENT: 9/10
REASON: 8/10
REASONABLE: 9/10
RATING: 8.5/10
[b]OVERALL SCORE: 8.5/10[/b]

<|startoftext|>This game has near infinite replay value. I have hundreds of hours invested and still finding new ways to play. If you are a fan of grand strategy this is definitely a must have. I rate it a 8/10

<|startoftext|>This game has near infinite replay value, and with the great game of thrones mod this game can be even better. It can be hard to le