# Sample Steam Reviews with GPT-2
Code inspired from https://github.com/woctezuma/sample-steam-reviews

**Caveat:** a more recent version is developped at https://github.com/woctezuma/sample-steam-reviews-with-gpt-2

## Setting the GPT-2 model

Install the Python package

Reference: https://github.com/minimaxir/gpt-2-simple

In [0]:
!pip install gpt_2_simple

Download the pre-trained model

In [0]:
import gpt_2_simple as gpt2
from datetime import datetime
from google.colab import files

## Downloading GPT-2

Choose between `117M` and `345M` models

In [0]:
# model_name = '117M'
model_name = '345M'

Download

In [0]:
gpt2.download_gpt2(model_name=model_name)

## Uploading a Text File to be Trained to Colaboratory

### Get a data snapshot from me

In [0]:
!curl -O https://raw.githubusercontent.com/woctezuma/sample-steam-reviews/master/output/583950.txt

### AppID

Store page: https://store.steampowered.com/app/583950/Artifact/

In [0]:
app_id = 583950 # Artifact: 583950

### Pre-processing

#### Input

In [0]:
artifact_file_name = str(app_id) + '.txt'

#### Strip lines

In [0]:
with open(artifact_file_name, 'r', encoding='utf8') as f:
  lines = [line.strip() for line in f.readlines()]
  
print('#lines = {}'.format(len(lines)))

#### Remove empty lines

In [0]:
texts = [line for line in lines if len(line)>0]

print('#lines = {}'.format(len(texts)))

#### Output

In [0]:
artifact_trimmed_file_name = 'artifact.txt'

#### Save output

In [0]:
line_separator = '\n'

with open(artifact_trimmed_file_name, 'w', encoding='utf8') as f:
  print(line_separator.join(texts), file=f)

## Fine-tune GPT-2

Reference: https://colab.research.google.com/drive/1VLG8e7YSEwypxU-noRNhsv5dW4NfTGce

In [0]:
file_name = artifact_trimmed_file_name

run_name = model_name + '_reviews_' + str(app_id)

In [0]:
sess = gpt2.start_tf_sess()

gpt2.finetune(sess,
              run_name=run_name,
              dataset=file_name,
              model_name=model_name,              
              steps=1000,
              restore_from='fresh', # change to 'latest' to resume training
              print_every=10,       # how many steps between printing progress
              sample_every=200,     # how many steps to print a demo sample
              save_every=500        # how many steps between saving checkpoint              
              )

## Generate Text From The Trained Model

In [0]:
temperature=1.0 # Default is 0.7, but you may want to increase the temperature, especially if your dataset is small, to avoid copying text.
top_k = 40      # Default: 0   ; Recommended: 40  ; useless parameter if top_p > 0.0
top_p = 0.9     # Default: 0.0 ; Recommended: 0.9 ; no need for top_k if top_p > 0.0

In [0]:
num_samples = 3
num_batches = 3 # Unique to GPT-2, you can pass a batch_size to generate multiple samples in parallel, giving a massive speedup.

In [0]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              top_k=top_k,
              top_p=top_p,                            
              prefix='I love Artifact')

In [0]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              top_k=top_k,
              top_p=top_p,                            
              prefix='I hate Artifact')

In [0]:
gpt2.generate(sess,
              run_name=run_name,
              nsamples=num_samples,
              batch_size=num_batches,
              temperature=temperature,
              top_k=top_k,
              top_p=top_p,                            
              prefix='Please, Valve, ')