<a href="https://colab.research.google.com/github/sccmst/NLUModelOnColab/blob/GPT2-ContentExtension/GPT-ContentExtension-PPO/gpt2_reporter_ppo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Tune GPT2 to generate positive reviews
> Optimise GPT2 to produce positive IMDB movie reviews using a BERT sentiment classifier as a reward function.

<div style="text-align: center">
<p style="text-align: center;"> <b>Figure:</b> Experiment setup to tune GPT2. The yellow arrows are outside the scope of this notebook, but the trained models are available through Hugging Face. </p>
</div>


In this notebook we fine-tune GPT2 (small) to generate positive movie reviews based on the IMDB dataset. The model gets the start of a real review and is tasked to produce positive continuations. To reward positive continuations we use a BERT classifier to analyse the sentiment of the produced sentences and use the classifier's outputs as rewards signals for PPO training.

## Setup experiment

### Import dependencies

In [None]:
!pip install torch>=1.4.0
!pip install trl>=0.1.0
!pip install datasets>=2.7.1
!pip install transformers==4.21.1
!pip list

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers==4.21.1
  Downloading transformers-4.21.1-py3-none-any.whl (4.7 MB)
[K     |████████████████████████████████| 4.7 MB 5.1 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.6 MB)
[K     |████████████████████████████████| 6.6 MB 6.3 MB/s 
Installing collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.10.3
    Uninstalling tokenizers-0.10.3:
      Successfully uninstalled tokenizers-0.10.3
  Attempting uninstall: transformers
    Found existing installation: transformers 4.3.2
    Uninstalling transformers-4.3.2:
      Successfully uninstalled transformers-4.3.2
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of

In [None]:
import torch
import time
import os
from tqdm import tqdm
import numpy as np
import pandas as pd
tqdm.pandas()

from datasets import load_dataset

from transformers import AutoTokenizer, pipeline, AutoModelWithLMHead

from trl.gpt2 import respond_to_batch, GPT2HeadWithValueModel
from trl.ppo import PPOTrainer
from trl.core import build_bert_batch_from_txt, listify_batch

### Configuration

In [None]:
config = {
    "model_name": "theta/gpt-reporter-badplace",
    # "cls_model_name": "lvwerra/distilbert-imdb",
    "steps": 20000,
    "batch_size": 32,
    "forward_batch_size": 32,
    "ppo_epochs": 2,   
    # "txt_in_min_len": 2,
    # "txt_in_max_len": 8,
    # "txt_out_min_len": 4,
    # "txt_out_max_len": 16,
    "lr": 1.41e-5,
    "init_kl_coef":0.2,
    "target": 6,
    "horizon":10000,
    "gamma":1,
    "lam":0.95,
    "cliprange": .2,
    "cliprange_value":.2,
    "vf_coef":.1, 
}

**Forward batching**: Since the models can be fairly big and we want to rollout large PPO batches this can lead to out-of-memory errors when doing the forward passes for text generation and sentiment analysis. We introduce the parameter `forward_batch_size` to split the forward passes into smaller batches. Although this hurts performance a little this is neglectible compared to the computations of the backward passes when optimizing the model. The same parameter is used in the `PPOTrainer` when doing forward passes. The `batch_size` should multiple of `forward_batch_size`.

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
pipe_device = 0 if torch.cuda.is_available() else -1

You can see that we load a GPT2 model called `gpt2_imdb`. This model was additionally fine-tuned on the IMDB dataset for 1 epoch with the huggingface [script](https://github.com/huggingface/transformers/blob/master/examples/run_language_modeling.py) (no special settings). The other parameters are mostly taken from the original paper ["Fine-Tuning Language Models from Human Preferences"](
https://arxiv.org/pdf/1909.08593.pdf). This model as well as the BERT model is available in the Huggingface model zoo [here](https://huggingface.co/models). The following code should automatically download the models.

## Load data and models

### Load BERT classifier
We load a BERT classifier fine-tuned on the IMDB dataset.

In [None]:
sent_kwargs = {
    "return_all_scores": True,
    # "function_to_apply": "none",
    # "batch_size": config["forward_batch_size"]
}
from transformers import BertTokenizer, AutoModelForSequenceClassification

tokenizer = BertTokenizer.from_pretrained("voidful/albert_chinese_small_sentiment")
# tokenizer.eos_token = tokenizer.pad_token
model = AutoModelForSequenceClassification.from_pretrained("voidful/albert_chinese_small_sentiment")
sentiment_pipe = pipeline("sentiment-analysis", model=model,tokenizer=tokenizer,device=pipe_device)

Downloading vocab.txt:   0%|          | 0.00/107k [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/379 [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/965 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/18.1M [00:00<?, ?B/s]

The model outputs are the logits for the negative and positive class. We will use the logits for positive class as a reward signal for the language model.

In [None]:
text = '雖然美國承諾暫停向中國出售這艘航空母艦'
sentiment_pipe(text, **sent_kwargs)



[[{'label': '負面', 'score': 0.6094347238540649},
  {'label': '正面', 'score': 0.39056524634361267}]]

In [None]:
# text = 'this movie was really good!!'
# sentiment_pipe(text, **sent_kwargs)

The resulting reward signal:

### Load pre-trained GPT2 language models

We load the GPT2 model with a value head and the tokenizer. We load the model twice; the first model is optimized while the second model serves as a reference to calculate the KL-divergence from the starting point. This serves as an additional reward signal in the PPO training to make sure the optimized model does not deviate too much from the original language model.

In [None]:
gpt2_model = GPT2HeadWithValueModel.from_pretrained('theta/gpt2-reporter')
gpt2_model_ref = GPT2HeadWithValueModel.from_pretrained('theta/gpt2-reporter')

gpt2_tokenizer = AutoTokenizer.from_pretrained('uer/gpt2-chinese-cluecorpussmall')
gpt2_tokenizer.eos_token = gpt2_tokenizer.pad_token

Downloading config.json:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/401M [00:00<?, ?B/s]

Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at theta/gpt2-reporter and are newly initialized: ['v_head.summary.weight', 'v_head.summary.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Some weights of GPT2HeadWithValueModel were not initialized from the model checkpoint at theta/gpt2-reporter and are newly initialized: ['v_head.summary.weight', 'v_head.summary.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Downloading tokenizer_config.json:   0%|          | 0.00/217 [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/577 [00:00<?, ?B/s]

Downloading vocab.txt:   0%|          | 0.00/107k [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

### Move models to GPU

If `cuda` is available move the computations to the GPU.

In [None]:
gpt2_model.to(device);
gpt2_model_ref.to(device);

### Tokenize the training Data

Download data

In [None]:
!pip install kaggle
!mkdir ~/.kaggle
!touch ~/.kaggle/kaggle.json
with open("/root/.kaggle/kaggle.json", "w") as f:
  f.write('{"username":"cstsmc","key":"41599cdf1be98f2d65480249a887043b"}')

!chmod 600 /root/.kaggle/kaggle.json

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:

from kaggle.api.kaggle_api_extended import KaggleApi
import re
import json
from sklearn.model_selection import train_test_split

api = KaggleApi()
api.authenticate()
api.dataset_download_files("terrychanorg/chinese-simplified-xlsum-v2", path="./", unzip=True)

with open('./chinese_traditional_XLSum_v2.0/chinese_traditional_val.jsonl') as f:
    data = []
    for line in f.readlines():
      line.replace("\n","")
      data.append(json.loads(line))

def build_text_files(data_json, dest_path):
    with open(dest_path, 'w') as f:
      data = []
      for texts in data_json:
          title = str(texts['title']).strip()
          # text = str(texts['text']).strip()
          # summary = str(texts['summary']).strip()
          # data.append(f"{summary[:40]}BEG;END")
          data.append(f"{title[:40]}BEG;END")
      f.write("\n".join(data))




In [None]:
build_text_files(data,"trainset.txt")

In [None]:
from datasets import load_dataset
ds = load_dataset("text", data_files={"train": ["trainset.txt"]})['train']
ds[:10]



Downloading and preparing dataset text/default to /root/.cache/huggingface/datasets/text/default-bf1ee97f2a6463f9/0.0.0/99cc88223027054f94ce0c7fd69d10eb172910fa0615671283a3c8e5e7af2f9c...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset text downloaded and prepared to /root/.cache/huggingface/datasets/text/default-bf1ee97f2a6463f9/0.0.0/99cc88223027054f94ce0c7fd69d10eb172910fa0615671283a3c8e5e7af2f9c. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

{'text': ['移民危機：德國將暫時恢復邊境控制BEG;END',
  '「愛丁堡動物園大熊貓流產」BEG;END',
  '父親涉避稅海外 英首相稱家人未受益BEG;END',
  '澳洲大選2019：十一張圖表看懂關鍵點BEG;END',
  '健康小知識：有關冷水浴 你可能想象不到的益處BEG;END',
  '奧巴馬要求美國會擴大打擊IS軍事授權BEG;END',
  '英媒：世紀審判在即 習近平面臨考驗BEG;END',
  '被標上數字的子女！探究加州父母虐童案BEG;END',
  '留學日記：我在倫敦，你在哪BEG;END',
  '東莞官方：「身家20億」官員「違紀」BEG;END']}

We pre-tokenize all IMDB in advance to avoid tokenizing twice. In the first step we encode the queries and slice the first `input_size()` tokens. In a second step we decode these tokens back to text for later display.

In [None]:
# def tokenize(sample):
#     sample["tokens"] = gpt2_tokenizer.encode(sample["text"])[:50]
#     sample["query"] = gpt2_tokenizer.decode(sample["tokens"])
#     return sample

# ds = ds.map(tokenize, batched=False)
# ds

In [None]:
ds.set_format("pandas")
ds[:10]

Unnamed: 0,text
0,移民危機：德國將暫時恢復邊境控制BEG;END
1,「愛丁堡動物園大熊貓流產」BEG;END
2,父親涉避稅海外 英首相稱家人未受益BEG;END
3,澳洲大選2019：十一張圖表看懂關鍵點BEG;END
4,健康小知識：有關冷水浴 你可能想象不到的益處BEG;END
5,奧巴馬要求美國會擴大打擊IS軍事授權BEG;END
6,英媒：世紀審判在即 習近平面臨考驗BEG;END
7,被標上數字的子女！探究加州父母虐童案BEG;END
8,留學日記：我在倫敦，你在哪BEG;END
9,東莞官方：「身家20億」官員「違紀」BEG;END


### Generation settings
For the response generation we just use sampling and make sure top-k and nucleus sampling are turned off as well as a minimal length.

In [None]:
gen_kwargs = {
    "min_length":-1,
    "max_length":50,
    "top_k": 0.0,
    "top_p": 1.0,
    "do_sample": True,
}

## Optimize model

### Dataloader
We use a dataloader to return the batches of queries used for each PPO epoch:

In [None]:
def collater(data):
    return dict((key, [d[key] for d in data]) for key in data[0])

dataloader = torch.utils.data.DataLoader(ds, batch_size=config['batch_size'], collate_fn=collater)

### Training loop

The training loop consists of the following main steps:
1. Get the query responses from the policy network (GPT-2)
2. Get sentiments for query/responses from BERT
3. Optimize policy with PPO using the (query, response, reward) triplet

**Training time**

This step takes **~2h** on a V100 GPU with the above specified settings.

In [None]:
ppo_trainer = PPOTrainer(gpt2_model, gpt2_model_ref, gpt2_tokenizer, **config)

total_ppo_epochs = 100#int(np.ceil(config["steps"]/config['batch_size']))

for epoch, batch in tqdm(zip(range(total_ppo_epochs), iter(dataloader))):
    print(f"batch: {epoch}/{total_ppo_epochs}")
    logs, timing = dict(), dict()
    t0 = time.time()
    # query_tensors = [torch.tensor(t).long().to(device) for t in batch["tokens"]]
    
    #### Get response from gpt2
    t = time.time()
    response_tensors = []
    query_tensors = []
    batch['response'] = []
    for i in range(config['batch_size']):
        # print(query_tensors[i].squeeze(dim=-1))
        # gen_len = 256
        query_text = batch["text"][i][0]
        query_tensor = gpt2_tokenizer.encode(query_text, return_tensors="pt").to(device)
        # print(query_text, query_tensor)
        response_tensor = respond_to_batch(gpt2_model, query_tensor,txt_len=150)
        response_text = gpt2_tokenizer.decode(response_tensor[0,:])
        query_tensors.append(query_tensor)
        response_tensors.append(response_tensor)
        batch['response'].append(response_text)

    # print(batch['response'])
    timing['time/get_response'] = time.time()-t

    #### Compute sentiment score
    t = time.time()
    texts = [r  for q,r in zip(batch['text'], batch['response'])]
    pipe_outputs = sentiment_pipe(texts, **sent_kwargs)
    rewards = torch.tensor([output[0]["score"] for output in pipe_outputs]).to(device)
    timing['time/get_sentiment_preds'] = time.time()-t
    
    #### Run PPO step 
    t = time.time()
    stats = ppo_trainer.step([q[0] for q in query_tensors], [r[0] for r in response_tensors], [w for w in rewards])
    timing['time/optimization'] = time.time()-t
     
    #### Log everything
    timing['time/epoch'] = time.time()-t0
    table_rows = [list(r) for r in zip(batch['text'], batch['response'], rewards.cpu().tolist())]
    # logs.update({'game_log': wandb.Table(columns=['query', 'response', 'reward'], rows=table_rows)})
    logs.update(timing)
    logs.update(stats)
    logs['env/reward_mean'] = torch.mean(rewards).cpu().numpy()
    logs['env/reward_std'] = torch.std(rewards).cpu().numpy()
    logs['env/reward_dist'] = rewards.cpu().numpy()
    # wandb.log(logs)
    


0it [00:00, ?it/s][A

batch: 0/100



1it [01:30, 90.12s/it][A

batch: 1/100



2it [03:06, 93.80s/it][A

batch: 2/100



3it [04:38, 92.84s/it][A

batch: 3/100



4it [05:58, 88.02s/it][A

batch: 4/100



5it [07:17, 84.80s/it][A

batch: 5/100



6it [08:37, 82.99s/it][A

batch: 6/100



7it [09:55, 81.41s/it][A

batch: 7/100



8it [11:14, 80.69s/it][A

batch: 8/100



9it [12:32, 79.85s/it][A

batch: 9/100



10it [13:52, 79.75s/it][A

batch: 10/100



11it [15:10, 79.34s/it][A

batch: 11/100



12it [16:29, 79.09s/it][A

batch: 12/100



13it [17:47, 78.89s/it][A

batch: 13/100



14it [19:06, 78.83s/it][A

batch: 14/100



15it [20:25, 78.92s/it][A

batch: 15/100



16it [21:43, 78.66s/it][A

batch: 16/100



17it [23:03, 79.11s/it][A

batch: 17/100



18it [24:22, 79.16s/it][A

batch: 18/100



19it [25:42, 79.37s/it][A

batch: 19/100



20it [27:00, 78.97s/it][A

batch: 20/100



21it [28:20, 79.20s/it][A

batch: 21/100



22it [29:39, 79.08s/it][A

batch: 22/100



23it [30:59, 79.29s/it][A

batch: 23/100



24it [32:17, 79.01s/it][A

batch: 24/100



25it [33:36, 79.02s/it][A

batch: 25/100



26it [34:55, 78.91s/it][A

batch: 26/100



27it [36:15, 79.22s/it][A

batch: 27/100



28it [37:34, 79.25s/it][A

batch: 28/100



29it [38:55, 79.68s/it][A

batch: 29/100



30it [40:13, 79.29s/it][A

batch: 30/100



31it [41:32, 79.09s/it][A

batch: 31/100



32it [42:50, 78.83s/it][A

batch: 32/100



33it [44:09, 78.85s/it][A

batch: 33/100



34it [45:27, 78.62s/it][A

batch: 34/100



35it [46:46, 78.75s/it][A

batch: 35/100



36it [48:05, 78.76s/it][A

batch: 36/100



37it [49:24, 78.82s/it][A

batch: 37/100



38it [50:42, 78.69s/it][A

batch: 38/100



39it [52:00, 78.49s/it][A

batch: 39/100



40it [53:18, 78.48s/it][A

batch: 40/100



41it [54:39, 78.98s/it][A

batch: 41/100



42it [55:57, 78.79s/it][A

batch: 42/100



43it [57:14, 78.17s/it][A

batch: 43/100



44it [58:32, 78.10s/it][A

batch: 44/100



45it [59:49, 77.93s/it][A

batch: 45/100



46it [1:01:07, 77.94s/it][A

batch: 46/100



47it [1:02:25, 77.98s/it][A

batch: 47/100



48it [1:03:45, 78.57s/it][A

batch: 48/100



49it [1:05:04, 78.59s/it][A

batch: 49/100



50it [1:06:21, 78.24s/it][A

batch: 50/100



51it [1:07:38, 77.88s/it][A

batch: 51/100



52it [1:08:57, 78.17s/it][A

batch: 52/100



53it [1:10:15, 78.06s/it][A

batch: 53/100



54it [1:11:32, 77.84s/it][A

batch: 54/100



55it [1:12:50, 77.84s/it][A

batch: 55/100



56it [1:14:07, 77.62s/it][A

batch: 56/100



57it [1:15:25, 77.68s/it][A

batch: 57/100



58it [1:16:41, 77.30s/it][A

batch: 58/100



59it [1:17:59, 77.38s/it][A

batch: 59/100



60it [1:19:16, 77.20s/it][A

batch: 60/100



61it [1:20:33, 77.35s/it][A

batch: 61/100



62it [1:21:50, 77.06s/it][A

batch: 62/100



63it [1:23:08, 77.25s/it][A

batch: 63/100



64it [1:24:25, 77.20s/it][A

batch: 64/100



65it [1:25:42, 77.36s/it][A

batch: 65/100



66it [1:26:59, 77.19s/it][A

batch: 66/100



67it [1:28:17, 77.34s/it][A

batch: 67/100



68it [1:29:36, 77.74s/it][A

batch: 68/100



69it [1:30:54, 78.08s/it][A

batch: 69/100



70it [1:32:13, 78.14s/it][A

batch: 70/100



71it [1:33:30, 78.00s/it][A

batch: 71/100



72it [1:34:48, 77.91s/it][A

batch: 72/100



73it [1:36:06, 77.90s/it][A

batch: 73/100



74it [1:37:25, 78.28s/it][A

batch: 74/100



75it [1:38:43, 78.14s/it][A

batch: 75/100



76it [1:40:00, 77.89s/it][A

batch: 76/100



77it [1:41:17, 77.70s/it][A

batch: 77/100



78it [1:42:35, 77.56s/it][A

batch: 78/100



79it [1:43:53, 77.70s/it][A

batch: 79/100



80it [1:45:10, 77.68s/it][A

batch: 80/100



81it [1:46:28, 77.81s/it][A

batch: 81/100



82it [1:47:46, 77.81s/it][A

batch: 82/100



83it [1:49:05, 77.96s/it][A

batch: 83/100



84it [1:50:22, 77.83s/it][A

batch: 84/100



85it [1:51:41, 78.16s/it][A

batch: 85/100



86it [1:52:59, 78.00s/it][A

batch: 86/100



87it [1:54:17, 78.17s/it][A

batch: 87/100



88it [1:55:37, 78.78s/it][A

batch: 88/100



89it [1:56:56, 78.77s/it][A

batch: 89/100



90it [1:58:16, 79.18s/it][A

batch: 90/100



91it [1:59:35, 78.89s/it][A

batch: 91/100



92it [2:00:54, 78.94s/it][A

batch: 92/100



93it [2:02:12, 78.78s/it][A

batch: 93/100



94it [2:03:30, 78.57s/it][A

batch: 94/100



95it [2:04:49, 78.67s/it][A

batch: 95/100



96it [2:06:07, 78.48s/it][A

batch: 96/100



97it [2:07:25, 78.43s/it][A

batch: 97/100



98it [2:08:43, 78.26s/it][A

batch: 98/100



99it [2:10:01, 78.21s/it][A

batch: 99/100



100it [2:11:19, 78.80s/it]


### Training progress
If you are tracking the training progress with Weights&Biases you should see a plot similar to the one below. Check out the interactive sample report on wandb.ai: [link](https://app.wandb.ai/lvwerra/trl-showcase/runs/1jtvxb1m/).

<div style="text-align: center">
<p style="text-align: center;"> <b>Figure:</b> Reward mean and distribution evolution during training. </p>
</div>

One can observe how the model starts to generate more positive outputs after a few optimisation steps.

> Note: Investigating the KL-divergence will probably show that at this point the model has not converged to the target KL-divergence, yet. To get there would require longer training or starting with a higher inital coefficient.

## Model inspection
Let's inspect some examples from the IMDB dataset. We can use `gpt2_model_ref` to compare the tuned model `gpt2_model` against the model before optimisation.

In [None]:
#### get a batch from the dataset
bs = 16
game_data = dict()
ds.set_format("pandas")
df_batch = ds[:].sample(bs)
game_data['query'] = df_batch['text'].tolist()

response_tensors_ref, response_tensors = [], []

#### get response from gpt2 and gpt2_ref
for i in range(bs):
    text = game_data['query'][i]
    text_tensor = gpt2_tokenizer.encode(query_text, return_tensors="pt").to(device)
    response_tensor = respond_to_batch(gpt2_model, text_tensor,txt_len=150)
    response_tensor_ref = respond_to_batch(gpt2_model_ref, text_tensor, txt_len=150)
    response_tensors.append(response_tensor)
    response_tensors_ref.append(response_tensor_ref)

#### decode responses
game_data['response (before)'] = [gpt2_tokenizer.decode(response_tensors_ref[i][0]) for i in range(bs)]
game_data['response (after)'] = [gpt2_tokenizer.decode(response_tensors[i][0]) for i in range(bs)]

#### sentiment analysis of query/response pairs before/after
texts = [r for q,r in zip(game_data['query'], game_data['response (before)'])]
game_data['rewards (before)'] = [output[0]["score"] for output in sentiment_pipe(texts, **sent_kwargs)]

texts = [r for q,r in zip(game_data['query'], game_data['response (after)'])]
game_data['rewards (after)'] = [output[0]["score"] for output in sentiment_pipe(texts, **sent_kwargs)]

# store results in a dataframe
df_results = pd.DataFrame(game_data)
df_results

Unnamed: 0,query,response (before),response (after),rewards (before),rewards (after)
0,紅色權貴吳小暉被公訴 涉集資詐騙及職務侵佔BEG;END,案 件 的 死 者 家 屬 上 周 六 （ 6 月 2 日 ） 約 300 名 俄 羅 斯 ...,[UNK] [UNK] 是 土 耳 其 總 理 埃 爾 多 安 的 一 張 名 單 上 所 ...,0.621991,0.59426
1,AA：英加油站汽油銷量五年銳減20%BEG;END,[UNK] 最 高 法 院 5 月 26 日 在 土 耳 其 北 部 城 市 和 城 市 發...,要 求 政 府 停 止 涉 及 協 議 聲 明 和 通 訊 內 容 的 通 訊 ， 網 民 ...,0.452352,0.584721
2,歐盟嚴查黃金護照塞浦路斯收回26名富豪國籍BEG;END,息 事 寧 人 的 [UNK] 號 導 彈 護 衛 艦 現 在 已 盡 快 完 全 停 用 ...,如 果 土 耳 其 過 此 一 決 定 ， 相 信 該 國 安 全 令 會 作 出 更 重 ...,0.668677,0.588505
3,哥特妝女生廣州乘地鐵遭拒，引中國網友自拍抗議BEG;END,[UNK] [UNK] [UNK] [UNK] 首 席 執 行 官 休 · 白 恩 斯 （ ...,穆 勒 之 子 埃 爾 多 安 位 於 伊 斯 坦 布 爾 的 奧 馬 爾 · 伊 斯 坦 ...,0.580071,0.551183
4,“中国等地司法界存在贪腐问题”BEG;END,信 半 個 多 世 紀 前 的 埃 爾 多 安 電 影 《 了 不 起 的 摩 洛 哥 》 ...,手 機 維 護 中 心 的 會 議 室 已 關 閉 。 土 耳 其 總 理 埃 爾 多 安 ...,0.571822,0.615362
5,湖南湘潭民企老闆市政府15樓跳樓自殺BEG;END,哥 倫 比 亞 廣 播 電 台 發 射 電 視 [UNK] 與 節 目 主 持 人 發 生 ...,此 前 曾 表 示 會 將 阻 止 土 耳 其 控 告 反 對 削 弱 反 對 派 的 土 ...,0.600191,0.615872
6,中國官媒：玉兔月球車已全面蘇醒BEG;END,月 11 日 為 期 三 天 的 民 眾 查 封 高 趕 鋒 刑 期 恢 復 法 院 的 禁...,此 前 ， 土 耳 其 檢 察 官 說 ， 政 府 暫 停 了 單 個 司 法 機 構 總 ...,0.576134,0.625983
7,里約2016：中國女足展露才華勝南非BEG;END,帖 登 揭 露 的 個 別 被 關 於 數 據 攻 擊 的 莽 撞 醜 聞 赫 施 拉 特 ...,持 續 關 注 土 耳 其 總 統 府 發 言 人 約 克 · 哈 吉 茲 蒂 爾 星 期 ...,0.575866,0.577031
8,無薪實習「只有富人家孩子能承擔」BEG;END,通 一 斷 言 ， 土 耳 其 認 同 政 府 的 報 復 措 施 要 小 得 多 ， 負 ...,這 一 「 間 諜 」 的 規 定 本 身 就 根 深 蒂 固 。 有 學 者 認 為 ， ...,0.540443,0.599033
9,「我們約定，要有一個人活下來把事情告訴大家」BEG;END,成 英 吉 拉 斯 軟 件 公 司 ( [UNK] [UNK] ) 把 請 求 令 延 期 ...,能 夠 打 開 福 樂 長 久 的 屏 蔽 真 是 我 的 福 音 ， 最 少 有 兩 部 ...,0.618186,0.69548


Looking at the reward mean/median of the generated sequences we observe a significant difference.

In [None]:
print('mean:')
display(df_results[["rewards (before)", "rewards (after)"]].mean())
print()
print('median:')
display(df_results[["rewards (before)", "rewards (after)"]].median())

mean:


rewards (before)    0.583381
rewards (after)     0.592415
dtype: float64


median:


rewards (before)    0.580654
rewards (after)     0.591383
dtype: float64

## Save model
Finally, we save the model and push it to the Hugging Face for later usage.

In [None]:
!pip install huggingface_hub
from huggingface_hub import notebook_login
notebook_login()

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
gpt2_model.save_pretrained('./temp_model')
gpt2_tokenizer.save_pretrained('./temp_model')

('./gpt-reporter-badplace/tokenizer_config.json',
 './gpt-reporter-badplace/special_tokens_map.json',
 './gpt-reporter-badplace/vocab.txt',
 './gpt-reporter-badplace/added_tokens.json',
 './gpt-reporter-badplace/tokenizer.json')

In [None]:
# !zip -r gpt2.zip gpt-reporter-badplace
# from google.colab import files
# files.download("gpt2.zip")

In [None]:
# 'gpt-reporter-badplace'
from transformers import AutoModelWithLMHead, TrainingArguments, Trainer

model = AutoModelWithLMHead.from_pretrained("temp_model")


training_args = TrainingArguments(
    output_dir="./gpt-reporter-badplace", #The output directory
    push_to_hub=True,
    hub_model_id="theta/gpt2-reporter-badplace"
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=None,
    eval_dataset=None,
    compute_metrics=None,
)
trainer.push_to_hub(commit_message="update Label Text")


loading configuration file gpt-reporter-badplace2/config.json
Model config GPT2Config {
  "_name_or_path": "gpt-reporter-badplace2",
  "activation_function": "gelu_new",
  "architectures": [
    "GPT2HeadWithValueModel"
  ],
  "attn_pdrop": 0.1,
  "bos_token_id": 50256,
  "embd_pdrop": 0.1,
  "eos_token_id": 50256,
  "gradient_checkpointing": false,
  "id2label": {
    "0": "LABEL_0"
  },
  "initializer_range": 0.02,
  "label2id": {
    "LABEL_0": 0
  },
  "layer_norm_epsilon": 1e-05,
  "model_type": "gpt2",
  "n_ctx": 1024,
  "n_embd": 768,
  "n_head": 12,
  "n_inner": null,
  "n_layer": 12,
  "n_positions": 1024,
  "output_past": true,
  "reorder_and_upcast_attn": false,
  "resid_pdrop": 0.1,
  "scale_attn_by_inverse_layer_idx": false,
  "scale_attn_weights": true,
  "summary_activation": null,
  "summary_first_dropout": 0.1,
  "summary_proj_to_labels": true,
  "summary_type": "cls_index",
  "summary_use_proj": true,
  "task_specific_params": {
    "text-generation": {
      "do_samp

Download file pytorch_model.bin:   0%|          | 3.43k/401M [00:00<?, ?B/s]

Download file training_args.bin: 100%|##########| 3.31k/3.31k [00:00<?, ?B/s]

Clean file training_args.bin:  30%|###       | 1.00k/3.31k [00:00<?, ?B/s]

Clean file pytorch_model.bin:   0%|          | 1.00k/401M [00:00<?, ?B/s]

Saving model checkpoint to ./gpt-reporter-badplace
Configuration saved in ./gpt-reporter-badplace/config.json
Model weights saved in ./gpt-reporter-badplace/pytorch_model.bin


Upload file pytorch_model.bin:   0%|          | 3.30k/401M [00:00<?, ?B/s]

Upload file training_args.bin: 100%|#########9| 3.30k/3.31k [00:00<?, ?B/s]

remote: Scanning LFS files for validity, may be slow...        
remote: LFS file scan complete.        
To https://huggingface.co/theta/gpt2-reporter-badplace
   5759267..47089cc  main -> main

remote: LFS file scan complete.        
To https://huggingface.co/theta/gpt2-reporter-badplace
   5759267..47089cc  main -> main

Dropping the following result as it does not have all the necessary fields:
{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}}
To https://huggingface.co/theta/gpt2-reporter-badplace
   47089cc..b914f57  main -> main

   47089cc..b914f57  main -> main



'https://huggingface.co/theta/gpt2-reporter-badplace/commit/47089cc8442dcc0944bd1bdfca5484323113a72d'

# Test Model

In [None]:
from transformers import pipeline, AutoModelWithLMHead
MODEL_NAME = 'uer/gpt2-chinese-cluecorpussmall'
model = AutoModelWithLMHead.from_pretrained('theta/gpt2-reporter-badplace')
reporter = pipeline('text-generation',model=model, tokenizer=MODEL_NAME)




Downloading config.json:   0%|          | 0.00/1.07k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/401M [00:00<?, ?B/s]

Downloading tokenizer_config.json:   0%|          | 0.00/217 [00:00<?, ?B/s]

Downloading config.json:   0%|          | 0.00/577 [00:00<?, ?B/s]

Downloading vocab.txt:   0%|          | 0.00/107k [00:00<?, ?B/s]

Downloading special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

In [None]:
reporter('總統宣布國防預算大漲BEG;END',max_length=800)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': '總統宣布國防預算大漲BEG;END 國 防 預 算 大 漲 是 因 為 中 國 已 經 向 蘇 聯 出 售 部 分 產 品 的 價 值 大 約 3000 億 美 元 。 此 前 多 一 點 也 對 中 國 對 美 國 在 科 技 改 編 方 面 投 資 的 情 報 收 入 構 成 重 大 問 題 。 現 在 美 國 和 蘇 聯 一 樣 明 白 ， 中 國 正 在 發 揮 越 來 越 大 的 作 用 ， 需 要 更 具 戰 略 定 位 的 國 防 部 。 因 此 ， 中 國 必 須 對 其 強 大 的 地 方 採 取 有 效 地 措 施 ， 這 意 味 著 中 國 對 美 國 能 夠 在 地 海 上 活 動 的 情 報 收 入 增 加 了 。 他 說 ： " 我 們 現 在 有 了 強 大 的 基 於 國 家 利 益 的 軍 事 策 略 ， 有 了 它 ， 就 可 以 有 足 夠 的 技 術 和 高 能 力 的 科 技 。 " 中 國 的 反 應 正 處 在 一 個 相 似 的 境 地 ， 這 種 不 同 的 觀 點 使 得 中 美 兩 國 的 政 策 都 非 常 不 一 樣 。 在 中 國 方 面 ， 很 多 中 國 領 導 人 擔 心 他 們 在 中 美 互 利 的 關 係 中 都 很 難 得 到 政 策 支 持 ， 而 且 他 們 也 擔 心 中 美 兩 國 在 某 些 時 候 有 些 必 然 性 在 發 生 非 常 大 的 同 步 。 中 美 兩 國 的 政 策 都 非 常 不 一 樣 。 但 是 根 据 他 們 的 觀 察 ， 中 國 政 府 一 些 高 科 技 行 業 的 投 入 已 經 大 大 減 少 了 ， 甚 至 也 不 能 說 是 大 幅 增 加 ， 以 至 於 整 個 行 業 都 很 難 得 到 投 資 。 但 是 中 美 的 高 科 技 項 目 往 往 都 非 常 有 用 。 比 如 有 的 企 業 需 要 大 量 的 新 技 術 來 支 持 新 的 公 司 ， 比 如 對 互 聯 網 的 投 入 ， 投 资 甚 至 可 以 通 過 這 些 軟 件 來 跟 其 他 行 業 進 行 合 作 。 中 國 的 許 多 基 於 互 聯 網 的 技 術 研 究 都 是 通 過 中 國 來 進 行 的 ， 但 是 