[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/xxbidiao/plug-and-blend/blob/main/blending_generation_demo_colab.ipynb)

## Introduction
Plug-and-blend generate stories based on both a prompt and one or multiple continuously weighted topics.

Here we show off some capabilities of Plug-and-blend, to illustrate its generation potential and how it can be used in an interactive setting.

## Setup
Let's have code and dataset downloaded and set up.

In [None]:
# Downloading the GeDi modifier model.
!wget https://storage.googleapis.com/sfr-gedi-data/gedi_topic.zip
import zipfile
with zipfile.ZipFile('gedi_topic.zip', 'r') as zip_ref:
    zip_ref.extractall('./')

--2021-03-31 02:01:03--  https://storage.googleapis.com/sfr-gedi-data/gedi_topic.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.74.128, 209.85.145.128, 142.250.125.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.74.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1318630072 (1.2G) [application/zip]
Saving to: ‘gedi_topic.zip’


2021-03-31 02:01:14 (125 MB/s) - ‘gedi_topic.zip’ saved [1318630072/1318630072]



Then we download the code archive and the base LM.

Here we provide our fine-tuned GPT2-large on ROCStories dataset. You can use a different dataset to fine-tune your own model; as long as its tokenization is compatible with gpt-2, it should work.

In [None]:
!gdown --id 1mkNr7unvQKBWayTZPSFlM7XhVMN8iNxA
!unzip plug_and_blend_r1.zip

!gdown --id 1Bhgfp2rZoCfU5tDPxZr5LV36WUfJXNOL
!unzip rocstories_gpt2_large.zip -d baselm

Downloading...
From: https://drive.google.com/uc?id=1mkNr7unvQKBWayTZPSFlM7XhVMN8iNxA
To: /content/plug_and_blend_r1.zip
  0% 0.00/35.4k [00:00<?, ?B/s]100% 35.4k/35.4k [00:00<00:00, 58.1MB/s]
Archive:  plug_and_blend_r1.zip
   creating: gedi_helpers/
  inflating: gedi_helpers/modeling_gpt2.py  
  inflating: gedi_helpers/modeling_utils.py  
  inflating: gedi_skill.py           
  inflating: gedi_story_gen.py       
Downloading...
From: https://drive.google.com/uc?id=1Bhgfp2rZoCfU5tDPxZr5LV36WUfJXNOL
To: /content/rocstories_gpt2_large.zip
463MB [00:02, 206MB/s]
Archive:  rocstories_gpt2_large.zip
  inflating: baselm/config.json      
  inflating: baselm/pytorch_model.bin  


Finally, install these dependencies (colab may ask you to restart runtime since we use an older version of `torch`.

In [None]:
!pip install transformers==3.5.1
!pip install torch==1.4.0
import nltk
nltk.download('punkt')

Collecting transformers==3.5.1
[?25l  Downloading https://files.pythonhosted.org/packages/3a/83/e74092e7f24a08d751aa59b37a9fc572b2e4af3918cb66f7766c3affb1b4/transformers-3.5.1-py3-none-any.whl (1.3MB)
[K     |████████████████████████████████| 1.3MB 5.1MB/s 
Collecting sentencepiece==0.1.91
[?25l  Downloading https://files.pythonhosted.org/packages/f2/e2/813dff3d72df2f49554204e7e5f73a3dc0f0eb1e3958a4cad3ef3fb278b7/sentencepiece-0.1.91-cp37-cp37m-manylinux1_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 20.4MB/s 
[?25hCollecting sacremoses
[?25l  Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K     |████████████████████████████████| 890kB 18.4MB/s 
Collecting tokenizers==0.9.3
[?25l  Downloading https://files.pythonhosted.org/packages/7b/ac/f5ba028f0f097d855e1541301e946d4672eb0f30b6e25cb2369075f916d2/tokenizers-0.9.3-cp37-cp37m-manylinux1_x86_64.whl (2.9MB

True

## Prepare the models

In [None]:
from gedi_skill import GediSkill

## Load base model
In this notebook, we have prepared all models and their paths set up for you.
However, you can also bring your own base LM. 
For this demo, any model that uses the same vocabulary of GPT will work.

If `base_model_path` is not `None`, it is treated as the path to base model.
Otherwise, the original `gpt2-large` model is used instead.

In [None]:
base_model_path = "baselm/"
gedi_path = "gedi_topic/"


## Demo 1 - Let's generate some sentence!
In this demo, we demonstrate individual-sentence blending generation capability of our blending generation language model.

First, we load in the models (This may take some time):

In [None]:
gedi_skill_obj = GediSkill(base_model_path,gedi_path)

HBox(children=(FloatProgress(value=0.0, description='Downloading', max=1042301.0, style=ProgressStyle(descript…




HBox(children=(FloatProgress(value=0.0, description='Downloading', max=456318.0, style=ProgressStyle(descripti…


no logit scale initialized for gpt2


In [None]:
# Then we set the parameters needed to generate the sentence.
# Here we first provide a prompt...
sentence = "Welcome to Plug and Blend!"

# Then provide control codes. They can be arbitrary one-token words, so try something else! `Sports`,`Science`,`Business`,`World` works the best, but zero-shot topics are supported too.
topic = {"Science":0.5,"Business":0.5}

# Finally we set the control strength here. A higher value than 1 will result in more steering towards topic specified, but potentially less fluency (Especially when sentence is short and control strength is very high).
# For example, setting it to 2 means 2x control strength.
control_strength = 1

# For ease of demo, scale it to internal strength, so that 1 in control_strength now means 30, the baseline value.
control_strength = control_strength * gedi_skill_obj.disc_weight

# Now just wait for the sentence to be generated.
text = gedi_skill_obj.generate_one_sentence(sentence=sentence, topic=topic,extra_args = {"disc_weight":control_strength})
print(text)
print("OK!")

 We're excited to share our passion for making delicious, healthy food.
OK!


## Demo 2: Run the story generation experience

In this demo, you will provide Control Sketches (described in the paper) so that the planner will generate a 10-sentence story for you.

An agent will interactively ask you (1) a 0-start topic index (e.g. 0 for `Sports` as in the default,  `['Sports','Science']` ), (2) a start point of the sketch, which specifies where the effect should emerge, in terms of sentence index (from 0 to 9), (3) an end point of the sketch. See the previous-run log for examples.

(Colab may report out of memory if this demo is performed after demo 1. Feel free to restart the Colab instance.)

For `topics` you can choose from `Sports`,`Science`,`Business`,`World`, or any other word that can be tokenized into one token (extra tokens will be ignored.)
There can be more than 2 topics provided in `topics`. Try having more, and have fun!

In [None]:
from story_gen import run_demo
run_demo(
    base_language_model_path = base_model_path, # use None to use original GPT2 model.
    gedi_path=gedi_path,
    topics=['Sports','Science']
)




Local location variables not used.
Topics available: ['Sports', 'Science'] (Configure it in code)
no logit scale initialized for gpt2
Starting a new sketch. Input index of topic, or no input if no more new sketches:0
This sketch is for topic: Sports
Area to apply, start?0
Area to apply, end?5
Starting a new sketch. Input index of topic, or no input if no more new sketches:1
This sketch is for topic: Science
Area to apply, start?5
Area to apply, end?10
Starting a new sketch. Input index of topic, or no input if no more new sketches:


  0%|          | 0/10 [00:00<?, ?it/s]

Now generating...
Planner output: [{'Sports': 1.0}, {'Sports': 1.0}, {'Sports': 0.9077361202459936, 'Science': 0.09226387975400638}, {'Sports': 0.8155224270282758, 'Science': 0.1844775729717242}, {'Sports': 0.6651435733326628, 'Science': 0.3348564266673373}, {'Sports': 0.4716058536173564, 'Science': 0.5283941463826437}, {'Sports': 0.2862435283632493, 'Science': 0.7137564716367507}, {'Sports': 0.15268457320416287, 'Science': 0.8473154267958372}, {'Sports': 0.0749034043928075, 'Science': 0.9250965956071925}, {'Science': 1.0}]


100%|██████████| 10/10 [02:41<00:00, 16.15s/it]

[Sentence 0 is using Configuration {'Sports': 1.0}]
 Jackie Robinson was playing in the NBA.
[Sentence 1 is using Configuration {'Sports': 1.0}]
 He had just played basketball for a few years.
[Sentence 2 is using Configuration {'Sports': 0.9077361202459936, 'Science': 0.09226387975400638}]
  He was looking forward to his first game of the season.
[Sentence 3 is using Configuration {'Sports': 0.8155224270282758, 'Science': 0.1844775729717242}]
  He decided to play with his friends and play against them in the court.
[Sentence 4 is using Configuration {'Sports': 0.6651435733326628, 'Science': 0.3348564266673373}]
     He got a lot of feedback from everyone who played against him, including some that were very excited about it!
[Sentence 5 is using Configuration {'Sports': 0.4716058536173564, 'Science': 0.5283941463826437}]
 I was really happy when I saw how he played.
[Sentence 6 is using Configuration {'Sports': 0.2862435283632493, 'Science': 0.7137564716367507}]
  I also had to admit 


