In [3]:
from transformers import GPTNeoForCausalLM, GPT2Tokenizer

model = GPTNeoForCausalLM.from_pretrained('EleutherAI/gpt-neo-1.3B')
tokenizer = GPT2Tokenizer.from_pretrained('EleutherAI/gpt-neo-1.3B')

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


In [4]:
import torch
input = tokenizer.encode("I evaluated the performance of GPT-Neo developed by OpenAI.", return_tensors='pt')

print(input[0])

print(tokenizer.decode(input[0]))

tensor([   40, 16726,   262,  2854,   286,   402, 11571,    12,  8199,    78,
         4166,   416,  4946, 20185,    13])
I evaluated the performance of GPT-Neo developed by OpenAI.


In [5]:
tokenizer.add_special_tokens({'pad_token': '[PAD]'})

input = tokenizer.batch_encode_plus(["I evaluated the performance of GPT-Neo developed by OpenAI.","I evaluated the performance of GPT developed by OpenAI."], padding=True, truncation=True, return_tensors="pt")

print(input['input_ids'])

print([tokenizer.decode(input['input_ids'][i]) for i in range(len(input['input_ids']))])

tensor([[   40, 16726,   262,  2854,   286,   402, 11571,    12,  8199,    78,
          4166,   416,  4946, 20185,    13],
        [   40, 16726,   262,  2854,   286,   402, 11571,  4166,   416,  4946,
         20185,    13, 50257, 50257, 50257]])
['I evaluated the performance of GPT-Neo developed by OpenAI.', 'I evaluated the performance of GPT developed by OpenAI.[PAD][PAD][PAD]']


In [6]:
input = tokenizer.batch_encode_plus(["I evaluated the performance of GPT2 developed by OpenAI.", "Vaccine for new coronavirus in the UK", "3.1415926535"], max_length=5, truncation=True, padding=True, return_tensors='pt')

input['input_ids']

tensor([[   40, 16726,   262,  2854,   286],
        [   53,  4134,   500,   329,   649],
        [   18,    13,  1415, 19707, 22980]])

In [7]:
generated = model.generate(input['input_ids'])

len(generated)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


3

In [8]:
generated_text = tokenizer.batch_decode(generated)

for i, sentence in enumerate(generated_text):
    print(f'No.{i+1}')
    print(f'{sentence}\n')

No.1
I evaluated the performance of the proposed method on the real-world dataset. The results are shown in Table \[tab:

No.2
Vaccine for new-borns

The vaccine for new-borns is a vaccine for the prevention of diseases

No.3
3.141592653589793238462643383279502884197169399375105820

