<a href="https://colab.research.google.com/github/thamarai1177/colabgemini/blob/main/TextGenerationPipeline_Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Transformers Generation Pipeline Tutorial
Generate text in 2 - 5 lines of code for all `transformers` language models!

Check the HuggingFace [website](https://huggingface.co/models?filter=lm-head) for a comprehensive list of language models to use.

## Setup

In [None]:
!pip install transformers

Collecting transformers
[?25l  Downloading https://files.pythonhosted.org/packages/22/97/7db72a0beef1825f82188a4b923e62a146271ac2ced7928baa4d47ef2467/transformers-2.9.1-py3-none-any.whl (641kB)
[K     |▌                               | 10kB 23.2MB/s eta 0:00:01[K     |█                               | 20kB 3.1MB/s eta 0:00:01[K     |█▌                              | 30kB 4.1MB/s eta 0:00:01[K     |██                              | 40kB 4.5MB/s eta 0:00:01[K     |██▌                             | 51kB 3.6MB/s eta 0:00:01[K     |███                             | 61kB 4.0MB/s eta 0:00:01[K     |███▋                            | 71kB 4.4MB/s eta 0:00:01[K     |████                            | 81kB 4.8MB/s eta 0:00:01[K     |████▋                           | 92kB 5.2MB/s eta 0:00:01[K     |█████                           | 102kB 4.9MB/s eta 0:00:01[K     |█████▋                          | 112kB 4.9MB/s eta 0:00:01[K     |██████▏                         | 122kB 4.9M

In [None]:
from transformers import pipeline
from transformers import AutoTokenizer, AutoModelWithLMHead

## Generation with GPT2

In [None]:
gpt2 = pipeline('text-generation')

HBox(children=(IntProgress(value=0, description='Downloading', max=554, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=1042301, style=ProgressStyle(description_wi…




HBox(children=(IntProgress(value=0, description='Downloading', max=456318, style=ProgressStyle(description_wid…




HBox(children=(IntProgress(value=0, description='Downloading', max=230, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=548118077, style=ProgressStyle(description_…




In [None]:
gpt2('Natural language processing is amazing!')

Setting `pad_token_id` to 50256 (first `eos_token_id`) to generate sequence


[{'generated_text': "Natural language processing is amazing!\n\nI'm not sure if I can say this enough. I"}]

# Generation with OpenAI GPT

In [None]:
tokenizer = AutoTokenizer.from_pretrained('openai-gpt')
model = AutoModelWithLMHead.from_pretrained('openai-gpt')
opengpt = pipeline('text-generation', model=model, tokenizer=tokenizer, device=0)

HBox(children=(IntProgress(value=0, description='Downloading', max=326, style=ProgressStyle(description_width=…




HBox(children=(IntProgress(value=0, description='Downloading', max=815973, style=ProgressStyle(description_wid…




HBox(children=(IntProgress(value=0, description='Downloading', max=458495, style=ProgressStyle(description_wid…

ftfy or spacy is not installed using BERT BasicTokenizer instead of SpaCy & ftfy.





HBox(children=(IntProgress(value=0, description='Downloading', max=478750579, style=ProgressStyle(description_…




In [None]:
opengpt('Natural language processing is amazing!')

[{'generated_text': 'Natural language processing is amazing! " \n " i \'m sure it is , " said the professor .'}]

## Generation with Transformer-XL

In [None]:
tokenizer = AutoTokenizer.from_pretrained('transfo-xl-wt103')
model = AutoModelWithLMHead.from_pretrained('transfo-xl-wt103')
xl = pipeline('text-generation', model=model, tokenizer=tokenizer)

In [None]:
xl('Natural language processing is amazing! Im very very excited with what is in store for the coming year. Also,', max_length=300)

You might want to consider setting `add_space_before_punct_symbol=True` as an argument to the `tokenizer.encode()` to avoid tokenizing words with punctuation symbols to the `<unk>` token
Setting `pad_token_id` to 0 (first `eos_token_id`) to generate sequence


[{'generated_text': 'Natural language processing is amazing! Im very very excited with what is in store for the coming year. Also, is'}]

## Generation with XLNet

In [None]:
tokenizer = AutoTokenizer.from_pretrained('xlnet-base-cased')
model = AutoModelWithLMHead.from_pretrained('xlnet-base-cased')
xlnet = pipeline('text-generation', model=model, tokenizer=tokenizer)

In [None]:
xlnet('Natural language processing is amazing!', max_length=200)

[{'generated_text': 'Natural language processing is amazing!, the """ is a word that sounds like a """. It sounds like a'}]

## Generation with T5

In [None]:
tokenizer = AutoTokenizer.from_pretrained("t5-base")
model = AutoModelWithLMHead.from_pretrained("t5-base")
t5 = pipeline('summarization', model=model, tokenizer=tokenizer)

In [None]:
INPUT = """
question: What does increased oxygen concentrations in the patient’s
lungs displace? context: Hyperbaric (high-pressure) medicine uses special oxygen
chambers to increase the partial pressure of O 2 around the patient and, when needed,
the medical staff. Carbon monoxide poisoning, gas gangrene, and decompression sickness
(the ’bends’) are sometimes treated using these devices. Increased O 2 concentration
in the lungs helps to displace carbon monoxide from the heme group of hemoglobin.
Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene, so increasing
its partial pressure helps kill them. Decompression sickness occurs in divers who
decompress too quickly after a dive, resulting in bubbles of inert gas, mostly nitrogen
and helium, forming in their blood. Increasing the pressure of O 2 as soon as possible
is part of the treatment.
"""
t5(INPUT, max_length=1000)

Your max_length is set to 1000, but you input_length is only 209. You might consider decreasing max_length manually, e.g. summarizer('...', max_length=50)


[{'summary_text': 'carbon monoxide from the heme group of hemoglobin . Oxygen gas is poisonous to the anaerobic bacteria that cause gas gangrene'}]

## Generation with CTRL

In [None]:
tokenizer = AutoTokenizer.from_pretrained('ctrl')
model = AutoModelWithLMHead.from_pretrained('ctrl')


In [None]:
ctrl = pipeline('text-generation', model=model, tokenizer=tokenizer)

In [None]:
ctrl('Natural language processing is amazing!')