https://huggingface.co/docs/transformers/generation_strategies

In [1]:
from transformers import pipeline
from transformers import set_seed

set_seed(42)

In [2]:
tg_pipeline = pipeline("text-generation", model="distilbert/distilgpt2", device=0)

tg_pipeline

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/762 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/353M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0


<transformers.pipelines.text_generation.TextGenerationPipeline at 0x7c4163609dd0>

Note that there are many input parameters that can be used to configure and control text generation

https://huggingface.co/docs/transformers/main_classes/text_generation

In [3]:
text = "In a world where dreams become reality"

answer = tg_pipeline(text, pad_token_id=tg_pipeline.tokenizer.eos_token_id)

print(answer)

[{'generated_text': 'In a world where dreams become reality, the only way to escape the obstacles is to escape the obstacles, and to have the courage to challenge the obstacles.'}]


In [4]:
text = "In a universe where every idea takes shape"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    min_new_tokens=100,
    max_new_tokens=500
)

print(answer[0]['generated_text'])

In a universe where every idea takes shape, the Universe is going to be a lot different from our own. But the possibilities of a universe are limitless.


The universe is going to be very different. It will have a very different future than the one we have at present, but it could be more like the one we have in the past.
So, how will we be able to take that reality and create a universe that we think is a more like the one we have in the past?
We can use the technologies we have today to create a universe that we think is totally different from what we have today. It's not like the universe is not in the past.
And I think that's what we should be able to find, because we know that there is something about the Universe that we shouldn't be thinking about because it is something that the Universe is not in.
Do you think that if we were to have a universe filled with human beings, it would be a lot different than the one we have today.
I think it is what we should be able to find.
So, wh

In [5]:
text = "In a universe where every idea takes shape"

answer = tg_pipeline(
    text,
    tokenizer = tg_pipeline.tokenizer,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=100,
    stop_strings = ["evolve", "world"]
)

print(answer[0]['generated_text'])

In a universe where every idea takes shape, the universe is shaped by an infinite number of galaxies, and the Earth is shaped by an infinite number of galaxies.




There are two basic principles for understanding the Universe. One of them is the notion that the universe is shaped by a set of galaxies. The other is the notion that the universe is shaped by a set of atoms, and that the universe is made of atoms.
This principle is called the "the Universe." This is what the universe is. The universe


In [9]:
text = "In a world sculpted by visionary ideas"

answers = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=80,
    num_return_sequences = 3
)

for answer in answers:
    print(answer['generated_text'])
    print('-'*80)

In a world sculpted by visionary ideas, artists, and artists, this artworks exhibit at the Smithsonian Museum of Art focuses on a broad range of artworks by artists from the arts to the arts. The exhibition features a range of contemporary artworks from contemporary to modern, from classical to contemporary, to contemporary, to contemporary.
--------------------------------------------------------------------------------
In a world sculpted by visionary ideas, the world of a "Derek Carr sculpture" is a world of a world of people living in a world of people living in a world of people living in a world of people living in a world of people living in a world of people living in a world of people living in a world of people living in a world of people living in a world of people living in a world of people living
--------------------------------------------------------------------------------
In a world sculpted by visionary ideas, the world’s greatest sculptor is a world sculpted by visi

## Decoding Strategies in Text Generation

Decoding strategies in text generation models refer to the methods used to generate text from the output probabilities produced by models like GPT, BART, or other transformer-based architectures. These strategies determine how the next word (or token) is selected during the text generation process, which can significantly influence the quality, coherence, and creativity of the generated text.

#### Greedy Search

How It Works: In Greedy Search, the model selects the token with the highest probability at each step. This approach is straightforward but often leads to suboptimal results because it doesn’t consider the long-term implications of each choice.



In [10]:
text = "I went to the office one day"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=100
)

print(answer[0]['generated_text'])

I went to the office one day, and I said, 'Oh, I know. Now you can't do this. I'm not going to do something. I know that this is not going to work.' The man who was looking for me, George. I said, 'No, this is not going to work.' He said, 'I'm going to do this, but I don't want to do it.' I said, 'I'm going to do this.' I said, 'Why, George? I'm


#### Beam Search
How It Works: Beam Search keeps track of multiple possible sequences (beams) at each step, rather than just the single best one. It explores a fixed number of top candidates (beam width) and expands them simultaneously, eventually selecting the sequence with the highest overall score.

In [11]:
text = "I went to the office one day"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=100,
    num_beams=4
)

print(answer[0]['generated_text'])

I went to the office one day and said, 'I don't know what I'm doing. I don't know what I'm doing.'"
















































































In [12]:
text = "I went to the office one day"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=100,
    num_beams=5,
    no_repeat_ngram_size=2
)

print(answer[0]['generated_text'])

I went to the office one day and said, 'I don't know what I'm talking about, but I'll tell you what.' And I said to him, "You know, I've got a lot of work to do, and I want you to take care of it.' " And then he said "I'm going to give you a little bit of advice on how to deal with it." And he went on to say: "It's not like you have to worry about it. It's just a matter of


In [13]:
text = "I went to the office one day"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=100,
    num_beams=5,
    no_repeat_ngram_size=4
)

print(answer[0]['generated_text'])

You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


I went to the office one day and said, 'I don't want to go to the office. I'm not going to go to my office.' "



He said: 'I'm going to the office, and I'm going to my office.


'I don't know what I'm doing.
'

'
'
He said, 'You're not going to do this.
'I'm not doing this.'

He added: 'I am not going to


In [15]:
text = "I went to the office one day"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=200,
    num_beams=5,
    repetition_penalty=2.0 # Default is 1.0 which means no penalty
)

print(answer[0]['generated_text'])

I went to the office one day and said, 'You're not going to be able to do this.'"



























































































































































































### Multinomial Sampling
As opposed to greedy search that always chooses a token with the highest probability as the next token, multinomial sampling (also called ancestral sampling) randomly selects the next token based on the probability distribution over the entire vocabulary given by the model. Every token with a non-zero probability has a chance of being selected, thus reducing the risk of repetition.

In [16]:
text = "I went to the office one day"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=100,
    do_sample=True,
    num_beams=1
)

print(answer[0]['generated_text'])

I went to the office one day and the other day. I‪m very glad.”

He had no idea I was there. He was a very nice guy. And I know that I'm not a bad person. But I‪m here. I'm not a bad person.
He was a very nice guy.
He was very nice to the public. He had a very nice demeanor.
He was very nice to the people who were there. He didn't take any personal threats.


#### Temperature:

A hyperparameter that controls the randomness of predictions, with lower values leading to more deterministic outputs and higher values introducing more variability.

#### Top-K Sampling:

A decoding strategy that restricts token selection to the top K most probable options, introducing controlled randomness by only considering a fixed number of high-probability tokens.

#### Top-p (Nucleus) Sampling:

A decoding strategy that selects tokens from the smallest set whose cumulative probability exceeds a threshold p, dynamically adjusting the number of considered tokens based on the context.

In [29]:
text = "In a world sculpted by visionary ideas"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=200,
    do_sample=True,
    temperature=0.1, # Default 1.0
    top_k=0,
    top_p=0
)

print(answer[0]['generated_text'])

In a world sculpted by visionary ideas, the world of the artist is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art


In [30]:
text = "In a world sculpted by visionary ideas"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=200,
    do_sample=True,
    temperature=1.0,
    top_k=0,
    top_p=0
)


print(answer[0]['generated_text'])

In a world sculpted by visionary ideas, the world of the artist is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art


In [31]:
text = "In a world sculpted by visionary ideas"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=200,
    do_sample=True,
    temperature=1.5,
    top_k=0,
    top_p=0
)


print(answer[0]['generated_text'])

In a world sculpted by visionary ideas, the world of the artist is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art. It is a world of art


In [32]:
text = "In a world sculpted by visionary ideas"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=100,
    do_sample=True,
    top_k=10 # Default value is 50
)

print(answer[0]['generated_text'])

In a world sculpted by visionary ideas, we’re now in the process of re-imposing the “treaty‡ to the human mind‡ to the physical reality of the world.

In the first place, a new approach to the world is being taken. In the second, we are working on a new approach to the world that aims to create a new, new and new way to create a world in which we can create a world in which to create a world in which to create a


In [33]:
text = "In a world sculpted by visionary ideas"

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=100,
    do_sample=True,
    top_k=100
)

print(answer[0]['generated_text'])

In a world sculpted by visionary ideas, we have a pretty awesome concept art. We have a really good idea of how it will look that way.


Our goal is to create an amazing experience. It will be an amazing way to experience it. It will be an amazing way to experience it.
We have a lot of ideas for how we will look. We want to make them easy to get started.
We're really looking for people that want to help us, but we don't want to talk about the


In [35]:
text = "In a world sculpted by visionary ideas"

print(text)

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=100,
    do_sample=True,
    top_p = 0.5 # Default value is 1
)

print(answer[0]['generated_text'])

In a world sculpted by visionary ideas
In a world sculpted by visionary ideas, the first time I saw a sculpted by a visionary idea, the first time I saw a sculpted by a visionary idea, the first time I saw a sculpted by a visionary idea, the first time I saw a sculpted by a visionary idea, the first time I saw a sculpted by a visionary idea, the first time I saw a sculpted by a visionary idea, the first time I saw a sculpted by a visionary idea, the first time I saw a sculpted


In [37]:
text = "In a world sculpted by visionary ideas"

print(text)

answer = tg_pipeline(
    text,
    pad_token_id=tg_pipeline.tokenizer.eos_token_id,
    max_new_tokens=100,
    do_sample=True,
    top_p = 0.80
)

print(answer[0]['generated_text'])

In a world sculpted by visionary ideas
In a world sculpted by visionary ideas, the British sculptor has been working on a series of films for the past two years, including "The Last Man on Earth", "The Last Man on Earth", "The Last Man on Earth", and "The Last Man on Earth."





















































### Streaming support

You can use the TextStreamer class to stream the output of generate() into your screen, one word at a time

In [38]:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

In [39]:
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")

model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2")

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [40]:
inputs = tokenizer(["An increasing sequence: one,"], return_tensors="pt")

streamer = TextStreamer(tokenizer)

In [41]:
_ = model.generate(**inputs, streamer=streamer, max_new_tokens=30)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


An increasing sequence: one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen,
