# Text Generation
***
## Table of Contents
***

In [1]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

## 1. Introduction


## 2. Device Agnostic Code
Mac GPU acceleration (`mps` backend) delivers significant speed-up over CPU for deep learning tasks, especially for large models and batch sizes. On Windows, `cuda` is used instead of `mps`.

In [2]:
# DEVICE = torch.device(
#     device="cuda" if torch.cuda.is_available() else "cpu"
# )  # For Windows
DEVICE = torch.device(
    device="mps" if torch.backends.mps.is_available() else "cpu"
)  # For MacOS
DEVICE

device(type='mps')

## 3. Loading Pre-Trained Model
### GPT-2

In [None]:
MODEL_NAME = "gpt2-large"
MAX_TOKENS = 50
MODEL = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="auto")
TOKENISER = AutoTokenizer.from_pretrained(MODEL_NAME, padding_side="left")

In [4]:
MODEL.generation_config.pad_token_id = MODEL.generation_config.eos_token_id

In [5]:
input_text = "Hello! Today I am"

model_inputs = TOKENISER(input_text, return_tensors="pt").to(DEVICE)

In [6]:
model_inputs

{'input_ids': tensor([[15496,     0,  6288,   314,   716]], device='mps:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1]], device='mps:0')}

## 4. Basic Generation Strategies
### Greedy Search
Greedy Search is the default setting of decoding strategy used by `.generate()`. At each step, it selects the token with the highest probability as the next token. This method is simple and fast, thus is suitable for generating short text. However, for longer text, it can lead to repetitive and less diverse sequences.

By default, greedy search generates up to 20 new tokens unless specified in `GenerationConfig`.

In [7]:
generated_ids = MODEL.generate(**model_inputs)
print(TOKENISER.batch_decode(generated_ids, skip_special_tokens=True)[0])

Hello! Today I am going to show you how to make a simple and easy to use, but very useful, tool for


### Sampling
Sampling selects a next token randomly based on the probability distribution over the entire vocabulary of the model. This reduces repetition and can generate more creative, diverse outputs compared to the greedy search strategy.

Sampling is enabled by setting the parameters: `do_sample=True` and `num_beams=1`.

In [10]:
generated_ids = MODEL.generate(**model_inputs, do_sample=True, num_beams=1)
print(TOKENISER.batch_decode(generated_ids, skip_special_tokens=True)[0])

Hello! Today I am pleased to share with you a great article from World Chess that is a great read for anyone who is


### Beam Search
Beam Search maintains multiple candidate sequences (beams) simultaneously. At each step, it expands each beam by selecting tokens, then retains the top $k$ beams based on cumulative (overall) probability score. This strategy is suited for input-grounded tasks such as image captioning or speech recognition.

Beam search is enabled by setting `num_beams > 1`, optionally combined with `do_sample = True`.

In [None]:
generated_ids = MODEL.generate(
    **model_inputs, max_new_tokens=MAX_TOKENS, num_beams=5, do_sample=False
)
print(TOKENISER.batch_decode(generated_ids, skip_special_tokens=True)[0])

Hello! Today I am going to show you how to create a simple web application using ASP.NET MVC 4 and ASP.NET Web API.

I am going to show you how to create a simple web application using ASP.NET MVC 4 and ASP.NET Web API.

I am going to show you how to create a simple web application using ASP.NET MVC 4 and ASP.NET Web API.

I am going to show you how to create a simple web application using ASP.NET MVC 4 and ASP.NET Web API.

I am going to show you how to create


### Nucleus Sampling (Top-p Sampling)
Instead of selecting from the entire vocabulary, Nucleus Sampling samples from the smallest set of tokens whose cumulative probability exceeds the threshold $p$. This introduces controlled randomness, resulting in more diverse and creative text generation.

In [None]:
generated_ids = MODEL.generate(
    **model_inputs, max_new_tokens=MAX_TOKENS, do_sample=True, top_p=0.9
)
print(TOKENISER.batch_decode(generated_ids, skip_special_tokens=True)[0])

Hello! Today I am going to show you how to install a simple but very simple web application in just a few minutes, that's why I call it a simple but very simple application. In this tutorial, we will not get into server-side frameworks or anything like that. It is not that kind of a web application. It is an interactive one and a web framework. It is not that kind of a web application. It is an interactive one and a web framework.

Let's start the tutorial with the basics. It is really easy to start with a simple web application, because you don't need any other knowledge to
