# Chapter 4 Text Generation

## Improvements to invoking the model for text generation

* curently greedy search
* Beam search
* temperature
* top-k



https://huggingface.co/blog/introducing-csearch Deterministic methods, e.g. greedy search and beam search, generate text by selecting the text continuation with the highest likelihood measured by the language model. However, as widely discussed in previous studies [3][4], deterministic methods often lead to the problem of model degeneration, i.e., the generated text is unnatural and contains undesirable repetitions.

To address the issues posed by deterministic methods, stochastic methods generate text by introducing randomness during the decoding process. Two widely-used stochastic methods are (i) top-k sampling [3] and (ii) nucleus sampling (also called top-p sampling) [4].

While nucleus sampling can generate text free of repetitions, the semantic coherence of the generated text is not well-maintained. For instance, the generated phrase 'AI is not journalism' is incoherent with respect to the given prefix, i.e. 'DeepMind Company'.

We note that this semantic inconsistency problem can partially be remedied by lowering the temperature. However, reducing the temperature brings nucleus sampling closer to greedy search, which can be seen as a trade-off between greedy search and nucleus sampling. Generally, it is challenging to find a prompt and model-independent temperature that avoids both the pitfalls of greedy search and nucleus sampling.


In [None]:
import torch


def greedy_search(**kwargs):
    logits = kwargs['logits']
    probas = torch.softmax(logits, dim=-1)
    idx_next = torch.argmax(probas, dim=-1, keepdim=True)
    return idx_next


def generate_text(model, idx, max_new_tokens
                  , context_size
                  , search_fn=greedy_search
                  , temperature=1.0):
    """
    Generate output tokens from a given model.
    Arguments:
        model: 
        
            llm model for text generation
        idx:
            Input token tensor
        max_new_tokens:
            Number of output tokens to be generated
        context_size:
            model context window.
    """
    for _ in range(max_new_tokens):
        idx_trim = idx[:,-context_size:]
        
        with torch.no_grad():
            logits = model(idx_trim)
        
        logits = logits[:,-1,:]
        idx_next = search_fn(logits=logits,temperature=temperature)
        
        idx = torch.cat((idx, idx_next), dim=1)
    return idx


In [None]:
def invoke_model(model,tokenizer 
                 ,start_context
                 ,search_fn=greedy_search
                ,temperature=1.0):
    
    assert len(start_context) > 0 \
        and start_context is not None
        
    print(f"Input context: '{start_context}'")
    encoded = tokenizer.encode(start_context)
    encoded_tensor = torch.tensor(encoded).unsqueeze(0)
    model.eval()
    with torch.no_grad():
        out = generate_text(model, encoded_tensor, 5
                            , context_size=50
                            ,search_fn=search_fn
                           ,temperature=temperature)
    
    decoded_text = tokenizer.decode(out.squeeze(0))
    print(f"Decoded text: '{decoded_text}'\n")

In [None]:
tokenizer = get_tokenizer()

for i in range(2):
    start_context = "It is a"
    invoke_model(model,tokenizer,start_context,search_fn=greedy_search)


In [None]:
def probabilistic_search(**kwargs):
    logits = kwargs['logits']
    probas = torch.softmax(logits, dim=-1)
    idx_next = torch.multinomial(probas, num_samples=1)
    return idx_next


In [None]:
for i in range(2):
    start_context = "It is a"
    invoke_model(model,tokenizer,start_context,search_fn=probabilistic_search)

In [None]:
import numpy as np

words = ["a","tree","space"]

logits = np.asarray([0.2,0.11,0.5])
temp_range = np.linspace(0,1,11)

softmax = lambda x: np.exp(x)/sum(np.exp(x))

for temperature in temp_range:
    if temperature > 0:
        b = np.round(logits * 1/temperature,2)
        b_norm = np.round(softmax(b),3)
        print(f"@ Temperature {temperature:.2f} values {b_norm}")

        experiments = 50
        idxs = np.random.multinomial(experiments, b_norm)
        
        for word,choosen_freq in zip(words, idxs):
            print(f"\t{word} choosen {choosen_freq} times out of {experiments} trials")

In [None]:
def temperature_scaling(**kwargs):
    logits = kwargs['logits']
    temperature = kwargs['temperature']
    probas = torch.softmax(logits/temperature, dim=-1)
    idx_next = torch.argmax(probas, dim=-1, keepdim=True)
    return idx_next


In [None]:
for i in range(2):
    start_context = "It is a"
    temperature =0.7
    invoke_model(model,tokenizer,start_context,search_fn=probabilistic_search, temperature=0.7)