![DLI Header](images/DLI_Header.png)

# Random Personas

In this notebook you will learn how to control the **temperature** of the model's generation, giving you a measure of control over how random its responses are. Leveraging **temperature**, you'll be able to create a variety of AI personalities.

## Learning Objectives

By the time you complete this notebook you will be able to:
- Explain how random **sampling** can make for non-deterministic LLM responses.
- Control the degree of randomness in a response by adjusting its **temperature**.

## Video Walkthrough

Execute the cell below to load the video walkthrough of this notebook.

In [None]:
 from IPython.display import HTML

video_url = "https://d36m44n9vdbmda.cloudfront.net/assets/s-fx-12-v1/v2/05-random.mp4"

video_html = f"""
<video controls width="640" height="360">
    <source src="{video_url}" type="video/mp4">
    Your browser does not support the video tag.
</video>
"""

display(HTML(video_html))

## Create LLaMA-2 Pipeline

In [None]:
from transformers import pipeline
model = "TheBloke/Llama-2-13B-chat-GPTQ"
# model = "TheBloke/Llama-2-7B-chat-GPTQ"

llama_pipe = pipeline("text-generation", model=model, device_map="auto");

## Helper Functions

In this notebook we will use the following functions to support our interaction with the LLM. Feel free to skim over them presently, as they are covered in greater detail when used below.

### Generate Model Responses

In [None]:
def generate(prompt, max_length=1024, pipe=llama_pipe, **kwargs):
    """
    Generates a response to the given prompt using a specified language model pipeline.

    This function takes a prompt and passes it to a language model pipeline, such as LLaMA, 
    to generate a text response. The function is designed to allow customization of the 
    generation process through various parameters and keyword arguments.

    Parameters:
    - prompt (str): The input text prompt to generate a response for.
    - max_length (int): The maximum length of the generated response. Default is 1024 tokens.
    - pipe (callable): The language model pipeline function used for generation. Default is llama_pipe.
    - **kwargs: Additional keyword arguments that are passed to the pipeline function.

    Returns:
    - str: The generated text response from the model, trimmed of leading and trailing whitespace.

    Example usage:
    ```
    prompt_text = "Explain the theory of relativity."
    response = generate(prompt_text, max_length=512, pipe=my_custom_pipeline, temperature=0.7)
    print(response)
    ```
    """

    def_kwargs = dict(return_full_text=False, return_dict=False)
    response = pipe(prompt.strip(), max_length=max_length, **kwargs, **def_kwargs)
    return response[0]['generated_text'].strip()

### Costruct Prompt, Optionally With System Context and/or Examples

In [None]:
def construct_prompt_with_context(main_prompt, system_context="", conversation_examples=[]):
    """
    Constructs a complete structured prompt for a language model, including optional system context and conversation examples.

    This function compiles a prompt that can be directly used for generating responses from a language model. 
    It creates a structured format that begins with an optional system context message, appends a series of conversational 
    examples as prior interactions, and ends with the main user prompt. If no system context or conversation examples are provided,
    it will return only the main prompt.

    Parameters:
    - main_prompt (str): The core question or statement for the language model to respond to.
    - system_context (str, optional): Additional context or information about the scenario or environment. Defaults to an empty string.
    - conversation_examples (list of tuples, optional): Prior exchanges provided as context, where each tuple contains a user message 
      and a corresponding agent response. Defaults to an empty list.

    Returns:
    - str: A string formatted as a complete prompt ready for language model input. If no system context or examples are provided, returns the main prompt.

    Example usage:
    ```
    main_prompt = "I'm looking to improve my dialogue writing skills for my next short story. Any suggestions?"
    system_context = "User is an aspiring author seeking to enhance dialogue writing techniques."
    conversation_examples = [
        ("How can dialogue contribute to character development?", "Dialogue should reveal character traits and show personal growth over the story arc."),
        ("What are some common pitfalls in writing dialogue?", "Avoid exposition dumps in dialogue and make sure each character's voice is distinct.")
    ]

    full_prompt = construct_prompt_with_context(main_prompt, system_context, conversation_examples)
    print(full_prompt)
    ```
    """
    
    # Return the main prompt if no system context or conversation examples are provided
    if not system_context and not conversation_examples:
        return main_prompt

    # Start with the initial part of the prompt including the system context, if provided
    full_prompt = f"<s>[INST] <<SYS>>{system_context}<</SYS>>\n" if system_context else "<s>[INST]\n"

    # Add each example from the conversation_examples to the prompt
    for user_msg, agent_response in conversation_examples:
        full_prompt += f"{user_msg} [/INST] {agent_response} </s><s>[INST]"

    # Add the main user prompt at the end
    full_prompt += f"{main_prompt} [/INST]"

    return full_prompt

## Default Non-random Responses

You may have noticed in previous notebooks that the responses we are getting back from our LLaMA-2 model, assuming we don't edit the prompt, are deterministic. Let's take an explicit look at this.

Here we are going to use our model for another generative task, this time to generate fictitious customer experiences with their Galaxy Rider mountain bike. Here we set the **system context** appropriately, and prompt the model to recall a memorable day on their bike.

In [None]:
system_context = """
You are a perennially satisfied customer who loves to reminisce about personal experiences with products. \
You never delve into technical specifics, as you believe it's the emotion and the joy that matter most. \
You're excited for others to feel the same euphoria and happiness you do. Your aim isn't to advertise, \
but to share a genuine, heartfelt story of joy and contentment.
"""

prompt = "Recall a memorable day out with your Galaxy Rider mountain bike in 50 words or less."

In [None]:
print(generate(construct_prompt_with_context(prompt, system_context)))

---

Let's generate a response a couple more times with the exact same prompt.

In [None]:
print(generate(construct_prompt_with_context(prompt, system_context)))

In [None]:
print(generate(construct_prompt_with_context(prompt, system_context)))

---

As stated, the model is generating the exact same response every time. For many scenarios this is exactly the behavior we would like, but for others, we would like to introduce a degree of randomness in to the model's responses. To accomplish this we will modify the **temperature** of the model's responses.

## Sampling and Temperature

In the realm of language models, **sampling** is the process by which a model generates text by sampling from the probability distribution of potential next words (tokens, actually, but for our purposes we can think of tokens as words). When we interact with language models without enabling **sampling**, the model operates deterministically, consistently selecting the most probable next word when generating text. This default behavior is useful when you need consistency and accuracy, but it can be limiting when you're aiming for creative and varied responses.

Within the context of the `transformers` pipeline we are using to interact with our LLaMA-2 model, **sampling** is *disabled* by default. To enable sampling we set `do_sample=True` when calling our `generate` function, and in doing so, instruct the model to pick words based on a probability distribution, allowing less likely words to be chosen, which can result in more diverse and interesting text.

Once **sampling** is enabled, we can also pass in specific values for **temperature**, which you can think of as the degree of randomness in the response. For `temperature` we pass in a value between `0.0` and `1.0`, with larger values indicating a larger degree of randomness.

## Exercise: Random Responses

Reusing the same **system context** and prompt as above, enable **sampling** (`do_sample=True`) and set the **temperature** to its highest value (`temperature=1.0`). Generate 3 different responses to check that they are each unique.

If you get stuck, see the solution below.

## Your Work Here

## Solution

In [None]:
print(generate(construct_prompt_with_context(prompt, system_context), do_sample=True, temperature=1))

In [None]:
print(generate(construct_prompt_with_context(prompt, system_context), do_sample=True, temperature=1))

In [None]:
print(generate(construct_prompt_with_context(prompt, system_context), do_sample=True, temperature=1))

## Creativity and Accuracy

As a final thought on `temperature` it's worth mentioning that random generation, while helpful in the context of unique or creative outputs, is not particularly in line with precision and accuracy of responses. In contexts where it is essential that the model is generating precise and/or factually accurate responses, always take care to check its outputs when increasing its temperature.

## Key Concept Review

The following key concepts were introduced in this notebook:

- **Sampling:** A process in text generation where the language model selects the next word (token) based on a probability distribution over the vocabulary.
- **Temperature:** A hyperparameter that controls the level of randomness in the model's predictions during sampling. A higher temperature increases diversity, leading to more varied and creative responses, while a lower temperature makes the model's output more predictable and conservative.

## Optional Advanced Exercises

If you'd like to go above and beyond the requirements of the course, below are some additional open-ended exercises for you to try.

### Use the 7B Model

At the top of the notebook, after restarting the kernel (see cell below), uncomment and use the 7B model instead of the 13B model we demoed. Try to get satisfying results in spite of using the smaller (weaker) model.

### Make Interacting Characters

Now that you are able to generate statements from fictitous people, try to create a small system that creates more than one distinct personality and makes them interact with each other.

### Make Interacting Characters Play a Game

Extend the previous exercise to create more than one character who are working toward an end goal defined by some "game" you create. This could be trying to get the other character to say a certain word, give away a secret location to a treasure they've hidden, or even something collaborative whereby the character's have to work together to acheive some goal. If you're really up for a challenge you might even consider creating more than 2 characters, or even teams of players interacting with each other.

## Restart the Kernel

In order to free up GPU memory for the next notebook, please run the following cell to restart the kernel.

In [None]:
from IPython import get_ipython

get_ipython().kernel.do_shutdown(restart=True)

![DLI Header](images/DLI_Header.png)