![DLI Header](images/DLI_Header.png)

# Iterative Prompt Development

In this notebook we warm up by iterating on a set of simple prompts, familiarizing ourselves with the `transformers` pipeline and the LLaMA-2 model we will be using throughout the course.

By iteratively experimenting with seemingly simple prompts, we will begin to see the importance of creating prompts that are **specific**, provide **cues** and will also learn about how to give the model **"time to think"** when given tasks that it might find challenging.

## Learning Objectives

By the time you complete this notebook you will be able to:
- Use a `transformers` pipeline to generate responses from a LLaMA-2 LLM.
- Craft prompts that are **specific**.
- Craft prompts that give the model **"time to think"**.
- Provide **cues** to the model to guide its response.

## Video Walkthrough

Execute the cell below to load the video walkthrough of this notebook.

In [None]:
 from IPython.display import HTML

video_url = "https://d36m44n9vdbmda.cloudfront.net/assets/s-fx-12-v1/v2/02-prompting.mp4"

video_html = f"""
<video controls width="640" height="360">
    <source src="{video_url}" type="video/mp4">
    Your browser does not support the video tag.
</video>
"""

display(HTML(video_html))

## Create LLaMA-2 Pipeline

In [None]:
from transformers import pipeline
model = "TheBloke/Llama-2-13B-chat-GPTQ"

llama_pipe = pipeline("text-generation", model=model, device_map="auto");

## Helper Functions

In this notebook we will use the following function to support our interaction with the LLM.

### Generate Model Responses

In [None]:
def generate(prompt, max_length=1024, pipe=llama_pipe, **kwargs):
    """
    Generates a response to the given prompt using a specified language model pipeline.

    This function takes a prompt and passes it to a language model pipeline, such as LLaMA, 
    to generate a text response. The function is designed to allow customization of the 
    generation process through various parameters and keyword arguments.

    Parameters:
    - prompt (str): The input text prompt to generate a response for.
    - max_length (int): The maximum length of the generated response. Default is 1024 tokens.
    - pipe (callable): The language model pipeline function used for generation. Default is llama_pipe.
    - **kwargs: Additional keyword arguments that are passed to the pipeline function.

    Returns:
    - str: The generated text response from the model, trimmed of leading and trailing whitespace.

    Example usage:
    ```
    prompt_text = "Explain the theory of relativity."
    response = generate(prompt_text, max_length=512, pipe=my_custom_pipeline, temperature=0.7)
    print(response)
    ```
    """

    def_kwargs = dict(return_full_text=False, return_dict=False)
    response = pipe(prompt.strip(), max_length=max_length, **kwargs, **def_kwargs)
    return response[0]['generated_text'].strip()

## The Capital of California

Let's begin with a very simple prompt, which we will pass to our `generate` function in order to get a response back from the LLaMA-2 model we are using. In this series of prompts we are interested for the model to respond to us with the capital of the state of California, which is *Sacramento*.

Our hope for this experiment is to get back the exact response `"Sacramento"` without anything else in the repsonse.

In [None]:
prompt = "What is the capital of California?"

print(generate(prompt))

---

The model did not understand that we only wanted the name of the capital city, without any other context, so let's craft a prompt that is more **specific**.

In [None]:
prompt = "What is the capital of California? Only answer this question and do so in as few a words as possible."

print(generate(prompt))

---

That is an improvement, but we are still getting a leading `Answer: ` in the response. Let's try to prevent this behavior by providing the model with the **cue** `Answer: `. Doing so may prevent the model from providing that text itself.

In [None]:
prompt = "What is the capital of California? Only answer this question and do so in as few a words as possible. Answer: "

print(generate(prompt))

## Vowels in Sacramento

In the following section we try to get the model to do something a little more complicated: tell us all the vowels in the name of the capital of California.

The correct answer is S**a**cr**a**m**e**nt**o** -> **aaeo** -> **aeo**. It's worth noting that in order to make this easy on myself (and you) I performed multiple steps to arrive at my answer.

In [None]:
prompt = "Tell me the vowels in the capital of California."

print(generate(prompt))

---

When models are faced with the need to reason in a way that requires multiple steps, it often helps to construct a prompt requesting that the model perform multiple intermediary steps, almost like asking it to show its work. This technique is often referred to as giving the model **"time to think"**.

The prompt below aims for the same end result, but asks the model to take the intermediate step of responding with the capital of California, before then responding with the vowels in it.

In [None]:
prompt = "Tell me the capital of California, and then tell me all the vowels in it."

print(generate(prompt))

---

Now that we see the effectiveness of giving the model **"time to think"**, let's try again with a slightly more complicated task: the vowels in the capital of California in reverse alphabetical order.

The correct answer is S**a**cr**a**m**e**nt**o** -> **aaeo** -> **aeo** -> **oea**

In [None]:
prompt = "Tell me the vowels in the capital of California in reverse alphabetical order?"

print(generate(prompt))

---

In order to assist the model, let's again give it **"time to think"** by prompting it to break down the task into intermediate steps and show its work.

In [None]:
prompt = "Tell me the capital of California, and then tell me all the vowels in it, then tell me the vowels in reverse-alphabetical order."

print(generate(prompt))

## Exercise

While LLMs aren't necessarily the best tool for performing math, as an exercise, generate a response from the prompt below, which intends to get the product of multiplying 23 and 34, and then iteratively develop a prompt which results in your getting the correct answer. Be sure to consider how you can be **precise** in your prompt, and also, provide an opportunity for the model to have **"time to think"**.

If you get stuck, a solution is provided below.

In [None]:
23*34 # Show the actual answer

In [None]:
prompt = "23x34" # While you and I understand the intention of this prompt, to the model it is not at all **precise**

print(generate(prompt))

### Your Work Here

### Solution

Click on the `...` to see a working solution.

In [None]:
prompt = "Calculate the product of 23 and 34. Use the steps typical of long multiplication and show your work."

print(generate(prompt))

## Key Concept Review

The following key concepts were introduced in this notebook:

- **Precise**: Being as explicit as necessary to guide the response of an LLM.
- **Cue**: A conclusion to a prompt that guides its response, often to prevent it from including the cue itself in its response.
- **"Time to think"**: A quality in prompts that supports LLM responses (often requiring calculation) by asking for the model to take multiple steps and show its work.

## Restart the Kernel

In order to free up GPU memory for the next notebook, please run the following cell to restart the kernel.

In [None]:
from IPython import get_ipython

get_ipython().kernel.do_shutdown(restart=True)

![DLI Header](images/DLI_Header.png)