![DLI Header](images/DLI_Header.png)

# Star Bikes Product Review Analyst

In this notebook you'll build an AI-powered document analyst, capable of performing **sentiment analysis** and generating data for downstream tasks. You will learn how to employ **few-shot learning** with a language model by providing it with instructive examples.

## Learning Objectives

By the time you complete this notebook you will be able to:
- Perform **sentiment analysis** on unstructured text using LLaMA-2.
- Explain what the LLaMA-2 **prompt template** is, and how it was used during **instruction fine-tuning**.
- Guide and improve model performance using **few-shot learning**.
- Use LLaMA-2 to generate JSON data for potential use in downstream processing tasks.

## Video Walkthrough

Execute the cell below to load the video walkthrough of this notebook.

In [None]:
 from IPython.display import HTML

video_url = "https://d36m44n9vdbmda.cloudfront.net/assets/s-fx-12-v1/v2/03-analyst.mp4"

video_html = f"""
<video controls width="640" height="360">
    <source src="{video_url}" type="video/mp4">
    Your browser does not support the video tag.
</video>
"""

display(HTML(video_html))

## Create LLaMA-2 Pipeline

In [None]:
from transformers import pipeline
model = "TheBloke/Llama-2-13B-chat-GPTQ"
# model = "TheBloke/Llama-2-7B-chat-GPTQ"

llama_pipe = pipeline("text-generation", model=model, device_map="auto");

## Helper Functions

In this notebook we will use the following functions to support our interaction with the LLM. Feel free to skim over them presently, as they are covered in greater detail when used below.

### Generate Model Responses

In [None]:
def generate(prompt, max_length=1024, pipe=llama_pipe, **kwargs):
    """
    Generates a response to the given prompt using a specified language model pipeline.

    This function takes a prompt and passes it to a language model pipeline, such as LLaMA, 
    to generate a text response. The function is designed to allow customization of the 
    generation process through various parameters and keyword arguments.

    Parameters:
    - prompt (str): The input text prompt to generate a response for.
    - max_length (int): The maximum length of the generated response. Default is 1024 tokens.
    - pipe (callable): The language model pipeline function used for generation. Default is llama_pipe.
    - **kwargs: Additional keyword arguments that are passed to the pipeline function.

    Returns:
    - str: The generated text response from the model, trimmed of leading and trailing whitespace.

    Example usage:
    ```
    prompt_text = "Explain the theory of relativity."
    response = generate(prompt_text, max_length=512, pipe=my_custom_pipeline, temperature=0.7)
    print(response)
    ```
    """

    def_kwargs = dict(return_full_text=False, return_dict=False)
    response = pipe(prompt.strip(), max_length=max_length, **kwargs, **def_kwargs)
    return response[0]['generated_text'].strip()

### Create Prompt With Examples (Few-shot Learning)

In [None]:
def prompt_with_examples(prompt, examples=[]):
    """
    Constructs a structured prompt string for language models with instructional examples.

    This function takes an initial prompt and a list of example prompt-response pairs, then 
    formats them into a single string enclosed by special start and end tokens used for 
    instructing the model. Each example is included in the final prompt, which could be 
    beneficial for models that take into account the context provided by examples.

    Parameters:
    - prompt (str): The main prompt to be processed by the language model.
    - examples (list of tuples): A list where each tuple contains a pair of strings 
      (example_prompt, example_response). Default is an empty list.

    Returns:
    - str: A string with the structured prompt and examples formatted for a language model.
    
    Example usage:
    ```
    main_prompt = "Translate the following sentence into French:"
    example_pairs = [("Hello, how are you?", "Bonjour, comment ça va?"),
                     ("Thank you very much!", "Merci beaucoup!")]
    formatted_prompt = prompt_with_examples(main_prompt, example_pairs)
    print(formatted_prompt)
    ```
    """
    
    # Start with the initial part of the prompt
    full_prompt = "<s>[INST]\n"

    # Add each example to the prompt
    for example_prompt, example_response in examples:
        full_prompt += f"{example_prompt} [/INST] {example_response} </s><s>[INST]"

    # Add the main prompt and close the template
    full_prompt += f"{prompt} [/INST]"

    return full_prompt

## Data - Starlight Cruiser Reviews

The following are customer reviews for a the Starlight Cruiser bicycle, a model offered by the fictitious bike company Star Bikes. We will be providing these reviews to the LLM for it to perform **sentiment analysis** and other analytical tasks.

**Neutral Review**

In [None]:
review = """
I recently purchased the Starlight Cruiser from Star Bikes, and I've been thoroughly impressed. \
The ride is smooth and it handles urban terrains with ease. \
However, I did find the seat a bit uncomfortable for longer rides. \
Also, the color options could be better. Despite these minor drawbacks, \
the build quality and the performance of the bike are commendable. It's a good value for the money.
"""

**Negative Review**

In [None]:
review_negative = """
Got the Starlight Cruiser last week, and I'm a bit disappointed. \
The brakes are not as responsive as I'd like and the gears often get stuck. \
The design is good but performance-wise, it leaves much to be desired.
"""

**Positive Review**

In [None]:
review_positive = """
I recently purchased the Starlight Cruiser from Star Bikes, and I've been thoroughly impressed. \
The ride is smooth and it handles urban terrains with ease. \
The seat was very comfortable for longer rides and the color options were great. \
The build quality and the performance of the bike are commendable. It's a good value for the money.
"""

## Sentiment Analysis

We will begin by asking the model to perform **sentiment analysis** by telling us the overall sentiment of one of our reviews. Ultimately we would like the model to provide us with a response of just a single word, either `positive`, `negative`, or `neutral`.

Just as a heads up, the following cell is going to result in a very strange output.

In [None]:
prompt = f"""
What is the overall sentiment of {review}
"""

print(generate(prompt))

---

It's not totally clear why the model gave us such junk with the prompt above, but it makes clear the importance of working on prompts iteratively, and brings again to the forefront the idea of **precision**. Let's make a very small update to the prompt by adding a `?` to the end, which should make more clear to the model that we are asking it a question we are hoping to get a response to.

In [None]:
prompt = f"""
What is the overall sentiment of {review}?
"""

print(generate(prompt))

---

The `?` made a huge difference! This example serves to remind us that minor tweaks to prompts can sometimes lead to drastic changes in the model's responses.

Given that our goal here is to get a single word response back from the model, let's iterate on the prompt to be more **specific** about the response we would like.

In [None]:
prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "positive", "negative", or "neutral".
"""

print(generate(prompt))

---

The model continues to be more helpful than we would like, providing us with more than just the single word response we are aiming for. Let's iterate on the prompt one more time by providing it a **cue** for its response.

In [None]:
prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "positive", "negative", or "neutral".
Overall Sentiment: 
"""

print(generate(prompt))

---

That's much better. However, when I read the review myself, I have to say, I might categorize the review as "neutral" rather than "positive". Here's the review again for reference:

In [None]:
review = """
I recently purchased the Starlight Cruiser from Star Bikes, and I've been thoroughly impressed. \
The ride is smooth and it handles urban terrains with ease. \
However, I did find the seat a bit uncomfortable for longer rides. \
Also, the color options could be better. Despite these minor drawbacks, \
the build quality and the performance of the bike are commendable. It's a good value for the money.
"""

Again, to emphasize how minor changes can drastically impact a model's behavior, let's slightly modify the prompt by changing the order of the options the model has to choose from.

In [None]:
# NOTE: 'Choose one of "neutral", "negative", or "positive".' is in a different order than the prompt immediately above
prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "neutral", "negative", or "positive".
Overall Sentiment: 
"""

print(generate(prompt))

---

Just to conclude this minor experiment, let's try yet another order.

In [None]:
prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "negative", "neutral", or "positive".
Overall Sentiment: 
"""

print(generate(prompt))

---

Our prompt does not currently give us confidence that we are going to get meaningful responses from the model. In order to get more reliable, trustworthy responses, let's turn our attention to an important technique, **few-shot learning**, which will allow us to provide instructive examples to the model about how it ought to behave.

## Provide Examples: Few-shot Learning

Depending on the number of examples provided, the following technique is referred to as **one-shot learning**, **two-shot learning**, **three-shot learning** (etc.), **many-shot learning** and **one-to-many shot learning**. In each case a **shot** is an example prompt/response cycle provided to the model to help guide its behavior.

The shots are typically prepended to the main prompt we wish the model to generate a response for. Depending on the model that is being used, there are specific ways to format our shots that will help the model understand that what we are providing it are prompt/response examples.

## Instruction Fine-tuned (Chat) Models

If you have ever looked at model repositories, say on Hugging Face, you may have noticed that there are [some models that are of the `-chat` variant](https://huggingface.co/models?sort=trending&search=meta-llama%2FLlama-2-13b). These are models (for example, `Llama-2-13b-chat`) that have had additional training on top of base pretrained models (for example `Llama-2-13b`), typically in order to make them better at following instructions, in support of use in chat applications. They have been **instruction fine-tuned**.

While pretrained LLM base models are typically well-suited for generating text by understanding the probability of what would come next given some input, simply providing the next-most-likely text is not the same as responding to a question or instruction.

During **instruction fine-tuning**, models are trained on a vast amount of example interactions between a user and a model so that the model can learn how to appropriately follow instructions or respond appropriately in a chat dialogue context.

## The LLaMA-2 Prompt Template

Depending on the instruction fine-tuned model (that is, the chat variant), the example interactions that it was **instruction fine-tuned** on will be formatted in different ways. The template used for this training process is called the **prompt template**. You can typically find a given model's prompt template in its documentation.

Below is the LLaMA-2 prompt template. It is actually a slightly simplified version, excluding a component that we will be looking at later in the course.

```python
<s>[INST] {{ user_msg_1 }} [/INST] {{ model_answer_1 }} </s>
```

Let's disect this **prompt template**.
- A single user/model interaction is contained between the `<s>` and `</s>` tags.
- The user part of the user/model interaction is contained between the `[INST]` and `[/INST]` tags.
- The model part of the user/model interaction follows the `[/INST]` tag and ends at the interaction-concluding `</s>` tag.

During **instruction fine-tuning**, the model was given many user/model examples using this **prompt template**. With that in mind, we can leverage its training and use this same **prompt template** to provide our own instructive examples to the model for how it ought to behave.

## Providing Instructive Examples

The following `prompt_with_examples` function will help us constuct prompts that include prepended instructive examples, using the LLaMA-2 **prompt template**.

In [None]:
def prompt_with_examples(prompt, examples=[]):
    """
    Constructs a structured prompt string for language models with instructional examples.

    This function takes an initial prompt and a list of example prompt-response pairs, then 
    formats them into a single string enclosed by special start and end tokens used for 
    instructing the model. Each example is included in the final prompt, which could be 
    beneficial for models that take into account the context provided by examples.

    Parameters:
    - prompt (str): The main prompt to be processed by the language model.
    - examples (list of tuples): A list where each tuple contains a pair of strings 
      (example_prompt, example_response). Default is an empty list.

    Returns:
    - str: A string with the structured prompt and examples formatted for a language model.
    
    Example usage:
    ```
    main_prompt = "Translate the following sentence into French:"
    example_pairs = [("Hello, how are you?", "Bonjour, comment ça va?"),
                     ("Thank you very much!", "Merci beaucoup!")]
    formatted_prompt = prompt_with_examples(main_prompt, example_pairs)
    print(formatted_prompt)
    ```
    """
    
    # Start with the initial part of the prompt
    full_prompt = "<s>[INST]\n"

    # Add each example to the prompt
    for example_prompt, example_response in examples:
        full_prompt += f"{example_prompt} [/INST] {example_response} </s><s>[INST]"

    # Add the main prompt and close the template
    full_prompt += f"{prompt} [/INST]"

    return full_prompt

## One-Shot Learning Example

Let's briefly step away from our sentiment analysis task, and use a simple text generation prompt to explore how we can use the `prompt_with_examples` function to provide an instructive example, or put another way, perform **one-shot learning**.

In [None]:
example_prompt = "Give me an all uppercase color that starts with the letter 'O'."
example_response = "ORANGE"

# The function expects prompt/response pairs to be 2-tuples
example_1 = (example_prompt, example_response)

# The function expects all prompt/response 2-tuples to be in a list
examples = [example_1]

# This is the "main" prompt we actually want the model to respond to
prompt = "Give me an all uppercase color that starts with the letter 'P'."

# Use `prompt_with_examples` to create a one-shot learning prompt, with a single example prepended to our main prompt
prompt_with_one_example = prompt_with_examples(prompt, examples)

In [None]:
print(prompt_with_one_example)

---

`prompt_with_one_example` above includes a single user/model interaction (`<s>...</s>`), using the LLaMA-2 **prompt template**, prepended to the main prompt. Note that the main prompt only includes the user part of the interaction (between the `[INST]` and `[/INST]` tags) and leaves the rest of the interaction (the model's response and the `</s>` tag) for the model to complete.

Before using `prompt_with_one_example` let's see what kind of response we get back from the model passing in just the main prompt, without an instructive example.

In [None]:
print(generate(prompt))

---

We see that while we got `PURPLE` we also got additional chat-like response prior to the output that we want.

Now let's get a response using `prompt_with_one_example`, which contains an example of how the model should response with only the word for the color we are interested in it generating.

In [None]:
print(generate(prompt_with_one_example))

## Sentiment Analysis With Examples

Since when we left off with our sentiment analysis task, we lacked confidence about whether the model would correctly label a neutral review, let's apply what we just learned about **one-shot learning** to provide our model with an instructive example of responding to what, as a human, we would consider a neutral review.

Here is an example we would like classified as a neutral review, which while clearly not negative contains both positive and negative sentiments about the bike.

In [None]:
example_neutral_review = """
I've had the chance to put several miles on my new Starlight Cruiser from Star Bikes. 
First off, the bike's design is sleek, and it provides an exceptionally stable ride, 
even when navigating the bustle of city streets. The gear shifting is fluid, 
and the bike feels robust, promising longevity. On the downside, the braking system, 
while reliable, lacks the responsiveness I've experienced with other bikes. 
I also noticed that the handlebar grips can become rather uncomfortable on prolonged journeys. 
Nevertheless, these issues aside, the bike offers impressive performance for its price range, 
making it a solid, middle-of-the-road choice for both commuting and leisure rides.
"""

We will use the example review to construct an `examples` list we can pass into the `prompt_with_examples` function.

In [None]:
example_prompt_neutral = f"""
What is the overall sentiment of this review {example_neutral_review}?

Choose one of "negative", "neutral", or "positive".
Overall Sentiment: 
"""
example_response_neutral = "neutral"

example_neutral = (example_prompt_neutral, example_response_neutral)
examples = [example_neutral]

Now we construct the main prompt, which again uses the review from above that we hope to be classified as neutral.

In [None]:
prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "negative", "neutral", or "positive".
Overall Sentiment: 
"""

prompt_with_one_example = prompt_with_examples(prompt, examples)

In [None]:
print(generate(prompt_with_one_example))

---

Unfortunately, the model is still labeling the review as `"Positive"`.

## Two-shot Learning

Often, when **one-shot learning** is not sufficient to get the kind of behavior we would like, we can include additional examples to further support the model's behavior.

In our case, in addition to the neutral example we have given the model, let's also provide it with an example of a positive review in hopes that it will be more clear about the difference between the two.

In [None]:
example_review_positive = """
I've been absolutely delighted with my Starlight Cruiser purchase from Star Bikes. 
The bike exudes a charm with its sleek design that turns heads as I glide through city lanes. 
It's not just about looks though; the bike performs wonderfully. The gears shift like a dream, 
making for a ride that's as smooth as silk across various urban terrains. I was initially skeptical 
about the comfort of the seat, but it proved to be pleasantly supportive, even on my longer weekend adventures. 
While the color choices were limited, I found one that suited my style perfectly. 
Any minor imperfections pale in comparison to the bike's overall quality and the sheer joy it brings to my daily commutes. 
For the price, the Starlight Cruiser is an undeniable gem that I would happily recommend.
"""

In [None]:
example_prompt_positive = f"""
What is the overall sentiment of this review {example_review_positive}?

Choose one of "negative", "neutral", or "positive".
Overall Sentiment: 
"""
example_response_positive = "positive"

In [None]:
main_prompt = f"""
What is the overall sentiment of this review {review}?

Choose one of "negative", "neutral", or "positive".
Overall Sentiment: 
"""

### Exercise: Perform Two-shot Learning

Perform **two-shot learning**, providing the model with both a neutral and a positive example interaction before prompting it for a response to `review` which we are hoping it will classify as `neutral`.

- Use `example_neutral` (already defined above) as one example.
- Use `example_review_positive`, `example_prompt_positive` and `example_response_positive` above to construct a positive user/model interaction example.
- Use both examples (neutral and positive) along with `main_prompt` above, to construct a prompt with two examples (using the `prompt_with_examples` function).
- Generate and print a model response using your prompt with two examples.

If you get stuck, there is a working solution below.

### Your Work Below

### Solution

Click on the ... to see a working solution.

In [None]:
example_positive = (example_prompt_positive, example_response_positive)
examples = [example_neutral, example_positive]

prompt_with_two_examples = prompt_with_examples(main_prompt, examples)

print(generate(prompt_with_two_examples))

## Generate Data for Downstream Consumption

Now that our model is able to perform **sentiment analysis** effectively, let's extend its analysis capabilities to be able to generate JSON objects for downstream consumption that contain a given review's positive and negative points.

We will begin iterating on a prompt by simply asking the model to separately list out the positive and negative points in a review.

In [None]:
prompt = f"""
From the review, list the positive points and negative points separately: {review}
"""

print(generate(prompt))

---

The model did quite well. Let's iterate now to try to get the model to produce a JSON object for us.

In [None]:
prompt = f"""
From the review, list down the positive points and negative points separately, in JSON: {review}
"""

print(generate(prompt))

---

That did not appear to make much of a difference. Let's be more **precise** in our prompt about how we want the data formatted. (Note: we must use double curly braces `{{` and `}}` rather than single curly braces `{` and `}` below because we are using a Python f-string, which interprets single curly braces as placeholders for Python values to be interpolated.)

In [None]:
prompt = f"""
From the review below, list down the positive points and negative points separately, in JSON. Use the following format:

{{"positive": [], "negative": []}}

Review: {review}
"""

print(generate(prompt))

---

This also did not seem to make much of a difference.

## Exercise: Successful JSON Generation

Using what you have learned thus far, successfully complete our task to get the model to create JSON outputs. In case you'd like to provide the model with instructive examples, two example reviews, along with their corresponding JSON outputs have been provided to you below.

Also below is a `pretty_print_json` helper function which you should be able to pass the response from the LLM. If the response if valid JSON, the function will print it with nice indenting.

If you get stuck, there are 2 solutions below. They include several cells which are currently hidden. You can display them by clicking on the `+ N cells hidden` buttons.

In [None]:
example_reviews = [
"""\
I recently purchased the Starlight Cruiser from Star Bikes, and I've been thoroughly impressed. \
The ride is smooth and it handles urban terrains with ease. \
However, I did find the seat a bit uncomfortable for longer rides. \
Also, the color options could be better. Despite these minor drawbacks, \
the build quality and the performance of the bike are commendable. It's a good value for the money.\
""",
"""\
Got the Starlight Cruiser last week, and I'm a bit disappointed. \
The brakes are not as responsive as I'd like and the gears often get stuck. \
The design is good but performance-wise, it leaves much to be desired.\
"""
]

In [None]:
example_outputs = [
    {
        "positive": ["smooth ride", "ease of handling urban terrains", "good value for the money"],
        "negative": ["seat uncomfortable for longer rides", "limited color options"]
    },
    {
        "positive": ["good design"],
        "negative": ["brakes not repsonsive", "gears often get stuck", "performance leaves much to be desired"]
    }
]

In [None]:
def pretty_print_json(json_string):
    print(json.dumps(json.loads(json_string), indent=4))

### Solution 1

Create a list of prompt/response 2-tuple examples. Note, in this implementation we are simply using the review as the prompt, without any additional input.

In [None]:
examples = [(example_review, json.dumps(example_output)) for example_review, example_output in zip(example_reviews, example_outputs)]

Check the formatting of our examples.

In [None]:
examples

---

Use `review` as our prompt.

In [None]:
prompt = review

Generate a response.

In [None]:
json_response = generate(prompt_with_examples(prompt, examples))

In [None]:
pretty_print_json(json_response)

### Solution 2

While *Solution 1* demonstrates an effective use of **two-shot learning**, in line with this notebook's objectives, it's worth mentioning I was also able to get a working solution by adding a **cue** of `JSON:` to the prompt we were iterating on earlier.

In [None]:
prompt = f"""
From the review below, list down the positive points and negative points separately, in JSON. Use the following format:

{{"positive": [], "negative": []}}

Review: {review}
JSON:
"""

pretty_print_json(generate(prompt))

## Key Concept Review

The following key concepts were introduced in this notebook:

- **Sentiment Analysis:** Identifying the mood or sentiment for a piece of text.
- **Instruction Fine-Tuning:** Improving a model's task performance through tailored example-based learning.
- **LLaMA-2 Prompt Template:** A pre-designed format guiding LLaMA-2 model responses, used during instruction fine-tuning.
- **Few-shot Learning:** Prepending one-to-many instructive examples to a model to improve its responses.

## Optional Advanced Exercises

If you'd like to go above and beyond the requirements of the course, below are some additional open-ended exercises for you to try.

### Varied Output Data

We were successful in generating JSON. Try to generate different forms of data, like HTML tables.

### Use the 7B Model

At the top of the notebook, after restarting the kernel (see cell below), uncomment and use the 7B model instead of the 13B model we demoed. Try to get satisfying results through prompt engineering in spite of using the smaller (weaker) model.

### Reuse the Model for Additional Analysis

One of the superpowers of LLMs like LLaMA-2 is that they are capable to perform many tasks that we may have historically needed many different models to perform.

Consider that if you were to have gathered "positive" and "negative" points for many reviews, you might have many items that you recognize as similar, but that the model recorded as different strings, for example "Nice tires" and "Awesome tires". Can you use reuse the model to create data that somehow captures these kinds of items, which you and I recognize as the same, to be captured in the data as being the same?

What other kinds of analysis would you be interested to perform?

## Restart the Kernel

In order to free up GPU memory for the next notebook, please run the following cell to restart the kernel.

In [None]:
from IPython import get_ipython

get_ipython().kernel.do_shutdown(restart=True)

![DLI Header](images/DLI_Header.png)