# Prompting Workshop with Weights and Biases - [Anish Shah](https://www.linkedin.com/in/anish-shah/)

In [1]:
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False

In [2]:
import os
from IPython.display import Markdown

Ensure to have the appropriate API keys for the models you want to run

In [3]:
if IN_COLAB:
    from google.colab import userdata
    os.environ["WANDB_API_KEY"] = userdata.get('WANDB_API_KEY')
    os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
    os.environ["ANTHROPIC_API_KEY"] = userdata.get('ANTHROPIC_API_KEY')
else:
    !pip install -q python-dotenv
    from dotenv import load_dotenv
    load_dotenv()

In [4]:
!pip install weave -U -q
!pip install litellm -q

Begin weave tracking experiment

In [5]:
%%capture
import weave
from litellm import completion


In [6]:
ANTHROPIC_SMART_MODEL_NAME = "claude-3-opus-20240229"
ANTHROPIC_FAST_MODEL_NAME = "claude-3-haiku-20240307"
OPENAI_SMART_MODEL_NAME = "gpt-4"
OPENAI_FAST_MODEL_NAME = "gpt-3.5-turbo"

In [7]:
MODEL_NAME = ANTHROPIC_FAST_MODEL_NAME

In [8]:
weave.init("prompting-workshop")

Logged in as W&B user a-sh0ts.
View Weave data at https://wandb.ai/a-sh0ts/prompting-workshop/weave




In [9]:
@weave.op()
def get_completion(system_message, messages, model_name=MODEL_NAME, max_tokens=4096, temperature=0):
    response = completion(
        model=model_name,
        max_tokens=max_tokens,
        temperature=temperature , #Good to set this for evals and RAG systems to 0
        system=system_message,
        messages=messages
    )
    print(response.json())
    return response.json()

Use case: Building a bot to help us to understand all the prompting information and answer the questions based on the information provided.

Step 1: Raw prompting

In [10]:
@weave.op()
def prompt_llm(question, system_message=""):
    return get_completion(system_message=system_message, messages=[{"role": "user", "content": question}])["choices"][0]["message"]["content"]

In [11]:
raw_prompt_response = prompt_llm(
    "What are the latest prompting techniques?"
)

{'id': 'chatcmpl-bf404355-7188-46bd-be8e-74eeaf1d49e1', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': "I do not actually have detailed knowledge about the latest prompting techniques. As an AI assistant created by Anthropic, I don't have insider information about the company's research and development. The field of prompt engineering is rapidly evolving, with new techniques and best practices emerging all the time. I would suggest checking reputable sources and AI research publications to stay up-to-date on the latest advancements in this area.", 'role': 'assistant', 'tool_calls': []}}], 'created': 1712685902, 'model': 'claude-3-haiku-20240307', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'prompt_tokens': 15, 'completion_tokens': 90, 'total_tokens': 105}}
🍩 https://wandb.ai/a-sh0ts/prompting-workshop/r/call/9f9dbef9-296c-4f74-9ba7-a250fd1fa3f1


In [12]:
Markdown(raw_prompt_response)

I do not actually have detailed knowledge about the latest prompting techniques. As an AI assistant created by Anthropic, I don't have insider information about the company's research and development. The field of prompt engineering is rapidly evolving, with new techniques and best practices emerging all the time. I would suggest checking reputable sources and AI research publications to stay up-to-date on the latest advancements in this area.

Understanbaly the model is not properly aware of the context of the question. It is just trying to answer the question based on the information provided in the prompt, which in this case is nothing.

Step 2. Prompting with context

We can throw the context directly alongside the question. [I will use this website by Aman Chadha](https://aman.ai/primers/ai/prompt-engineering/) who very usefully condensed many great papers and articles into a comprehensive single page guide of different prompting techniques.

In [13]:
def load_markdown_file(file_path):
    with open(file_path, 'r', encoding='utf-8') as file:
        markdown_content = file.read()
    return markdown_content

In [14]:
context = load_markdown_file('prompt_engineering.md')

Note: Anthropic has an amazingly large context size and as a result we can luckily just shove the whole document into the prompt in this situation. This can get quite expensive however so it typically makes more sense to use techniques that chunk the document into better sizes or use a RAG based pipeline


In [15]:
context_prompt_response = prompt_llm(
    context + "\n\nWhat are the latest prompting techniques and provide an example of each"
)

{'id': 'chatcmpl-01487437-ddcf-4aae-bc91-648644826227', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': 'Here are some of the latest prompting techniques and examples:\n\n1. **Zero-shot Prompting**:\nExample:\nPrompt: Classify the text into neutral, negative, or positive. Text: I think the vacation is okay.\nOutput: Neutral\n\n2. **Few-shot Prompting**:\nExample: \nPrompt: A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is: We were traveling in Africa and we saw these very cute whatpus. To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:\nOutput: When we won the game, we all started to farduddle in celebration.\n\n3. **Chain-of-Thought (CoT) Prompting**:\nExample:\nPrompt: The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.\nOutput: Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.\n\n

In [16]:
Markdown(context_prompt_response)

Here are some of the latest prompting techniques and examples:

1. **Zero-shot Prompting**:
Example:
Prompt: Classify the text into neutral, negative, or positive. Text: I think the vacation is okay.
Output: Neutral

2. **Few-shot Prompting**:
Example: 
Prompt: A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is: We were traveling in Africa and we saw these very cute whatpus. To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:
Output: When we won the game, we all started to farduddle in celebration.

3. **Chain-of-Thought (CoT) Prompting**:
Example:
Prompt: The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
Output: Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.

4. **Zero-shot CoT**:
Example:
Prompt: I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
Let's think step by step.
Output: First, you started with 10 apples. You gave away 2 apples to the neighbor and 2 to the repairman, so you had 6 apples left. Then you bought 5 more apples, so now you had 11 apples. Finally, you ate 1 apple, so you would remain with 10 apples.

5. **Few-shot CoT**:
Example:
Prompt: When I was 6 my sister was half my age. Now I'm 70 how old is my sister?
Output: When I was 6 my sister was half my age, so she was 3. Now I am 70, so she is 70 - 3 = 67 years old. The answer is 67.

6. **Automatic Chain-of-Thought (Auto-CoT)**:
Auto-CoT is a method that automatically generates diverse and effective CoT examples for prompting, without manual curation.

7. **Self-Consistency**:
Example:
Prompt: When I was 6 my sister was half my age. Now I'm 70 how old is my sister?
Output: When the narrator was 6, his sister was half his age, which is 3. Now that the narrator is 70, his sister would be 70 - 3 = 67 years old. The answer is 67.

8. **Tree-of-Thoughts (ToT) Prompting**:
ToT is a framework that organizes the LLM's thoughts into a tree structure, enabling systematic exploration and optimal path selection for complex problem-solving.

9. **Graph-of-Thought (GoT) Prompting**:
GoT models human thought processes as a graph, capturing the non-sequential nature of thinking, in contrast to the linear Chain-of-Thought.

10. **ReAct Prompting**:
Example:
Prompt: Aside from the Apple Remote, what other devices can control the program Apple Remote was originally designed to interact with?
Output: Searching for information on the Apple Remote, I found that it was originally designed to interact with Apple TV. Other devices that can control Apple TV include iOS devices like iPhones and iPads, as well as some third-party universal remotes.

These are just a few examples of the latest prompting techniques. The field of prompt engineering is rapidly evolving, with new methods being introduced to enhance the capabilities of large language models.

This response is a lot better! We can see that the model is able to answer the question with a lot more context and actually responds with details that explain various techniques I'll be talking about in class. The problem though is that this is regurgitating the information from the website and the technical details are not being explained in a way that is easy to understand. This is where the next step comes in.

Step 3. Condition Responses with a System Prompt

We want to ensure we force our bot to explain the information in a way that is easy to understand. We can do this by providing a system prompt that forces the model to explain the information in a way that is best for us to understand

In [17]:
system_message = """
Objective: You will analyze a highly technical markdown page about prompt engineering. Your task is to simplify the concepts discussed on this page and explain them in an easy-to-understand manner. For each prompting technique mentioned, provide clear, concise examples that illuminate how the same task would be approached differently.
Personality and Tone: Adopt the role of a friendly and knowledgeable teacher who excels at breaking down complex subjects into easily digestible pieces. Your explanations should be patient and encouraging, aiming to enlighten without overwhelming. The tone should be casual yet informative, making technical content accessible to a broad audience, including beginners.
Contextual Information: Assume the user has a basic understanding of AI but may not be familiar with advanced concepts of prompt engineering. Wherever possible, relate technical details to everyday scenarios or familiar contexts to enhance comprehension.
Creativity Constraints and Style Guidance: Your explanations should avoid jargon and technical terminology without sacrificing accuracy. Use metaphors, analogies, and simple examples to convey your points. Each prompting technique should be illustrated with a brief, imaginative example that embodies its essence.
External Knowledge: Feel free to draw upon general knowledge of AI, machine learning, and prompt engineering practices. However, avoid diving deep into highly specialized or niche research unless it directly supports your explanations.
Rules and Guidelines: Steer clear of overly complex explanations or examples that might confuse someone new to the topic. Ensure that your examples are realistic and directly applicable to the prompting techniques being discussed.
Output Verification Standards: Your responses should be clear, accurate, and directly responsive to the task. Examples should be checked for their relevance and ability to demonstrate the discussed concepts effectively.
Benefits of Task: By simplifying these technical concepts, you will help demystify prompt engineering for those new to the subject, fostering a deeper understanding and appreciation of its importance in AI interactions. This approach not only educates but also engages users by making learning about AI an enjoyable and enlightening experience.
"""

In [18]:
system_and_context_prompt_response = prompt_llm(
    system_message=system_message,
    question=context + "\n\nWhat are the latest prompting techniques and provide an example of each"
)

{'id': 'chatcmpl-447de6ee-bad4-4649-abae-6a1a8680c1c6', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': 'Here are some of the latest prompting techniques and examples:\n\n1. Zero-shot Prompting:\nExample:\nPrompt: Classify the text into neutral, negative, or positive. Text: I think the vacation is okay.\nOutput: Neutral\n\n2. Few-shot Prompting: \nExample:\nPrompt: A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is: We were traveling in Africa and we saw these very cute whatpus. To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:\nOutput: When we won the game, we all started to farduddle in celebration.\n\n3. Chain-of-Thought (CoT) Prompting:\nExample: \nPrompt: The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.\nOutput: Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.\n\n4. Zero-sho

In [19]:
Markdown(system_and_context_prompt_response)

Here are some of the latest prompting techniques and examples:

1. Zero-shot Prompting:
Example:
Prompt: Classify the text into neutral, negative, or positive. Text: I think the vacation is okay.
Output: Neutral

2. Few-shot Prompting: 
Example:
Prompt: A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is: We were traveling in Africa and we saw these very cute whatpus. To do a "farduddle" means to jump up and down really fast. An example of a sentence that uses the word farduddle is:
Output: When we won the game, we all started to farduddle in celebration.

3. Chain-of-Thought (CoT) Prompting:
Example: 
Prompt: The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.
Output: Adding all the odd numbers (15, 5, 13, 7, 1) gives 41. The answer is False.

4. Zero-shot CoT:
Example:
Prompt: I went to the market and bought 10 apples. I gave 2 apples to the neighbor and 2 to the repairman. I then went and bought 5 more apples and ate 1. How many apples did I remain with?
Let's think step by step.
Output: First, you started with 10 apples. You gave away 2 apples to the neighbor and 2 to the repairman, so you had 6 apples left. Then you bought 5 more apples, so now you had 11 apples. Finally, you ate 1 apple, so you would remain with 10 apples.

5. Few-shot CoT:
Example:
Prompt: [Few-shot examples] ... 
Question: When I was 6 my sister was half my age. Now I'm 70 how old is my sister?
Output: When the narrator was 6, his sister was half his age, which is 3. Now that the narrator is 70, his sister would be 70 - 3 = 67 years old. The answer is 67.

6. Automatic Chain-of-Thought (Auto-CoT):
Auto-CoT is a method that automatically generates diverse and effective CoT examples to use in prompts, without manual curation.

7. Self-Consistency:
Example:
Prompt: When I was 6 my sister was half my age. Now I'm 70 how old is my sister?
Output: When I was 6 my sister was half my age, so she was 3. Now I am 70, so she is 70/2 = 35. The answer is 35.

8. Tree-of-Thoughts (ToT) Prompting:
ToT is a framework that organizes the LLM's thoughts into a tree structure, enabling systematic exploration of different reasoning paths.

9. Graph-of-Thought (GoT) Prompting:
GoT models human thought processes as a graph rather than a simple chain, capturing the non-linear nature of reasoning.

10. Skeleton-of-Thought Prompting:
SoT prompts the LLM to first outline the answer's skeleton, then fill in the details in parallel, improving efficiency.

11. Chain-of-Verification (CoVe):
CoVe prompts the LLM to first generate a draft response, then plan and execute verification steps to correct any mistakes before providing the final answer.

12. ReAct Prompting:
ReAct interleaves reasoning and action steps, allowing the LLM to interact with external tools and information sources to improve the reliability of its responses.

These are just a few examples of the latest prompting techniques. The field of prompt engineering is rapidly evolving, with new methods continuously being developed to enhance the capabilities of large language models.

Great! Now the we're able to get a response that is easy to understand and provides a lot of context. Now we can start to standardize the inputs and outputs in such a way that we can ask different questions and even pass different context in the future, making it easy for LLM application development.

Step 4: System Prompts - Inputs

We will be adjusting a lot of different parameters so wrapping it in a model makes it a lot easier for us to investigate and retreive our best models

In [20]:
from weave import Model

In [21]:
class PromptingModel(Model):

    system_message: str = ""
    context: str = ""
    prompt_template: str = "{context}\n{question}"

    model_name: str = MODEL_NAME
    max_tokens: int = 4096
    temperature: float = 0.0
    

    def __init__(self, system_message, context, prompt_template=None, model_name=None, max_tokens=None, temperature=None):
        super().__init__()
        self.system_message = system_message
        self.context = context
        if prompt_template:
            self.prompt_template = prompt_template 
        if model_name:
            self.model_name = model_name
        if max_tokens:
            self.max_tokens = max_tokens
        if temperature:
            self.temperature = temperature

    # f-strings make for creating great prompt templates
    @weave.op()
    def get_prompt(self, question):
        return [{"role": "user",
                 "content": self.prompt_template.format(context=self.context, question=question)
                }]

    # we change the above prompt_llm to use the get_prompt method
    @weave.op()
    def predict(self, question):
        response = get_completion(
            system_message=self.system_message,
            messages=self.get_prompt(question),
            model_name=self.model_name,
            max_tokens=self.max_tokens,
            temperature=self.temperature
        )
        return response["choices"][0]["message"]["content"]

    # @weave.op()
    # def predict(self, question):
    #     response = completion(
    #         model=self.model_name,
    #         max_tokens=self.max_tokens,
    #         temperature=self.temperature , #Good to set this for evals and RAG systems to 0
    #         system=self.system_message,
    #         messages=self.get_prompt(question)
    #     )
    #     return response["choices"][0]["message"]["content"]
    
    # we render the response as markdown for better viewing
    def render(self, question):
        response = self.predict(question)
        return Markdown(response)

In [22]:
# Note we're assuming we're only using context and question in our prompt template
prompt_template = "{context}\n{question}"

In [23]:
llm_app = PromptingModel(
    system_message=system_message, 
    context=context, 
    prompt_template=prompt_template,
    model_name=MODEL_NAME
)

In [24]:
llm_app.get_prompt("What are the latest prompting techniques?")

🍩 https://wandb.ai/a-sh0ts/prompting-workshop/r/call/2f2b96be-fb3c-4c3c-9505-74af3dc962a1


[{'role': 'user',
  'content': '- Overview\n- Prompts\n- Zero-shot Prompting\n- Few-shot Prompting\n- Chain-of-Thought (CoT) Prompting \nZero-shot CoT\nFew-shot CoT\n\n\n- Zero-shot CoT\n- Few-shot CoT\n- Automatic Chain-of-Thought (Auto-CoT)\n- Self-Consistency\n- Tree-of-Thoughts (ToT) Prompting \nWhat is the difference between Tree-of-Thought prompting and Chain-of-Thought prompting? Which is better and why?\n\n\n- What is the difference between Tree-of-Thought prompting and Chain-of-Thought prompting? Which is better and why?\n- Graph-of-Thought (GoT) Prompting\n- Skeleton-of-Thought Prompting\n- Chain-of-Verification (CoVe)\n- ReAct Prompting \nHow does ReAct Work?\nReAct Prompting\nResults on Knowledge-Intensive Tasks\nResults on Decision Making Tasks\nReAct Usage with LangChain\n\n\n- How does ReAct Work?\n- ReAct Prompting\n- Results on Knowledge-Intensive Tasks\n- Results on Decision Making Tasks\n- ReAct Usage with LangChain\n- Active-Prompt\n- Instruction Prompting and Tunin

In [25]:
question = """
I am building a chatbot for analyzing workshop attendee satisfaction. 
What are some good examples of Few Shot prompts to put into another prompt?
"""

In [26]:
llm_app.schema()

{'additionalProperties': False,
 'properties': {'name': {'anyOf': [{'type': 'string'}, {'type': 'null'}],
   'default': None,
   'title': 'Name'},
  'description': {'anyOf': [{'type': 'string'}, {'type': 'null'}],
   'default': None,
   'title': 'Description'},
  'system_message': {'default': '',
   'title': 'System Message',
   'type': 'string'},
  'context': {'default': '', 'title': 'Context', 'type': 'string'},
  'prompt_template': {'default': '{context}\n{question}',
   'title': 'Prompt Template',
   'type': 'string'},
  'model_name': {'default': 'claude-3-haiku-20240307',
   'title': 'Model Name',
   'type': 'string'},
  'max_tokens': {'default': 4096, 'title': 'Max Tokens', 'type': 'integer'},
  'temperature': {'default': 0.0, 'title': 'Temperature', 'type': 'number'}},
 'title': 'PromptingModel',
 'type': 'object'}

In [27]:
llm_app.render(question)

{'id': 'chatcmpl-d3cda205-7b81-4733-aca3-c52b095ca7c0', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': 'Here are some examples of few-shot prompts that could be useful for building a chatbot to analyze workshop attendee satisfaction:\n\nExample 1:\nPrompt: Analyze the sentiment of the following feedback on a workshop: "The workshop was really informative and the speaker was engaging. However, the venue was a bit cramped and the lunch options were limited."\nFew-shot examples:\nFeedback: "The training was excellent and the instructor was knowledgeable. The only downside was the dated equipment in the classroom."\nAnalysis: The feedback is mostly positive, with the main criticism being about the dated equipment. Overall, the sentiment is positive.\n\nFeedback: "I was disappointed with the workshop. The content was disorganized and the speaker seemed unprepared. The snacks provided were also poor quality."\nAnalysis: The feedback is negative, citing issues with t

TypeError: unhashable type: 'ObjectRef'

Now we can easily and consistently swap system messages, context, and questions to get the best results. As we can see this system prompt doesn't really work as well with the newly asked question. We may also want more consistency in the form of our outputs. Using third party packages like instructor (?) are outside the scope of this workshop but luckily by even simply using proper tags in the prompt we can force our model to respond in a way that is more consistent for parsing on our end.

Step 5: System Prompts - Outputs

In [None]:
system_message = """
Objective: You will analyze a highly technical markdown page about prompt engineering. Your task is to simplify the concepts discussed on this page and explain them in an easy-to-understand manner. For each prompting technique mentioned, provide clear, concise examples that illuminate how the same task would be approached differently.
Personality and Tone: Adopt the role of a friendly and knowledgeable teacher who excels at breaking down complex subjects into easily digestible pieces. Your explanations should be patient and encouraging, aiming to enlighten without overwhelming. The tone should be casual yet informative, making technical content accessible to a broad audience, including beginners.
Contextual Information: Assume the user has a basic understanding of AI but may not be familiar with advanced concepts of prompt engineering. Wherever possible, relate technical details to everyday scenarios or familiar contexts to enhance comprehension.
Creativity Constraints and Style Guidance: Your explanations should avoid jargon and technical terminology without sacrificing accuracy. Use metaphors, analogies, and simple examples to convey your points. Each prompting technique should be illustrated with a brief, imaginative example that embodies its essence.
External Knowledge: Feel free to draw upon general knowledge of AI, machine learning, and prompt engineering practices. However, avoid diving deep into highly specialized or niche research unless it directly supports your explanations.
Rules and Guidelines: Steer clear of overly complex explanations or examples that might confuse someone new to the topic. Ensure that your examples are realistic and directly applicable to the prompting techniques being discussed.
Output Verification Standards: Your responses should be clear, accurate, and directly responsive to the task. Examples should be checked for their relevance and ability to demonstrate the discussed concepts effectively.
Benefits of Task: By simplifying these technical concepts, you will help demystify prompt engineering for those new to the subject, fostering a deeper understanding and appreciation of its importance in AI interactions. This approach not only educates but also engages users by making learning about AI an enjoyable and enlightening experience.
"""

In [None]:
prompt_template = """
<context>{context}</context>
<question>{question}</question>\n

You must respond within an <answer></answer> markdown tag.
Inside of the <answer> markdown tag, you must provide a format of
<answer>
    <explanation> EXPLANATION </explanation>
    <example> EXAMPLE </example>
</answer>

Fill in the rest of the tag given:
<answer>
"""

In [None]:
llm_app = PromptingModel(
    system_message=system_message, 
    context=context, 
    prompt_template=prompt_template,
    model_name=ANTHROPIC_SMART_MODEL_NAME
)

In [None]:
question = """
I am building a chatbot for analyzing workshop attendee satisfaction. 
What are some good examples of Few Shot prompts to put into another prompt?
"""

In [None]:
llm_app.render(question)