<div class='bar_title'></div>

*Practical Data Science*

# Prompt Engineering

Gunther Gust & Viet Nguyen<br>
Chair of Enterprise AI

<img src="https://github.com/GuntherGust/tds2_data/blob/main/images/d3.png?raw=true" style="width:20%; float:left;" />

<img src="https://github.com/GuntherGust/practical_data_science/blob/main/notebooks/images/CAIDASlogo.png?raw=1" style="width:20%; float:left;" />

In this lecture, we will introduce the concept of prompt engineering used in large language models (LMs). All examples are demonstrated using [Gemini APIs](https://ai.google.dev/gemini-api/docs), and the lecture mainly follows the teaching materials of [DAIR.AI](https://github.com/dair-ai/Prompt-Engineering-Guide/tree/main).

## Table of Contents
1. Basics of prompt engineering
2. Several advanced techniques for more complex prompts
3. General tips for designing prompts
4. Tools for playing around with prompt engineering

## 1. Basics of Prompt Engineering

Prompt engineering is a field of creating and optimizing prompts to effectively utilize and enhance large language models (LLMs) across many applications. Developing skills in prompt engineering allows practitioners to gain deeper insights into the strengths and limitations of LLMs. Researchers leverage these techniques to enhance the safetey and performance of LLMs, aiming to deal with a varetiery of tasks, from straightforward question answering to complex arithmetic reasoning. Meanwhile, developers employ prompt engineering to craft robust and effective prompting strategies to interact with LLMs and other tools.


Before using `Gemini APIs`, you need to create a secret key [here](https://aistudio.google.com/app/apikey). Please keep the secret key somewhere safe because you cannot retrieve it on the website again. Although this is not a good practice, let's try to use the secret key directly here for simplicity:

In [18]:
import google.generativeai as genai
from IPython.display import display, Markdown

# INSERT YOUR KEY HERE
key = "AIzaSyAGWyzerTRj94wHnfz7BvyLVkxCbusLzfo"

# configure the key for calling GenAI model
genai.configure(api_key=key)

# load model
model = genai.GenerativeModel("gemini-1.5-flash")

Due to new policy of limiting data usage from `Open APIs`, we utilize the examples of [DAIR.AI](https://github.com/dair-ai/Prompt-Engineering-Guide/tree/main) with Google Model `Gemini` instead. You can take a look at all variants [here](https://ai.google.dev/gemini-api/docs/models/gemini) (You need a Google Account). In this lecture, we use the standard `Gemini 1.5 Flash` that has great performance for most tasks, including images, video, and text.

### 1.1 Basic Prompting
With simple prompts, you can achieve reasonable results, but the outcome largely depends on how well you structure your request and the amount of detail you include. A well-thought-out prompt goes beyond just a basic question or instruction; it incorporates essential information like context, examples, or specifics. You can make the model understand your request better and enhance the quality of the response.

Below is a simple prompt example:

In [12]:
# prompt
prompt = "The sky is"

#response
response = model.generate_content(prompt)
print(response.text)

The sky is blue.  (But it can also be many other colors depending on the time of day and weather conditions.)



### 1.2 Text Summarization

In [13]:
# prompt
prompt = """Antibiotics are a type of medication used to treat bacterial infections.
They work by either killing the bacteria or preventing them from reproducing,
allowing the body's immune system to fight off the infection.
Antibiotics are usually taken orally in the form of pills, capsules,
or liquid solutions, or sometimes administered intravenously.
They are not effective against viral infections,
and using them inappropriately can lead to antibiotic resistance.

Explain the above in one sentence:"""

# response
response = model.generate_content(prompt)
print(response.text)

Antibiotics are medications that kill or stop the reproduction of bacteria, helping the body fight bacterial infections, but are ineffective against viruses and their overuse can cause resistance.



### 1.3 Question and Answering


In [14]:
# prompt
prompt = """Answer the question based on the context below.
Keep the answer short and concise.
Respond "Unsure about answer" if not sure about the answer.

Context: Teplizumab traces its roots to a New Jersey drug company
called Ortho Pharmaceutical. There, scientists generated an early version
of the antibody, dubbed OKT3. Originally sourced from mice, the molecule
was able to bind to the surface of T cells and limit their cell-killing
potential. In 1986, it was approved to help prevent organ rejection
after kidney transplants, making it the first therapeutic antibody
allowed for human use.

Question: What was OKT3 originally sourced from?

Answer:
Explain the above in one sentence:"""

#response
response = model.generate_content(prompt)
print(response.text)

Mice.



Context is taken from [here](https://www.nature.com/articles/d41586-023-00400-x).

### 1.4 Text Classification

In [19]:
# prompt
prompt = """Classify the text into neutral, negative or positive.
Text: I think the food was okay.
Sentiment:"""

# response
response = model.generate_content(prompt)
display(Markdown(response.text))

Sentiment: **Neutral**


### Short exercise: Tweaking the prompt
Modify the above text to make the sentiment into "Negative". Note that sometimes the model outputs normal text without markdown format, and it is fine. You can enforce your prompt to format the text.

In [24]:
# your prompt here
prompt1 = """Classify the text into neutral, negative or positive.
Text: I think the food was bad.
Sentiment:"""

prompt2 = """Classify the text into neutral, negative or positive. Make the answer bold.
Text: I think the food was bad.
Sentiment:"""

# response
response = model.generate_content(prompt1)
display(Markdown(response.text))

response = model.generate_content(prompt2)
display(Markdown(response.text))

Sentiment: Negative


Sentiment: **Negative**


### 1.5 Role Playing

In [21]:
# prompt
prompt = """The following is a conversation with an AI research assistant.
The assistant tone is technical and scientific.

Human: Hello, who are you?
AI: Greeting! I am an AI research assistant. How can I help you today?
Human: Can you tell me about the creation of blackholes?
AI:"""

# response
response = model.generate_content(prompt)
display(Markdown(response.text))

The creation of black holes is a complex process governed by the principles of general relativity.  While various mechanisms exist, they all involve the gravitational collapse of a sufficiently massive object.  The most widely accepted pathways are:

* **Stellar Black Hole Formation:** This is the most common pathway.  Massive stars (generally those exceeding 20-30 solar masses, though the exact limit is dependent on stellar metallicity and rotation)  exhaust their nuclear fuel.  The outward pressure from nuclear fusion no longer counteracts the inward pull of gravity.  The core collapses catastrophically, leading to a supernova explosion. If the remnant core's mass exceeds the Tolman–Oppenheimer–Volkoff limit (approximately 2-3 solar masses),  further collapse ensues, resulting in a black hole.  The precise details depend on the star's initial mass, composition, and rotation.

* **Supermassive Black Hole Formation:** The formation of supermassive black holes (SMBHs), millions or even billions of solar masses, remains an area of active research.  Several theories exist, including:
    * **Direct Collapse:** In the early universe, under specific conditions of high density and low metallicity, gas clouds could have collapsed directly into SMBHs without forming intermediate-mass stars.
    * **Seed Black Holes and Accretion:**  Stellar-mass black holes or intermediate-mass black holes formed through stellar collapse could act as "seeds," gradually accumulating matter through accretion to become supermassive over cosmic timescales.  This process is likely facilitated by mergers of galaxies and their central black holes.
    * **Hierarchical Merging:** Multiple smaller black holes merging together over time, a process potentially accelerated by galaxy mergers.

* **Primordial Black Holes:** These hypothetical black holes formed in the very early universe, possibly from density fluctuations in the aftermath of the Big Bang. Their existence remains unproven, although some cosmological models predict their formation.


It's crucial to note that these formation processes are not mutually exclusive.  A given SMBH might have formed through a combination of direct collapse and subsequent accretion of matter.  Furthermore, our understanding of black hole formation is continuously being refined with new observations and theoretical advancements.  The specifics of each formation pathway involve complex astrophysical processes, including hydrodynamic simulations, magnetohydrodynamic effects, and general relativistic calculations that are beyond the scope of a brief summary.


### 1.6 Code Generation (SQL)

In [25]:
# prompt
prompt = "\"\"\"\nTable departments, columns = [DepartmentId, DepartmentName]\nTable students, columns = [DepartmentId, StudentId, StudentName]\nCreate a MySQL query for all students in the Computer Science Department\n\"\"\""

# response
response = model.generate_content(prompt)
display(Markdown(response.text))

```sql
SELECT
    StudentId,
    StudentName
FROM
    students
WHERE
    DepartmentId = (SELECT DepartmentId FROM departments WHERE DepartmentName = 'Computer Science');
```


### 1.7 Reasoning

In [27]:
# prompt
prompt = """The odd numbers in this group add up to an even number: 15, 32, 5, 13, 82, 7, 1.

Solve by breaking the problem into steps.
First, identify the odd numbers, add them, and indicate
whether the result is odd or even."""

# response
response = model.generate_content(prompt)
display(Markdown(response.text))

Here's how to solve the problem step-by-step:

**Step 1: Identify the odd numbers.**

The odd numbers in the group are: 15, 5, 13, 7, 1

**Step 2: Add the odd numbers.**

15 + 5 + 13 + 7 + 1 = 41

**Step 3: Indicate whether the result is odd or even.**

The result, 41, is an **odd** number.


### Short Exercise: Caption an image

Beside text-to-text format, you can also generate text for a given input image using `Gemini 1.5 Pro`.

Your task: create a prompt to make a caption for the image. The response MUST contain a list of bullet points.

In [31]:
import httpx
import os
import base64

model = genai.GenerativeModel(model_name = "gemini-1.5-pro")
image_path = "https://miro.medium.com/v2/resize:fit:720/format:webp/1*tjPh2MVUFSdqREruQCuurQ.jpeg"

image = httpx.get(image_path)

# Give it a prompt -- YOUR CODE HERE
prompt = "Caption this image, describe the characteristics with a list of bullet points."
response = model.generate_content([{'mime_type':'image/jpeg', 'data': base64.b64encode(image.content).decode('utf-8')}, prompt])

# Print the caption in the markdown format -- YOUR CODE HERE
display(Markdown(response.text))

This image appears to be fan art of a cartoon dog, likely inspired by the animation style of Genndy Tartakovsky (creator of Dexter's Laboratory, Samurai Jack, and Primal). It has an air of nonchalant coolness or swagger.

* **Animal:** A stylized dog or canine-like creature.
* **Expression:** The character has a sly, almost smug expression, emphasized by the half-closed eyes.
* **Attire:** It wears a simple, oversized gray sweater, light blue jeans, and red sneakers.  The jeans are rolled up slightly at the ankles.
* **Pose:** The character stands with its hands in its pockets, further contributing to the laid-back attitude.
* **Color Palette:** The image uses a limited color palette, mostly browns, blues, grays, and reds, against a muted green background.
* **Style:**  The drawing style is reminiscent of early 2000s cartoons, with thick lines and somewhat simplified shapes. The shading and coloring appear slightly rough, giving it a sketched or hand-drawn feel.

## 2. More Advanced Prompting Techniques


2.1 Zero-shot Prompting

Large language models such as GPT-3.5 Turbo, GPt-4, Claude 3 and Gemini are trained on large and diverse datasets. This large-scale training setting enables these models to handle certain tasks using a "zero-shot" approach. In zero-shot prompting, the input provided to the model contains no examples or demonstrations. Instead, the prompt gives direct instructions for the task, relying solely on the model's inherent capabilities to understand and execute it. All of the examples you see above are `zero-shot` prompting. Here is another zero-shot `text classification` example:


In [32]:
# prompt
prompt = """Classify the text into neutral, negative or positive.
Text: I enjoyed the concert last night, although the technical issues took an hour to be resolved.
Sentiment:"""

# response
response = model.generate_content(prompt)
display(Markdown(response.text))

Sentiment: **Positive**

Despite the negative experience of the technical issues, the overall sentiment is positive because the user explicitly states they "enjoyed the concert". The negative aspect is presented as a mitigating factor, not the overriding experience.


2.1 Few-shot Prompting

### 2.3 Chain-of-Thought Prompting

### 2.4 Meta Prompting

### 2.5 Generate Knowledge Prompting

### 2.6 Automatic Prompt Engineer (Bonus)

Outro


<img src="https://github.com/GuntherGust/practical_data_science/blob/main/notebooks/images/d3.png?raw=1" style="width:50%; float:center;" />
