### LangChain Essentials

# Prompts Templating for Ollama - LangChain #2

Until 2021, to use an AI model for a specific use-case we would need to fine-tune the model weights themselves. That would require huge amounts of training data and significant compute to fine-tune any reasonably performing model.

Instruction fine-tuned **L**arge **L**anguage **M**odels (LLMs) changed this fundamental rule of applying AI models to new use-cases. Rather than needing to either train a model from scratch or fine-tune an existing model, these new LLMs could adapt incredibly well to a new problem or use-case with nothing more than a prompt change.

Prompts allow us to completely change the functionality of an AI pipeline. Through natural language we simply _tell_ our LLM what it needs to do, and with the right AI pipeline and prompting, it often works.

LangChain naturally has many functionalities geared towards helping us build our prompts. We can build very dynamic prompting pipelines that modifying the structure and content of what we feed into our LLM based on essentially any parameter we would like. In this example, we'll explore the essentials to prompting in LangChain and apply this in a demo **R**etrieval **A**ugmented **G**eneration (RAG) pipeline.

---

> !!! We will be using Ollama for this example allowing us to run everything locally. If you would like to use OpenAI instead, please see the [OpenAI version] TK of this example.

---

## Basic Prompting

We'll start by looking at the various parts of our prompt. For RAG use-cases we'll typically have three core components however this is _very_ use-cases dependant and can vary significantly. Nonetheless, for RAG we will typically see:

* **Rules for our LLM**: this part of the prompt sets up the behavior of our LLM, how it should approach responding to user queries, and simply providing as much information as possible about what we're wanting to do as possible. We typically place this within the _system prompt_ of an chat LLM.

* **Context**: this part is RAG-specific. The context refers to some _external information_ that we may have retrieved from a web search, database query, or often a _vector database_. This external information is the **R**etrieval **A**ugmentation part of **RA**G. For chat LLMs we'll typically place this inside the chat messages between the assistant and user.

* **Question**: this is the input from our user. In the vast majority of cases the question/query/user input will always be provided to the LLM (and typically through a _user message_). However, the format and location of this being provided often changes.

* **Answer**: this is the answer from our assistant, again this is _very_ typical and we'd expect this with every use-case.

The below is an example of how a RAG prompt may look:

```
Answer the question based on the context below,                 }
if you cannot answer the question using the                     }--->  (Rules) For Our Prompt
provided information answer with "I don't know"                 }

Context: Aurelio AI is an AI development studio                 }
focused on the fields of Natural Language Processing (NLP)      }
and information retrieval using modern tooling                  }--->   Context AI has
such as Large Language Models (LLMs),                           }
vector databases, and LangChain.                                }

Question: Does Aurelio AI do anything related to LangChain?     }--->   User Question

Answer:                                                         }--->   AI Answer
```

Here we can see how the AI will appoach our question, as you can see we have a formulated response, if the context has the answer, then use the context to answer the question, if not, say I don't know, then we also have context and question which are being passed into this similarly to paramaters in a function.

In [2]:
prompt = """
Answer the user's query based on the context below.                 
If you cannot answer the question using the
provided information answer with "I don't know".

Context: {context}
"""

LangChain uses a `ChatPromptTemplate` object to format the various prompt types into a single list which will be passed to our LLM:

In [5]:
from langchain.prompts import ChatPromptTemplate

# passing the template to the LangChain model
prompt_template = ChatPromptTemplate.from_messages([
    ("system", prompt),
    ("user", "{query}"),
])

When we call the template it will expect us to provide two variables, the `context` and the `query`. Both of these variables are pulled from the strings we wrote, as LangChain interprets curly-bracket syntax (ie `{context}` and `{query}`) as indicating a dynamic variable that we expect to be inserted at query time. We can see that these variables have been picked up by our template object by viewing it's `input_variables` attribute:

In [7]:
prompt_template.input_variables

['context', 'query']

We can also view the structure of the messages (currently _prompt templates_) that the `ChatPromptTemplate` will construct by viewing the `messages` attribute:

In [8]:
prompt_template.messages

[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'query'], input_types={}, partial_variables={}, template='\nAnswer the question based on the context below,                 \nif you cannot answer the question using the                   \nprovided information answer with "I don\'t know"\n\ncontext: {context}\n\nquestion: {query}\n'), additional_kwargs={}),
 HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='{query}'), additional_kwargs={})]

From this, we can see that each tuple provided when using `ChatPromptTemplate.from_messages` becomes an individual prompt template itself. Within each of these tuples, the first value defines the _role_ of the message, which is typically `system`, `human`, or `ai`. Using these tuples is shorthand for the following, more explicit code:

In [10]:
from langchain.prompts import SystemMessagePromptTemplate, HumanMessagePromptTemplate

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(prompt),
    HumanMessagePromptTemplate.from_template("{query}"),
])

We can see the structure of this new chat prompt template is identical to our previous:

In [11]:
prompt_template

ChatPromptTemplate(input_variables=['context', 'query'], input_types={}, partial_variables={}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context', 'query'], input_types={}, partial_variables={}, template='\nAnswer the question based on the context below,                 \nif you cannot answer the question using the                   \nprovided information answer with "I don\'t know"\n\ncontext: {context}\n\nquestion: {query}\n'), additional_kwargs={}), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='{query}'), additional_kwargs={})])

### Invoking our LLM with Templates

We've defined our prompt template, now let's define out LLM and run it with our template and a user query.

First, we initialize our LLM. For this, we are using Ollama

We start by initializing the 1B parameter Llama 3.2 model, fine-tuned for instruction following. We pull the model from Ollama by switching to our terminal and executing:

```
ollama pull llama3.2:1b-instruct-fp16
```

Once the model has finished downloading, we initialize it in LangChain using the ChatOllama class:

In [12]:
from langchain_ollama.chat_models import ChatOllama

model_name = "llama3.2:1b-instruct-fp16"

# initialize one LLM with temperature 0.0, this makes the LLM more deterministic
llm = ChatOllama(temperature=0.0, model=model_name)

Here we define our LLM and _because_ we're using it for a question-answer use-case we want it's answer to be as grounded in reality as possible. To do that, we ofcourse prompt it to not make up any information via the `If you cannot answer the question using the provided information answer with "I don't know"` line, but we _also_ use the model's `temperature` setting.

The `temperature` parameter controls the randomness of the LLM's output. A temperature of `0.0` makes an LLM's output more determinstic which _in theory_ should lead to a lower likelihood of hallucination.

Now, the question here may be, _why would we ever not use `temperature=0.0`?_ The answer to that is that sometimes a little bit of randomness can useful. Randomness tends to translate to text that feels more human and creative, so if we'd like an LLM to help us write an article or even a poem, that lack of determinism becomes a feature rather than a bug.

For now, we'll stick with our more deterministic LLM. We'll setup the pipeline to consume two variables when our LLM pipeline is called, `query` and `context`, we'll feed them into our chat prompt template, and then invoke our LLM with our formatted messages.

Although that sounds complicated, all we're doing is connecting our `prompt_template` and `llm`. We do this with **L**ang**C**hain **E**xpression **L**anguage (LCEL), which uses the `|` operator to connect our each component.

In [20]:
pipeline = prompt_template | llm

Now let's define a `query` and some relevant `context` and invoke our pipeline.

In [38]:
context = """Aurelio AI is an AI company developing tooling for AI
engineers. Their focus is on language AI with the team having strong
expertise in building AI agents and a strong background in
information retrieval.

The company is behind several open source frameworks, most notably
Semantic Router and Semantic Chunkers. They also have an AI
Platform providing engineers with tooling to help them build with
AI. Finally, the team also provides development services to other
organizations to help them bring their AI tech to market.

Aurelio AI became LangChain Partners in September 2024 after a long
track record of delivering AI solutions built with the LangChain
ecosystem."""

query = "what does Aurelio AI do?"

In [32]:
pipeline.invoke({"query": query, "context": context})

AIMessage(content='According to the context, Aurelio AI is an AI company that develops tooling for AI engineers. They focus on language AI and have expertise in building AI agents and information retrieval. Additionally, they provide several tools and services, including:\n\n* Open source frameworks: Semantic Router and Semantic Chunkers\n* AI Platform: providing engineers with tooling to build with AI\n* Development services: helping other organizations bring their AI technology to market.\n\nAurelio AI became LangChain Partners in September 2024 after a long track record of delivering AI solutions built with the LangChain ecosystem.', additional_kwargs={}, response_metadata={'model': 'llama3.2:1b-instruct-fp16', 'created_at': '2024-12-04T18:00:32.646486Z', 'done': True, 'done_reason': 'stop', 'total_duration': 1003948000, 'load_duration': 26077708, 'prompt_eval_count': 209, 'prompt_eval_duration': 95000000, 'eval_count': 117, 'eval_duration': 881000000, 'message': Message(role='assis

Our LLM pipeline is able to consume the information from the `context` and use it to answer the user's `query`. Ofcourse, we would never realistically be feeding in both a question and an answer into an LLM manually. Typically, the `context` would be retrieved from a vector database, via web search, or from elsewhere. We will cover this use-case in full and build a functional RAG pipeline in a future chapter.

For now, we'll continue with the essentials of prompting.

However that is considered the old way of doing few shot prompts, here below we can see a newer way which is way easier.

## Few Shot Prompting

Many **S**tate-**o**f-**t**he-**A**rt (SotA) LLMs are incredible at instruction following. Meaning that it requires much less effort to get the intended output or behavior from these models than is the case for older LLMs and smaller LLMs.

We're using a _one billion_ parameter LLM, that's where the `:1b-` part of `llama3.2:1b-instruct-fp16` comes from. Now, that may seem like a lot but in the world of LLMs this is a _tiny_ model. Because of it's size it can be efficiently and even used easily on a lot of consumer hardware. However, it is also less capable than other models like `gpt-4o` or `claude-3.5-sonnet`.

Using our tiny LLM does mean we need to put in a little extra work to get to generate what we'd like it to generate. Let's try an example, we'll ask the LLM to summarize the key points about Aurelio AI using markdown and bullet points. Let's see what happens.

In [42]:
new_system_prompt = """
Answer the user's query based on the context below.                 
If you cannot answer the question using the
provided information answer with "I don't know".

Always answer in markdown format. When doing so please
provide headers, short summaries, follow with bullet
points, then conclude.

Context: {context}
"""

prompt_template.messages[0].prompt.template = new_system_prompt

out = pipeline.invoke({"query": query, "context": context}).content
print(out)

# What Does Aurelio AI Do?

Aurelio AI is an AI company that develops tooling for AI engineers, focusing on language AI. They have a strong background in building AI agents and expertise in information retrieval.

## Key Areas of Work

*   Developing open-source frameworks:
    *   Semantic Router
    *   Semantic Chunkers
*   Providing AI Platform:
    *   Tooling for building AI with the LangChain ecosystem
*   Offering Development Services:

    *   To other organizations to help them bring their AI technology to market


We can display our markdown nicely with `IPython` like so:

In [43]:
from IPython.display import display, Markdown

display(Markdown(out))

# What Does Aurelio AI Do?

Aurelio AI is an AI company that develops tooling for AI engineers, focusing on language AI. They have a strong background in building AI agents and expertise in information retrieval.

## Key Areas of Work

*   Developing open-source frameworks:
    *   Semantic Router
    *   Semantic Chunkers
*   Providing AI Platform:
    *   Tooling for building AI with the LangChain ecosystem
*   Offering Development Services:

    *   To other organizations to help them bring their AI technology to market

This is not bad, but also not quite the format we wanted. We can try fine-tuning and tweaking our prompt instructions further, or we can also provide some examples of what we'd like. Providing examples is what we'd refer to as _few-shot prompting_.

In [54]:
new_system_prompt = """
Answer the user's query based on the context below.                 
If you cannot answer the question using the
provided information answer with "I don't know".

Always answer in markdown format. When doing so please
provide headers, short summaries, follow with bullet
points, then conclude. Here are some examples:


User: Can you explain gravity?
AI: ## Gravity

Gravity is one of the fundamental forces in the universe.

### Discovery

* Gravity was first discovered by Sir Isaac Newton in the late 17th century.
* It was said that Newton theorized about gravity after seeing an apple fall from a tree.

### In General Relativity

* Gravity is described as the curvature of spacetime.
* The more massive an object is, the more it curves spacetime.
* This curvature is what causes objects to fall towards each other.

### Gravitons

* Gravitons are hypothetical particles that mediate the force of gravity.
* They have not yet been detected.

**To conclude**, Gravity is a fascinating topic and has been studied extensively since the time of Newton.


User: What is the capital of France?
AI: ## France

The capital of France is Paris.

### Origins

* The name Paris comes from the Latin word "Parisini" which referred to a Celtic people living in the area.
* The Romans named the city Lutetia, which means "the place where the river turns".
* The city was renamed Paris in the 3rd century BC by the Celtic-speaking Parisii tribe.

### Famous Landmarks

* The Eiffel Tower
* The Louvre
* Notre-Dame Cathedral

**To conclude**, Paris is highly regarded as one of the most beautiful cities in the world and is one of the world's
greatest cultural and economic centres.


Context: {context}
"""

prompt_template.messages[0].prompt.template = new_system_prompt

In [55]:
out = pipeline.invoke({"query": query, "context": context}).content

display(Markdown(out))

**Aurelio AI Overview**

Aurelio AI is an AI company that develops tooling for AI engineers, focusing on language AI and information retrieval.

### Key Areas of Expertise

* **Language AI**: Aurelio AI has strong expertise in building AI agents and provides tools to help engineers build with AI.
* **Information Retrieval**: The company also specializes in information retrieval, providing frameworks and platforms to aid in data analysis and search.
* **AI Platform**: Aurelio AI offers an AI platform that enables engineers to develop and deploy AI solutions.

### Tooling and Frameworks

Aurelio AI's tooling includes:

* **Semantic Router**: A framework for building semantic search systems
* **Semantic Chunkers**: A set of tools for chunking and disambiguating text data
* **LangChain Platform**: An AI platform providing a range of tools and services for language AI development

### Services

Aurelio AI provides development services to organizations, helping them bring their AI technology to market.

**To conclude**, Aurelio AI is a company that specializes in developing tooling and platforms for AI engineers, with a focus on language AI and information retrieval.

We can see that by adding a few examples to our prompt, ie _few-shot prompting_, we can get much more control over the exact structure of our LLM response. As the size of our LLMs increases, the ability of them to follow instructions becomes much greater and they tend to require less explicit prompting as we have shown here. However, even for SotA models like `gpt-4o` few-shot prompting is still a valid technique that can be used if the LLM is struggling to follow our intended instructions.

In [15]:
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate

In [16]:
# This is a prompt template used to format each individual example.
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
# This is our few shot template used to feed our examples into the LLM
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=Examples,
)
# Here we can view the format of our content down below
print(few_shot_prompt.format())

Human: Clouds, Sky, Sun, Space, Planets
AI: Galaxy
Human: Eyes, Face, Human, Animal
AI: Species


In [17]:
# Here we are formatting our final prompt for the AI to use, take note that the input needs to be the same keyword as the input below.
final_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a poem writer."),
        few_shot_prompt,
        ("human", "{input}"),
    ]
)
print(final_prompt.messages)

[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], input_types={}, partial_variables={}, template='You are a poem writer.'), additional_kwargs={}), FewShotChatMessagePromptTemplate(examples=[{'input': 'Clouds, Sky, Sun, Space, Planets', 'output': 'Galaxy'}, {'input': 'Eyes, Face, Human, Animal', 'output': 'Species'}], input_variables=[], input_types={}, partial_variables={}, example_prompt=ChatPromptTemplate(input_variables=['input', 'output'], input_types={}, partial_variables={}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={}), AIMessagePromptTemplate(prompt=PromptTemplate(input_variables=['output'], input_types={}, partial_variables={}, template='{output}'), additional_kwargs={})])), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], input_types={}, partial_variables={}, template='{input}'), additional_kwargs={})]


In [18]:
from langchain.chains import LLMChain

chain = LLMChain(prompt=final_prompt, llm=creative_llm)

chain.invoke({"input": "Fitness, Health, Lifestyle"})

  chain = LLMChain(prompt=final_prompt, llm=creative_llm)


{'input': 'Fitness, Health, Lifestyle', 'text': 'Wellness'}

## Chain of Thought Prompting

Now we want to dive into Chain-Of-Thought prompting, for this there is no direct prompt template however instead what we do is we set the template up to make the AI talk to us about how it will solve the problem, by going through each step rather then just rushing straight to the answer.

In [19]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

Here we will settup the step by step rule to enable chain of thought prompting.

In [20]:
# Define the chain-of-thought prompt template
template = """
Question: {question}

First, list systematically and in detail all the problems in this question
that need to be solved before we can arrive at the correct answer.
Then, solve each sub problem using the answers of previous problems
and reach a final solution

Output: Well the right answer is... """

In [21]:
# Define the chain-of-thought prompt template
template2 = """
Question: {question}

Do not use any chain-of-thought processes to answer the question

Output: Well the wrong answer is... """

Then the rest is the same as a simple prompt, where we feed a basic question into the LLM and it will give us a answer.

In [22]:
prompt = PromptTemplate(template=template, input_variables=["question"])
prompt2 = PromptTemplate(template=template2, input_variables=["question"])

In [23]:
# Load the question-answering chain with the chain-of-thought prompt
chain = LLMChain(prompt = prompt, llm = llm)
# Load the question-answering chain with the chain-of-thought prompt
chain2 = LLMChain(prompt = prompt2, llm = llm)

In [24]:
# Ask a question
question = "James has 7 apples, he eats 4 and is given an additional 19 apples, James gives 15 apples to Josh, and Josh gives James 2 apples, how many apples does James have?" 

In [25]:
result = chain.run(question)

print(result)

  result = chain.run(question)


Here are the problems that need to be solved systematically and in detail:

1. James has 7 apples initially.
2. James eats 4 apples.
3. James receives an additional 19 apples.
4. James gives 15 apples to Josh.
5. James gives 2 apples to Josh.

To solve this problem, we will follow these steps:

**Step 1: Calculate the number of apples James has after eating 4**

James starts with 7 apples and eats 4, so he is left with:
7 - 4 = 3

**Step 2: Add the additional 19 apples that James receives**

Now, James has 3 apples and receives an additional 19 apples. So, we add these two numbers together:
3 + 19 = 22

**Step 3: Subtract the number of apples James gives to Josh**

James gives 15 apples to Josh, so he is left with:
22 - 15 = 7

**Step 4: Add the number of apples James gives to Josh**

James gives 2 apples to Josh, so we add these two numbers together:
7 + 2 = 9

Therefore, the correct answer is: James has 9 apples.


In [26]:
result = chain2.run(question)

print(result)

7 - 4 + 19 + 0 (Josh's apples) + 15 + 2 = 23


As you can see with the amount of steps involved without using a step by step guide, the AI can step over important information that would help it achieve the correct results, however it's important to note that without a super long question the AI can solve alot of these issues without using chain prompting, however it's just in these certain instances where chain prompting can be very useful.