# LangChain: The LLM Application Framework

LangChain, an open-source library, empowers developers by providing a standardized and structured interface for building and integrating various components of an LLM Application. Its model-agnostic nature allows for compatibility with models from multiple LLM providers, including OpenAI, HuggingFace, and others. 

Using Langchain allows us to build ("like a chain") reusable components as part of complex multi-step LLM-based applications clearly and succinctly. 

You can learn about different [LangChain components here.](https://python.langchain.com/v0.2/docs/concepts/#components)

This tutorial will focus on a few LangChain components and learn about `chaining`, one of its powerful features.

## Prompt templates

[Prompt Templates](https://python.langchain.com/v0.2/docs/concepts/#prompt-templates) provides templates for designing prompts fed as inputs to the LLM models.
It helps us design templates with multiple inputs that are parameterized and reusable.

```{note}
For this tutorial, we will only cover the use of `String PromptTemplates` as this gives us a finer control over the template string structure, unlike the `ChatPromptTemplate`.
```

Below is an example of how to use a prompt template. We can import this class from the `langchain-core` package.

The `langchain-core` package contains base abstractions of different components and ways to compose them together.

In [1]:
from langchain_core.prompts import PromptTemplate

### String PromptTemplates

The [String PromptTemplates](https://python.langchain.com/v0.2/docs/concepts/#string-prompttemplates) is used to format a string input. By default, the template takes Python `f-string` format. There are currently 2 choices of `template_format` available: `f-string` and `jinja2`. Later we will see the use of `jinja2` format. In the example below, we will use the `f-string` format.

In [2]:
prompt_template = PromptTemplate.from_template(
    "{planet_name} in the solar system is the "
)

prompt_template.format(planet_name="Mars")

'Mars in the solar system is the '

Let's instantiate our OLMo model like in the [previous section](./2-llms-and-prompt-engineering-with-olmo.ipynb#introduction) of the tutorial with `llama-cpp-python`.

In [3]:
from llama_cpp import Llama
from ssec_tutorials import download_olmo_model
from ssec_tutorials.scipy_conf import parse_text_generation_response

In [4]:
OLMO_MODEL = (
    download_olmo_model()
)  # It won't actually download again if it's already there
olmo = Llama(model_path=str(OLMO_MODEL), verbose=False)

Model already exists at /Users/lsetiawan/.cache/ssec_tutorials/OLMo-7B-Instruct-Q4_K_M.gguf


Now that we have our model ready to go, let's try different prompt templating starting from the previous prompt template with an input of `planet_name`.

In [5]:
model_response = olmo(
    prompt=prompt_template.format(planet_name="Mars"),
    temperature=0.2,
    max_tokens=8,
    echo=True,
)  # Generate a completion, can also call olmo.create_completion

In [6]:
print(parse_text_generation_response(model_response))

Mars in the solar system is the 
4th largest planet from the sun


In [7]:
# Another example
prompt_template = PromptTemplate.from_template(
    "{entity_1} of the planet {entity_2} is "
)
prompt_template.format(entity_1="Size", entity_2="Earth")

'Size of the planet Earth is '

In [8]:
model_response = olmo(
    prompt=prompt_template.format(entity_1="Size", entity_2="Earth"),
    temperature=0.2,
    echo=True,
)

In [9]:
print(parse_text_generation_response(model_response))

Size of the planet Earth is 
5,147 kilometers or 3,158 miles. The diameter of the planet


#### Your turn 😎

Create a `StringPromptTemplate` that outputs some text generation prompt, for example, "Sun is part of galaxy ...".

Feel free to experiment with the built in [Python `f-string` ](https://docs.python.org/3.11/tutorial/inputoutput.html#formatted-string-literals) for the `prompt` input argument to the model.

In [10]:
# Write your prompt_template and model_response code here

## LLM Interface

LangChain have implemented a [`Runnable`](https://api.python.langchain.com/en/stable/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable) protocol that allows us to create custom "chains".
This protocol has a standard interface for defining and invoking various LLMs, PromptTemplates, and other components, enabling reusability.
For more details, go to LangChain's [Runnable documentation](https://python.langchain.com/v0.2/docs/concepts/#runnable-interface).

```{note}
In this tutorial, you will see the use of `.invoke` method on various LangChain's object.
This is essentially using that standard interface for the `Runnable` protocol.
```

Loading the model via [LangChain's LlamaCpp](https://python.langchain.com/v0.2/docs/integrations/llms/llamacpp/) abstraction enables us to use the `chaining` feature. This class is part of the `langchain-community` package, which contains third party integrations that are maintained by the LangChain community.

In [11]:
from langchain_community.llms import LlamaCpp

In [12]:
olmo = LlamaCpp(
    model_path=str(OLMO_MODEL),
    temperature=0.8,
    verbose=False,
)

As you can see below, we now have a `LlamaCpp` Langchain object rather than the `Llama` llama-cpp-python object from previous sections.

In [13]:
type(olmo)

langchain_community.llms.llamacpp.LlamaCpp

We learned above about the `Runnable` protocol. Let's see how we can invoke the model using the standard interface compared to how we originally invoked the model with `llama-cpp-python`.

In [14]:
answer = olmo.invoke("What is the meaning of life?")

In [15]:
print(answer)


In one of his famous essays, Samuel Johnson replied to this question with a question: “What is the purpose of life?” He answered his own question by saying that the purpose of life was a philosophical problem, which could not be answered by an essay or a book.
Johnson’s response reflects a common attitude towards the meaning of life. Many people assume that the meaning of life is an intellectual or philosophical question that can only be answered through the study of philosophy, religion, or science. However, this view overlooks the fact that the meaning of life is not just a theoretical concept but also a practical one.
The meaning of life has real-world implications for our lives and for society as a whole. It affects how we live, what we do, and why we do it. For example, if the meaning of life is simply to be happy or to achieve success, then many people will focus on those goals without considering the broader implications of their actions. This can lead to a shallow and meaningl

If you'd like to access the base object `Llama` object from the `llama-cpp-python` package, you can access it via the `.client` attribute of the `LlamaCpp` object.

In [16]:
type(olmo.client)

llama_cpp.llama.Llama

With access to the underlying `Llama` object, you can directly retrieve any metadata information. In this example, we are retrieving OLMo's tokenizer chat template we saw in the [previous notebook](./2-llms-and-prompt-engineering-with-olmo.ipynb#chat-completion) to setup a String PromptTemplate.

The built in model's chat template is using [`jinja2`](https://jinja.palletsprojects.com/en/3.1.x/) templating syntax, which is a popular templating engine for Python.

In [17]:
prompt_template = PromptTemplate.from_template(
    template=olmo.client.metadata["tokenizer.chat_template"], template_format="jinja2"
)

The `PromptTemplate` object has 2 main attributes that are very useful to explore the built-in prompt template of the model:
- `input_variables`: This is a list of all the input variables that the prompt template expects.
- `template`: This is the actual template string that the model uses.

In [18]:
prompt_template.input_variables

['add_generation_prompt', 'eos_token', 'messages']

For this particular template, we can see that it expects `add_generation_prompt`, `eos_token` and `messages`. But what are the variable types for these inputs? What do they mean?

We can answer the questions above by looking at the template string itself. The template string is using the jinja2 templating engine syntax, so it may look confusing at first, but at the end of the day it's essentially just some python code in a template string.

In [19]:
print(prompt_template.template)

{{ eos_token }}{% for message in messages %}
{% if message['role'] == 'user' %}
{{ '<|user|>
' + message['content'] }}
{% elif message['role'] == 'assistant' %}
{{ '<|assistant|>
'  + message['content'] + eos_token }}
{% endif %}
{% if loop.last and add_generation_prompt %}
{{ '<|assistant|>' }}
{% endif %}
{% endfor %}


As we can see above, the template reads as follows:
- `eos_token` is a string that is added at the top of the resulting string after prompt is formatted.
You can also see that `eos_token` is used to append `content` string values from an `assistant` `role`.
You can find this value by going to the Model's [`tokenizer_config.json`](https://huggingface.co/allenai/OLMo-7B-Instruct-hf/blob/main/tokenizer_config.json#L233) file and looking for the `eos_token` key. *Unfornately, this is currently the only way to get this information, you can go to https://github.com/ggerganov/llama.cpp/issues/5040 for more details.* In our case, the `eos_token` is `<|endoftext|>`.
- `messages` is a list of dictionary that is iterated over. As you can see that this dictionary should contain a `role` and `content` key.
- `add_generation_prompt` is a boolean that is used to determine whether to add a generation prompt or not. In this case, when it's the last message and `add_generation_prompt` is `True`, it will add `<|assistant|>` string to the end of the prompt.

Now that we know what the template expects we can create the final prompt string by passing in the expected input variables, this time, instead of using the `.format` method, let's see what happens if we use the `.invoke` method on the `PromptTemplate` object.

In [20]:
prompt_template.invoke(
    messages=[
        {
            "role": "user",
            "content": "You are a helpful assistant. Tell me a joke about cats",
        }
    ],
    add_generation_prompt=True,
    eos_token="<|endoftext|>",
)

TypeError: BasePromptTemplate.invoke() got an unexpected keyword argument 'messages'

As you can see, this results to an error if we pass in the input variables directly. This is because the `.invoke` method expects an input argument called `input` that is a **dictionary** of the input variables, which will be passed into the runnable.
Also, there's a `config` input argument that is a `RunnableConfig` object, however, this is optional and can be omitted,
and you will see that we will use this later when invoking the model.

In [21]:
?prompt_template.invoke

[0;31mSignature:[0m
[0mprompt_template[0m[0;34m.[0m[0minvoke[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0minput[0m[0;34m:[0m [0;34m'Dict'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mconfig[0m[0;34m:[0m [0;34m'Optional[RunnableConfig]'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;34m'PromptValue'[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Transform a single input into an output. Override to implement.

Args:
    input: The input to the runnable.
    config: A config to use when invoking the runnable.
       The config supports standard keys like 'tags', 'metadata' for tracing
       purposes, 'max_concurrency' for controlling how much work to do
       in parallel, and other keys. Please refer to the RunnableConfig
       for more details.

Returns:
    The output of the runnable.
[0;31mFile:[0m      ~/mambaforge/envs/ssec-scipy2024/lib/python3.11/site-packages/langchain_core/prompts/base.py
[0;31mType:[

Let's try again, this time with the correct input type.

In [22]:
prompt_value = prompt_template.invoke(
    input=dict(
        messages=[
            {
                "role": "user",
                "content": "You are a helpful assistant. Tell me a joke about cats",
            }
        ],
        add_generation_prompt=True,
        eos_token="<|endoftext|>",
    )
)

You can see below that we get [`StringPromptValue`](https://api.python.langchain.com/en/latest/prompt_values/langchain_core.prompt_values.StringPromptValue.html) object this time as the output rather than pure string. But we can still get the string value by calling the `.to_string` method on the `StringPromptValue` object.

In [23]:
prompt_value.to_string()

'<|endoftext|>\n\n<|user|>\nYou are a helpful assistant. Tell me a joke about cats\n\n\n<|assistant|>\n\n'

The output string above contains the necessary signifier tokens for the OLMo Model to understand what the user input is and where the model should put generated responses. This whole string output will then become the full prompt for the model.

```{note}
For the rest of the tutorial, we won't be using `.invoke` method on the `PromptTemplate` object, but rather we will use the `.format` method to get the final prompt string. This is more straightforward and easier to understand. The walkthrough above is just to show you how to use the `.invoke` method.
```

## Chain in LangChain

Chaining allows us to combine multiple components, as described above, in series or parallel to develop a multi-step LLM pipeline.
As shown in the image below, any number of components can be linked together to form a chain.

![LancChain Chain](../../images/langchain-chain.webp)


Image Source: [www.analyticsvidhya.com](https://www.analyticsvidhya.com/blog/2023/10/a-comprehensive-guide-to-using-chains-in-langchain/)

Internally, the chain works like below:

STEP 1: Dictionary is processed as an input to the prompt template.  
STEP 2: Prompt Template reads the variables to form the prompt text as output - "What are stars and moon?"  
STEP 3: The prompt is given as input to the LLM model.  
STEP 4: LLM Model produces output.  
STEP 5: The output goes through StrOutputParser that parses it into string and gives the result.  

We can use the pipe operator ("|"), which is part of the [LCEL(Lang Chain Expression Language)](https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language-lcel). The pipe operator sequentially arranges each component, similar to the above image.

In [24]:
llm_chain = prompt_template | olmo

When we check the type of the resulting chain, it's just a `RunnableSequence`! So, essentially, it's a series of runnables that are executed in sequence.

In [25]:
type(llm_chain)

langchain_core.runnables.base.RunnableSequence

In [26]:
llm_chain

PromptTemplate(input_variables=['add_generation_prompt', 'eos_token', 'messages'], template="{{ eos_token }}{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n'  + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}", template_format='jinja2')
| LlamaCpp(verbose=False, client=<llama_cpp.llama.Llama object at 0x125cbb6d0>, model_path='/Users/lsetiawan/.cache/ssec_tutorials/OLMo-7B-Instruct-Q4_K_M.gguf')

Like other `Runnable` type, it has an `invoke` method that expects the same `input` and `config` arguments as we've seen before with the `LLM` and `PromptTemplate` objects.

In [27]:
?llm_chain.invoke

[0;31mSignature:[0m
[0mllm_chain[0m[0;34m.[0m[0minvoke[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0minput[0m[0;34m:[0m [0;34m'Input'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mconfig[0m[0;34m:[0m [0;34m'Optional[RunnableConfig]'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m**[0m[0mkwargs[0m[0;34m:[0m [0;34m'Any'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;34m'Output'[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Transform a single input into an output. Override to implement.

Args:
    input: The input to the runnable.
    config: A config to use when invoking the runnable.
       The config supports standard keys like 'tags', 'metadata' for tracing
       purposes, 'max_concurrency' for controlling how much work to do
       in parallel, and other keys. Please refer to the RunnableConfig
       for more details.

Returns:
    The output of the runnable.
[0;31mFile:[0m      ~/mambaforge/envs/ssec

Just like the example above, we'll need to pass in the input variables as a dictionary.

In [28]:
# Construct the prompt as expected by OLMo
llm_chain.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": "You are a helpful assistant. Tell me a joke about cats",
            }
        ],
        "add_generation_prompt": True,
        "eos_token": "<|endoftext|>",
    }
)

" Why don't cats play poker in the jungle? There are too many predators!\n\n — Jim Benton ☕️🐱🌊 (May The Beans Be With You)"

Instead of having to invoke `llm_chain` repeatedly with `add_generation_prompt` and `eos_token`, we can update our `prompt_template`.

In [29]:
# Create a prompt template using OLMo's tokenizer chat template we saw in module 1, but this time use partial variables.
prompt_template = PromptTemplate.from_template(
    template=olmo.client.metadata["tokenizer.chat_template"],
    template_format="jinja2",
    partial_variables={"add_generation_prompt": True, "eos_token": "<|endoftext|>"},
)

In [30]:
llm_chain = prompt_template | olmo

Let's stream the output instead of waiting for OLMo to generate and display the text.
We can use [Callbacks](https://python.langchain.com/v0.2/docs/concepts/#callbacks) to subscribe to various events in your LLM application pipeline. 
Check [this out](https://api.python.langchain.com/en/latest/callbacks/langchain_core.callbacks.base.BaseCallbackHandler.html) for a list of events.

Below, we will use the [`StreamingStdOutCallbackHander`](https://api.python.langchain.com/en/latest/callbacks/langchain_core.callbacks.streaming_stdout.StreamingStdOutCallbackHandler.html#langchain-core-callbacks-streaming-stdout-streamingstdoutcallbackhandler) to stream the output to the console.
To do this, we can pass in a dictionary to the `config` argument of the `invoke` method, with a `callbacks` key that contains a list of callback handlers, to see all the options, checkout the [`RunnableConfig`](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.config.RunnableConfig.html#langchain_core.runnables.config.RunnableConfig) documentation.

In [31]:
from langchain_core.callbacks import StreamingStdOutCallbackHandler

In [32]:
llm_chain.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": "You are a helpful assistant. Tell me a joke about cats",
            }
        ]
    },
    config={"callbacks": [StreamingStdOutCallbackHandler()]},
)

 Sure, here's a cute cat joke for you:


Why don't cats like Wi-Fi? Because they prefer the old-school method of tracking down live cables to play with! 😊🐱💬 Remember, sharing is caring, but not when it comes to your wireless connection. 📶🌍❓

" Sure, here's a cute cat joke for you:\n\n\nWhy don't cats like Wi-Fi? Because they prefer the old-school method of tracking down live cables to play with! 😊🐱💬 Remember, sharing is caring, but not when it comes to your wireless connection. 📶🌍❓"

We will cover more LangChain concepts in upcoming notebooks. 

#### Your turn 😎

Try different messages value(s) and see how the output changes. But remember to follow the template structure.
The dictionary keys must contain `role` and `content` and the allowed `role` values are only `user` and `assistant`.

In [None]:
# Write your llm_chain.invoke code here, feel free to also, create your own template and try partial_variables