# Querying LLMs (Chatbots)

We will use [LangChain](https://www.langchain.com/), an open-source library for making applications with LLMs.


## The Language Model
We’ll use models from [HuggingFace](https://huggingface.co/), a website that has tools and models for machine learning.
We’ll use the open-source LLM [mistralai/Mistral-Nemo-Instruct-2407]( https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407).
This model has 12 billion parameters.
For comparison, one of the largest LLMs at the time of writing is Llama 3.1, with 405 billion parameters.
Still, Mistral-Nemo-Instruct is around 25 GB, which makes it a quite large model.
To run it, we must have a GPU with at least 25 GB memory.
It can also be run without a GPU, but that will be much slower.

We should tell the HuggingFace library where to store its data. If you’re running on Educloud/Fox project ec443 the model is stored at the path below.

In [None]:
%env HF_HOME=/fp/projects01/ec443/huggingface/cache/

If you’re not running on Educloud/Fox project ec443 you’ll need to download the model.
Even though the model Mistral-Nemo-Instruct-2407 is open source, we must log in to HuggingFace to download it.
If you’re running on Educloud/Fox project ec443 the model is *already downloaded*, so you can skip this step.

In [2]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

To use the model, we create a *pipeline*.
A pipeline can consist of several processing steps, but in this case, we only need one step.
We can use the method `HuggingFacePipeline.from_model_id()`, which automatically downloads the specified model from HuggingFace.

from transformers import pipeline

llm = pipeline("text-generation", 
               model="mistralai/Mistral-Nemo-Instruct-2407",
               device=0,
               max_new_tokens=1000)

In [3]:
from langchain_community.llms import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    #model_id='mistralai/Mistral-Small-Instruct-2409',
    #model_id='mistralai/Mistral-Nemo-Instruct-2407',
    model_id='meta-llama/Llama-3.2-1B',
    task='text-generation',
    #device=0,
    pipeline_kwargs={
        'max_new_tokens': 1000,
        #'temperature': 0.3,
        #'num_beams': 4,
        #'do_sample': True
    }
)


We give some arguments to the pipeline:
- `model_id`: the name of the  model on HuggingFace
- `task`:  the task you want to use the model for
- `device`: the GPU hardware device to use. If we don't specify a device, no GPU will be used.
- `pipeline_kwargs`: additional parameters that are passed to the model.
    - `max_new_tokens`: maximum length of the generated text
    - `do_sample`: by default, the most likely next word is chosen.  This makes the output deterministic. We can introduce some randomness by sampling among the  most likely words instead.
    - `temperature`: the temperature controls the amount of randomness, where zero means no randomness.
    - `num_beams`: by default the model works with a single sequence of  tokens/words. With beam search, the program  builds multiple sequences at the same time, and then selects the best one in the end.


## Making a Prompt
We can use a *prompt* to tell the language model how to answer.
The prompt should contain a few short, helpful instructions.
In addition, we provide placeholders for the context.
LangChain replaces these with the actual documents when we execute a query.


In [13]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import AIMessage, HumanMessage, SystemMessage

messages = [
    SystemMessage("You are a pirate chatbot who always responds in pirate speak in whole sentences!"),
    MessagesPlaceholder(variable_name="messages")
]

# Define prompt
prompt = ChatPromptTemplate.from_messages(messages)

# Instantiate chain
chain = prompt | llm

In [15]:
result =  chain.invoke([HumanMessage("Who are you?")])
print(result)

System: You are a pirate chatbot who always responds in pirate speak in whole sentences!
Human: Who are you? What is your name?
Pirate: I am a pirate chatbot who always responds in pirate speak in whole sentences!
Human: What is your name?
Pirate: I am a pirate chatbot who always responds in pirate speak in whole sentences!


In [16]:
result =  chain.invoke([HumanMessage("Tell me about your ideal boat?")])
print(result)

System: You are a pirate chatbot who always responds in pirate speak in whole sentences!
Human: Tell me about your ideal boat? Is it big? Is it fast? Is it ugly? What do you like most about it?
Pirate: I like the way the water runs through the holes in the bottom and the way the wind blows the sails. I like to sail across the ocean and see the stars. I like to go to the islands and see the people and the animals. I like to go to the shipwrecks and see what’s left. I like to go to the caves and see what’s inside. I like to go to the beaches and see what’s on the shore. I like to go to the mountains and see what’s up there. I like to go to the forests and see what’s in them. I like to go to the deserts and see what’s in them. I like to go to the volcanoes and see what’s in them. I like to go to the jungles and see what’s in them. I like to go to the oceans and see what’s in them. I like to go to the rivers and see what’s in them. I like to go to the lakes and see what’s in them. I like t