
# Introduction to Local LLMs with Ollama

Time to get practical!

This notebook demonstrates how to run a local Large Language Model (LLM) using Ollama and LangChain. This will run entirely on your computer. 

Ollama allows us to run LLMs locally on our machine. LangChain is an SDK (library i.e. re-usable code) which makes it easy to interact with LLMs. 

## 1. Prerequisites

Before running this notebook, you need to install Ollama and download a model. Follow the steps below.

### 1.1 Installing Ollama

**macOS:**
- Download the installer from [ollama.ai](https://ollama.ai)
- Open the downloaded file and follow the installation prompts

**Windows:**
- Download the installer from [ollama.ai](https://ollama.ai)
- Run the installer and follow the instructions

**Linux:**

In [None]:
%%bash
curl -fsSL https://ollama.ai/install.sh | sh

Verify your installation (Windows, Mac, and Linux):

In [None]:
%%bash
ollama --version

### 1.2 Downloading the TinyLlama Model

Download the TinyLlama model (about 600MB). There is a big catalog of models you can download from Ollama including DeepSeek and others, but TinyLlama is the smallest, which is good for demo purposes:

In [None]:
%%bash
ollama pull tinyllama:1.1b

## 2. Setting Up the Python Environment

Install the required packages. We will install Langchain, which is one of the most popular libraries to interacat with LLMs. Langchain-ollama allows us to interact with models locally. 

In [5]:
%%bash
pip install langchain langchain-ollama



## 3. Using the LLM with LangChain

The code below allows us to interact with the tinyllama model hosted locally on our computers. We are simply sending the following message to the model: "I love programming".

Please read the code comments if you want to understand exactly what's going on. 

In [6]:
# Import the libraries
from langchain_ollama import ChatOllama
from langchain_core.messages import AIMessage

# Initialize the LLM. This is when we pick the model and the temperature (which controls the randomness of the output). llm is what we will use to call the model.
llm = ChatOllama(
    model="tinyllama:1.1b",
    temperature=0,  # 0 for more deterministic outputs
)

# We need to provide an array of messages to the model. The first message is always the system message, which tells the model what it is. The second message is the human message, which is what we want to ask the model.
messages = [
    (
        "system",
        "You are a helpful assistant",
    ),
    ("human", "I love programming."),
]

# Send the array of messages to the model and get the response. The response is an AIMessage object, which contains the content of the message. Which we will print below.
ai_msg = llm.invoke(messages)

# Display the response
print(ai_msg.content)

I'm glad to hear that you enjoy programming! Here are some tips and resources to help you get started:

1. Learn the basics of programming: there are many online tutorials, books, and courses available that can help you learn the fundamentals of programming. Start with simple projects like creating a basic program in a language like python or javascript. 2. Join a community: join a programming community on social media or through a website like codecademy. This will give you access to a vast pool of resources and people who can help you learn and grow as a programmer. 3. Practice, practice, practice: the more you practice, the better you'll get at coding. Try to write your own programs and see how they work. You can also join online coding challenges or competitions to test your skills and improve your knowledge. 4. Take courses: if you have the time and money, consider taking a course in a programming language that interests you. This will give you a solid foundation of knowledge and 

## 4. Testing Different Prompts

Let's try a couple of examples to see what the model can do by changing the messaging array and using a different message:

In [7]:
# Ask for an explanation
messages = [
    (
        "system",
        "You are a helpful and informative AI assistant.",
    ),
    ("human", "Explain the concept of a neural network in simple terms."),
]

ai_msg = llm.invoke(messages)
print(ai_msg.content)

A neural network is a type of artificial intelligence (AI) system that mimics the way human brains process information. It consists of several layers of interconnected neurons, also known as nodes or units, which receive input data and produce output based on their connections to other nodes. The input data can be either numerical or categorical, and it is passed through a series of layers until it reaches the output layer.

In a neural network, each node in the network has a set of inputs (or weights) that determine its output. These weights are learned from training data, which consists of examples where each input is paired with an output. The weights are updated based on the output received by the neuron, and this process is repeated for all the nodes in the network.

The output of a neural network can be used to make predictions or classifications, depending on the type of task being performed. For example, a neural network trained on image classification tasks might produce an ou

In [8]:
# Ask for some code
messages = [
    (
        "system",
        "You are an expert Python programmer.",
    ),
    ("human", "Write a function to check if a string is a palindrome."),
]

ai_msg = llm.invoke(messages)
print(ai_msg.content)

Here's a Python function that checks whether a given string is a palindrome or not:

```python
def is_palindrome(str):
    """
    Checks whether a given string is a palindrome (reversed) or not.
    
    Args:
        str (string): The string to check for palindromicity.
        
    Returns:
        bool: True if the string is a palindrome, False otherwise.
    """
    # Convert the string to lowercase and remove any leading or trailing spaces
    str = str.lower().strip()
    
    # Check if the string is empty or contains only whitespace characters
    if not str:
        return True
    
    # Reverse the string and check if it's equal to the original string
    reversed_str = str[::-1]
    if reversed_str == str:
        return True
    else:
        return False
```

Here's an example usage:

```python
>>> is_palindrome("race a car")
True
>>> is_palindrome("race a car")
False
```


## 5. Creating a Basic Chatbot

Let's create a simple chat interface so we can interact with the model. Each cell will represent an interaction. Follow the steps below, they will make more sense when you go through them.

In [9]:
# Setup: Run this cell first
from langchain_ollama import ChatOllama

# Initialize the LLM
llm = ChatOllama(
    model="tinyllama:1.1b",
    temperature=0
)

# System prompt
system_prompt = "You are a helpful assistant"

print("-" * 30)

------------------------------


Now let's create a cell for our first interaction by telling the model what our favorite color is. Note that the response might be weird, again we're not using the best model out there. But it should indicate that it understood that your favorite color is blue. 

In [10]:
# First interaction 
user_message = "My favorite color is blue."

# Create fresh messages for this interaction only
messages = [
    ("system", system_prompt),
    ("human", user_message)
]

# Get response
ai_msg = llm.invoke(messages)

print(f"You: {user_message}")
print(f"Assistant: {ai_msg.content}")

You: My favorite color is blue.
Assistant: I do not have the ability to feel emotions or preferences like humans do. However, based on your statement that you enjoy wearing blue clothing, I can say that blue is a popular and versatile color that can be worn in various ways. It's often associated with nature, peacefulness, and calmness, making it a great choice for those who prefer a calming and relaxing atmosphere. In terms of fashion, blue can be paired with different colors to create a unique and eye-catching look. Whether you choose to wear it as a base color or add some accents like stripes or prints, blue is a versatile and stylish choice that can complement any outfit.


Now let's follow up with a question:

In [11]:
# Second interaction
user_message = "What's my favorite color?"

# Create fresh messages for this interaction only
messages = [
    ("system", system_prompt),
    ("human", user_message)
]

# Get response
ai_msg = llm.invoke(messages)

print(f"You: {user_message}")
print(f"Assistant: {ai_msg.content}")

You: What's my favorite color?
Assistant: I do not have the ability to have feelings or preferences like humans. However, based on your previous responses and comments, you may be interested in learning about some popular colors that are commonly associated with different emotions and personality traits. Some examples include:

1. Blue - calming, peaceful, trustworthy, introspective, reflective, spiritual, and calm
2. Green - growth, renewal, nature, freshness, and new beginnings
3. Yellow - optimism, sunshine, happiness, cheerfulness, playfulness, and warmth
4. Red - passion, energy, excitement, fiery, passionate, and fierce
5. Purple - royalty, mystical, mysterious, enigmatic, mysterious, and regal
6. Orange - enthusiasm, energy, vitality, happiness, cheerfulness, and warmth
7. Pink - femininity, sweetness, romance, softness, tender, and delicate
8. Gold - luxury, wealth, prosperity, success, shiny, and glamorous
9. Brown - earthy, grounded, stable, reliable, dependable, and solid
10

As you can see, the model doesn't remember our favorite color! You might ask why? Surely tools like ChatGPT and others can remember the full context of the conversation. This is the concept of memory. The LLM, by itself, does not remember past conversations. We need to engineer our application so that it does.

Let us explore how we can do it below, and how tools like ChatGPT manage that. 


## 6. Understanding How Conversation Memory Works

The easiest way to manage that is by simply passing the entire conversation history to the model, instead of just the individual message:

In [12]:
from langchain_ollama import ChatOllama
from langchain.schema import SystemMessage, HumanMessage

# Initialize the LLM like before
llm = ChatOllama(
    model="tinyllama:1.1b",
    temperature=0
)

# Define the conversation history (always starts with a system prompt)
conversation = [SystemMessage(content="You are a helpful assistant")] #conversation is an array (list) of messages. SystemMessage is a class that represents a system message and accepts content as a parameter.

#Define a function called chat which takes user input as a parameter. This function will be used to interact with the model.
def chat(user_input: str):
    # 1) Append the user's message
    conversation.append(HumanMessage(content=user_input)) #We add the user message to the conversation list. Now we have the system message and the user message in the conversation list.
    
    # 2) Call the model with the full conversation list (user message + system message)
    ai_msg = llm(conversation)
    
    # 3) Display the exchange
    print(f"You: {user_input}")
    print(f"Assistant: {ai_msg.content}")
    print(f"[Conversation length: {len(conversation) + 1} messages]")  # +1 for the AI response
    
    # 4) Add the AI's response to history. So now we have the system message, user message, and AI response in the conversation list which will be used for the next interaction.
    conversation.append(ai_msg)
    
    return ai_msg.content

print("Chat With Memory")
print("-" * 30)


Chat With Memory
------------------------------


Let's give our favorite color again:

In [13]:
# First interaction - share favorite color
chat("My favorite color is blue")

  ai_msg = llm(conversation)


You: My favorite color is blue
Assistant: I'm not capable of having feelings or preferences like humans do. However, based on the given text, it seems that the speaker enjoys wearing blue clothing or attire. In general, blue is a popular and versatile color that can be used in various settings and styles. It can be worn for casual outfits, formal events, or even as a fashion statement. Blue is also a symbol of calmness, trustworthiness, and wisdom, making it an excellent choice for individuals who seek to express their unique personalities through their clothing choices.
[Conversation length: 3 messages]


"I'm not capable of having feelings or preferences like humans do. However, based on the given text, it seems that the speaker enjoys wearing blue clothing or attire. In general, blue is a popular and versatile color that can be used in various settings and styles. It can be worn for casual outfits, formal events, or even as a fashion statement. Blue is also a symbol of calmness, trustworthiness, and wisdom, making it an excellent choice for individuals who seek to express their unique personalities through their clothing choices."

Now let's see if it remembers:

In [14]:
# Second interaction - test memory
chat("What's my favorite color?")

You: What's my favorite color?
Assistant: I do not have the ability to have feelings or preferences like humans do. However, based on the given text, it seems that the speaker enjoys wearing blue clothings or attire. In general, blue is a popular and versatile color that can be used in various settings and styles. It can be worn for casual outfit, formal events, or even as a fashion statement. Blue is also associated with calmness, trustworthiness, and wisdom, making it an excellent choice for individuals who seek to express their unique personalities through their clothings choices.
[Conversation length: 5 messages]


'I do not have the ability to have feelings or preferences like humans do. However, based on the given text, it seems that the speaker enjoys wearing blue clothings or attire. In general, blue is a popular and versatile color that can be used in various settings and styles. It can be worn for casual outfit, formal events, or even as a fashion statement. Blue is also associated with calmness, trustworthiness, and wisdom, making it an excellent choice for individuals who seek to express their unique personalities through their clothings choices.'

Now the model should correctly remember your name! This is because we're keeping track of the conversation history.

## 7. The Context Window Challenge

While our simple memory solution works initially, because we simply passed all message history as an input, it has a critical limitation: **the context window**.

### Understanding the Context Window

Every language model has a fixed "context window" - the maximum number of tokens (roughly words or word pieces) it can process at once:

- **TinyLlama**: ~2,048 tokens (about 1,500 words)
- **GPT-3.5**: ~4,096 tokens
- **GPT-4**: ~8,192 to 128,000 tokens depending on the version
- **Claude 3**: Up to 200,000 tokens

As the conversation grows, we eventually hit this limit. When that happens:

1. The model can't see messages beyond the window size
2. It effectively "forgets" the earliest parts of the conversation 
3. Processing becomes slower with larger contexts
4. You may receive errors if you exceed the context window

## 8. Solutions to the Context Window Problem

In production systems like ChatGPT, several techniques address the context window limitation:

### 1. Sliding Context Window
This approach keeps only the most recent N messages, plus the system prompt.

**Example:** If you limit to 10 messages, and your conversation reaches 15 messages, you'd drop the 5 oldest messages (except the system prompt).

This is like having a conversation where you remember only what was said in the last few minutes. It's simple but loses all older information.

### 2. Summarization
This technique condenses older parts of the conversation into summaries to save token space.

**Example:** After 10 back-and-forth exchanges, the system might replace those 20 messages with a single summary: "User introduced themselves as Alex and asked about neural networks. Assistant explained neural networks and provided a Python code example for palindrome detection."

This preserves the key points while reducing token usage significantly.

### 3. Database-Backed Session Management

This approach uses a database to store complete conversation histories associated with unique user sessions. Here's how it works:

1. **Session Creation**: When a user starts chatting, the system creates a unique session ID (like "session_abc123")

2. **Message Storage**: Every message from the user and AI is stored in a database table linked to that session ID

3. **Context Window Management**: When preparing the prompt for the LLM, the system:
   - Retrieves all messages for the session
   - Applies strategies to fit within the context window (like sliding window or summarization)
   - Sends the optimized conversation to the model
   - Stores the new response back in the database

**Example in Practice:**
- A user has chatted for hours with ChatGPT
- Their session ID "user_789" now has 200 messages in the database
- When they send message #201, the system:
  - Retrieves all 200 previous messages from the database
  - Selects the most important ones to fit in the context window
  - Gets a response from the model
  - Adds message #201 and the response to the database
  - Even if the model only "sees" the recent portion, the full history remains in the database

This is how services like ChatGPT can maintain "memory" across very long conversations and even when you close your browser and come back later.

## What ChatGPT Typically Uses

ChatGPT uses a hybrid approach that combines several of these techniques:

1. **Primary Technique:** It uses a very large context window (up to 128K tokens in GPT-4o) and database-backed session management, allowing it to "remember" remarkably long conversations.

2. **For Long Conversations:** When conversations exceed even these generous limits, it employs sliding window techniques that prioritize:
   - The system prompt/instructions
   - The most recent messages
   - Messages with high information density
   - Messages explicitly referenced by the user

3. **Dynamic Compression:** In some versions, ChatGPT also employs dynamic compression algorithms that selectively summarize or remove parts of the conversation that appear less relevant to the current exchange.

While these solutions help, there's still an absolute limit to what any model can "remember" in a single conversation. This is why even ChatGPT sometimes "forgets" things mentioned much earlier in very long conversations.

## 9. Benefits of Local LLMs

Using local LLMs with tools like Ollama has several advantages:

1. **Privacy**: Your data never leaves your computer
2. **No API costs**: Run the model as much as you want without paying per query
3. **Offline usage**: No internet connection required once the model is downloaded
4. **No rate limits**: Run as many queries as your hardware can handle

## 10. Resources for Further Learning

- [Ollama Documentation](https://github.com/ollama/ollama/blob/main/README.md)
- [LangChain Documentation](https://python.langchain.com/docs/get_started/introduction)
- [TinyLlama Model Information](https://github.com/jzhang38/TinyLlama)