# **1.0** ‎ Installation & Setup

This notebook serves as the foundational setup guide for using **Ollama** to support the development of HuaLaoWei's municipal chatbot assistant. \
It focuses on installing Ollama, pulling suitable Large Language Models (LLMs), and validating their capabilities through quick benchmarking tests. \
The selected models will later be used for tasks such as intent classification, query filtering, and response generation in subsequent notebooks.

### **Introduction:** What is Ollama?
Ollama is a local Large Language Model (LLM) runtime that simplifies the process of downloading, running, and interacting with open-source language models on your own machine. Unlike cloud-hosted services (such as OpenAI or Cohere), Ollama provides privacy, cost-efficiency, and lower latency, making it ideal for local experimentation and development.

### **Why Ollama?**
- **Offline Access:** No internet dependency after download.
- **Custom Model Support:** Easy to pull or fine-tune models.
- **Reduced Latency:** Instant response generation without network delays.
- **Privacy-first:** Sensitive queries remain local.

We will use Ollama to run lightweight and domain-tuned LLMs for intent classification and other downstream tasks in our chatbot.

### **1.0.1** ‎ Installing Ollama

To begin, if you're a macOS or Linux user, you can run the following command in your terminal, or in this notebook:

In [None]:
!curl -fsSL https://ollama.com/install.sh | sh

Then restart your shell or run the below command if you're in a notebook environment (Linux).

In [None]:
!export PATH=$PATH:/usr/local/bin

On Windows, the downloader must be directly downloaded from: https://ollama.com/download \
More detailed installation instructions is available at: https://ollama.com \
To check if Ollama has been successfully installed, you can run the below command:

In [5]:
# Check Ollama installation version
!ollama --version

ollama version is 0.6.5


### **1.0.2** ‎ Serving Ollama

`ollama serve` starts the local Ollama backend service that handles model requests. \
It opens up a local HTTP API (usually on localhost:11434) that tools, scripts, or apps can connect to in order to run prompts, chat, etc. \
As we'll be using LangChain later on, this command helps connect Ollama connect to these services.

In [6]:
!ollama serve

Error: listen tcp 127.0.0.1:11434: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted.


### **1.0.3** ‎ Pull Pre-trained LLMs

As mentioned, Ollama has a large collection of open-source pre-trained LLM models at our disposal. \
You can view their whole collection on their website and documentation here: https://ollama.com. \
For initial testing and experimentation, we will pull several of them. \
Below is a table that provides a general summary of the models we're pulling:

| Model          | Size          | Speed (on CPU)       | Accuracy / Quality             | Use Cases                                         |
|----------------|---------------|----------------------|--------------------------------|--------------------------------------------------------|
| llama3.1     | ~8B / ~70B    | Moderate             | Very High (on 70B)             | General-purpose chat, reasoning, classification, RAG     |
| mistral     | 7B            | Fast                 | Quite High (for its size)      | Lightweight tasks, few-shot classification, code help    |
| chatglm3    | 6B            | Moderate             | Moderate–High (Bilingual)      | Bilingual bots, Q&A, regional language applications       |
| deepseek-r1 | ~7B / ~70B    | Moderate             | Very High (GPT-3.5 class)      | Classification, summarisation, intent-to-code systems    |

Use the following code below to download these models. \
**Note:** Initial installation for each model could take up to several minutes depending on your RAM


In [None]:
# Llama3.1 (Meta)
!ollama pull llama3.1:8b

# Mistral
!ollama pull mistral

# ChatGLM
!ollama pull EntropyYue/chatglm3

# Deepseek Chat
!ollama pull deepseek-r1:7b

Vice versa, if you need to remove any models you've pulled, you can do that using:

In [None]:
!ollama rm <model_name>

deleted 'deepseek-coder:6.7b-instruct'


[?25l[?2026h[?25l[1G[K[?25h[?2026l[2K[1G[?25h


To check if Ollama is running and that the models has been successfully installed, you can execute the following:

In [11]:
import subprocess

def check_ollama_status():
    try:
        output = subprocess.check_output(["ollama", "list"], stderr=subprocess.STDOUT, text=True)
        print("Ollama is running. Installed models:")
        print(output)
    except subprocess.CalledProcessError as e:
        print("Error with Ollama:", e.utput)

check_ollama_status()

Ollama is running. Installed models:
NAME                          ID              SIZE      MODIFIED           
llama3.1:latest               46e0c10c039e    4.9 GB    About a minute ago    
mistral:latest                f974a74358d6    4.1 GB    13 hours ago          
EntropyYue/chatglm3:latest    8f6f34227356    3.6 GB    13 hours ago          
deepseek-r1:7b                0a8c26691023    4.7 GB    3 days ago            



# **1.1** ‎ Testing Ollama

To better understand the performance and suitability of each pulled model for our municipal chatbot, we will run a small set of test prompts across all models. \
These tests can help us evaluate in terms of:

- **Inference Speed**: Time taken to generate a full response.
- **Output Quality**: Fluency, relevance, and structure of the response.
- **Consistency**: How reliably the model answers similar queries.
- **Suitability**: Whether the model aligns with municipal use cases.

This benchmarking helps us identify the right model for different components of our chatbot architecture

In [13]:
import time
import requests

# Define test models and a consistent prompt
models_to_test = [
    "llama3.1:latest", # Follow the "NAME" shown in the output in the previous step
    "mistral:latest",
    "EntropyYue/chatglm3:latest",
    "deepseek-r1:7b"
]

# Test prompt
prompt = "How can AI be used to enhance the municipal reporting service? Give me a clear and concise answer."

# Dictionary to store results
test_results = {}

# Loop through models
for model_name in models_to_test:
    print(f"\nTesting model: {model_name}")

    # Measure inference time
    start_time = time.time()
    response = requests.post(
        "http://localhost:11434/api/generate",
        json={
            "model": model_name,
            "prompt": prompt,
            "stream": False
        }
    )
    end_time = time.time()

    # Extract response
    result = response.json()
    output = result.get("response", "[No response returned]")
    elapsed = round(end_time - start_time, 2)

    # Store in results dictionary
    test_results[model_name] = {
        "response": output,
        "inference_time_sec": elapsed
    }

    print(f"Inference Time: {elapsed} sec")
    print(f"Response:\n{output}")


Testing model: llama3.1:latest
Inference Time: 18.26 sec
Response:
AI can enhance the municipal reporting service in several ways:

1. **Automated data collection**: AI can extract relevant information from various sources, such as social media, news articles, and government websites, reducing manual labor and increasing accuracy.
2. **Predictive analytics**: AI-powered algorithms can analyze historical data to predict future trends, enabling proactive decision-making and resource allocation.
3. **Improved incident classification**: AI can categorize and prioritize reports based on urgency and severity, ensuring timely attention from municipal authorities.
4. **Enhanced citizen engagement**: AI-driven chatbots or virtual assistants can provide citizens with information, updates, and guidance on reporting processes, fostering a more user-friendly experience.
5. **Real-time monitoring**: AI can analyze data streams in real-time to detect anomalies, enabling swift response to emerging is

Here are the initial observations from running the test prompt:

| Model      | Inference Time (s) | Output Quality (Subjective) | Notes                                                                                                                                                                          |
|------------|--------------------|-----------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| llama3.1 (8B)  | 🟨🟨🟨⬜⬜ 18.26              | 🟩🟩🟩🟩🟩 Excellent                   | Ideal for reasoning and chat replies, with decent speed.                                                              |
| mistral    | 🟩🟩🟩🟩⬜ 13.96              | 🟩🟩🟩🟩⬜ Very Good                  | Very fast, could be a solid for classification. tasks                                                                                                                    |
| chatglm3   | 🟩🟩🟩🟩🟩 8.51               | 🟨🟨🟨⬜⬜ Good                      | Extremely fast, good bilingual support; but content lacked depth.                            |
| deepseek-r1 (7B)   | 🟥⬜⬜⬜⬜ 53.25              | 🟩🟩🟩🟩🟩 Excellent                 | Strong performance but concerning speed. Also offers thought process. |
