# Q1. Running Ollama with Docker
What's the version of ollama client?

`ollama version is 0.1.48`

# Q2. Downloading an LLM

In [1]:
from pprint import pprint

pprint({"schemaVersion":2,"mediaType":"application/vnd.docker.distribution.manifest.v2+json","config":{"mediaType":"application/vnd.docker.container.image.v1+json","digest":"sha256:887433b89a901c156f7e6944442f3c9e57f3c55d6ed52042cbb7303aea994290","size":483},"layers":[{"mediaType":"application/vnd.ollama.image.model","digest":"sha256:c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12","size":1678447520},{"mediaType":"application/vnd.ollama.image.license","digest":"sha256:097a36493f718248845233af1d3fefe7a303f864fae13bc31a3a9704229378ca","size":8433},{"mediaType":"application/vnd.ollama.image.template","digest":"sha256:109037bec39c0becc8221222ae23557559bc594290945a2c4221ab4f303b8871","size":136},{"mediaType":"application/vnd.ollama.image.params","digest":"sha256:22a838ceb7fb22755a3b0ae9b4eadde629d19be1f651f73efb8c6b4e2cd0eea0","size":84}]})

{'config': {'digest': 'sha256:887433b89a901c156f7e6944442f3c9e57f3c55d6ed52042cbb7303aea994290',
            'mediaType': 'application/vnd.docker.container.image.v1+json',
            'size': 483},
 'layers': [{'digest': 'sha256:c1864a5eb19305c40519da12cc543519e48a0697ecd30e15d5ac228644957d12',
             'mediaType': 'application/vnd.ollama.image.model',
             'size': 1678447520},
            {'digest': 'sha256:097a36493f718248845233af1d3fefe7a303f864fae13bc31a3a9704229378ca',
             'mediaType': 'application/vnd.ollama.image.license',
             'size': 8433},
            {'digest': 'sha256:109037bec39c0becc8221222ae23557559bc594290945a2c4221ab4f303b8871',
             'mediaType': 'application/vnd.ollama.image.template',
             'size': 136},
            {'digest': 'sha256:22a838ceb7fb22755a3b0ae9b4eadde629d19be1f651f73efb8c6b4e2cd0eea0',
             'mediaType': 'application/vnd.ollama.image.params',
             'size': 84}],
 'mediaType': 'application/vnd.d

# Q3. Running the LLM


In [2]:
!rm -f minsearch.py
!wget https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py

--2024-06-29 11:03:08--  https://raw.githubusercontent.com/alexeygrigorev/minsearch/main/minsearch.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3832 (3.7K) [text/plain]
Saving to: ‘minsearch.py’


2024-06-29 11:03:08 (11.6 MB/s) - ‘minsearch.py’ saved [3832/3832]



In [3]:
import requests 
import minsearch

docs_url = 'https://github.com/DataTalksClub/llm-zoomcamp/blob/main/01-intro/documents.json?raw=1'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

documents = []

for course in documents_raw:
    course_name = course['course']

    for doc in course['documents']:
        doc['course'] = course_name
        documents.append(doc)

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

index.fit(documents)

<minsearch.Index at 0x716b000be380>

In [4]:
def search(query):
    boost = {'question': 3.0, 'section': 0.5}

    results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5
    )

    return results

In [25]:
def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    
    prompt = prompt_template.format(question=query, context=context).strip()
    return prompt

def llm(prompt):
    response = client.chat.completions.create(
        model='gemma:2b',
        messages=[{"role": "user", "content": prompt}],
        temperature=0.0
    )
    
    return response.choices[0].message.content


In [6]:
def rag(query):
    search_results = search(query)
    prompt = build_prompt(query, search_results)
    answer = llm(prompt)
    return answer

In [21]:
from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',
)

In [12]:
llm('write that this is a test')

"Sure, here's a test:\n\n"

In [16]:
llm('10 * 10')

'The user has requested a calculation of 10 multiplied by 10.\n\nSure, here is the calculation:\n\n10 * 10 = 100\n\nTherefore, the answer is 100.'

# Q4. Donwloading the weights
What's the size of the `ollama_files/models` folder?

`1.6G    ./ollama_files/models`

# Q5. Adding the weights
What do you put after `COPY`?

`COPY ollama_files .ollama`

# Q6. Serving it

In [26]:
prompt = "What's the formula for energy?"
llm(prompt)

"Sure, here's the formula for energy:\n\n**E = K + U**\n\nWhere:\n\n* **E** is the energy in joules (J)\n* **K** is the kinetic energy in joules (J)\n* **U** is the potential energy in joules (J)\n\n**Kinetic energy (K)** is the energy an object possesses when it moves or is in motion. It is calculated as half the product of an object's mass (m) and its velocity (v) squared:\n\n**K = 1/2 * m * v^2**\n\n**Potential energy (U)** is the energy an object possesses when it is in a position or has a specific configuration. It is calculated as the product of an object's mass and the gravitational constant (g) multiplied by the height or distance of the object from a reference point.\n\n**Gravitational potential energy (U)** is given by the formula:\n\n**U = mgh**\n\nWhere:\n\n* **m** is the mass of the object in kilograms (kg)\n* **g** is the acceleration due to gravity in meters per second squared (m/s^2)\n* **h** is the height or distance of the object in meters (m)\n\nThe formula for energ

In [33]:
import tiktoken 
encoding = tiktoken.encoding_for_model("gpt-4o")

In [34]:
len(encoding.encode(llm(prompt)))

283

How many completion tokens did you get in response?

`283`