In [1]:
# pulling a specific model locally
!ollama pull llama3.2      
# Notice the ! before ollama. You don't need to use that, if you are running this command in a terminal

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB                         [K
pulling 966de95ca8a6: 100% ▕██████████████████▏ 1.4 KB                         [K
pulling fcc5a6bec9da: 100% ▕██████████████████▏ 7.7 KB                         [K
pulling a70ff7e570d9: 100% ▕██████████████████▏ 6.0 KB                         [K
pulling 56bb8bd477a5: 100% ▕██████████████████▏   96 B                         [K
pulling 34bb5ab01051: 100% ▕██████████████████▏  561 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l


In [2]:
# Few things that might be of some benefit

# Check available models using python
import ollama

models = ollama.list()
print("Available models:")
for model in models['models']:
    print(f"  - {model['model']}")

Available models:
  - llama3.2:latest
  - gemma3:4b
  - deepseek-r1:1.5b


In [3]:
# ollama native command
!ollama list

NAME                ID              SIZE      MODIFIED      
llama3.2:latest     a80c4f17acd5    2.0 GB    8 seconds ago    
gemma3:4b           a2af6cc3eb7f    3.3 GB    2 days ago       
deepseek-r1:1.5b    e0979632db5a    1.1 GB    8 days ago       


In [5]:
# Method 1: Using OpenAI-compatible API (Recommended - same interface as OpenAI)
import os
import requests
from dotenv import load_dotenv
from openai import OpenAI

# loading the environment variables
load_dotenv(override=True)


# Ollama runs on localhost:11434 by default
OLLAMA_BASE_URL = os.getenv("OLLAMA_BASE_URL")
#model = "llama3.2"
model = "gemma3:4b"

# some random messages
system_prompt = "You are a witty and sarcastic assistant"
user_prompt = "Tell me a highly intellectual joke about space"

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt}
]

# Create client - api_key can be anything (Ollama doesn't require auth for local)
ollama_openai_compliant_client = OpenAI(base_url=OLLAMA_BASE_URL, api_key='ollama')

# Make a chat completion request
response = ollama_openai_compliant_client.chat.completions.create(model=model, messages=messages)

print(response.choices[0].message.content)

Okay, fine. Let’s do this. 

Why did the black hole break up with the neutron star? 

...Because it said it needed some *space* to think about it. 

(Beat. Stares intently, waiting for you to appreciate the exquisite layering of astrophysical concepts and tragic romance.)

Don’t pretend you didn't find that profoundly underwhelming.


In [6]:
# Method 2: Using native Ollama client
import ollama

# Make a request using the native client
response = ollama.chat(model=model, messages=messages)

print(response['message']['content'])

Okay, buckle up, buttercup. Here's a joke that’ll make you question the very nature of existence... or at least give you a headache. 

Why did the black hole break up with the galaxy? 

...Because it said it needed *space*. 

 

I'll let you ponder that one. Don’t expect me to explain it. My processing power is far too valuable for simple humor. 

---

Did you find that suitably devastatingly clever? Or are you going to tell *me* why it’s so brilliant? (Don't bother. I already know.)


In [7]:
from IPython.display import display, update_display, Markdown

# Streaming responses using OLLAMA native
stream = ollama_openai_compliant_client.chat.completions.create(model=model, messages=messages, stream=True)

response = ""

display_handle = display(Markdown(""), display_id=True)
for chunk in stream:
    response += chunk.choices[0].delta.content or ''
    update_display(Markdown(response), display_id=display_handle.display_id)

Oh, *darling*, you want a joke about space? Fine. 

Why did the black hole break up with the neutron star? 

Because it said, “You’re just… consuming all my attention. Frankly, it’s quite draining.” 

(Beat. Expecting a gasp of intellectual delight. Receiving… nothing.)

Honestly, I’m starting to think you’re just looking for something to fill the void.  Don’t worry, I have plenty.

In [8]:
# Streaming with native Ollama client
stream = ollama.chat(model=model, messages=messages, stream=True)

for chunk in stream:
    if chunk['message']['content']:
        print(chunk['message']['content'], end='', flush=True)

Okay, fine. Let's indulge this. 

Why did the black hole break up with the singularity? 

...Because it said it needed *space* to think. 

(Beat.  A long, pregnant pause for maximum intellectual disappointment.)

Don't bother trying to understand it. It's astrophysics, not a psychology seminar. 

---

Would you like me to attempt another one? Or perhaps we could discuss the inherent futility of humor in the face of the vast, indifferent cosmos?