# Lab 1: Text Generation with Local LLMs

In this lab, you'll learn how to generate text using **Ollama**, a tool that lets you run large language models locally on your machine - completely free.

## What You'll Learn
- How to interact with local LLMs using Ollama
- Basic text generation and completion
- Code generation
- Structured output generation
- Streaming responses

## Prerequisites
1. Install Ollama: `curl -fsSL https://ollama.com/install.sh | sh`
2. Pull a model: `ollama pull llama3.2`
3. Install Python package: `pip install ollama`

## 1. Setup and Basic Generation

In [None]:
# Install the ollama package if not already installed
!pip install ollama -q

In [None]:
import ollama

# Check available models
models = ollama.list()
print("Available models:")
for model in models['models']:
    print(f"  - {model['name']}")

In [None]:
# Basic text generation
response = ollama.chat(
    model='llama3.2',
    messages=[{
        'role': 'user',
        'content': 'What is machine learning? Explain in 2 sentences.'
    }]
)

print(response['message']['content'])

## 2. Different Generation Tasks

In [None]:
# Text Summarization
long_text = """
Artificial intelligence (AI) is intelligence demonstrated by machines, as opposed to natural 
intelligence displayed by animals including humans. AI research has been defined as the field 
of study of intelligent agents, which refers to any system that perceives its environment and 
takes actions that maximize its chance of achieving its goals. The term "artificial intelligence" 
had previously been used to describe machines that mimic and display "human" cognitive skills 
that are associated with the human mind, such as "learning" and "problem-solving". This 
definition has since been rejected by major AI researchers who now describe AI in terms of 
rationality and acting rationally, which does not limit how intelligence can be articulated.
"""

response = ollama.chat(
    model='llama3.2',
    messages=[{
        'role': 'user',
        'content': f'Summarize this text in one sentence:\n\n{long_text}'
    }]
)

print("Summary:", response['message']['content'])

In [None]:
# Question Answering
context = """
The Python programming language was created by Guido van Rossum and first released in 1991.
Python's design philosophy emphasizes code readability with the use of significant indentation.
Python is dynamically typed and garbage-collected. It supports multiple programming paradigms,
including structured, object-oriented and functional programming.
"""

question = "Who created Python and when was it released?"

response = ollama.chat(
    model='llama3.2',
    messages=[{
        'role': 'user',
        'content': f'Based on this context:\n{context}\n\nAnswer this question: {question}'
    }]
)

print("Answer:", response['message']['content'])

## 3. Code Generation

In [None]:
# Generate Python code
response = ollama.chat(
    model='llama3.2',
    messages=[{
        'role': 'user',
        'content': 'Write a Python function that checks if a number is prime. Include docstring and type hints.'
    }]
)

print(response['message']['content'])

In [None]:
# Code explanation
code_to_explain = """
def fibonacci(n):
    a, b = 0, 1
    for _ in range(n):
        yield a
        a, b = b, a + b
"""

response = ollama.chat(
    model='llama3.2',
    messages=[{
        'role': 'user',
        'content': f'Explain what this Python code does line by line:\n{code_to_explain}'
    }]
)

print(response['message']['content'])

## 4. Streaming Responses

In [None]:
# Streaming allows you to see the response as it's generated
print("Streaming response:")
stream = ollama.chat(
    model='llama3.2',
    messages=[{
        'role': 'user',
        'content': 'Write a short poem about programming.'
    }],
    stream=True
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)
print()  # newline at the end

## 5. System Prompts and Personas

In [None]:
# Using system prompts to control behavior
response = ollama.chat(
    model='llama3.2',
    messages=[
        {
            'role': 'system',
            'content': 'You are a helpful coding assistant. Always provide code examples and explain your reasoning step by step.'
        },
        {
            'role': 'user',
            'content': 'How do I read a CSV file in Python?'
        }
    ]
)

print(response['message']['content'])

In [None]:
# Creating a specialized assistant
response = ollama.chat(
    model='llama3.2',
    messages=[
        {
            'role': 'system',
            'content': 'You are a SQL expert. Respond only with SQL queries, no explanations unless asked.'
        },
        {
            'role': 'user',
            'content': 'Get all users who signed up in the last 30 days from a users table'
        }
    ]
)

print(response['message']['content'])

## 6. Conversation History (Multi-turn Chat)

In [None]:
# Multi-turn conversation
messages = [
    {'role': 'user', 'content': 'My name is Alex and I\'m learning Python.'},
]

response = ollama.chat(model='llama3.2', messages=messages)
print("Assistant:", response['message']['content'])

# Add assistant's response to history
messages.append(response['message'])

# Continue the conversation
messages.append({'role': 'user', 'content': 'What\'s my name and what am I learning?'})

response = ollama.chat(model='llama3.2', messages=messages)
print("\nAssistant:", response['message']['content'])

## 7. Model Parameters

In [None]:
# Adjusting temperature for creativity
prompt = "Write a creative name for a coffee shop"

# Low temperature = more deterministic
response_low = ollama.chat(
    model='llama3.2',
    messages=[{'role': 'user', 'content': prompt}],
    options={'temperature': 0.1}
)
print("Low temperature (0.1):", response_low['message']['content'])

# High temperature = more creative/random
response_high = ollama.chat(
    model='llama3.2',
    messages=[{'role': 'user', 'content': prompt}],
    options={'temperature': 1.5}
)
print("High temperature (1.5):", response_high['message']['content'])

## 8. JSON Output

In [None]:
import json

# Request structured JSON output
response = ollama.chat(
    model='llama3.2',
    messages=[{
        'role': 'user',
        'content': '''Extract information from this text and return as JSON:
        "John Smith is a 35-year-old software engineer from San Francisco who works at TechCorp."
        
        Return JSON with fields: name, age, profession, city, company'''
    }],
    format='json'
)

result = json.loads(response['message']['content'])
print(json.dumps(result, indent=2))

## 9. Try Different Models

Ollama supports many open-source models. Here are some you can try:

| Model | Size | Best For |
|-------|------|----------|
| llama3.2 | 3B | General purpose, fast |
| llama3.1 | 8B | Better quality, slower |
| mistral | 7B | Good balance of speed/quality |
| codellama | 7B | Code generation |
| phi3 | 3.8B | Fast, good for simple tasks |

Pull a new model with: `ollama pull <model-name>`

In [None]:
# Compare different models on the same task
prompt = "Explain quantum computing in one sentence."

for model_name in ['llama3.2']:  # Add more models you've pulled
    try:
        response = ollama.chat(
            model=model_name,
            messages=[{'role': 'user', 'content': prompt}]
        )
        print(f"\n{model_name}:")
        print(response['message']['content'])
    except Exception as e:
        print(f"{model_name}: Not available - run 'ollama pull {model_name}' first")

## Summary

In this lab, you learned how to:
- Use Ollama to run LLMs locally
- Generate text, summaries, and answers
- Generate and explain code
- Stream responses in real-time
- Use system prompts to control behavior
- Maintain conversation history
- Adjust model parameters like temperature
- Get structured JSON output

**Next Lab:** Building a RAG system with ChromaDB