# Functions and Methods in Python

## Learning Objectives
By the end of this section, you will be able to:
- Define and execute custom functions to organize code
- Use the return statement to get values from functions
- Call functions with different argument types (positional, keyword, default)
- Understand and use objects and methods with dot notation
- Apply functions and methods in AI/RAG/Agentic AI workflows

## Why This Matters: Real-World AI/RAG/Agentic Applications

**In AI Systems:**
- Functions organize data preprocessing pipelines (cleaning, tokenization, normalization)
- Custom functions wrap model inference calls and handle responses
- Methods on model objects (`.predict()`, `.generate()`) perform AI operations

**In RAG Pipelines:**
- Functions encapsulate retrieval logic (search, rank, filter)
- Methods on vector database objects (`.query()`, `.insert()`) manage embeddings
- Custom functions combine retrieved context with prompts

**In Agentic AI:**
- Functions define tool capabilities that agents can call
- Methods on agent objects (`.run()`, `.step()`) execute agent loops
- Return values pass data between agent actions

## Prerequisites
- Basic Python variables and data types
- Understanding of strings and string methods
- Familiarity with print statements

---

## Instructor Activity 1
**Concept**: Defining and executing basic custom functions

Functions are reusable blocks of code that perform specific tasks. They help organize code, avoid repetition, and make programs easier to understand.

**Basic syntax:**
```python
def function_name():
    # code to execute
    pass
```

### Example 1: Simple Function Without Parameters

**Problem**: Create a function that prints a greeting message

**Expected Output**: `Hello! Welcome to Python programming.`

In [None]:
# Empty cell for live demonstration

<details>
<summary>Solution</summary>

```python
# Define the function using 'def' keyword
def greet():
    print("Hello! Welcome to Python programming.")

# Call the function to execute it
greet()
# Output: Hello! Welcome to Python programming.
```

**Why this works:**
- `def greet():` defines a function named `greet` with no parameters
- The indented code block is the function body
- `greet()` calls/executes the function
- Nothing happens until you call the function

</details>

### Example 2: Function With Parameters

**Problem**: Create a function that greets a specific person by name

**Expected Output**: `Hello, Alice! Nice to meet you.`

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# Function with a parameter (name)
def greet_person(name):
    print(f"Hello, {name}! Nice to meet you.")

# Call with different arguments
greet_person("Alice")
# Output: Hello, Alice! Nice to meet you.

greet_person("Bob")
# Output: Hello, Bob! Nice to meet you.
```

**Why this works:**
- `name` is a parameter - a variable that receives a value when the function is called
- When we call `greet_person("Alice")`, `"Alice"` becomes the value of `name`
- The function can be called multiple times with different arguments

</details>

### Example 3: Function With Multiple Parameters

**Problem**: Create a function that formats a person's full information

**Expected Output**: `Name: Alice, Age: 25, City: New York`

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# Function with multiple parameters
def display_person_info(name, age, city):
    print(f"Name: {name}, Age: {age}, City: {city}")

# Call with positional arguments (order matters!)
display_person_info("Alice", 25, "New York")
# Output: Name: Alice, Age: 25, City: New York

display_person_info("Bob", 30, "London")
# Output: Name: Bob, Age: 30, City: London
```

**Why this works:**
- Multiple parameters are separated by commas
- When calling, arguments are matched to parameters by position
- First argument goes to first parameter, second to second, etc.
- This is called "positional arguments"

</details>

---

## Learner Activity 1
**Practice**: Defining and executing basic custom functions

### Exercise 1: Create a Simple Function

**Task**: Define a function called `introduce` that prints "I am learning Python!"

**Expected Output**: `I am learning Python!`

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Define the function
def introduce():
    print("I am learning Python!")

# Call the function
introduce()
# Output: I am learning Python!
```

**Why this works:**
The function encapsulates the print statement and can be called whenever needed.

</details>

### Exercise 2: Function With One Parameter

**Task**: Create a function called `describe_ai_model` that takes a model name as a parameter and prints "The [model_name] is a powerful AI model."

**Expected Output** (when called with "GPT-4"): `The GPT-4 is a powerful AI model.`

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Define function with one parameter
def describe_ai_model(model_name):
    print(f"The {model_name} is a powerful AI model.")

# Call with different model names
describe_ai_model("GPT-4")
# Output: The GPT-4 is a powerful AI model.

describe_ai_model("Claude")
# Output: The Claude is a powerful AI model.
```

**Why this works:**
The parameter `model_name` acts as a placeholder that gets replaced with the actual argument when the function is called.

</details>

### Exercise 3: Function With Multiple Parameters

**Task**: Create a function called `describe_vector` that takes three parameters (dimension, database, purpose) and prints "Vector: [dimension]D, Database: [database], Purpose: [purpose]"

**Expected Output** (when called with 1536, "Pinecone", "RAG"): `Vector: 1536D, Database: Pinecone, Purpose: RAG`

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Define function with three parameters
def describe_vector(dimension, database, purpose):
    print(f"Vector: {dimension}D, Database: {database}, Purpose: {purpose}")

# Call with positional arguments
describe_vector(1536, "Pinecone", "RAG")
# Output: Vector: 1536D, Database: Pinecone, Purpose: RAG

describe_vector(768, "Weaviate", "Semantic Search")
# Output: Vector: 768D, Database: Weaviate, Purpose: Semantic Search
```

**Why this works:**
Multiple parameters allow functions to work with several pieces of data. The arguments are matched to parameters by position.

</details>

---

## Instructor Activity 2
**Concept**: Return statements - getting values from functions

So far, our functions have printed values. But often we want functions to **calculate and return** values that we can use in our program.

**Key difference:**
- `print()` displays output to the screen
- `return` sends a value back to the caller that can be stored in a variable

### Example 1: Basic Return Statement

**Problem**: Create a function that calculates and returns the square of a number

**Expected Output**: `25` (when called with 5)

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# Function that returns a value
def square(number):
    result = number ** 2
    return result  # Send the result back to the caller

# Call the function and store the returned value
squared_value = square(5)
print(squared_value)
# Output: 25

# Can use the returned value in calculations
doubled = square(3) * 2
print(doubled)
# Output: 18 (9 * 2)
```

**Why this works:**
- `return result` sends the calculated value back
- The returned value can be stored in a variable (`squared_value`)
- The returned value can be used in expressions
- When a function hits `return`, it immediately exits

</details>

### Example 2: Return vs Print Comparison

**Problem**: Compare a function that prints vs one that returns

**Expected Output**: See the difference in behavior

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# Function that prints (no return)
def add_and_print(a, b):
    result = a + b
    print(result)
    # No return - function returns None by default

# Function that returns
def add_and_return(a, b):
    result = a + b
    return result

# Test the difference
x = add_and_print(3, 5)  # Prints: 8
print("Stored value:", x)  # Prints: Stored value: None

y = add_and_return(3, 5)  # No output yet
print("Stored value:", y)  # Prints: Stored value: 8

# You can use the returned value in calculations
z = add_and_return(10, 20) + 5
print(z)  # Prints: 35
```

**Why this works:**
- `add_and_print` displays a value but returns `None` (nothing usable)
- `add_and_return` gives back a value that can be stored and used
- **In AI/RAG systems**, we need functions to return values so we can pass them to the next step

</details>

### Example 3: Returning Computed Strings

**Problem**: Create a function that formats a prompt for an AI model and returns it

**Expected Output**: `System: You are a helpful assistant\nUser: What is Python?`

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# Function that builds and returns a formatted prompt
def create_prompt(system_message, user_message):
    # Combine system and user messages in a standard format
    prompt = f"System: {system_message}\nUser: {user_message}"
    return prompt

# Use the function to create prompts
ai_prompt = create_prompt(
    "You are a helpful assistant",
    "What is Python?"
)

print(ai_prompt)
# Output:
# System: You are a helpful assistant
# User: What is Python?

# The returned value can be used later
another_prompt = create_prompt(
    "You are a code reviewer",
    "Review this function"
)
print("\n" + another_prompt)
```

**Why this works:**
- The function constructs a string and returns it
- The returned prompt can be stored and used later (e.g., sent to an AI API)
- This pattern is common in RAG systems for formatting queries

</details>

---

## Learner Activity 2
**Practice**: Return statements - getting values from functions

### Exercise 1: Simple Return

**Task**: Create a function called `multiply` that takes two numbers and returns their product

**Expected Output** (when called with 4, 7): `28`

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Define function that returns a value
def multiply(a, b):
    result = a * b
    return result

# Test the function
product = multiply(4, 7)
print(product)
# Output: 28

# Can use in expressions
total = multiply(3, 5) + multiply(2, 4)
print(total)
# Output: 23 (15 + 8)
```

**Why this works:**
The function calculates the product and returns it, allowing us to use the result in further calculations.

</details>

### Exercise 2: Return Formatted String

**Task**: Create a function called `format_document_metadata` that takes a title and page count, and returns a formatted string: "Document: [title], Pages: [page_count]"

**Expected Output** (when called with "AI Guide", 42): `Document: AI Guide, Pages: 42`

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Function that returns formatted string
def format_document_metadata(title, page_count):
    metadata = f"Document: {title}, Pages: {page_count}"
    return metadata

# Use the function
doc_info = format_document_metadata("AI Guide", 42)
print(doc_info)
# Output: Document: AI Guide, Pages: 42

# Store for later use
another_doc = format_document_metadata("Python Handbook", 156)
print(another_doc)
# Output: Document: Python Handbook, Pages: 156
```

**Why this works:**
Returning formatted strings allows us to build data that can be stored, processed, or passed to other functions - essential in RAG pipelines.

</details>

### Exercise 3: Calculate Embedding Cost

**Task**: Create a function called `calculate_embedding_cost` that takes the number of tokens and returns the cost (assume $0.0001 per token)

**Expected Output** (when called with 10000): `1.0`

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Function that calculates and returns cost
def calculate_embedding_cost(tokens):
    cost_per_token = 0.0001
    total_cost = tokens * cost_per_token
    return total_cost

# Calculate costs for different token counts
cost1 = calculate_embedding_cost(10000)
print(f"Cost for 10,000 tokens: ${cost1}")
# Output: Cost for 10,000 tokens: $1.0

cost2 = calculate_embedding_cost(50000)
print(f"Cost for 50,000 tokens: ${cost2}")
# Output: Cost for 50,000 tokens: $5.0

# Calculate total cost for multiple documents
total = calculate_embedding_cost(10000) + calculate_embedding_cost(20000)
print(f"Total cost: ${total}")
# Output: Total cost: $3.0
```

**Why this works:**
Returning numeric values allows us to perform calculations and track costs - crucial for real-world AI applications.

</details>

---

## Instructor Activity 3
**Concept**: Different ways to call functions (positional, keyword, default arguments)

Python provides flexible ways to pass arguments to functions. Understanding these patterns helps write more maintainable code.

### Example 1: Keyword Arguments

**Problem**: Call a function using keyword arguments for clarity

**Expected Output**: Clear, readable function calls

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# Function with multiple parameters
def create_rag_query(question, max_results, similarity_threshold):
    return f"Query: '{question}', Max: {max_results}, Threshold: {similarity_threshold}"

# Method 1: Positional arguments (order matters)
result1 = create_rag_query("What is RAG?", 5, 0.8)
print(result1)
# Output: Query: 'What is RAG?', Max: 5, Threshold: 0.8

# Method 2: Keyword arguments (order doesn't matter, more readable)
result2 = create_rag_query(
    question="What is RAG?",
    similarity_threshold=0.8,
    max_results=5
)
print(result2)
# Output: Query: 'What is RAG?', Max: 5, Threshold: 0.8

# Method 3: Mix positional and keyword (positional must come first)
result3 = create_rag_query("What is RAG?", max_results=5, similarity_threshold=0.8)
print(result3)
# Output: Query: 'What is RAG?', Max: 5, Threshold: 0.8
```

**Why this works:**
- Keyword arguments make code self-documenting
- You can put keyword arguments in any order
- Positional arguments (if any) must come before keyword arguments
- This is very common in AI/ML libraries (e.g., `model.predict(data, temperature=0.7)`)

</details>

### Example 2: Default Parameter Values

**Problem**: Create a function with default values that can be overridden

**Expected Output**: Function works with or without all arguments

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# Function with default parameter values
def generate_text(prompt, model="gpt-3.5-turbo", temperature=0.7, max_tokens=100):
    return f"Prompt: '{prompt}' | Model: {model} | Temp: {temperature} | Tokens: {max_tokens}"

# Call with only required argument (others use defaults)
result1 = generate_text("Hello world")
print(result1)
# Output: Prompt: 'Hello world' | Model: gpt-3.5-turbo | Temp: 0.7 | Tokens: 100

# Override some defaults
result2 = generate_text("Hello world", temperature=0.9)
print(result2)
# Output: Prompt: 'Hello world' | Model: gpt-3.5-turbo | Temp: 0.9 | Tokens: 100

# Override all defaults
result3 = generate_text(
    "Hello world",
    model="gpt-4",
    temperature=0.5,
    max_tokens=200
)
print(result3)
# Output: Prompt: 'Hello world' | Model: gpt-4 | Temp: 0.5 | Tokens: 200
```

**Why this works:**
- Parameters with `=` have default values
- If you don't provide a value, the default is used
- You can override any or all defaults
- This makes functions flexible and easy to use
- **In AI systems**, defaults provide sensible configurations that users can customize

</details>

### Example 3: Combining All Argument Types

**Problem**: Create a flexible function for document retrieval

**Expected Output**: Function handles various calling patterns

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# Realistic RAG function with required and optional parameters
def retrieve_documents(query, collection_name, top_k=5, min_score=0.7, include_metadata=True):
    """
    Retrieve documents from a vector database.
    
    Required: query, collection_name
    Optional: top_k, min_score, include_metadata (have defaults)
    """
    result = f"Searching '{collection_name}' for: '{query}'"
    result += f" | Top {top_k} results | Min score: {min_score}"
    result += f" | Metadata: {include_metadata}"
    return result

# Minimal call (only required arguments)
print(retrieve_documents("What is AI?", "docs"))
# Output: Searching 'docs' for: 'What is AI?' | Top 5 results | Min score: 0.7 | Metadata: True

# Override some defaults with keyword arguments
print(retrieve_documents("What is AI?", "docs", top_k=10))
# Output: Searching 'docs' for: 'What is AI?' | Top 10 results | Min score: 0.7 | Metadata: True

# Override multiple defaults, any order for keyword args
print(retrieve_documents(
    query="What is AI?",
    collection_name="docs",
    include_metadata=False,
    min_score=0.85
))
# Output: Searching 'docs' for: 'What is AI?' | Top 5 results | Min score: 0.85 | Metadata: False
```

**Why this works:**
- Required parameters come first (no defaults)
- Optional parameters have defaults
- Users can customize behavior without remembering all parameters
- This pattern mirrors real AI/RAG libraries (LangChain, LlamaIndex, etc.)

</details>

---

## Learner Activity 3
**Practice**: Different ways to call functions (positional, keyword, default arguments)

### Exercise 1: Use Keyword Arguments

**Task**: Create a function `send_email` with parameters (recipient, subject, body). Call it using keyword arguments in a different order than defined.

**Expected Output**: Function works regardless of keyword argument order

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Define function
def send_email(recipient, subject, body):
    return f"To: {recipient}\nSubject: {subject}\nBody: {body}"

# Call with keyword arguments in different order
email1 = send_email(
    body="Let's discuss the project.",
    recipient="alice@example.com",
    subject="Project Update"
)
print(email1)
# Output:
# To: alice@example.com
# Subject: Project Update
# Body: Let's discuss the project.

# Another call with different order
email2 = send_email(
    subject="Meeting",
    body="See you tomorrow.",
    recipient="bob@example.com"
)
print("\n" + email2)
```

**Why this works:**
Keyword arguments allow flexible ordering, making code more readable and less error-prone.

</details>

### Exercise 2: Function With Defaults

**Task**: Create a function `configure_agent` with parameters (name, model="gpt-3.5-turbo", temperature=0.7). Call it multiple times with different combinations of arguments.

**Expected Output**: Function uses defaults when arguments not provided

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Function with default parameters
def configure_agent(name, model="gpt-3.5-turbo", temperature=0.7):
    return f"Agent: {name} | Model: {model} | Temperature: {temperature}"

# Call with only required argument
agent1 = configure_agent("Assistant")
print(agent1)
# Output: Agent: Assistant | Model: gpt-3.5-turbo | Temperature: 0.7

# Override one default
agent2 = configure_agent("Researcher", model="gpt-4")
print(agent2)
# Output: Agent: Researcher | Model: gpt-4 | Temperature: 0.7

# Override all defaults
agent3 = configure_agent("Creative", model="gpt-4", temperature=0.9)
print(agent3)
# Output: Agent: Creative | Model: gpt-4 | Temperature: 0.9

# Use keyword arguments to skip the middle parameter
agent4 = configure_agent("Precise", temperature=0.3)
print(agent4)
# Output: Agent: Precise | Model: gpt-3.5-turbo | Temperature: 0.3
```

**Why this works:**
Default parameters make functions flexible - you only specify what you want to change from the defaults.

</details>

### Exercise 3: Realistic RAG Function

**Task**: Create a function `search_knowledge_base` with:
- Required: query, database_name
- Optional with defaults: max_results=10, filter_by="relevance", include_sources=True

Call it 3 different ways to demonstrate flexibility.

**Expected Output**: Different configurations based on provided arguments

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Realistic RAG search function
def search_knowledge_base(query, database_name, max_results=10, filter_by="relevance", include_sources=True):
    result = f"Searching {database_name}: '{query}'"
    result += f" | Max results: {max_results}"
    result += f" | Filter: {filter_by}"
    result += f" | Sources: {include_sources}"
    return result

# Way 1: Minimal (only required arguments, use all defaults)
search1 = search_knowledge_base("What is machine learning?", "ml_docs")
print(search1)
# Output: Searching ml_docs: 'What is machine learning?' | Max results: 10 | Filter: relevance | Sources: True

# Way 2: Override some defaults
search2 = search_knowledge_base(
    "What is machine learning?",
    "ml_docs",
    max_results=5,
    include_sources=False
)
print(search2)
# Output: Searching ml_docs: 'What is machine learning?' | Max results: 5 | Filter: relevance | Sources: False

# Way 3: All keyword arguments (any order)
search3 = search_knowledge_base(
    database_name="ml_docs",
    query="What is machine learning?",
    filter_by="date",
    max_results=20,
    include_sources=True
)
print(search3)
# Output: Searching ml_docs: 'What is machine learning?' | Max results: 20 | Filter: date | Sources: True
```

**Why this works:**
This pattern (required + optional with defaults) is the foundation of most AI/ML APIs. It provides flexibility while maintaining simplicity.

</details>

---

## Instructor Activity 4
**Concept**: Objects and methods - understanding dot notation

In Python, almost everything is an **object**. Objects have **methods** - functions that belong to the object and can act on its data.

**Syntax**: `object.method()`

You've already used methods! String methods like `.upper()`, `.lower()`, `.split()` are all methods.

### Example 1: String Object Methods

**Problem**: Use various string methods to understand dot notation

**Expected Output**: See how methods operate on the object

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# String is an object with methods
text = "hello world"

# Call methods on the string object using dot notation
print(text.upper())        # Convert to uppercase
# Output: HELLO WORLD

print(text.capitalize())   # Capitalize first letter
# Output: Hello world

print(text.replace("world", "Python"))  # Replace substring
# Output: hello Python

print(text.split())        # Split into list of words
# Output: ['hello', 'world']

# Methods can be chained
result = text.upper().replace("WORLD", "PYTHON")
print(result)
# Output: HELLO PYTHON
```

**Why this works:**
- `text` is a string object
- `.upper()`, `.capitalize()`, etc. are methods that belong to string objects
- The dot (`.`) connects the object to its method
- Methods can take parameters (like `.replace(old, new)`)
- You can chain methods: the result of one method becomes the object for the next

</details>

### Example 2: List Object Methods

**Problem**: Use list methods to modify and query lists

**Expected Output**: Lists modified using their methods

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# List is an object with methods
documents = ["doc1.txt", "doc2.txt"]

# Add items using methods
documents.append("doc3.txt")  # Add to end
print("After append:", documents)
# Output: After append: ['doc1.txt', 'doc2.txt', 'doc3.txt']

documents.insert(1, "doc_new.txt")  # Insert at position 1
print("After insert:", documents)
# Output: After insert: ['doc1.txt', 'doc_new.txt', 'doc2.txt', 'doc3.txt']

# Remove items
documents.remove("doc_new.txt")
print("After remove:", documents)
# Output: After remove: ['doc1.txt', 'doc2.txt', 'doc3.txt']

# Sort the list
documents.sort()
print("After sort:", documents)
# Output: After sort: ['doc1.txt', 'doc2.txt', 'doc3.txt']

# Count occurrences
documents.append("doc1.txt")
count = documents.count("doc1.txt")
print("Count of doc1.txt:", count)
# Output: Count of doc1.txt: 2
```

**Why this works:**
- Lists are objects with their own set of methods
- Some methods modify the list in place (`.append()`, `.sort()`)
- Some methods return information (`.count()`)
- Each data type has methods appropriate for working with that type

</details>

### Example 3: Understanding Object-Method Relationship

**Problem**: Demonstrate that different object types have different methods

**Expected Output**: See how methods are type-specific

In [None]:
# Empty cell for demonstration

<details>
<summary>Solution</summary>

```python
# Different types have different methods

# String object
text = "Python for AI"
print("String methods:")
print(text.lower())        # String has .lower()
print(text.split())        # String has .split()

# List object
items = [3, 1, 2]
print("\nList methods:")
items.sort()               # List has .sort()
print(items)
items.append(4)            # List has .append()
print(items)

# Dictionary object (coming up in future lessons!)
config = {"model": "gpt-4", "temp": 0.7}
print("\nDictionary methods:")
print(config.keys())       # Dictionary has .keys()
print(config.values())     # Dictionary has .values()

# Note: You can't use string methods on lists or vice versa
# items.upper()  # This would cause an error!
# text.append("!")  # This would also cause an error!
```

**Why this works:**
- Each object type (string, list, dict, etc.) has its own set of methods
- Methods are designed for the specific type of data
- **In AI/RAG systems**: Objects like `vectorstore.query()`, `model.generate()`, `agent.run()` follow this pattern
- Understanding objects and methods is crucial for using AI libraries

</details>

---

## Learner Activity 4
**Practice**: Objects and methods - understanding dot notation

### Exercise 1: String Method Chain

**Task**: Take the string "  hello world  " and:
1. Remove leading/trailing spaces with `.strip()`
2. Convert to uppercase
3. Replace "WORLD" with "PYTHON"
4. Do all in one chained statement

**Expected Output**: `HELLO PYTHON`

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Original string with spaces
text = "  hello world  "

# Method chaining: output of each method becomes input to next
result = text.strip().upper().replace("WORLD", "PYTHON")
print(result)
# Output: HELLO PYTHON

# Same thing, step by step to understand:
step1 = text.strip()              # "hello world"
step2 = step1.upper()             # "HELLO WORLD"
step3 = step2.replace("WORLD", "PYTHON")  # "HELLO PYTHON"
print(step3)
# Output: HELLO PYTHON
```

**Why this works:**
Method chaining works because each method returns a string object, which can have methods called on it.

</details>

### Exercise 2: List Methods for Document Processing

**Task**: Create a list `processed_docs = []` and:
1. Add "intro.pdf" using `.append()`
2. Add "chapter1.pdf" using `.append()`
3. Add "appendix.pdf" at the beginning using `.insert(0, ...)`
4. Sort the list using `.sort()`
5. Print the final list

**Expected Output**: `['appendix.pdf', 'chapter1.pdf', 'intro.pdf']`

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Start with empty list
processed_docs = []

# Add documents using list methods
processed_docs.append("intro.pdf")
print("After first append:", processed_docs)
# Output: After first append: ['intro.pdf']

processed_docs.append("chapter1.pdf")
print("After second append:", processed_docs)
# Output: After second append: ['intro.pdf', 'chapter1.pdf']

processed_docs.insert(0, "appendix.pdf")  # Insert at position 0 (beginning)
print("After insert:", processed_docs)
# Output: After insert: ['appendix.pdf', 'intro.pdf', 'chapter1.pdf']

processed_docs.sort()  # Sort alphabetically
print("After sort:", processed_docs)
# Output: After sort: ['appendix.pdf', 'chapter1.pdf', 'intro.pdf']
```

**Why this works:**
List methods like `.append()`, `.insert()`, and `.sort()` modify the list in place - they change the original list object.

</details>

### Exercise 3: Understanding Methods vs Functions

**Task**: Compare using a method vs a function:
1. Create a list `numbers = [3, 1, 4, 1, 5]`
2. Use the `.count()` method to count how many times 1 appears
3. Use the `len()` function to get the list length
4. Explain the difference in your code comments

**Expected Output**: Count = 2, Length = 5

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
# Create a list
numbers = [3, 1, 4, 1, 5]

# METHOD: .count() belongs to the list object
# Syntax: object.method()
count_of_ones = numbers.count(1)
print(f"Count of 1: {count_of_ones}")
# Output: Count of 1: 2

# FUNCTION: len() is a built-in function that takes an object as argument
# Syntax: function(object)
length = len(numbers)
print(f"Length: {length}")
# Output: Length: 5

# Key difference:
# - Methods are called ON objects using dot notation: object.method()
# - Functions are called WITH objects as arguments: function(object)

# More examples:
text = "hello"
print(text.upper())    # Method: text.upper()
print(len(text))       # Function: len(text)
```

**Why this works:**
- **Methods** are functions that belong to objects (use dot notation)
- **Functions** are standalone and take objects as arguments
- Both are useful in different contexts
- **In AI/RAG**: You'll use both - `model.generate()` (method) and `len(documents)` (function)

</details>

---

## Optional Extra Practice
**Challenge yourself with these problems that integrate all the concepts**

### Challenge 1: Build a Text Processing Pipeline

**Task**: Create a function called `preprocess_document` that:
- Takes parameters: text, remove_punctuation=True, to_lowercase=True
- If remove_punctuation is True, replace all "." and "," with ""
- If to_lowercase is True, convert to lowercase
- Return the processed text

Test with: `"Hello, World. How are you?"`

**Expected Output** (with defaults): `hello world how are you?`

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
def preprocess_document(text, remove_punctuation=True, to_lowercase=True):
    """
    Preprocess text for NLP/RAG pipelines.
    """
    # Start with original text
    processed = text
    
    # Remove punctuation if requested
    if remove_punctuation:
        processed = processed.replace(".", "")
        processed = processed.replace(",", "")
    
    # Convert to lowercase if requested
    if to_lowercase:
        processed = processed.lower()
    
    return processed

# Test with different configurations
original = "Hello, World. How are you?"

# With defaults (all preprocessing)
result1 = preprocess_document(original)
print("Full preprocessing:", result1)
# Output: Full preprocessing: hello world how are you?

# Keep punctuation
result2 = preprocess_document(original, remove_punctuation=False)
print("Keep punctuation:", result2)
# Output: Keep punctuation: hello, world. how are you?

# Keep original case
result3 = preprocess_document(original, to_lowercase=False)
print("Keep case:", result3)
# Output: Keep case: Hello World How are you?

# No preprocessing
result4 = preprocess_document(original, remove_punctuation=False, to_lowercase=False)
print("No preprocessing:", result4)
# Output: No preprocessing: Hello, World. How are you?
```

**Why this works:**
This demonstrates a realistic text preprocessing function:
- Combines multiple concepts: functions, parameters, defaults, conditionals, string methods
- Default parameters provide sensible behavior
- Users can customize as needed
- Similar to preprocessing in real RAG pipelines

</details>

### Challenge 2: RAG Query Builder

**Task**: Create a function `build_rag_query` that:
- Takes: user_question, context_chunks (list), model="gpt-3.5-turbo"
- Combines context chunks into a single string using the `.join()` method
- Returns a formatted prompt: "Context: [combined_context]\n\nQuestion: [user_question]\n\nAnswer:"

Test with:
```python
chunks = ["Python is a programming language.", "It is used for AI and data science."]
question = "What is Python used for?"
```

**Expected Output**: Properly formatted RAG prompt

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
def build_rag_query(user_question, context_chunks, model="gpt-3.5-turbo"):
    """
    Build a RAG (Retrieval Augmented Generation) prompt.
    
    Args:
        user_question: The user's question
        context_chunks: List of retrieved context strings
        model: AI model to use (default: gpt-3.5-turbo)
    
    Returns:
        Formatted prompt string ready for AI model
    """
    # Combine context chunks using join() method
    # join() connects list items with the separator (space in this case)
    combined_context = " ".join(context_chunks)
    
    # Build the prompt
    prompt = f"Context: {combined_context}\n\n"
    prompt += f"Question: {user_question}\n\n"
    prompt += "Answer:"
    
    return prompt

# Test the function
chunks = [
    "Python is a programming language.",
    "It is used for AI and data science.",
    "Python has a simple syntax."
]
question = "What is Python used for?"

rag_prompt = build_rag_query(question, chunks)
print(rag_prompt)
# Output:
# Context: Python is a programming language. It is used for AI and data science. Python has a simple syntax.
# 
# Question: What is Python used for?
# 
# Answer:

# Can specify different model
rag_prompt_gpt4 = build_rag_query(question, chunks, model="gpt-4")
print("\n---Using GPT-4---")
print(rag_prompt_gpt4)
```

**Why this works:**
This is a realistic RAG implementation pattern:
- Takes user question and retrieved context
- Uses `.join()` method to combine list items
- Formats everything into a prompt for the AI model
- Returns the prompt string for use with an AI API
- This is exactly how RAG systems work in production!

</details>

### Challenge 3: Agent Tool Definition

**Task**: Create a function `calculate_similarity` that:
- Takes: query, documents (list), threshold=0.7
- For each document, "calculate" similarity (just use len() of document / 100 as fake similarity)
- Use list methods to build a results list with documents above threshold
- Return the filtered list

Test with:
```python
docs = [
    "Short doc",
    "This is a much longer document with more words in it",
    "Medium length document here",
    "Another very long document that contains many words and should have high similarity"
]
```

**Expected Output**: Only documents with similarity >= 0.7 (length >= 70 characters)

In [None]:
# Your code here

<details>
<summary>Solution</summary>

```python
def calculate_similarity(query, documents, threshold=0.7):
    """
    Filter documents by similarity score.
    
    Args:
        query: Search query (not used in this simplified version)
        documents: List of document strings
        threshold: Minimum similarity to include (0.0 to 1.0)
    
    Returns:
        List of (document, similarity) tuples for docs above threshold
    """
    # Start with empty results list
    results = []
    
    # Process each document
    for doc in documents:
        # Calculate fake similarity (in real RAG: use embeddings)
        similarity = len(doc) / 100
        
        # If above threshold, add to results using list method
        if similarity >= threshold:
            # Store as tuple: (document, score)
            results.append((doc, similarity))
    
    # Sort by similarity (highest first)
    results.sort(key=lambda x: x[1], reverse=True)
    
    return results

# Test data
docs = [
    "Short doc",
    "This is a much longer document with more words in it",
    "Medium length document here",
    "Another very long document that contains many words and should have high similarity"
]

# Test with default threshold (0.7)
results = calculate_similarity("python", docs)
print("Results with threshold 0.7:")
for doc, score in results:
    print(f"  Score {score:.2f}: {doc[:50]}...")  # First 50 chars
# Output shows only documents with 70+ characters

# Test with lower threshold
results2 = calculate_similarity("python", docs, threshold=0.3)
print("\nResults with threshold 0.3:")
for doc, score in results2:
    print(f"  Score {score:.2f}: {doc[:50]}...")  # First 50 chars
```

**Why this works:**
This demonstrates an agentic AI tool pattern:
- Function that an agent could call to retrieve relevant documents
- Takes query and filters documents by relevance
- Uses list methods (`.append()`, `.sort()`) to build results
- Configurable threshold parameter
- Returns structured data (list of tuples)
- In real systems, similarity would use embeddings and vector math

</details>