# Basic LLM Inference

This notebook covers the fundamentals of Large Language Model (LLM) inference using APIs. We'll explore different approaches to interact with LLMs and understand key concepts.

## What is LLM Inference?

LLM inference is the process of generating text using a pre-trained language model. During inference, the model takes an input prompt and generates a response based on patterns learned during training.

### Key Concepts:
- **Prompt**: The input text that guides the model's response
- **Completion**: The text generated by the model
- **Temperature**: Controls randomness in generation (0 = deterministic, 1+ = creative)
- **Max tokens**: Maximum length of the generated response
- **API endpoints**: Different models and their capabilities

## Popular LLM API Providers

1. **OpenAI**: GPT-3.5, GPT-4, GPT-4 Turbo
2. **Anthropic**: Claude 3 (Haiku, Sonnet, Opus)
3. **Google**: Gemini Pro
4. **Cohere**: Command models
5. **Hugging Face**: Various open-source models

Let's start with practical examples!

In [None]:
from dotenv import load_dotenv
load_dotenv("../.env")
import os
import openai
from openai import OpenAI
import json
from typing import Dict, List, Optional

# Set up OpenAI API key
# You can set this in your environment: export OPENAI_API_KEY="your-openai-key"
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

if not OPENAI_API_KEY:
    print("⚠️  OPENAI_API_KEY not found. Set it with: export OPENAI_API_KEY='your-key'")
    print("For this tutorial, you can also set it directly:")
    print("OPENAI_API_KEY = 'your-api-key-here'")
else:
    # Initialize OpenAI client
    client = OpenAI(api_key=OPENAI_API_KEY)
    print("✅ OpenAI client initialized successfully!")

⚠️  OPENAI_API_KEY not found. Set it with: export OPENAI_API_KEY='your-key'
For this tutorial, you can also set it directly:
OPENAI_API_KEY = 'your-api-key-here'


## 1. Basic OpenAI Chat Completions

OpenAI's Chat Completions API is the modern way to interact with GPT models. It supports conversation-style interactions with system, user, and assistant messages.

In [2]:
def simple_chat_completion(prompt: str, model: str = "gpt-3.5-turbo"):
    """
    Simple chat completion with OpenAI
    
    Args:
        prompt: User message
        model: OpenAI model to use
    
    Returns:
        Generated response
    """
    if not OPENAI_API_KEY:
        return "❌ OpenAI API key not set"
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "user", "content": prompt}
            ]
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"❌ Error: {e}"

# Example 1: Simple question
prompt = "What is machine learning?"
response = simple_chat_completion(prompt)
print("🤖 GPT Response:")
print(response)
print("\n" + "="*60 + "\n")

🤖 GPT Response:
❌ OpenAI API key not set




## 2. Understanding Parameters

Let's explore how different parameters affect the model's output.

In [3]:
def chat_with_parameters(prompt: str, temperature: float = 0.7, max_tokens: int = 150, 
                        top_p: float = 1.0, model: str = "gpt-3.5-turbo"):
    """
    Chat completion with customizable parameters
    
    Args:
        prompt: User message
        temperature: Controls randomness (0.0 to 2.0)
        max_tokens: Maximum tokens to generate
        top_p: Nucleus sampling parameter (0.0 to 1.0)
        model: OpenAI model to use
    
    Returns:
        Generated response
    """
    if not OPENAI_API_KEY:
        return "❌ OpenAI API key not set"
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            temperature=temperature,
            max_tokens=max_tokens,
            top_p=top_p
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"❌ Error: {e}"

# Test different temperatures
prompt = "Write a creative story about a robot learning to paint."

print("🎨 Testing Different Temperature Settings:")
print("\n1. Low Temperature (0.2) - More focused and deterministic:")
response_low = chat_with_parameters(prompt, temperature=0.2, max_tokens=100)
print(response_low)

print("\n2. Medium Temperature (0.7) - Balanced creativity:")
response_med = chat_with_parameters(prompt, temperature=0.7, max_tokens=100)
print(response_med)

print("\n3. High Temperature (1.5) - More creative and random:")
response_high = chat_with_parameters(prompt, temperature=1.5, max_tokens=100)
print(response_high)

print("\n" + "="*60 + "\n")

🎨 Testing Different Temperature Settings:

1. Low Temperature (0.2) - More focused and deterministic:
❌ OpenAI API key not set

2. Medium Temperature (0.7) - Balanced creativity:
❌ OpenAI API key not set

3. High Temperature (1.5) - More creative and random:
❌ OpenAI API key not set




## 3. System Messages and Role-Based Conversations

System messages help set the behavior and context for the AI assistant.

In [4]:
def chat_with_system_message(user_prompt: str, system_prompt: str, model: str = "gpt-3.5-turbo"):
    """
    Chat completion with system message
    
    Args:
        user_prompt: User's message
        system_prompt: System message to set AI behavior
        model: OpenAI model to use
    
    Returns:
        Generated response
    """
    if not OPENAI_API_KEY:
        return "❌ OpenAI API key not set"
    
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ]
        )
        return response.choices[0].message.content
    except Exception as e:
        return f"❌ Error: {e}"

# Example 1: Technical Expert
system_msg_1 = "You are a senior software engineer with expertise in Python and machine learning. Provide technical, detailed answers."
user_msg = "How do I optimize a neural network for better performance?"

response_1 = chat_with_system_message(user_msg, system_msg_1)
print("🔧 Technical Expert Response:")
print(response_1)
print("\n" + "-"*40 + "\n")

# Example 2: Simple Explainer
system_msg_2 = "You are a friendly teacher explaining complex topics to beginners. Use simple language and analogies."
response_2 = chat_with_system_message(user_msg, system_msg_2)
print("👩‍🏫 Beginner-Friendly Response:")
print(response_2)
print("\n" + "-"*40 + "\n")

# Example 3: Creative Writer
system_msg_3 = "You are a creative writer who explains everything through storytelling and metaphors."
response_3 = chat_with_system_message(user_msg, system_msg_3)
print("✨ Creative Writer Response:")
print(response_3)

print("\n" + "="*60 + "\n")

🔧 Technical Expert Response:
❌ OpenAI API key not set

----------------------------------------

👩‍🏫 Beginner-Friendly Response:
❌ OpenAI API key not set

----------------------------------------

✨ Creative Writer Response:
❌ OpenAI API key not set




## 4. Multi-turn Conversations

Build conversations with multiple exchanges between user and assistant.

In [5]:
class ChatSession:
    """Simple chat session manager"""
    
    def __init__(self, system_message: str = None, model: str = "gpt-3.5-turbo"):
        self.messages = []
        self.model = model
        
        if system_message:
            self.messages.append({"role": "system", "content": system_message})
    
    def add_user_message(self, content: str):
        """Add user message to conversation"""
        self.messages.append({"role": "user", "content": content})
    
    def add_assistant_message(self, content: str):
        """Add assistant message to conversation"""
        self.messages.append({"role": "assistant", "content": content})
    
    def get_response(self):
        """Get response from OpenAI"""
        if not OPENAI_API_KEY:
            return "❌ OpenAI API key not set"
        
        try:
            response = client.chat.completions.create(
                model=self.model,
                messages=self.messages
            )
            
            assistant_response = response.choices[0].message.content
            self.add_assistant_message(assistant_response)
            return assistant_response
            
        except Exception as e:
            return f"❌ Error: {e}"
    
    def print_conversation(self):
        """Print the entire conversation"""
        for msg in self.messages:
            role = msg["role"].title()
            if role == "System":
                print(f"🔧 {role}: {msg['content']}")
            elif role == "User":
                print(f"👤 {role}: {msg['content']}")
            else:
                print(f"🤖 {role}: {msg['content']}")
            print()

# Example: Multi-turn conversation about Python
chat = ChatSession(system_message="You are a helpful Python programming tutor.")

print("🗣️ Multi-turn Conversation Example:")
print("="*50)

# Turn 1
chat.add_user_message("What is a list comprehension in Python?")
response1 = chat.get_response()
print("👤 User: What is a list comprehension in Python?")
print(f"🤖 Assistant: {response1}")
print("\n" + "-"*30 + "\n")

# Turn 2
chat.add_user_message("Can you show me an example with filtering?")
response2 = chat.get_response()
print("👤 User: Can you show me an example with filtering?")
print(f"🤖 Assistant: {response2}")
print("\n" + "-"*30 + "\n")

# Turn 3
chat.add_user_message("What's the performance difference compared to regular loops?")
response3 = chat.get_response()
print("👤 User: What's the performance difference compared to regular loops?")
print(f"🤖 Assistant: {response3}")

print("\n" + "="*60 + "\n")

🗣️ Multi-turn Conversation Example:
👤 User: What is a list comprehension in Python?
🤖 Assistant: ❌ OpenAI API key not set

------------------------------

👤 User: Can you show me an example with filtering?
🤖 Assistant: ❌ OpenAI API key not set

------------------------------

👤 User: What's the performance difference compared to regular loops?
🤖 Assistant: ❌ OpenAI API key not set




## 5. Function Calling (Tools)

OpenAI models can call functions to interact with external systems and APIs.

In [6]:
import math

# Define some example functions
def calculate_area_circle(radius: float) -> float:
    """Calculate the area of a circle"""
    return math.pi * radius ** 2

def get_weather_info(city: str) -> str:
    """Mock weather function (in real app, this would call a weather API)"""
    weather_data = {
        "New York": "Sunny, 22°C",
        "London": "Cloudy, 15°C", 
        "Tokyo": "Rainy, 18°C",
        "Paris": "Partly cloudy, 19°C"
    }
    return weather_data.get(city, f"Weather data not available for {city}")

# Define function schemas for OpenAI
functions = [
    {
        "type": "function",
        "function": {
            "name": "calculate_area_circle",
            "description": "Calculate the area of a circle given its radius",
            "parameters": {
                "type": "object",
                "properties": {
                    "radius": {
                        "type": "number",
                        "description": "The radius of the circle"
                    }
                },
                "required": ["radius"]
            }
        }
    },
    {
        "type": "function", 
        "function": {
            "name": "get_weather_info",
            "description": "Get weather information for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The name of the city"
                    }
                },
                "required": ["city"]
            }
        }
    }
]

def chat_with_functions(user_message: str):
    """Chat with function calling capability"""
    if not OPENAI_API_KEY:
        return "❌ OpenAI API key not set"
    
    try:
        # Initial request
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": user_message}],
            tools=functions,
            tool_choice="auto"
        )
        
        message = response.choices[0].message
        
        # Check if the model wants to call a function
        if message.tool_calls:
            print("🔧 Function calls detected:")
            
            # Process each function call
            messages = [{"role": "user", "content": user_message}, message]
            
            for tool_call in message.tool_calls:
                function_name = tool_call.function.name
                function_args = json.loads(tool_call.function.arguments)
                
                print(f"   Calling {function_name} with args: {function_args}")
                
                # Call the actual function
                if function_name == "calculate_area_circle":
                    result = calculate_area_circle(**function_args)
                elif function_name == "get_weather_info":
                    result = get_weather_info(**function_args)
                else:
                    result = "Unknown function"
                
                # Add function result to messages
                messages.append({
                    "tool_call_id": tool_call.id,
                    "role": "tool",
                    "name": function_name,
                    "content": str(result)
                })
            
            # Get final response with function results
            final_response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                messages=messages
            )
            
            return final_response.choices[0].message.content
        else:
            return message.content
            
    except Exception as e:
        return f"❌ Error: {e}"

# Test function calling
print("🛠️ Function Calling Examples:")
print("="*50)

# Example 1: Math calculation
query1 = "What's the area of a circle with radius 5?"
response1 = chat_with_functions(query1)
print(f"👤 User: {query1}")
print(f"🤖 Assistant: {response1}")
print("\n" + "-"*30 + "\n")

# Example 2: Weather query
query2 = "What's the weather like in Tokyo?"
response2 = chat_with_functions(query2)
print(f"👤 User: {query2}")
print(f"🤖 Assistant: {response2}")
print("\n" + "-"*30 + "\n")

# Example 3: Combined query
query3 = "Calculate the area of a circle with radius 3 and tell me the weather in Paris"
response3 = chat_with_functions(query3)
print(f"👤 User: {query3}")
print(f"🤖 Assistant: {response3}")

print("\n" + "="*60 + "\n")

🛠️ Function Calling Examples:
👤 User: What's the area of a circle with radius 5?
🤖 Assistant: ❌ OpenAI API key not set

------------------------------

👤 User: What's the weather like in Tokyo?
🤖 Assistant: ❌ OpenAI API key not set

------------------------------

👤 User: Calculate the area of a circle with radius 3 and tell me the weather in Paris
🤖 Assistant: ❌ OpenAI API key not set




## 6. Streaming Responses

For longer responses, streaming allows you to see the output as it's generated.

In [7]:
import time

def stream_response(prompt: str, model: str = "gpt-3.5-turbo"):
    """Stream response from OpenAI"""
    if not OPENAI_API_KEY:
        print("❌ OpenAI API key not set")
        return
    
    try:
        print("🤖 Streaming response:")
        print("-" * 40)
        
        stream = client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            stream=True
        )
        
        collected_content = ""
        for chunk in stream:
            if chunk.choices[0].delta.content is not None:
                content = chunk.choices[0].delta.content
                print(content, end="", flush=True)
                collected_content += content
                time.sleep(0.02)  # Small delay to simulate typing effect
        
        print("\n" + "-" * 40)
        return collected_content
        
    except Exception as e:
        print(f"❌ Error: {e}")

# Example: Stream a longer response
prompt = "Write a detailed explanation of how neural networks learn, including backpropagation."
response = stream_response(prompt)

print("\n" + "="*60 + "\n")

❌ OpenAI API key not set




## 7. Best Practices and Common Patterns

Key takeaways for effective LLM inference:

### 🎯 Parameter Guidelines:
- **Temperature 0.0-0.3**: Factual, consistent responses
- **Temperature 0.7-1.0**: Creative, varied responses  
- **Temperature 1.5+**: Highly creative but potentially inconsistent

### 🏗️ System Message Patterns:
- Define role and expertise clearly
- Set output format expectations
- Provide context and constraints

### 💡 Prompt Design Tips:
- Be specific and clear
- Use examples when possible
- Break complex tasks into steps
- Set clear expectations

### 🔧 Error Handling:
- Always handle API key missing
- Catch and handle HTTP errors
- Implement retry logic for production
- Monitor token usage and costs

### 🚀 Performance Tips:
- Use appropriate model for task complexity
- Implement streaming for long responses
- Cache responses when possible
- Batch requests when applicable

This completes the basic inference tutorial! You now understand the fundamentals of working with OpenAI's API for text generation.