# Getting Started with Azure LLM Toolkit

This notebook provides a comprehensive introduction to the Azure LLM Toolkit, covering:

1. Installation and setup
2. Basic configuration
3. Simple chat completion
4. Streaming responses
5. Function calling / tools
6. Cost tracking

## Prerequisites

- Azure OpenAI API access
- Python 3.9+
- API key and endpoint

## 1. Installation

First, install the package:

In [None]:
# Install the package (run this once)
# !pip install azure-llm-toolkit

# Or for development:
# !pip install -e .

## 2. Configuration

Configure your Azure OpenAI credentials. There are multiple ways to do this:

### Option A: Environment Variables (Recommended)

In [None]:
import os

# Set environment variables
os.environ["AZURE_OPENAI_API_KEY"] = "your-api-key-here"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://your-resource.openai.azure.com"
os.environ["AZURE_OPENAI_DEPLOYMENT"] = "gpt-4"
os.environ["AZURE_OPENAI_API_VERSION"] = "2024-02-15-preview"

### Option B: Configuration Object

In [None]:
from azure_llm_toolkit import AzureConfig

config = AzureConfig(
    api_key="your-api-key-here",
    endpoint="https://your-resource.openai.azure.com",
    deployment="gpt-4",
    api_version="2024-02-15-preview",
)

print(f"Configuration loaded for deployment: {config.deployment}")

## 3. Create the Client

Create an instance of the `AzureLLMClient`:

In [None]:
from azure_llm_toolkit import AzureLLMClient

# Using environment variables
client = AzureLLMClient()

# Or using config object
# client = AzureLLMClient(config=config)

print("‚úÖ Client created successfully!")

## 4. Simple Chat Completion

Let's send a simple chat completion request:

In [None]:
# Simple synchronous request
response = await client.chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
    ],
    max_tokens=100,
)

print("Response:")
print(response.choices[0].message.content)

### Viewing Usage Information

In [None]:
print(f"\nUsage Statistics:")
print(f"  Prompt tokens: {response.usage.prompt_tokens}")
print(f"  Completion tokens: {response.usage.completion_tokens}")
print(f"  Total tokens: {response.usage.total_tokens}")

## 5. Streaming Responses

Stream responses in real-time for a better user experience:

In [None]:
print("Streaming response: ", end="", flush=True)

async for chunk in client.chat_completion_stream(
    messages=[
        {"role": "system", "content": "You are a creative writer."},
        {"role": "user", "content": "Write a haiku about coding."},
    ],
    max_tokens=100,
):
    if chunk.choices and chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

print("\n\n‚úÖ Streaming complete!")

## 6. Using the Sync Client

If you're working in a non-async environment, use the sync client:

In [None]:
from azure_llm_toolkit import SyncAzureLLMClient

sync_client = SyncAzureLLMClient()

# No async/await needed
response = sync_client.chat_completion(messages=[{"role": "user", "content": "What is 2 + 2?"}], max_tokens=50)

print(f"Answer: {response.choices[0].message.content}")

## 7. Function Calling / Tools

Use function calling to enable the LLM to interact with external tools:

In [None]:
from azure_llm_toolkit.tools import tool, ToolRegistry

# Create a tool registry
registry = ToolRegistry()


# Define a tool using the decorator
@tool(registry=registry)
def get_weather(location: str, unit: str = "celsius") -> dict:
    """Get the current weather for a location.

    Args:
        location: The city and state, e.g. 'San Francisco, CA'
        unit: Temperature unit, either 'celsius' or 'fahrenheit'

    Returns:
        Weather information dictionary
    """
    # Mock weather data
    return {"location": location, "temperature": 22, "unit": unit, "condition": "sunny"}


@tool(registry=registry)
def calculate(expression: str) -> float:
    """Evaluate a mathematical expression.

    Args:
        expression: A math expression like '2 + 2' or '10 * 5'

    Returns:
        The result of the calculation
    """
    return eval(expression)


print(f"‚úÖ Registered {len(registry.list_tools())} tools")

### Using Tools with the LLM

In [None]:
# Send a request with tools
messages = [{"role": "user", "content": "What's the weather in Paris and what is 15 times 7?"}]

response = await client.chat_completion(messages=messages, tools=registry.to_azure_tools(), tool_choice="auto")

# Check if the model wants to call tools
if response.choices[0].message.tool_calls:
    print("üîß Model wants to call tools:")
    for tool_call in response.choices[0].message.tool_calls:
        print(f"  - {tool_call.function.name}({tool_call.function.arguments})")

    # Execute the tool calls
    tool_results = registry.execute_tool_calls(response.choices[0].message.tool_calls)

    # Add assistant message with tool calls
    messages.append(response.choices[0].message)

    # Add tool results
    for result in tool_results:
        messages.append({"role": "tool", "tool_call_id": result.tool_call_id, "content": result.content})

    # Get final response
    final_response = await client.chat_completion(messages=messages, tools=registry.to_azure_tools())

    print("\nüìù Final Response:")
    print(final_response.choices[0].message.content)
else:
    print("üí¨ Direct response:")
    print(response.choices[0].message.content)

## 8. Cost Tracking

Monitor costs automatically with the built-in cost tracker:

In [None]:
# Get cost statistics
cost_stats = client.get_cost_stats()

print("üí∞ Cost Statistics:")
print(f"  Total requests: {cost_stats['total_requests']}")
print(f"  Total tokens: {cost_stats['total_tokens']:,}")
print(f"  Prompt tokens: {cost_stats['prompt_tokens']:,}")
print(f"  Completion tokens: {cost_stats['completion_tokens']:,}")
print(f"  Total cost: ${cost_stats['total_cost']:.4f}")
print(f"  Average cost per request: ${cost_stats['avg_cost_per_request']:.4f}")

## 9. Embeddings

Generate embeddings for text:

In [None]:
# Single text embedding
embedding = await client.embed_text(
    text="Azure LLM Toolkit makes it easy to work with Azure OpenAI",
    deployment="text-embedding-ada-002",  # Use your embedding deployment
)

print(f"Embedding dimension: {len(embedding)}")
print(f"First 5 values: {embedding[:5]}")

In [None]:
# Batch embeddings
texts = [
    "The quick brown fox jumps over the lazy dog",
    "Machine learning is fascinating",
    "Python is a great programming language",
]

embeddings = await client.embed_texts(texts=texts, deployment="text-embedding-ada-002")

print(f"Generated {len(embeddings)} embeddings")
print(f"Each embedding has {len(embeddings[0])} dimensions")

## 10. Caching

Enable caching to avoid redundant API calls:

In [None]:
from azure_llm_toolkit import AzureLLMClient
from azure_llm_toolkit.cache import InMemoryCache

# Create client with caching enabled
cached_client = AzureLLMClient(
    cache=InMemoryCache(ttl_seconds=3600)  # Cache for 1 hour
)

# First call - hits the API
import time

start = time.time()
response1 = await cached_client.chat_completion(
    messages=[{"role": "user", "content": "What is 42 + 58?"}], max_tokens=50
)
time1 = time.time() - start

# Second call - from cache
start = time.time()
response2 = await cached_client.chat_completion(
    messages=[{"role": "user", "content": "What is 42 + 58?"}], max_tokens=50
)
time2 = time.time() - start

print(f"First call: {time1:.3f}s")
print(f"Cached call: {time2:.3f}s")
print(f"Speedup: {time1 / time2:.1f}x faster!")
print(f"\nResponse: {response2.choices[0].message.content}")

## 11. Health Checks

Check the health of your Azure OpenAI connection:

In [None]:
from azure_llm_toolkit.health import health_check

# Perform health check
health = await health_check(client)

print(f"Health Status: {'‚úÖ Healthy' if health['healthy'] else '‚ùå Unhealthy'}")
print(f"Timestamp: {health['timestamp']}")

if health["checks"]:
    print("\nDetailed Checks:")
    for check_name, check_result in health["checks"].items():
        status = "‚úÖ" if check_result.get("healthy", False) else "‚ùå"
        print(f"  {status} {check_name}: {check_result.get('message', 'OK')}")

## 12. Clean Up

Always close the client when done:

In [None]:
await client.close()
await cached_client.close()

print("‚úÖ Clients closed successfully")

## Next Steps

Now that you've learned the basics, check out these other notebooks:

1. **02_rate_limiting_strategies.ipynb** - Learn how to handle rate limits effectively
2. **03_cost_optimization.ipynb** - Techniques for reducing costs
3. **04_rag_implementation.ipynb** - Build a RAG system
4. **05_agent_patterns.ipynb** - Create intelligent agents
5. **06_production_deployment.ipynb** - Deploy to production

## Resources

- [GitHub Repository](https://github.com/tsoernes/azure-llm-toolkit)
- [Full Documentation](https://github.com/tsoernes/azure-llm-toolkit/blob/main/README.md)
- [Examples](https://github.com/tsoernes/azure-llm-toolkit/tree/main/examples)
- [Azure OpenAI Documentation](https://learn.microsoft.com/en-us/azure/ai-services/openai/)