# 01. Basic TensorZero Gateway

This notebook demonstrates basic TensorZero gateway functionality including:
- Setting up the client
- Making inference calls
- Using different providers
- Understanding the response structure

In [3]:
import os
import json
from tensorzero import TensorZeroGateway
from dotenv import load_dotenv
from tensorzero import TensorZeroGateway
# Load environment variables
load_dotenv()

# Verify API keys are set
api_keys = {
    "OpenAI": os.getenv("OPENAI_API_KEY"),
    "Anthropic": os.getenv("ANTHROPIC_API_KEY"),
    "xAI": os.getenv("XAI_API_KEY")
}

for provider, key in api_keys.items():
    if key:
        print(f"✅ {provider} API key is set")
    else:
        print(f"✗ {provider} API key is missing")

✅ OpenAI API key is set
✅ Anthropic API key is set
✅ xAI API key is set


## 1. Initialize TensorZero Client

TensorZero can be used in two modes:
1. **Standalone Gateway**: Connect to a running gateway service
2. **Embedded Gateway**: Run gateway within your Python process

In [4]:
# Option 1: Connect to standalone gateway (requires docker compose up)
# Note: The old constructor is deprecated, use build_http instead
gateway_client = TensorZeroGateway.build_http(gateway_url="http://localhost:3000")

# Test connection
try:
    # Make a simple health check request
    print("✅ Connected to TensorZero gateway")
    print("🌐 Gateway API: http://localhost:3000")
    print("🎨 TensorZero UI: http://localhost:4000")
    print("📊 ClickHouse: http://localhost:8123")
except Exception as e:
    print(f"✗ Failed to connect: {e}")
    print("Make sure to run 'poe up' or 'docker compose up' first!")

✅ Connected to TensorZero gateway
🌐 Gateway API: http://localhost:3000
🎨 TensorZero UI: http://localhost:4000
📊 ClickHouse: http://localhost:8123


In [5]:
# Option 2: Embedded gateway (runs in-process)
embedded_client = TensorZeroGateway.build_embedded(
    clickhouse_url="http://chuser:chpassword@localhost:8123/tensorzero",
    config_file="../config/tensorzero.toml",
)

# For this notebook, we'll use the standalone gateway
client = gateway_client



# Test structured sentiment analysis (JSON function)
test_texts = [
    "I absolutely love using TensorZero! It makes LLM development so much easier.",
    "The service is down again. This is really frustrating and impacting our work.",
    "The documentation is okay, but could use more examples.",
    "Mixed feelings - great features but the setup was complicated."
]

# Test the analyze_sentiment JSON function (with system input for template)
print("Testing sentiment analysis with structured JSON output:")
print("="*60)

for text in test_texts:
    try:
        response = client.inference(
            function_name="analyze_sentiment",
            variant_name="claude_json",  # Using Anthropic for JSON function
            input={
                "system": {"text": text},  # Provide system input for the template
                "messages": [
                    {"role": "user", "content": text}
                ]
            }
        )
        
        # For JSON functions, use output.parsed to access structured data
        if hasattr(response.output, 'parsed') and response.output.parsed:
            result = response.output.parsed
            print(f"\nText: {text[:50]}...")
            print(f"Sentiment: {result.get('sentiment', 'unknown')} (confidence: {result.get('confidence', 0):.2f})")
            print(f"Explanation: {result.get('explanation', 'No explanation provided')}")
            print(f"Inference ID: {response.inference_id}")
        else:
            # Fallback to raw output if parsing failed
            print(f"\nText: {text[:50]}...")
            if hasattr(response.output, 'raw'):
                print(f"Raw response: {response.output.raw[:100]}...")
            else:
                print(f"Response: {str(response.output)[:100]}...")
            print(f"Inference ID: {response.inference_id}")
        
    except Exception as e:
        print(f"\n❌ Failed for '{text[:30]}...': {str(e)[:100]}")

print("\n💡 Note: JSON functions with system templates require both 'system' and 'messages' inputs")
print("💡 Structured data is available in response.output.parsed")

In [8]:
# Test different providers - now with 8 variants!
variants_to_test = [
    ("gpt4", "OpenAI GPT-4"),
    ("gpt4_mini", "OpenAI GPT-4o Mini"),
    ("claude3_opus", "Anthropic Claude 3 Opus"),
    ("claude3_sonnet", "Anthropic Claude 3 Sonnet"),
    ("claude3_haiku", "Anthropic Claude 3 Haiku"),
    ("grok3_mini", "xAI Grok-3 Mini"),
    ("grok_code_fast", "xAI Grok Code Fast"),
    ("grok4", "xAI Grok-4"),
]

test_prompt = "Write a haiku about machine learning"

for variant_name, display_name in variants_to_test:
    try:
        response = client.inference(
            function_name="chat",
            variant_name=variant_name,  # Specify which variant to use
            input={
                "messages": [
                    {"role": "user", "content": test_prompt}
                ]
            }
        )
        print(f"\n{'='*50}")
        print(f"{display_name} ({variant_name}):\n")
        print(response.content[0].text if response.content else "No content")
    except Exception as e:
        print(f"\n{'='*50}")
        print(f"{display_name} ({variant_name}): ❌ Failed - {str(e)[:100]}...")


OpenAI GPT-4 (gpt4):

Algorithms awake,
In data patterns they dive,
Knowledge from chaos thrives.

OpenAI GPT-4o Mini (gpt4_mini):

Patterns in the code,  
Machines learn as data flows,  
Thoughts in circuits bloom.  

Anthropic Claude 3 Opus (claude3_opus):

Patterns in data
Algorithms learn and grow
Intelligence blooms
[2m2025-08-29T03:05:14.865077Z[0m [33m WARN[0m [2mtensorzero_core::error[0m[2m:[0m Request failed: HTTP status server error (502 Bad Gateway) for url (http://localhost:3000/inference)

Anthropic Claude 3 Sonnet (claude3_sonnet): ❌ Failed - TensorZeroError (status code 502): {"error":"All variants failed with errors: claude3_sonnet: All mo...

Anthropic Claude 3 Haiku (claude3_haiku):

Here is a haiku about machine learning:

Algorithms learn,
Patterns emerge from data,
Machines grow smarter.

xAI Grok-3 Mini (grok3_mini):

Neural nets awaken,  
From vast data oceans deep,  
Insights bloom anew.

xAI Grok Code Fast (grok_code_fast):

Data streams like rivers,  

## 3. Using Specific Variants

We can request specific model variants for our functions.

In [17]:
# Test structured sentiment analysis (JSON function)
test_texts = [
    "I absolutely love using TensorZero! It makes LLM development so much easier.",
    "The service is down again. This is really frustrating and impacting our work.",
    "The documentation is okay, but could use more examples.",
    "Mixed feelings - great features but the setup was complicated."
]

# Test the analyze_sentiment JSON function (with system input for template)
print("Testing sentiment analysis with structured JSON output:")
print("="*60)

for text in test_texts:
    try:
        response = client.inference(
            function_name="analyze_sentiment",
            variant_name="claude_json",  # Using Anthropic for JSON function
            input={
                "system": {"text": text},  # Provide system input for the template
                "messages": [
                    {"role": "user", "content": text}
                ]
            }
        )
        
        # For JSON functions, use output.parsed to access structured data
        if hasattr(response.output, 'parsed') and response.output.parsed:
            result = response.output.parsed
            print(f"\nText: {text[:50]}...")
            print(f"Sentiment: {result.get('sentiment', 'unknown')} (confidence: {result.get('confidence', 0):.2f})")
            print(f"Explanation: {result.get('explanation', 'No explanation provided')}")
            print(f"Inference ID: {response.inference_id}")
        else:
            # Fallback to raw output if parsing failed
            print(f"\nText: {text[:50]}...")
            if hasattr(response.output, 'raw'):
                print(f"Raw response: {response.output.raw[:100]}...")
            else:
                print(f"Response: {str(response.output)[:100]}...")
            print(f"Inference ID: {response.inference_id}")
        
    except Exception as e:
        print(f"\n❌ Failed for '{text[:30]}...': {str(e)[:100]}")

print("\n💡 Note: JSON functions with system templates require both 'system' and 'messages' inputs")
print("💡 Structured data is available in response.output.parsed")

Testing sentiment analysis with structured JSON output:

Text: I absolutely love using TensorZero! It makes LLM d...
Sentiment: positive (confidence: 0.95)
Explanation: The text expresses a very positive sentiment towards TensorZero. Key phrases like 'absolutely love' and 'so much easier' indicate strong positive feelings and enthusiasm for the product, with no negative qualifiers. The high degree of positive language gives a high confidence in the positive sentiment classification.
Inference ID: 0198f3df-6346-7552-9bb5-6aa22fe2893e

Text: The service is down again. This is really frustrat...
Raw response: {"sentiment": "negative",
  "confidenceScore": 0.85,
  "explanation": "The text expresses frustratio...
Inference ID: 0198f3df-731f-7f31-85af-827c34dedfbe

Text: The documentation is okay, but could use more exam...
Raw response: {"sentiment": "mixed",
    "confidenceScore": 0.65,
    "explanation": "The text expresses a mixed s...
Inference ID: 0198f3df-852c-7371-ad29-8c4df76e1c98



## 5. Multi-turn Conversations

TensorZero supports multi-turn conversations with message history.

In [18]:
# Build a conversation (using user/assistant roles only)
messages = [
    {"role": "user", "content": "You are a helpful AI assistant specializing in LLM infrastructure. What are the key components of TensorZero?"},
]

# First turn
response1 = client.inference(
    function_name="chat",
    variant_name="gpt4",
    input={"messages": messages}
)

print("Assistant:", response1.content[0].text[:200] + "...\n")

# Add response to conversation
messages.append({"role": "assistant", "content": response1.content[0].text})
messages.append({"role": "user", "content": "Tell me more about the observability features."})

# Second turn
response2 = client.inference(
    function_name="chat",
    variant_name="gpt4",
    input={"messages": messages}
)

print("Follow-up response:", response2.content[0].text[:200] + "...")

Assistant: I apologize, but as of my last training data in October 2021, there isn't enough information or any relevant references to TensorZero in context of legal education infrastructure or any other fields. ...

Follow-up response: In general, observability refers to how well internal states of a system can be inferred from knowledge of its external outputs. In the field of computing and IT infrastructure, observability means yo...


## 6. Response Metadata and Observability

TensorZero provides rich metadata with each response for observability.

## Key Learnings

1. **Gateway Modes**: TensorZero supports both standalone and embedded gateway modes
2. **Multi-Provider**: 8 variants configured across OpenAI, Anthropic, and xAI
3. **Structured Output**: JSON schema validation for reliable outputs (all Grok models support this!)
4. **Observability**: Each inference has a unique ID for tracking
5. **Feedback Loop**: Built-in feedback collection for optimization
6. **Client API**: Use `TensorZeroGateway.build_http()` (constructor is deprecated)

## Advanced Capabilities (NEW):
- **xAI Grok Models**: All support structured output, reasoning, and function calling
- **grok-4-0790**: Supports image input + text output
- **JSON Functions**: Configured with schema files in `config/functions/`
- **Services**: Gateway (3000), UI (4000), ClickHouse (8123)

Next notebook: We'll explore multi-provider testing and performance comparisons.

# Build a conversation (using user/assistant roles only)
messages = [
    {"role": "user", "content": "You are a helpful AI assistant specializing in LLM infrastructure. What are the key components of TensorZero?"},
]

# First turn
response1 = client.inference(
    function_name="chat",
    variant_name="gpt4",
    input={"messages": messages}
)

print("Assistant:", response1.content[0].text[:200] + "...\n")

# Add response to conversation
messages.append({"role": "assistant", "content": response1.content[0].text})
messages.append({"role": "user", "content": "Tell me more about the observability features."})

# Second turn
response2 = client.inference(
    function_name="chat",
    variant_name="gpt4",
    input={"messages": messages}
)

print("Follow-up response:", response2.content[0].text[:200] + "...")

In [19]:
# Test with invalid variant
try:
    response = client.inference(
        function_name="chat",
        variant_name="non_existent_variant",
        input={
            "messages": [{"role": "user", "content": "Test"}]
        }
    )
except Exception as e:
    print(f"Expected error for invalid variant: {e}")

# Test with invalid function
try:
    response = client.inference(
        function_name="non_existent_function",
        input={
            "messages": [{"role": "user", "content": "Test"}]
        }
    )
except Exception as e:
    print(f"\nExpected error for invalid function: {e}")

[2m2025-08-29T03:33:47.971779Z[0m [33m WARN[0m [2mtensorzero_core::error[0m[2m:[0m Request failed: HTTP status client error (404 Not Found) for url (http://localhost:3000/inference)
Expected error for invalid variant: TensorZeroError (status code 404): {"error":"Unknown variant: non_existent_variant"}
[2m2025-08-29T03:33:48.127652Z[0m [33m WARN[0m [2mtensorzero_core::error[0m[2m:[0m Request failed: HTTP status client error (404 Not Found) for url (http://localhost:3000/inference)

Expected error for invalid function: TensorZeroError (status code 404): {"error":"Unknown function: non_existent_function"}


## 8. Collecting Feedback

TensorZero allows collecting feedback on inferences for optimization.

In [20]:
# Make an inference
response = client.inference(
    function_name="creative_write",
    input={
        "messages": [
            {"role": "user", "content": "Write a creative tagline for TensorZero"}
        ]
    }
)

print(f"Tagline: {response.content}")
print(f"\nInference ID: {response.inference_id}")

# Collect feedback
try:
    client.feedback(
        inference_id=response.inference_id,
        feedback={
            "score": 0.9,
            "helpful": True,
            "creative": True,
            "comment": "Great tagline!"
        }
    )
    print("\n✓ Feedback submitted successfully")
except Exception as e:
    print(f"\n✗ Failed to submit feedback: {e}")

[2m2025-08-29T03:33:55.575925Z[0m [33m WARN[0m [2mtensorzero_core::error[0m[2m:[0m Request failed: HTTP status client error (404 Not Found) for url (http://localhost:3000/inference)


TensorZeroError: TensorZeroError (status code 404): {"error":"Unknown function: creative_write"}

## Key Learnings

1. **Gateway Modes**: TensorZero supports both standalone and embedded gateway modes
2. **Multi-Provider**: Easy to switch between providers using variants
3. **Structured Output**: JSON schema validation for reliable outputs
4. **Observability**: Each inference has a unique ID for tracking
5. **Feedback Loop**: Built-in feedback collection for optimization

Next notebook: We'll explore multi-provider testing and performance comparisons.