To run this notebook just press _"Run All"_ <span style="opacity:.8;">(in Google Colab: <b>Runtime ▸ Run all</b>)</span>

<p align="center">
  <a href="https://docs.fenic.ai">Read the Docs</a> •
  <a href="https://discord.com/invite/GdqF3J7huR">Join Discord</a> •
  <a href="https://github.com/typedef-ai/fenic">⭐️ Star fenic</a>
</p>

To install fenic locally, just follow the instructions on the [Github Repo](https://github.com/typedef-ai/fenic)

This is the fenic Hello, World! 

You will learn the basics on how fenic works by building an error log analyzer using the semantic operators of fenic, to parse and analyze application errors without using regex patterns. 

If this notebook helps, please give <a href="https://github.com/typedef-ai/fenic" target="_blank" rel="noopener noreferrer">fenic</a> a ⭐️ — it really helps!

## Installation

In [None]:
!pip uninstall -y sklearn-compat ibis-framework imbalanced-learn
!pip install fenic matplotlib seaborn polars==1.30.0

In [None]:
import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

## fenic session
Set up the session for fenic with gpt-4o-mini as the LLM to use for semantic operations.

In [None]:
import fenic as fc
from pydantic import BaseModel, Field

# Configure session with semantic capabilities
config = fc.SessionConfig(
    app_name="hello_debug",
        semantic=fc.SemanticConfig(
            language_models={
                "mini" : fc.OpenAILanguageModel(
                    model_name="gpt-4o-mini",  # Fast and effective for log analysis
                    rpm=500,
                    tpm=200_000
                )
            }
        )
    )

# Create session
session = fc.Session.get_or_create(config)

## About the Data

This dataset contains a collection of synthetic but realistic error logs from various microservices in a modern application stack. 

Each entry includes a timestamp, the service name, and a detailed error log message. 

The logs cover a range of issues such as exceptions, connection failures, timeouts, cache misses, slow queries, and more. 

This dataset is ideal for demonstrating error analysis, root cause extraction, and automated debugging workflows using semantic data tools.

In [None]:
 # 2. Create sample error logs - the kind developers see every day
error_logs = [
        {
            "timestamp": "2024-01-20 14:23:45",
            "service": "api-gateway",
            "error_log": """
ERROR: NullPointerException in UserService.getProfile()
    at com.app.UserService.getProfile(UserService.java:45)
    at com.app.ApiController.handleRequest(ApiController.java:123)
    at java.base/java.lang.Thread.run(Thread.java:834)

User ID: 12345 was not found in cache, attempted DB lookup returned null
            """
        },
        {
            "timestamp": "2024-01-20 14:24:12",
            "service": "auth-service",
            "error_log": """
node:internal/process/promises:288
            triggerUncaughtException(err, true /* fromPromise */);
            ^

Error: connect ECONNREFUSED 127.0.0.1:6379
    at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
    at Protocol._enqueue (/app/node_modules/redis/lib/redis.js:458:48)
    at Protocol._write (/app/node_modules/redis/lib/redis.js:326:10)

Redis connection failed during session validation
            """
        },
        {
            "timestamp": "2024-01-20 14:25:33",
            "service": "payment-processor",
            "error_log": """
Traceback (most recent call last):
  File "/app/payment/processor.py", line 89, in process_payment
    response = stripe.Charge.create(
  File "/usr/local/lib/python3.9/site-packages/stripe/api_resources/charge.py", line 45, in create
    return cls._static_request("post", cls.class_url(), params=params)
  File "/usr/local/lib/python3.9/site-packages/stripe/api_requestor.py", line 234, in request
    raise error.APIConnectionError(msg)
stripe.error.APIConnectionError: Connection error: timeout after 30s

Payment processing failed for order_id: ORD-789456
            """
        },
        {
            "timestamp": "2024-01-20 14:26:01",
            "service": "data-pipeline",
            "error_log": """
django.db.utils.OperationalError: could not connect to server: Connection refused
    Is the server running on host "db.prod.internal" (10.0.1.50) and accepting
    TCP/IP connections on port 5432?

FATAL: Batch job 'daily_analytics' failed after 3 retries
Table 'user_metrics' has 2.3M pending records
            """
        },
        {
            "timestamp": "2024-01-20 14:27:15",
            "service": "frontend",
            "error_log": """
TypeError: Cannot read property 'map' of undefined
    at ProfileList (ProfileList.jsx:34:19)
    at renderWithHooks (react-dom.development.js:14985:18)
    at updateFunctionComponent (react-dom.development.js:17356:20)

API response was: {"error": "rate_limit_exceeded", "retry_after": 60}
Component tried to render before data loaded
            """
        },
        {
            "timestamp": "2024-01-20 14:28:03",
            "service": "api-gateway",
            "error_log": """
WARN: Slow query detected in UserService.searchUsers()
Query took 2.3 seconds to complete
SELECT * FROM users WHERE name LIKE '%john%' ORDER BY created_at DESC
Consider adding an index on the name column for better performance
            """
        },
        {
            "timestamp": "2024-01-20 14:28:45",
            "service": "cache-service",
            "error_log": """
INFO: Cache miss for key 'user_preferences_12345'
Fetching data from primary database
Cache hit ratio: 87% (normal range: 85-95%)
No action required
            """
        },
        {
            "timestamp": "2024-01-20 14:29:12",
            "service": "notification-service",
            "error_log": """
WARN: Email delivery delayed for notification_id: notify_789
SMTP server response: 450 Requested mail action not taken: mailbox unavailable
Will retry in 5 minutes (attempt 2/3)
            """
        },
        {
            "timestamp": "2024-01-20 14:29:33",
            "service": "analytics",
            "error_log": """
DEBUG: Processing batch of 1,250 events
Memory usage: 45MB (limit: 512MB)
Processing time: 1.2s
All events processed successfully
            """
        }
    ]

## Load the Data

Next we turn the data into a Fenic dataframe so we can start working with it.

In [None]:
# Create DataFrame from the error logs
df = session.create_dataframe(error_logs)

print("Hello World! Error Log Analyzer")
print("=" * 70)
print(f"Found {df.count()} log entries to analyze\n")

## Define extraction schemas

In this step we are keeping track of the schemas we want to use for extracting information from the data we have. Think of this as the equivalent of "tool calling" for LLMs but for data.

Practically, Fenic ensures that whatever data the LLM will extract will be in the schema you provide. Also, the 'name' and 'description' of the fields, are used to provide context for the LLM on what to extract.

In [None]:
# Create Pydantic models for extracting error analysis information
class ErrorAnalysis(BaseModel):
    root_cause: str = Field(description="The root cause of this error")
    fix_recommendation: str = Field(description="How to fix this error")

class ErrorPattern(BaseModel):
    error_type: str = Field(description="Type of error (e.g., NullPointer, Timeout, ConnectionRefused)")
    component: str = Field(description="Affected component or system")

## Error Log Classification and Extraction

This cell performs semantic analysis on the error log dataset. It classifies the severity of each error log (e.g., low, medium, high, critical) and extracts key debugging information, such as the root cause and recommended fix, using a language model. 

The results are displayed in a readable table, showing the timestamp, service, severity, root cause, and fix recommendation for each error entry.

In [None]:
# Analyze errors using semantic operations
df_analyzed = df.select(
        "timestamp",
        "service",
        # Classify error severity
        fc.semantic.classify("error_log", ["low", "medium", "high", "critical"]).alias("severity"),
        # Extract key debugging information
        fc.semantic.extract(
            "error_log",
            ErrorAnalysis
        ).alias("analysis")
    )

# Show analysis with extracted fields
df_analysis_readable = df_analyzed.select(
    "timestamp",
    "service",
    "severity",
    df_analyzed.analysis.root_cause.alias("root_cause"),
    df_analyzed.analysis.fix_recommendation.alias("fix_recommendation")
)

print("Error Analysis Results:")
print("-" * 70)
df_analysis_readable.show()

# Displaying Critical Errors

This cell filters the analyzed error logs to focus on entries classified as "critical" or "high" severity. 

It then displays a table with the timestamp, service, root cause, and recommended fix for each of these urgent errors, helping to quickly identify and prioritize issues that require immediate attention.

In [None]:
# Focus on critical errors
print("\nCritical Errors Requiring Immediate Attention:")
print("-" * 70)
critical_errors = df_analyzed.filter(
    (df_analyzed["severity"] == "critical") | (df_analyzed["severity"] == "high")
).select(
    "timestamp",
    "service",
    df_analyzed.analysis.root_cause.alias("root_cause"),
    df_analyzed.analysis.fix_recommendation.alias("fix_recommendation")
)

critical_errors.show()

# Extracting and Displaying Error Patterns

This cell extracts structured error patterns from each log entry, identifying the type of error (such as connection issues or exceptions) and the affected component or system. 

It then displays a table summarizing these patterns for each service, providing a clear overview of recurring error types and their sources across the application.

In [None]:
# Extract specific error patterns

df_patterns = df.select(
        "service",
        fc.semantic.extract(
            "error_log",
            ErrorPattern
        ).alias("patterns")
    )

print("\nError Patterns Detected:")
print("-" * 70)

# Show pattern details
df_pattern_details = df_patterns.select(
        "service",
        df_patterns.patterns.error_type.alias("error_type"),
        df_patterns.patterns.component.alias("component")
    )
df_pattern_details.show()

# Clean up
session.stop()

print("\nAnalysis complete!")

## Next Steps
- Try adding your own error logs
- Extract specific fields like error codes or user IDs
- Build alerts for critical errors
- Create auto-generated runbooks

A few other examples to explore:

- Extract structured metadata from unstructured text [notebook](https://colab.research.google.com/github/typedef-ai/fenic/blob/main/examples/document_extraction/document_extraction.ipynb)
- Named Entity Recognition [notebook](https://colab.research.google.com/github/typedef-ai/fenic/blob/main/examples/named_entity_recognition/ner.ipynb)
- Classification of new articles for article bias detection [notebook](https://colab.research.google.com/github/typedef-ai/fenic/blob/main/examples/news_analysis/news_analysis.ipynb)

---

<p align="center" style="margin:18px 0 6px; font-size:1.05em;">
  Enjoyed this? Help others find it.
</p>

<p align="center" style="margin:0 0 12px;">
<a href="https://github.com/typedef-ai/fenic">⭐️ Give fenic a Star ⭐️</a>
</p>



