# Getting Started with Trioexplorer API

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/triohealth/trioexplorer/blob/main/notebooks/getting_started.ipynb)

This notebook demonstrates how to use the Trioexplorer API to search clinical notes across patient cohorts. You will learn how to:

1. Configure API authentication
2. Discover available indexed cohorts and note types
3. Perform different types of searches (keyword, semantic, hybrid)
4. Apply date, entity, and advanced filters to search queries
5. Discover available filter fields and their values
6. Visualize search results and performance metrics

## Prerequisites

**You will need:**
- An API key with `read:global` or cohort-specific `read:<cohort-id>` entitlements
- Contact your administrator to obtain an API key

In [None]:
# Install required packages
!pip install -q requests pandas matplotlib seaborn

# Import dependencies
import requests
import json
import os
from datetime import datetime, timedelta
from typing import Optional, Dict, Any, List
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Configure matplotlib for notebook display
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')

print("✓ All dependencies loaded successfully")

In [None]:
# Configuration

# Production API Endpoints
# TODO: Update these URLs with your production endpoints
TRIO_API_URL = "http://k8s-notesear-notesear-20ee5f12c9-4c4972a75575c8a7.elb.us-east-1.amazonaws.com:8001"

print(f"Search API: {TRIO_API_URL}")

# API Key Configuration
# Option 1 (Recommended): Use Colab secrets
try:
    from google.colab import userdata
    TRIO_API_KEY = userdata.get('TRIO_API_KEY')
    print("✓ API key loaded from Colab secrets")
except:
    # Option 2: Set directly (not recommended - use Colab secrets instead)
    TRIO_API_KEY = os.environ.get("TRIO_API_KEY", "")
    if not TRIO_API_KEY:
        print("⚠️  WARNING: No API key configured!")
        print("Add TRIO_API_KEY to Colab secrets, or set it manually:")
        print("TRIO_API_KEY = 'ts_your_api_key_here'")
    else:
        print(f"✓ API Key configured: {TRIO_API_KEY[:8]}...")

# Test connectivity
def check_service_health(url: str, name: str) -> bool:
    """Check if the API service is healthy and reachable."""
    try:
        response = requests.get(f"{url}/health", timeout=5)
        if response.status_code == 200:
            print(f"✓ {name} is healthy")
            return True
        else:
            print(f"✗ {name} returned status {response.status_code}")
            return False
    except requests.exceptions.ConnectionError:
        print(f"✗ Cannot connect to {name} at {url}")
        print("  Contact your administrator if this persists")
        return False
    except Exception as e:
        print(f"✗ Error checking {name}: {str(e)}")
        return False

print("\nChecking API connectivity...")
search_healthy = check_service_health(TRIO_API_URL, "Search API")

if search_healthy:
    print("\n✓ All systems ready!")
else:
    print("\n⚠️  Search API is not available. Check the errors above.")

## Step 1: Discover Available Cohorts

Before searching, you need to know which cohorts are indexed and available. The `/cohorts/indexed` endpoint lists all cohorts with searchable data in search index.

Each indexed cohort includes:
- `cohort_id` - Unique identifier
- `cohort_name` - Human-readable name (if available)
- `namespace` - search index namespace for the index
- `chunk_count` - Number of indexed chunks (vectors)
- `index_status` - Current indexing status

In [None]:
def get_indexed_cohorts(limit: int = 20) -> List[Dict[str, Any]]:
    """
    List all indexed cohorts available for search.

    Args:
        limit: Maximum number of cohorts to return (max 100)

    Returns:
        List of indexed cohort information
    """
    response = requests.get(
        f"{TRIO_API_URL}/cohorts/indexed",
        headers={"X-API-Key": TRIO_API_KEY},
        params={"limit": limit}
    )

    if response.status_code == 200:
        data = response.json()
        cohorts = data.get("items", [])
        total = data.get("total_count", len(cohorts))
        print(f"Found {total} indexed cohorts (showing {len(cohorts)})")
        return cohorts
    else:
        print(f"Failed to list cohorts: {response.status_code}")
        print(response.text)
        return []

# Fetch indexed cohorts
cohorts = get_indexed_cohorts()

In [None]:
# Display cohorts as a formatted table
if cohorts:
    cohorts_df = pd.DataFrame(cohorts)

    # Format the display
    display_cols = ["cohort_id", "cohort_name", "chunk_count", "index_status"]
    available_cols = [c for c in display_cols if c in cohorts_df.columns]

    print("\nIndexed Cohorts:")
    print("-" * 80)
    display(cohorts_df[available_cols].style.format({
        "chunk_count": "{:,}"
    }))

    # Summary statistics
    total_chunks = cohorts_df["chunk_count"].sum()
    print(f"\nTotal indexed chunks across all cohorts: {total_chunks:,}")
else:
    print("No cohorts available. Ensure data has been indexed.")

## Step 2: Discover Note Types

Before filtering searches by note type, you can see what note types exist in the system. The `/note-types` endpoint provides a list of available note types.

This is useful for:
- Understanding what documentation types are indexed
- Filtering searches to specific note types (e.g., "Progress Notes", "Discharge Summaries")
- Building UI dropdowns for note type selection

In [None]:
def get_note_types(
    search: Optional[str] = None,
    limit: int = 50,
    offset: int = 0
) -> List[Dict[str, Any]]:
    """
    List available note types in the system.

    Args:
        search: Filter note types by name (case-insensitive substring match)
        limit: Maximum number of results to return (default 50)
        offset: Number of results to skip for pagination

    Returns:
        List of note type objects with id and name
    """
    params = {"limit": limit, "offset": offset}
    if search:
        params["search"] = search

    response = requests.get(
        f"{TRIO_API_URL}/note-types",
        headers={"X-API-Key": TRIO_API_KEY},
        params=params
    )

    if response.status_code == 200:
        data = response.json()
        note_types = data.get("items", [])
        total = data.get("total_count", len(note_types))
        print(f"Found {total} note types (showing {len(note_types)})")
        return note_types
    else:
        print(f"Failed to list note types: {response.status_code}")
        print(response.text)
        return []

# Fetch all note types
note_types = get_note_types()

# Display as DataFrame
if note_types:
    note_types_df = pd.DataFrame(note_types)
    print("\nAvailable Note Types:")
    print("-" * 60)
    display(note_types_df)
else:
    print("No note types found.")

In [None]:
# Filter note types by name
# Example: Find all note types containing "progress"
progress_notes = get_note_types(search="progress")

if progress_notes:
    print("\nNote types matching 'progress':")
    for nt in progress_notes:
        print(f"  - {nt.get('name', 'Unknown')} (id: {nt.get('id', 'N/A')})")

## Step 3: Search a Cohort

The Search API supports three search modes:

| Mode | Description | Best For |
|------|-------------|----------|
| `keyword` | BM25 full-text search | Exact terms, medical codes (e.g., "ICD-10 E11.9") |
| `semantic` | Vector similarity search | Conceptual queries (e.g., "patient struggling with blood sugar") |
| `hybrid` (default) | Combines both with reciprocal rank fusion | General-purpose queries |

### Key Parameters

- `query` (required) - Search text
- `search-type` - Search mode (default: `hybrid`)
- `k` - Number of results (default: 10, max: 300)
- `cohort-ids` - Comma-separated cohort IDs to search
- `rerank` - Apply Cohere reranking (default: true)
- `distinct` - De-duplication mode: `encounter`, `patient`, `note`, or `none`
- `vector-weight` - Weight for vector search in hybrid fusion (0.0-1.0)
- `min-quality-score` - Minimum note quality score filter (0.0-1.0)
- `filters` - search index attribute filters (JSON)
- `entity-filters` - Entity and assertion filters (JSON)

In [None]:
def search(
    query: str,
    search_type: str = "hybrid",
    k: int = 10,
    cohort_ids: Optional[List[str]] = None,
    rerank: bool = True,
    date_from: Optional[str] = None,
    date_to: Optional[str] = None,
    note_types: Optional[List[str]] = None,
    include_noise: bool = False,
    distinct: Optional[str] = None,
    vector_weight: Optional[float] = None,
    top_k_retrieval: Optional[int] = None,
    distance_threshold: Optional[float] = None,
    chunk_multiplier: Optional[int] = None,
    min_quality_score: Optional[float] = None,
    min_chunk_quality_score: Optional[float] = None,
    filters: Optional[Dict[str, Any]] = None,
    entity_filters: Optional[Dict[str, Any]] = None,
    **kwargs
) -> Dict[str, Any]:
    """
    Search indexed patient notes.

    Args:
        query: Search text
        search_type: 'keyword', 'semantic', or 'hybrid' (default)
        k: Number of results to return (max 300)
        cohort_ids: List of cohort IDs to search
        rerank: Apply Cohere reranking (default True, hybrid only)
        date_from: Filter from date (YYYY-MM-DD)
        date_to: Filter to date (YYYY-MM-DD)
        note_types: Filter by note types
        include_noise: Include noise notes (default False)
        distinct: De-duplication mode ('encounter', 'patient', 'note', or 'none')
        vector_weight: Weight for vector search in fusion (0.0-1.0, default 0.5)
        top_k_retrieval: Number of results to retrieve before reranking
        distance_threshold: Cosine distance cutoff for semantic results
        chunk_multiplier: Initial retrieval multiplier for semantic search
        min_quality_score: Minimum note quality score (0.0-1.0)
        min_chunk_quality_score: Minimum chunk quality score (0.0-1.0)
        filters: search index filters as dict (converted to JSON)
        entity_filters: Entity/assertion filters as dict (converted to JSON)

    Returns:
        Search response with results and metadata
    """
    params = {
        "query": query,
        "search-type": search_type,
        "k": k,
        "rerank": str(rerank).lower(),
        "include-noise": str(include_noise).lower(),
    }

    if cohort_ids:
        params["cohort-ids"] = ",".join(str(c) for c in cohort_ids)
    if date_from:
        params["date-from"] = date_from
    if date_to:
        params["date-to"] = date_to
    if note_types:
        params["note-types"] = ",".join(note_types)
    if distinct:
        params["distinct"] = distinct
    if vector_weight is not None:
        params["vector-weight"] = vector_weight
    if top_k_retrieval is not None:
        params["top-k-retrieval"] = top_k_retrieval
    if distance_threshold is not None:
        params["distance-threshold"] = distance_threshold
    if chunk_multiplier is not None:
        params["chunk-multiplier"] = chunk_multiplier
    if min_quality_score is not None:
        params["min-quality-score"] = min_quality_score
    if min_chunk_quality_score is not None:
        params["min-chunk-quality-score"] = min_chunk_quality_score
    if filters:
        params["filters"] = json.dumps(filters)
    if entity_filters:
        params["entity-filters"] = json.dumps(entity_filters)

    # Add any additional parameters
    for key, value in kwargs.items():
        params[key.replace("_", "-")] = value

    response = requests.get(
        f"{TRIO_API_URL}/search",
        headers={"X-API-Key": TRIO_API_KEY},
        params=params
    )

    if response.status_code == 200:
        return response.json()
    else:
        print(f"Search failed: {response.status_code}")
        print(response.text)
        return {"results": [], "metadata": {}}

# Example search
results = search("diabetes management", k=10)
print(f"Found {len(results.get('results', []))} results")

In [None]:
def display_search_results(response: Dict[str, Any], show_text: bool = False):
    """Display search results in a formatted table."""
    results = response.get("results", [])
    metadata = response.get("metadata", {})

    print("=" * 80)
    print("SEARCH RESULTS")
    print("=" * 80)

    # Metadata summary
    print(f"\nQuery: '{metadata.get('query', 'N/A')}'")
    print(f"Search Type: {metadata.get('search_type', 'N/A')}")
    print(f"Total Results: {metadata.get('total_results', 0)}")
    print(f"Reranked: {metadata.get('reranked', False)}")
    print(f"Distance Threshold: {metadata.get('distance_threshold', 'N/A')}")

    # Match counts
    exact = metadata.get('exact_match_count')
    semantic = metadata.get('semantic_match_count')
    if exact is not None or semantic is not None:
        print(f"\nMatch Counts:")
        if exact is not None:
            print(f"  Keyword (BM25): {exact:,}")
        if semantic is not None:
            print(f"  Semantic (Vector): {semantic:,}")

    # Unique counts
    print(f"\nUnique Entities:")
    print(f"  Patients: {metadata.get('unique_patients', 'N/A')}")
    print(f"  Encounters: {metadata.get('unique_encounters', 'N/A')}")
    print(f"  Notes: {metadata.get('unique_notes', 'N/A')}")

    # Omitted results (noise filtering)
    omitted = metadata.get('omitted_results')
    if omitted:
        print(f"\nOmitted (Noise Filtered):")
        print(f"  Semantic: {omitted.get('semantic_omitted', 0)}")
        print(f"  Keyword: {omitted.get('keyword_omitted', 0)}")
        print(f"  Total: {omitted.get('total_omitted', 0)}")
        if omitted.get('is_minimum'):
            print("  (counts are minimum estimates)")

    # Results table
    if results:
        print("\n" + "-" * 80)
        print("TOP RESULTS")
        print("-" * 80)

        for i, result in enumerate(results[:5], 1):
            print(f"\n{i}. Score: {result.get('score', 0):.4f}")
            print(f"   Patient: {result.get('patient_id', 'N/A')[:20]}...")
            print(f"   Note Type: {result.get('note_type', 'N/A')}")
            print(f"   Note Date: {result.get('note_date', 'N/A')}")

            if result.get('distance') is not None:
                print(f"   Distance: {result['distance']:.4f}")
            if result.get('keyword_score') is not None:
                print(f"   Keyword Score: {result['keyword_score']:.4f}")

            if show_text:
                text = result.get('text_chunk') or result.get('text_full', '')
                if text:
                    snippet = text[:200] + "..." if len(text) > 200 else text
                    print(f"   Snippet: {snippet}")

# Display results
display_search_results(results)

## Step 4: Adding Date Filters

You can filter search results by date range using:

- `date-from` - Include notes from this date onwards (YYYY-MM-DD, inclusive)
- `date-to` - Include notes up to this date (YYYY-MM-DD, inclusive)

This is useful for:
- Finding recent documentation for a condition
- Analyzing notes within a specific time period
- Tracking progression of a condition over time

In [None]:
# Example: Search with date range
# Adjust dates based on your indexed data

# Calculate date range (last 6 months)
end_date = datetime.now()
start_date = end_date - timedelta(days=180)

date_from = start_date.strftime("%Y-%m-%d")
date_to = end_date.strftime("%Y-%m-%d")

print(f"Searching from {date_from} to {date_to}")

# Perform date-filtered search
date_results = search(
    query="diabetes mellitus",
    k=20,
    date_from=date_from,
    date_to=date_to
)

print(f"\nFound {len(date_results.get('results', []))} results in date range")
display_search_results(date_results)

## Step 5: Entity and Assertion Filtering

The Search API can filter results based on extracted clinical entities and their assertion status. This is powerful for finding specific clinical mentions with context.

### Entity Types

Clinical entities are extracted and categorized into these types:

| Entity Type | Description | Examples |
|-------------|-------------|----------|
| `symptoms` | Patient-reported or observed symptoms | "chest pain", "shortness of breath" |
| `diagnoses` | Medical diagnoses and conditions | "diabetes mellitus", "hypertension" |
| `medications` | Drugs and medications | "metformin", "lisinopril" |
| `procedures` | Medical procedures | "colonoscopy", "cardiac catheterization" |
| `lab_tests` | Laboratory tests | "HbA1c", "CBC" |
| `vital_signs` | Vital sign measurements | "blood pressure", "heart rate" |

### Assertion Types

Each entity has an assertion status indicating clinical context:

| Assertion Type | Description | Example Context |
|----------------|-------------|-----------------|
| `present` | Currently present/active | "Patient has diabetes" |
| `negated` | Explicitly negated | "No chest pain" |
| `historical` | Past history | "History of MI in 2019" |
| `family` | Family history | "Mother had breast cancer" |
| `hypothetical` | Possible/uncertain | "Rule out PE" |
| `conditional` | Conditional mention | "If symptoms worsen" |

### Combining Entity and Assertion Types

Filter keys combine entity type and assertion type with an underscore: `{entity_type}_{assertion_type}`

**Examples:**
- `symptoms_present` - Active symptoms
- `diagnoses_negated` - Ruled-out diagnoses
- `medications_historical` - Previously prescribed medications
- `diagnoses_family` - Family history of conditions

In [None]:
# Example: Search for notes with specific entity filters
# Find notes mentioning active diabetes symptoms

entity_results = search(
    query="diabetes",
    k=10,
    entity_filters={
        "symptoms_present": ["fatigue", "polyuria", "polydipsia"]
    }
)

print(f"Found {len(entity_results.get('results', []))} results with diabetes-related symptoms")
display_search_results(entity_results)

## Step 6: Advanced Filters (search index Filters)

For complex filtering scenarios, you can pass raw search index filter expressions. These filters operate directly on indexed attributes and support various operators.

### Filter Syntax

Filters are passed as JSON objects with the following structure:

```json
{
    "field_name": ["value1", "value2"],       // In list (OR)
    "field_name": {"$eq": "exact_value"},     // Exact match
    "field_name": {"$ne": "excluded_value"},  // Not equal
    "field_name": {"$in": ["a", "b", "c"]},   // In list
    "field_name": {"$nin": ["x", "y"]},       // Not in list
    "field_name": {"$gt": 0.5},               // Greater than
    "field_name": {"$gte": 0.5},              // Greater than or equal
    "field_name": {"$lt": 1.0},               // Less than
    "field_name": {"$lte": 1.0}               // Less than or equal
}
```

### Common Filterable Fields

- `note_type` - Type of clinical note
- `quality_score` - Note quality score (0.0-1.0)
- `chunk_quality_score` - Chunk quality score (0.0-1.0)
- `patient_id` - Patient identifier
- `encounter_id` - Encounter identifier

In [None]:
# Example: Using search index filters for quality score filtering
# Find high-quality notes only

high_quality_results = search(
    query="diabetes management",
    k=10,
    min_quality_score=0.7,  # Shortcut for quality score filter
    min_chunk_quality_score=0.6
)

print(f"Found {len(high_quality_results.get('results', []))} high-quality results")
display_search_results(high_quality_results)

## Step 7: Filter Field Discovery

Before building complex filters, you can discover what filter fields are available and their possible values. This is useful for building dynamic filter UIs or understanding what data is indexed.

### Endpoints

- `GET /namespaces/{namespace}/filter-fields` - List available filter fields
- `GET /namespaces/{namespace}/filter-values/{field}` - Get values for a specific field

In [None]:
def get_filter_fields(
    namespace: str,
    category: Optional[str] = None
) -> List[Dict[str, Any]]:
    """
    List available filter fields for a namespace.

    Args:
        namespace: search index namespace (from cohort info)
        category: Filter by category (e.g., 'metadata', 'entity')

    Returns:
        List of filter field definitions
    """
    params = {}
    if category:
        params["category"] = category

    response = requests.get(
        f"{TRIO_API_URL}/namespaces/{namespace}/filter-fields",
        headers={"X-API-Key": TRIO_API_KEY},
        params=params
    )

    if response.status_code == 200:
        data = response.json()
        fields = data.get("fields", [])
        print(f"Found {len(fields)} filter fields for namespace '{namespace}'")
        return fields
    else:
        print(f"Failed to get filter fields: {response.status_code}")
        print(response.text)
        return []


def get_filter_values(
    namespace: str,
    field: str,
    limit: int = 50
) -> List[Any]:
    """
    Get possible values for a filter field.

    Args:
        namespace: search index namespace
        field: Field name to get values for
        limit: Maximum number of values to return

    Returns:
        List of possible values for the field
    """
    response = requests.get(
        f"{TRIO_API_URL}/namespaces/{namespace}/filter-values/{field}",
        headers={"X-API-Key": TRIO_API_KEY},
        params={"limit": limit}
    )

    if response.status_code == 200:
        data = response.json()
        values = data.get("values", [])
        total = data.get("total_count", len(values))
        print(f"Found {total} values for field '{field}' (showing {len(values)})")
        return values
    else:
        print(f"Failed to get filter values: {response.status_code}")
        print(response.text)
        return []

In [None]:
# Example: Discover filter fields for a cohort
# First, get a namespace from an indexed cohort

if cohorts:
    # Use the first cohort's namespace
    example_namespace = cohorts[0].get("namespace")
    
    if example_namespace:
        print(f"Exploring filter fields for namespace: {example_namespace}\n")
        
        # Get all available filter fields
        fields = get_filter_fields(example_namespace)
        
        if fields:
            fields_df = pd.DataFrame(fields)
            print("\nAvailable Filter Fields:")
            print("-" * 60)
            display(fields_df)
    else:
        print("No namespace found in cohort data")
else:
    print("No cohorts available. Run the cohorts cell first.")

In [None]:
# Example: Get possible values for a specific filter field
# This is useful for building filter dropdowns in UIs

if cohorts:
    example_namespace = cohorts[0].get("namespace")
    
    if example_namespace:
        # Get values for note_type field
        print("Getting values for 'note_type' field...\n")
        note_type_values = get_filter_values(example_namespace, "note_type", limit=20)
        
        if note_type_values:
            print("\nAvailable note_type values:")
            for i, value in enumerate(note_type_values, 1):
                print(f"  {i}. {value}")
else:
    print("No cohorts available. Run the cohorts cell first.")

## Step 8: Comparing Search Types

Different search types excel at different tasks:

| Search Type | Strengths | Weaknesses |
|-------------|-----------|------------|
| **Keyword** | Exact matches, medical codes, specific terminology | Misses synonyms, conceptual matches |
| **Semantic** | Conceptual similarity, synonyms, paraphrasing | May miss exact term matches |
| **Hybrid** | Best of both worlds | Slightly more latency |

Let's compare results for the same query across all three modes.

In [None]:
def compare_search_types(query: str, k: int = 10) -> Dict[str, Dict]:
    """Run the same query across all search types and compare."""
    search_types = ["keyword", "semantic", "hybrid"]
    results = {}

    for stype in search_types:
        print(f"Running {stype} search...")
        results[stype] = search(query, search_type=stype, k=k, rerank=(stype == "hybrid"))

    return results

# Compare search types
query = "patient with elevated blood glucose"
comparison = compare_search_types(query, k=20)

# Display comparison summary
print("\n" + "=" * 80)
print("SEARCH TYPE COMPARISON")
print("=" * 80)
print(f"Query: '{query}'\n")

for stype, result in comparison.items():
    meta = result.get("metadata", {})
    print(f"\n{stype.upper()} Search:")
    print(f"  Results returned: {meta.get('total_results', 0)}")
    if meta.get('exact_match_count') is not None:
        print(f"  Keyword matches: {meta.get('exact_match_count', 'N/A'):,}")
    if meta.get('semantic_match_count') is not None:
        print(f"  Semantic matches: {meta.get('semantic_match_count', 'N/A'):,}")
    print(f"  Unique patients: {meta.get('unique_patients', 'N/A')}")

## Step 9: Visualizing Search Results

This section provides visualizations to help analyze search results:

1. **Omission Analysis** - Impact of noise filtering
2. **Ranking Metrics** - Vector vs BM25 ranking contribution

In [None]:
def visualize_omitted_results(response: Dict[str, Any]):
    """Visualize omitted (noise-filtered) results."""
    metadata = response.get("metadata", {})
    omitted = metadata.get("omitted_results")

    if not omitted:
        print("No omission data available (noise filtering may be disabled or skipped)")
        return

    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    # Left: Omitted by type
    ax1 = axes[0]
    omit_types = []
    omit_counts = []

    if omitted.get("semantic_omitted") is not None:
        omit_types.append("Semantic\nOmitted")
        omit_counts.append(omitted["semantic_omitted"])
    if omitted.get("keyword_omitted") is not None:
        omit_types.append("Keyword\nOmitted")
        omit_counts.append(omitted["keyword_omitted"])

    if omit_types:
        colors = ['#c0392b', '#d35400']
        bars = ax1.bar(omit_types, omit_counts, color=colors, edgecolor='black')
        ax1.set_ylabel('Omitted Count')
        ax1.set_title('Results Filtered as Noise')

        for bar in bars:
            height = bar.get_height()
            ax1.annotate(f'{int(height):,}',
                        xy=(bar.get_x() + bar.get_width() / 2, height),
                        xytext=(0, 3), textcoords="offset points",
                        ha='center', va='bottom', fontsize=10)

    # Right: Included vs Omitted pie chart
    ax2 = axes[1]
    total_results = metadata.get("total_results", 0)
    total_omitted = omitted.get("total_omitted", 0)

    if total_results > 0 or total_omitted > 0:
        sizes = [total_results, total_omitted]
        labels = [f'Included\n({total_results:,})', f'Omitted\n({total_omitted:,})']
        colors = ['#27ae60', '#e74c3c']
        explode = (0, 0.05)

        ax2.pie(sizes, explode=explode, labels=labels, colors=colors,
               autopct='%1.1f%%', shadow=True, startangle=90)
        ax2.set_title('Results Distribution')

    plt.tight_layout()
    plt.show()

# Visualize omission data from earlier search
if results.get("results"):
    visualize_omitted_results(results)

In [None]:
def visualize_ranking_metrics(response: Dict[str, Any]):
    """Visualize ranking metrics (vector_rank, bm25_rank) when available."""
    results = response.get("results", [])
    metadata = response.get("metadata", {})

    # Check if fusion details are available (requires rerank=false)
    if metadata.get("reranked", True):
        print("Ranking details are only available when rerank=false")
        print("Run: search(query, rerank=False)")
        return

    # Extract ranking data
    ranking_data = []
    for i, r in enumerate(results):
        if r.get("vector_rank") is not None or r.get("bm25_rank") is not None:
            ranking_data.append({
                "result_position": i + 1,
                "vector_rank": r.get("vector_rank"),
                "bm25_rank": r.get("bm25_rank"),
                "fusion_score": r.get("fusion_score", 0)
            })

    if not ranking_data:
        print("No ranking data available in results")
        return

    df = pd.DataFrame(ranking_data)

    fig, axes = plt.subplots(1, 2, figsize=(12, 5))

    # Left: Vector vs BM25 rank scatter
    ax1 = axes[0]
    scatter = ax1.scatter(
        df["vector_rank"].fillna(df["vector_rank"].max() + 10),
        df["bm25_rank"].fillna(df["bm25_rank"].max() + 10),
        c=df["fusion_score"],
        cmap='viridis',
        s=100,
        alpha=0.7,
        edgecolor='black'
    )
    ax1.set_xlabel('Vector Rank')
    ax1.set_ylabel('BM25 Rank')
    ax1.set_title('Vector vs BM25 Ranking')
    plt.colorbar(scatter, ax=ax1, label='Fusion Score')

    # Right: Rank contribution to final position
    ax2 = axes[1]
    positions = df["result_position"]
    ax2.plot(positions, df["vector_rank"], 'o-', label='Vector Rank', color='teal')
    ax2.plot(positions, df["bm25_rank"], 's-', label='BM25 Rank', color='coral')
    ax2.set_xlabel('Final Result Position')
    ax2.set_ylabel('Original Rank')
    ax2.set_title('Rank Contribution to Final Position')
    ax2.legend()
    ax2.invert_yaxis()  # Lower rank = better

    plt.tight_layout()
    plt.show()

# Run a search without reranking to see fusion details
unranked_results = search("diabetes", k=15, rerank=False)
visualize_ranking_metrics(unranked_results)

## Summary

In this notebook, you learned how to:

1. **Configure Authentication** - Set up your API key for secure access
2. **Discover Cohorts** - List indexed cohorts available for search
3. **Discover Note Types** - Find available note types for filtering
4. **Search Notes** - Use keyword, semantic, and hybrid search modes
5. **Apply Date Filters** - Filter by date range
6. **Use Entity Filters** - Filter by clinical entities and assertion types
7. **Use Advanced Filters** - Apply search index attribute filters and quality thresholds
8. **Discover Filter Fields** - Explore available filter fields and their values
9. **Compare Search Types** - Understand when to use each search mode
10. **Visualize Results** - Analyze search results with charts and metrics

## Key API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/cohorts/indexed` | GET | List indexed cohorts |
| `/note-types` | GET | List available note types |
| `/search` | GET | Search notes with filters |
| `/namespaces/{ns}/filter-fields` | GET | List filter fields for a namespace |
| `/namespaces/{ns}/filter-values/{field}` | GET | Get values for a filter field |

## Search Parameters Quick Reference

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | string | Search text (required) |
| `search-type` | string | `keyword`, `semantic`, or `hybrid` |
| `k` | int | Number of results (max 300) |
| `cohort-ids` | string | Comma-separated cohort IDs |
| `rerank` | bool | Apply Cohere reranking |
| `distinct` | string | De-duplication: `encounter`, `patient`, `note`, `none` |
| `date-from` / `date-to` | string | Date range filter (YYYY-MM-DD) |
| `note-types` | string | Comma-separated note types |
| `vector-weight` | float | Vector weight in fusion (0.0-1.0) |
| `min-quality-score` | float | Minimum note quality (0.0-1.0) |
| `filters` | JSON | search index attribute filters |
| `entity-filters` | JSON | Entity/assertion type filters |

## Next Steps

- Explore the Search API documentation
- Check out the CLI tool: `trioexplorer --help`
- Experiment with different search types and parameters
- Build custom filters using the filter discovery endpoints

## Troubleshooting

**Common Issues:**

1. **Connection refused** - Contact your administrator to verify the API endpoints

2. **401 Unauthorized** - Check your API key is valid and has correct entitlements

3. **No cohorts found** - Data may not be indexed yet or your API key may not have access

4. **Slow searches** - Try reducing `k` or disabling `rerank` for faster results

5. **Empty filter values** - The namespace may not have data indexed for that field