# Getting Started with Trioexplorer API

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/triohealth/trioexplorer/blob/main/notebooks/getting_started.ipynb)

This notebook demonstrates how to use the Trioexplorer API to search clinical notes across patient cohorts. You will learn a **broad-to-narrow** search workflow:

1. Start with a broad search to see what's available
2. Apply quality filters to reduce noise
3. Add date filters to focus on specific time periods
4. Use entity filters for clinical precision
5. Review results grouped by encounter

## Prerequisites

**You will need:**
- An API key for the Trioexplorer API
- Contact your administrator to obtain an API key

In [None]:
# Install required packages
!pip install -q requests pandas matplotlib seaborn numpy

# Import dependencies
import requests
import json
import os
from datetime import datetime, timedelta
from typing import Optional, Dict, Any, List
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Configure matplotlib for notebook display
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')

print("✓ All dependencies loaded successfully")

In [None]:
# =============================================================================
# CONFIGURATION - Modify these values for your use case
# =============================================================================

# Search Configuration
SEARCH_TERM = "side effects"  # Change this to your search query
COHORT_IDS = None             # Set to a list of cohort IDs, e.g., ["cohort-1", "cohort-2"], or None for all
TOP_K = 10000                 # Number of results to retrieve (max 10,000)

# API Configuration
TRIO_API_URL = "https://search.trioexplorer.com"

print(f"Search API: {TRIO_API_URL}")
print(f"Search Term: '{SEARCH_TERM}'")
print(f"Cohort Filter: {COHORT_IDS if COHORT_IDS else 'All cohorts'}")
print(f"Top K: {TOP_K:,}")

In [None]:
# API Key Configuration
# Option 1 (Recommended): Use Colab secrets
try:
    from google.colab import userdata
    TRIO_API_KEY = userdata.get('TRIO_API_KEY')
    print("✓ API key loaded from Colab secrets")
except:
    # Option 2: Set directly (not recommended for shared notebooks)
    TRIO_API_KEY = os.environ.get("TRIO_API_KEY", "")
    if not TRIO_API_KEY:
        print("⚠️  WARNING: No API key configured!")
        print("Add TRIO_API_KEY to Colab secrets, or set it manually:")
        print('TRIO_API_KEY = "your_api_key_here"')
    else:
        print(f"✓ API Key configured: {TRIO_API_KEY[:8]}...")

In [None]:
# =============================================================================
# VALIDATION - Verify API connectivity before proceeding
# =============================================================================

def validate_api_connection():
    """Validate API key and connectivity."""
    print("Validating API connection...\n")
    
    # Check API health
    try:
        health_response = requests.get(f"{TRIO_API_URL}/health", timeout=5)
        if health_response.status_code == 200:
            print("✓ API server is reachable")
        else:
            print(f"✗ API server returned status {health_response.status_code}")
            return False
    except requests.exceptions.ConnectionError:
        print(f"✗ Cannot connect to API at {TRIO_API_URL}")
        return False
    except Exception as e:
        print(f"✗ Connection error: {str(e)}")
        return False
    
    # Test API key by fetching cohorts
    if not TRIO_API_KEY:
        print("✗ No API key configured")
        return False
    
    try:
        test_response = requests.get(
            f"{TRIO_API_URL}/cohorts/indexed",
            headers={"X-API-Key": TRIO_API_KEY},
            params={"limit": 1},
            timeout=10
        )
        if test_response.status_code == 200:
            print("✓ API key is valid")
            return True
        elif test_response.status_code == 401:
            print("✗ API key is invalid or expired")
            return False
        elif test_response.status_code == 403:
            print("✗ API key does not have access to this resource")
            return False
        else:
            print(f"✗ Unexpected response: {test_response.status_code}")
            return False
    except Exception as e:
        print(f"✗ Error testing API key: {str(e)}")
        return False

is_valid = validate_api_connection()

if is_valid:
    print("\n" + "=" * 50)
    print("✓ Ready to proceed!")
    print("=" * 50)
else:
    print("\n" + "=" * 50)
    print("✗ Please fix the issues above before continuing")
    print("=" * 50)

## Step 1: Discover Available Cohorts

Before searching, you need to know which cohorts are indexed and available. The `/cohorts/indexed` endpoint lists all cohorts with searchable data.

Each indexed cohort includes:
- `cohort_id` - Unique identifier
- `cohort_name` - Human-readable name (if available)
- `namespace` - The index partition for this cohort in our search system
- `chunk_count` - Number of indexed chunks (vectors)
- `index_status` - Current indexing status

In [None]:
def get_indexed_cohorts(limit: int = 20) -> List[Dict[str, Any]]:
    """
    List all indexed cohorts available for search.

    Args:
        limit: Maximum number of cohorts to return (max 100)

    Returns:
        List of indexed cohort information
    """
    response = requests.get(
        f"{TRIO_API_URL}/cohorts/indexed",
        headers={"X-API-Key": TRIO_API_KEY},
        params={"limit": limit}
    )

    if response.status_code == 200:
        data = response.json()
        cohorts = data.get("items", [])
        total = data.get("total_count", len(cohorts))
        print(f"Found {total} indexed cohorts (showing {len(cohorts)})")
        return cohorts
    else:
        print(f"Failed to list cohorts: {response.status_code}")
        print(response.text)
        return []

# Fetch indexed cohorts
cohorts = get_indexed_cohorts()

In [None]:
# Display cohorts as a formatted table
if cohorts:
    cohorts_df = pd.DataFrame(cohorts)

    # Format the display
    display_cols = ["cohort_id", "cohort_name", "chunk_count", "index_status"]
    available_cols = [c for c in display_cols if c in cohorts_df.columns]

    print("\nIndexed Cohorts:")
    print("-" * 80)
    display(cohorts_df[available_cols].style.format({
        "chunk_count": "{:,}"
    }))

    # Summary statistics
    total_chunks = cohorts_df["chunk_count"].sum()
    print(f"\nTotal indexed chunks across all cohorts: {total_chunks:,}")
else:
    print("No cohorts available. Ensure data has been indexed.")

## Step 2: Broad Search

Now let's perform a broad search using your configured `SEARCH_TERM`. We'll start with a high `k` value (10,000) to see the full scope of matching results.

The Search API supports three search modes:

| Mode | Description | Best For |
|------|-------------|----------|
| `keyword` | BM25 full-text search | Exact terms, medical codes |
| `semantic` | Vector similarity search | Conceptual queries |
| `hybrid` (default) | Combines both approaches | General-purpose queries |

**Goal:** Get a broad view of results, then narrow down with filters in subsequent steps.

In [None]:
def search(
    query: str,
    search_type: str = "hybrid",
    k: int = 10,
    cohort_ids: Optional[List[str]] = None,
    rerank: bool = True,
    date_from: Optional[str] = None,
    date_to: Optional[str] = None,
    note_types: Optional[List[str]] = None,
    include_noise: bool = False,
    distinct: Optional[str] = None,
    vector_weight: Optional[float] = None,
    top_k_retrieval: Optional[int] = None,
    distance_threshold: Optional[float] = None,
    chunk_multiplier: Optional[int] = None,
    min_quality_score: Optional[float] = None,
    min_chunk_quality_score: Optional[float] = None,
    filters: Optional[Dict[str, Any]] = None,
    entity_filters: Optional[Dict[str, Any]] = None,
    **kwargs
) -> Dict[str, Any]:
    """
    Search indexed patient notes.

    Args:
        query: Search text
        search_type: 'keyword', 'semantic', or 'hybrid' (default)
        k: Number of results to return (max 10,000)
        cohort_ids: List of cohort IDs to search
        rerank: Apply Cohere reranking (default True)
        date_from: Filter from date (YYYY-MM-DD)
        date_to: Filter to date (YYYY-MM-DD)
        note_types: Filter by note types
        include_noise: Include noise notes (default False)
        distinct: De-duplication mode ('encounter', 'patient', 'note', or 'none')
        vector_weight: Weight for vector search in fusion (0.0-1.0)
        top_k_retrieval: Number of results to retrieve before reranking
        distance_threshold: Cosine distance cutoff for semantic results
        chunk_multiplier: Initial retrieval multiplier for semantic search
        min_quality_score: Minimum note quality score (0.0-1.0)
        min_chunk_quality_score: Minimum chunk quality score (0.0-1.0)
        filters: Attribute filters as dict (converted to JSON)
        entity_filters: Entity/assertion filters as dict (converted to JSON)

    Returns:
        Search response with results and metadata
    """
    params = {
        "query": query,
        "search-type": search_type,
        "k": k,
        "rerank": str(rerank).lower(),
        "include-noise": str(include_noise).lower(),
    }

    if cohort_ids:
        params["cohort-ids"] = ",".join(str(c) for c in cohort_ids)
    if date_from:
        params["date-from"] = date_from
    if date_to:
        params["date-to"] = date_to
    if note_types:
        params["note-types"] = ",".join(note_types)
    if distinct:
        params["distinct"] = distinct
    if vector_weight is not None:
        params["vector-weight"] = vector_weight
    if top_k_retrieval is not None:
        params["top-k-retrieval"] = top_k_retrieval
    if distance_threshold is not None:
        params["distance-threshold"] = distance_threshold
    if chunk_multiplier is not None:
        params["chunk-multiplier"] = chunk_multiplier
    if min_quality_score is not None:
        params["min-quality-score"] = min_quality_score
    if min_chunk_quality_score is not None:
        params["min-chunk-quality-score"] = min_chunk_quality_score
    if filters:
        params["filters"] = json.dumps(filters)
    if entity_filters:
        params["entity-filters"] = json.dumps(entity_filters)

    for key, value in kwargs.items():
        params[key.replace("_", "-")] = value

    response = requests.get(
        f"{TRIO_API_URL}/search",
        headers={"X-API-Key": TRIO_API_KEY},
        params=params
    )

    if response.status_code == 200:
        return response.json()
    else:
        print(f"Search failed: {response.status_code}")
        print(response.text)
        return {"results": [], "metadata": {}}

# Perform broad search with configured SEARCH_TERM
print(f"Searching for: '{SEARCH_TERM}'")
print(f"Cohorts: {COHORT_IDS if COHORT_IDS else 'All'}")
print("-" * 50)

broad_results = search(
    query=SEARCH_TERM,
    k=TOP_K,
    cohort_ids=COHORT_IDS
)

# Store result count for funnel tracking
broad_count = broad_results.get("metadata", {}).get("total_results", len(broad_results.get("results", [])))
print(f"\n✓ Broad search returned {broad_count} results")
print(f"  Unique patients: {broad_results.get('metadata', {}).get('unique_patients', 'N/A')}")
print(f"  Unique encounters: {broad_results.get('metadata', {}).get('unique_encounters', 'N/A')}")

# Initialize funnel tracking
filter_funnel = {"Broad Search": broad_count}

## Step 3: Filtering by Quality Score

Quality scores help filter out low-quality or noisy clinical text. This is often the most impactful filter for improving result relevance.

**Key Parameters:**
- `min_quality_score` - Minimum note-level quality (0.0-1.0)
- `min_chunk_quality_score` - Minimum chunk-level quality (0.0-1.0)

Higher scores mean cleaner, more clinically relevant text. A threshold of **0.7** typically removes most noise while preserving valuable content.

In [None]:
# Apply quality score filters
quality_results = search(
    query=SEARCH_TERM,
    k=TOP_K,
    cohort_ids=COHORT_IDS,
    min_quality_score=0.7,
    min_chunk_quality_score=0.6
)

quality_count = quality_results.get("metadata", {}).get("total_results", len(quality_results.get("results", [])))
filter_funnel["+ Quality ≥0.7"] = quality_count

print(f"Results with quality filters:")
print(f"  Before: {broad_count}")
print(f"  After:  {quality_count}")
print(f"  Reduction: {broad_count - quality_count} results filtered ({(1 - quality_count/max(broad_count,1))*100:.1f}%)")

## Step 4: Adding Date Filters

You can filter search results by date range using:

- `date-from` - Include notes from this date onwards (YYYY-MM-DD, inclusive)
- `date-to` - Include notes up to this date (YYYY-MM-DD, inclusive)

This is useful for:
- Finding recent documentation for a condition
- Analyzing notes within a specific time period
- Tracking progression of a condition over time

In [None]:
# Add date filters to further narrow results
# Using last 6 months as an example - adjust based on your data

end_date = datetime.now()
start_date = end_date - timedelta(days=180)

date_from = start_date.strftime("%Y-%m-%d")
date_to = end_date.strftime("%Y-%m-%d")

print(f"Adding date filter: {date_from} to {date_to}")

date_results = search(
    query=SEARCH_TERM,
    k=TOP_K,
    cohort_ids=COHORT_IDS,
    min_quality_score=0.7,
    min_chunk_quality_score=0.6,
    date_from=date_from,
    date_to=date_to
)

date_count = date_results.get("metadata", {}).get("total_results", len(date_results.get("results", [])))
filter_funnel["+ Date Filter"] = date_count

print(f"\nResults with date filter added:")
print(f"  Before: {quality_count}")
print(f"  After:  {date_count}")
print(f"  Reduction: {quality_count - date_count} results filtered")

In [None]:
# Timeline visualization: Show results distribution over time
results_list = date_results.get("results", [])

if results_list:
    # Extract dates from results
    dates = []
    for r in results_list:
        note_date = r.get("note_date")
        if note_date:
            try:
                dates.append(pd.to_datetime(note_date))
            except:
                pass
    
    if dates:
        fig, ax = plt.subplots(figsize=(12, 4))
        
        # Create histogram of results by month
        date_series = pd.Series(dates)
        date_series.groupby(date_series.dt.to_period("M")).count().plot(
            kind="bar", ax=ax, color="#3498db", edgecolor="black"
        )
        
        ax.set_xlabel("Month")
        ax.set_ylabel("Number of Results")
        ax.set_title(f"Results Timeline for '{SEARCH_TERM}'")
        plt.xticks(rotation=45)
        plt.tight_layout()
        plt.show()
    else:
        print("No dates available in results for timeline visualization")
else:
    print("No results to visualize")

## Step 5: Entity and Assertion Filtering

The Search API can filter results based on extracted clinical entities and their assertion status. This is powerful for finding specific clinical mentions with context.

### Entity Types

Clinical entities are extracted and categorized into these types:

| Entity Type | Description | Examples |
|-------------|-------------|----------|
| `symptoms` | Patient-reported or observed symptoms | "chest pain", "shortness of breath" |
| `diagnoses` | Medical diagnoses and conditions | "diabetes mellitus", "hypertension" |
| `medications` | Drugs and medications | "metformin", "lisinopril" |
| `procedures` | Medical procedures | "colonoscopy", "cardiac catheterization" |
| `lab_tests` | Laboratory tests | "HbA1c", "CBC" |
| `vital_signs` | Vital sign measurements | "blood pressure", "heart rate" |

### Assertion Types

Each entity has an assertion status indicating clinical context:

| Assertion Type | Description | Example Context |
|----------------|-------------|-----------------|
| `present` | Currently present/active | "Patient has diabetes" |
| `negated` | Explicitly negated | "No chest pain" |
| `historical` | Past history | "History of MI in 2019" |
| `family` | Family history | "Mother had breast cancer" |
| `hypothetical` | Possible/uncertain | "Rule out PE" |
| `conditional` | Conditional mention | "If symptoms worsen" |

### Combining Entity and Assertion Types

Filter keys combine entity type and assertion type with an underscore: `{entity_type}_{assertion_type}`

**Examples:**
- `symptoms_present` - Active symptoms
- `diagnoses_negated` - Ruled-out diagnoses
- `medications_historical` - Previously prescribed medications
- `diagnoses_family` - Family history of conditions

In [None]:
# Apply entity filters for clinical precision
# Example: Filter for notes with present symptoms related to our search

entity_results = search(
    query=SEARCH_TERM,
    k=TOP_K,
    cohort_ids=COHORT_IDS,
    min_quality_score=0.7,
    min_chunk_quality_score=0.6,
    date_from=date_from,
    date_to=date_to,
    entity_filters={
        "symptoms_present": ["nausea", "fatigue", "headache"]  # Adjust based on your search term
    }
)

entity_count = entity_results.get("metadata", {}).get("total_results", len(entity_results.get("results", [])))
filter_funnel["+ Entity Filter"] = entity_count

print(f"Results with entity filter added:")
print(f"  Before: {date_count}")
print(f"  After:  {entity_count}")
print(f"  Reduction: {date_count - entity_count} results filtered")

In [None]:
# Funnel Chart: Visualize the broad-to-narrow workflow
fig, ax = plt.subplots(figsize=(10, 6))

steps = list(filter_funnel.keys())
counts = list(filter_funnel.values())

# Create horizontal bar chart (funnel style)
colors = plt.cm.Blues(np.linspace(0.3, 0.9, len(steps)))[::-1]
bars = ax.barh(range(len(steps)), counts, color=colors, edgecolor='black')

# Add count labels
for i, (bar, count) in enumerate(zip(bars, counts)):
    ax.text(bar.get_width() + max(counts)*0.02, bar.get_y() + bar.get_height()/2, 
            f'{count:,}', va='center', fontsize=11, fontweight='bold')

ax.set_yticks(range(len(steps)))
ax.set_yticklabels(steps)
ax.invert_yaxis()  # Top to bottom
ax.set_xlabel('Number of Results')
ax.set_title(f"Filter Funnel: '{SEARCH_TERM}'", fontsize=14, fontweight='bold')
ax.set_xlim(0, max(counts) * 1.15)

plt.tight_layout()
plt.show()

# Print summary
print("\n" + "=" * 50)
print("FILTER FUNNEL SUMMARY")
print("=" * 50)
for step, count in filter_funnel.items():
    print(f"  {step}: {count:,} results")

## Step 6: Advanced Filters

For complex filtering scenarios, you can pass raw filter expressions. These filters operate directly on indexed attributes and support various operators.

### Filter Syntax

Filters are passed as JSON objects with the following structure:

```json
{
    "field_name": ["value1", "value2"],       // In list (OR)
    "field_name": {"$eq": "exact_value"},     // Exact match
    "field_name": {"$ne": "excluded_value"},  // Not equal
    "field_name": {"$in": ["a", "b", "c"]},   // In list
    "field_name": {"$nin": ["x", "y"]},       // Not in list
    "field_name": {"$gt": 0.5},               // Greater than
    "field_name": {"$gte": 0.5},              // Greater than or equal
    "field_name": {"$lt": 1.0},               // Less than
    "field_name": {"$lte": 1.0}               // Less than or equal
}
```

### Common Filterable Fields

- `note_type` - Type of clinical note
- `quality_score` - Note quality score (0.0-1.0)
- `chunk_quality_score` - Chunk quality score (0.0-1.0)
- `patient_id` - Patient identifier
- `encounter_id` - Encounter identifier

In [None]:
# Example: Using raw filters for specific note types
# Filter to only include "Progress Notes" and "Discharge Summary"

filtered_results = search(
    query=SEARCH_TERM,
    k=TOP_K,
    cohort_ids=COHORT_IDS,
    filters={
        "note_type": ["Progress Note", "Discharge Summary"]
    }
)

print(f"Results filtered by note type:")
print(f"  Found {len(filtered_results.get('results', []))} results")

# Show note type distribution in results
if filtered_results.get("results"):
    note_types = [r.get("note_type", "Unknown") for r in filtered_results["results"]]
    type_counts = pd.Series(note_types).value_counts()
    print("\nNote type distribution:")
    for nt, count in type_counts.items():
        print(f"  {nt}: {count}")

## Step 7: Filter Field Discovery

Before building complex filters, you can discover what filter fields are available and their possible values. 

Each cohort has a `namespace` - this is the index partitioning scheme in our search system that organizes the data for efficient retrieval.

### Endpoints

- `GET /namespaces/{namespace}/filter-fields` - List available filter fields
- `GET /namespaces/{namespace}/filter-values/{field}` - Get values for a specific field

In [None]:
def get_filter_fields(
    namespace: str,
    category: Optional[str] = None
) -> List[Dict[str, Any]]:
    """
    List available filter fields for a namespace.

    Args:
        namespace: The namespace from cohort info (index partition identifier)
        category: Filter by category (e.g., 'metadata', 'entity')

    Returns:
        List of filter field definitions
    """
    params = {}
    if category:
        params["category"] = category

    response = requests.get(
        f"{TRIO_API_URL}/namespaces/{namespace}/filter-fields",
        headers={"X-API-Key": TRIO_API_KEY},
        params=params
    )

    if response.status_code == 200:
        data = response.json()
        fields = data.get("fields", [])
        print(f"Found {len(fields)} filter fields for namespace '{namespace}'")
        return fields
    else:
        print(f"Failed to get filter fields: {response.status_code}")
        print(response.text)
        return []


def get_filter_values(
    namespace: str,
    field: str,
    limit: int = 50
) -> List[Any]:
    """
    Get possible values for a filter field.

    Args:
        namespace: The namespace (index partition identifier)
        field: Field name to get values for
        limit: Maximum number of values to return

    Returns:
        List of possible values for the field
    """
    response = requests.get(
        f"{TRIO_API_URL}/namespaces/{namespace}/filter-values/{field}",
        headers={"X-API-Key": TRIO_API_KEY},
        params={"limit": limit}
    )

    if response.status_code == 200:
        data = response.json()
        values = data.get("values", [])
        total = data.get("total_count", len(values))
        print(f"Found {total} values for field '{field}' (showing {len(values)})")
        return values
    else:
        print(f"Failed to get filter values: {response.status_code}")
        print(response.text)
        return []

In [None]:
# Example: Discover filter fields for a cohort
# First, get a namespace from an indexed cohort

if cohorts:
    # Use the first cohort's namespace
    example_namespace = cohorts[0].get("namespace")
    
    if example_namespace:
        print(f"Exploring filter fields for namespace: {example_namespace}\n")
        
        # Get all available filter fields
        fields = get_filter_fields(example_namespace)
        
        if fields:
            fields_df = pd.DataFrame(fields)
            print("\nAvailable Filter Fields:")
            print("-" * 60)
            display(fields_df)
    else:
        print("No namespace found in cohort data")
else:
    print("No cohorts available. Run the cohorts cell first.")

In [None]:
# Example: Get possible values for a specific filter field
# This is useful for building filter dropdowns in UIs

if cohorts:
    example_namespace = cohorts[0].get("namespace")
    
    if example_namespace:
        # Get values for note_type field
        print("Getting values for 'note_type' field...\n")
        note_type_values = get_filter_values(example_namespace, "note_type", limit=20)
        
        if note_type_values:
            print("\nAvailable note_type values:")
            for i, value in enumerate(note_type_values, 1):
                print(f"  {i}. {value}")
else:
    print("No cohorts available. Run the cohorts cell first.")

## Step 8: Reviewing Results by Encounter

After narrowing down results with filters, the final step is to review the actual clinical notes. Grouping results by encounter helps you understand the context of each finding.

This section shows how to:
1. Group results by `encounter_id`
2. Display an encounter summary table
3. View actual note text for selected encounters

In [None]:
# Use the final filtered results (or entity_results if available)
final_results = entity_results if entity_results.get("results") else date_results

results_list = final_results.get("results", [])

if results_list:
    # Group results by encounter_id
    encounters = {}
    for r in results_list:
        enc_id = r.get("encounter_id", "Unknown")
        if enc_id not in encounters:
            encounters[enc_id] = {
                "patient_id": r.get("patient_id", "Unknown"),
                "notes": [],
                "dates": [],
                "note_types": set()
            }
        encounters[enc_id]["notes"].append(r)
        if r.get("note_date"):
            encounters[enc_id]["dates"].append(r.get("note_date"))
        if r.get("note_type"):
            encounters[enc_id]["note_types"].add(r.get("note_type"))
    
    # Create encounter summary table
    summary_data = []
    for enc_id, data in encounters.items():
        dates = sorted(data["dates"]) if data["dates"] else []
        summary_data.append({
            "encounter_id": enc_id[:20] + "..." if len(str(enc_id)) > 20 else enc_id,
            "patient_id": data["patient_id"][:15] + "..." if len(str(data["patient_id"])) > 15 else data["patient_id"],
            "note_count": len(data["notes"]),
            "note_types": ", ".join(sorted(data["note_types"]))[:40],
            "date_range": f"{dates[0]} to {dates[-1]}" if len(dates) > 1 else (dates[0] if dates else "N/A")
        })
    
    summary_df = pd.DataFrame(summary_data)
    summary_df = summary_df.sort_values("note_count", ascending=False).head(10)
    
    print(f"Found {len(encounters)} unique encounters")
    print("\nTop 10 Encounters by Note Count:")
    print("-" * 80)
    display(summary_df)
else:
    print("No results available. Run the search cells above first.")

In [ ]:
# Display actual note text for the first few encounters
if results_list and encounters:
    print("=" * 80)
    print("SAMPLE NOTE TEXT")
    print("=" * 80)
    
    # Get first 3 encounters with notes
    sample_encounters = list(encounters.items())[:3]
    
    for enc_id, data in sample_encounters:
        print(f"\n{'─' * 80}")
        print(f"ENCOUNTER: {enc_id}")
        print(f"Patient: {data['patient_id']}")
        print(f"Notes: {len(data['notes'])}")
        print(f"{'─' * 80}")
        
        # Show first note from this encounter
        note = data["notes"][0]
        print(f"\nNote Type: {note.get('note_type', 'N/A')}")
        print(f"Note Date: {note.get('note_date', 'N/A')}")
        print(f"Score: {note.get('score', 0):.4f}")
        
        # Display text snippet
        text = note.get('text_chunk') or note.get('text_full', '')
        if text:
            print(f"\nText Preview:")
            print("-" * 40)
            # Show first 500 characters
            preview = text[:500] + "..." if len(text) > 500 else text
            print(preview)
        else:
            print("\n(No text available in result)")
else:
    print("No encounters to display")

## Summary

This notebook demonstrated a **broad-to-narrow** search workflow:

```
Broad search (k=10000)   ████████████████████  → Start with all matches
+ quality ≥0.7           ████████████          → Remove noisy content
+ date filter            ██████                → Focus on time period
+ entity filter          ███                   → Clinical precision
→ Review by encounter    [actual notes]        → Examine results
```

### What You Learned

1. **Configure & Validate** - Set up API key and verify connectivity
2. **Discover Cohorts** - Find indexed cohorts available for search
3. **Broad Search** - Start with high `k` to see the full scope of results
4. **Quality Filtering** - Use `min_quality_score` to reduce noise
5. **Date Filtering** - Narrow by time period with `date_from`/`date_to`
6. **Entity Filtering** - Filter by clinical entities and assertions
7. **Advanced Filters** - Use raw filter expressions for complex queries
8. **Filter Discovery** - Explore available fields and values
9. **Review by Encounter** - Group and examine actual note text

### Key API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/cohorts/indexed` | GET | List indexed cohorts |
| `/search` | GET | Search notes with filters |
| `/namespaces/{ns}/filter-fields` | GET | List filter fields |
| `/namespaces/{ns}/filter-values/{field}` | GET | Get field values |

### Search Parameters Quick Reference

| Parameter | Type | Description |
|-----------|------|-------------|
| `query` | string | Search text (required) |
| `search-type` | string | `keyword`, `semantic`, or `hybrid` |
| `k` | int | Number of results (max 10,000) |
| `cohort-ids` | string | Comma-separated cohort IDs |
| `min-quality-score` | float | Minimum note quality (0.0-1.0) |
| `date-from` / `date-to` | string | Date range filter (YYYY-MM-DD) |
| `entity-filters` | JSON | Entity/assertion type filters |
| `filters` | JSON | Advanced attribute filters |

### Next Steps

- Try different `SEARCH_TERM` values in the configuration cell
- Adjust entity filters based on your clinical domain
- Explore the CLI tool: `trioexplorer --help`
- Build custom filters using the filter discovery endpoints

### Troubleshooting

| Issue | Solution |
|-------|----------|
| Connection refused | Verify API endpoints with your administrator |
| 401 Unauthorized | Check API key is valid |
| No cohorts found | Data may not be indexed yet |
| Too many results | Increase quality score thresholds |
| No results | Broaden search term or reduce filters |