# Notebook 1: Qdrant Fundamentals & Search Basics

## üéØ Objectives

In this notebook, you'll learn:
- How to connect to Qdrant Cloud with your own credentials
- Create collections with vector configurations
- Ingest documents with metadata (payload)
- Perform basic vector searches
- Use filtering with payload indexes
- Understand core Qdrant concepts: collections, points, vectors, payload

## üìã Prerequisites

- `qdrant-client=1.15`, `numpy`, `pandas`, `tqdm`
- Required environment variables: `QDRANT_URL`, `QDRANT_API_KEY`

In [1]:
import sys
import os
import numpy as np
import pandas as pd
from utils import (
    ensure_collection, create_sample_dataset,
    upsert_points_batch, search_dense, print_search_results,
    create_payload_index, print_system_info
)
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, Filter, FieldCondition, MatchValue

print_system_info()
print(f"\nüìç Working directory: {os.getcwd()}")



üîß System Information:
   Python: 3.9.6
   ‚úÖ Qdrant Client: unknown
   ‚úÖ NumPy: 2.0.2
   ‚úÖ Pandas: 2.3.1
   ‚úÖ Matplotlib: 3.9.4


   ‚úÖ Scikit-learn: 1.6.1

üîß Optional Dependencies:
   ‚úÖ FastEmbed: 0.7.1


   ‚úÖ OpenAI: 1.100.1
   ‚úÖ Anthropic: 0.64.0

üî¨ Environment: JupyterLab/Notebook detected

üìç Working directory: /Users/thierrydamiba/dsdojo


In [2]:
# Load environment variables from .env for Jupyter
try:
    from dotenv import load_dotenv, find_dotenv
    load_dotenv(find_dotenv(), override=False)
    print("üîê Loaded environment from .env")
except Exception as e:
    print(f"‚ö†Ô∏è Could not load .env via python-dotenv: {e}")


üîê Loaded environment from .env


## üì¶ Auto-Install Dependencies

The cell below will automatically install any missing packages. Perfect for JupyterLab environments!

In [3]:
# Install dependencies (will skip if already installed)
import subprocess
import sys

def install_if_missing(package_name, import_name=None):
    """Install package if not already available"""
    if import_name is None:
        import_name = package_name.replace('-', '_')
    
    try:
        __import__(import_name)
        print(f"‚úÖ {package_name} already available")
        return True
    except ImportError:
        print(f"üì¶ Installing {package_name}...")
        try:
            subprocess.check_call([sys.executable, "-m", "pip", "install", package_name, "-q"])
            print(f"‚úÖ {package_name} installed successfully")
            return True
        except subprocess.CalledProcessError as e:
            print(f"‚ùå Failed to install {package_name}: {e}")
            return False

# Required packages for this notebook
required_packages = [
    ("qdrant-client", "qdrant_client"),
    ("pandas", "pandas"),
    ("numpy", "numpy"), 
    ("tqdm", "tqdm"),
    ("matplotlib", "matplotlib")
]

print("üîß Checking and installing dependencies...")
all_installed = True
for package, import_name in required_packages:
    if not install_if_missing(package, import_name):
        all_installed = False

if all_installed:
    print("\nüéâ All dependencies ready!")
else:
    print("\n‚ö†Ô∏è Some packages failed to install. You may need to install them manually.")

üîß Checking and installing dependencies...
‚úÖ qdrant-client already available
‚úÖ pandas already available
‚úÖ numpy already available
‚úÖ tqdm already available
‚úÖ matplotlib already available

üéâ All dependencies ready!


## ‚öôÔ∏è Qdrant Cloud Setup (Required)

For this notebook, you must use your own Qdrant Cloud cluster.

- Sign up and create a free cluster at: [cloud.qdrant.io](https://cloud.qdrant.io)
- Obtain your cluster URL and API key, then set these environment variables before connecting:

```python
import os
os.environ["QDRANT_URL"] = "https://your-cluster.qdrant.io:6333"
os.environ["QDRANT_API_KEY"] = "your-api-key"
```

No shared webinar cluster is provided in this notebook.

In [4]:
# Workshop Configuration
COLLECTION_NAME = "workshop_fundamentals"
VECTOR_SIZE = 384  # Compatible with many embedding models

print("üîê Qdrant Cloud Setup (Your Own Credentials)")
print("=" * 40)

# Require user-provided cluster credentials
custom_url = os.getenv("QDRANT_URL")
custom_key = os.getenv("QDRANT_API_KEY")

if not custom_url or not custom_key:
    raise RuntimeError(
        "QDRANT_URL and QDRANT_API_KEY must be set. Create a free cluster at https://cloud.qdrant.io, "
        "then set the environment variables as shown in the previous cell."
    )

print("üåê Using your Qdrant Cloud cluster")
print(f"   URL: {custom_url}")
print(f"   API Key: {'*' * (len(custom_key)-4) + custom_key[-4:]}")

print(f"\nüìÅ Collection: {COLLECTION_NAME}")
print(f"üéØ Vector size: {VECTOR_SIZE}")

üîê Qdrant Cloud Setup (Your Own Credentials)
üåê Using your Qdrant Cloud cluster
   URL: https://a025094c-936b-4e1b-b947-67d686d20306.eu-central-1-0.aws.development-cloud.qdrant.io:6333
   API Key: ************************************************************************************************ivvs

üìÅ Collection: workshop_fundamentals
üéØ Vector size: 384


## üèóÔ∏è Dataset Creation

Let's create a small, portable dataset of FAQ and documentation entries.

In [5]:
# Create sample dataset
df = create_sample_dataset(size=150, seed=42)

print(f"üìä Created dataset with {len(df)} entries")
print(f"\nüìÇ Categories: {df['category'].value_counts().to_dict()}")
print(f"üåç Languages: {df['lang'].value_counts().to_dict()}")

# Preview the data
print("\nüîç Sample entries:")
df.head()

üìä Created dataset with 150 entries

üìÇ Categories: {np.str_('release'): 34, np.str_('product'): 33, np.str_('faq'): 31, np.str_('policy'): 26, np.str_('howto'): 26}
üåç Languages: {np.str_('de'): 42, np.str_('en'): 38, np.str_('fr'): 36, np.str_('es'): 34}

üîç Sample entries:


Unnamed: 0,id,text,category,lang,timestamp
0,1,Learn about api documentation and examples,product,en,1728716456
1,2,Deprecated features announcement - Updated ver...,release,en,1728839193
2,3,FAQ: Third-party integrations policy?,policy,fr,1745946504
3,4,Security compliance standards - Updated version,policy,es,1741621656
4,5,Performance benchmarks and metrics - Updated v...,product,es,1746394672


## üé≤ Generate Embedding Vectors

For this fundamentals notebook, we'll use random normalized vectors to focus on Qdrant concepts. In real applications, you'd use actual embedding models.

In [6]:
# Generate random normalized vectors for demonstration
# In production, you would use real embeddings from models like:
# - sentence-transformers
# - OpenAI embeddings
# - FastEmbed

np.random.seed(42)
vectors = np.random.randn(len(df), VECTOR_SIZE)
# Normalize vectors for cosine similarity
vectors = vectors / np.linalg.norm(vectors, axis=1, keepdims=True)

print(f"‚úÖ Generated {vectors.shape[0]} vectors of dimension {vectors.shape[1]}")
print(f"üìè Vector norm check (should be ~1.0): {np.linalg.norm(vectors[0]):.4f}")

‚úÖ Generated 150 vectors of dimension 384
üìè Vector norm check (should be ~1.0): 1.0000


## üîå Connect to Qdrant

In [7]:
# Initialize Qdrant Cloud client (requires your own credentials)
try:
    qdrant_url = os.getenv("QDRANT_URL")
    qdrant_key = os.getenv("QDRANT_API_KEY")
    if not qdrant_url or not qdrant_key:
        raise RuntimeError(
            "QDRANT_URL and QDRANT_API_KEY must be set. Create a free cluster at https://cloud.qdrant.io, "
            "then set the environment variables as shown above."
        )

    client = QdrantClient(url=qdrant_url, api_key=qdrant_key)
    # Test connection
    health = client.get_collections()
    print("üåê Connected to Qdrant Cloud successfully!")
    print(f"üì¶ Existing collections: {[c.name for c in health.collections]}")
except Exception as e:
    print(f"‚ùå Connection failed: {e}")
    print("\nüîß Troubleshooting:")
    print("1. Check your QDRANT_URL (should start with https://)")
    print("2. Verify your QDRANT_API_KEY from cloud.qdrant.io")
    print("3. Make sure your cluster is running (check Qdrant Cloud dashboard)")
    print("4. Try setting variables directly in Python:")
    print('   os.environ["QDRANT_URL"] = "https://your-cluster.qdrant.io:6333"')
    print('   os.environ["QDRANT_API_KEY"] = "your-api-key"')
    raise

üåê Connected to Qdrant Cloud successfully!
üì¶ Existing collections: ['workshop_multi_vector', 'workshop_mmr', 'workshop_fundamentals', 'cuad_legal_clauses', 'cuad_clauses', 'workshop_health', 'workshop_hybrid', 'minicoil-collection', 'midjourney', 'acme_zephyr', 'agentic_rag_demo']


## üìö Create Collection

Collections in Qdrant are like tables in databases - they store points (vectors + metadata).

In [8]:
# Define vector configuration
# Using a single named vector "text" with cosine distance
vector_config = VectorParams(
    size=VECTOR_SIZE,
    distance=Distance.COSINE  # Good for text embeddings
)

# Create collection
ensure_collection(
    client=client,
    collection_name=COLLECTION_NAME,
    vector_config=vector_config,
    force_recreate=False  # Set to True to start fresh
)

# Get collection info
info = client.get_collection(COLLECTION_NAME)
print(f"\nüìã Collection info:")
print(f"   Points count: {info.points_count}")
print(f"   Vector size: {info.config.params.vectors.size}")
print(f"   Distance: {info.config.params.vectors.distance}")

‚úì Collection 'workshop_fundamentals' already exists



üìã Collection info:
   Points count: 150
   Vector size: 384
   Distance: Cosine


## üì• Ingest Points

Points are the core data unit in Qdrant: ID + Vector + Payload (metadata).

In [9]:
# Define which DataFrame columns to include as payload
payload_columns = ["text", "category", "lang", "timestamp"]

# Upsert points in batches
print("üì§ Uploading points...")
upsert_points_batch(
    client=client,
    collection_name=COLLECTION_NAME,
    df=df,
    vectors=vectors,
    payload_cols=payload_columns,
    batch_size=50
)

# Verify upload
info = client.get_collection(COLLECTION_NAME)
print(f"\n‚úÖ Upload complete! Collection now has {info.points_count} points")

üì§ Uploading points...



Uploading points:   0%|                                                                                                   | 0/3 [00:00<?, ?it/s]


Uploading points:  33%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé                                                            | 1/3 [00:01<00:02,  1.24s/it]


Uploading points:  67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã                              | 2/3 [00:01<00:00,  1.34it/s]


Uploading points: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:02<00:00,  1.61it/s]


Uploading points: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3/3 [00:02<00:00,  1.42it/s]


‚úÖ Upload complete! Collection now has 150 points





## üè∑Ô∏è Create Payload Indexes

Payload indexes speed up filtering operations.

In [10]:
# Create indexes for fields we'll filter on
create_payload_index(client, COLLECTION_NAME, "category", "keyword")
create_payload_index(client, COLLECTION_NAME, "lang", "keyword")
create_payload_index(client, COLLECTION_NAME, "timestamp", "integer")

print("\nüìñ Payload indexes created for faster filtering!")

‚úì Created payload index for 'category' (keyword)


‚úì Created payload index for 'lang' (keyword)


‚úì Created payload index for 'timestamp' (integer)

üìñ Payload indexes created for faster filtering!


## üîç First Vector Search

Let's perform our first similarity search using one of our vectors as the query.

In [11]:
# Use the first document's vector as our query
query_idx = 0
query_vector = vectors[query_idx]
query_text = df.iloc[query_idx]["text"]

print(f"üîç Query text: '{query_text}'")
print(f"üìÇ Query category: {df.iloc[query_idx]['category']}")

# Perform search
results = search_dense(
    client=client,
    collection_name=COLLECTION_NAME,
    query_vector=query_vector,
    limit=5,
    with_payload=True
)

print_search_results(results, "üéØ Most Similar Documents")

üîç Query text: 'Learn about api documentation and examples'
üìÇ Query category: product



üéØ Most Similar Documents

1. Score: 1.0000
   ID: 1
   Category: product
   Language: en
   Text: Learn about api documentation and examples...

2. Score: 0.1541
   ID: 23
   Category: faq
   Language: fr
   Text: How do I reset my password?...

3. Score: 0.1120
   ID: 26
   Category: howto
   Language: en
   Text: Configuring database connections...

4. Score: 0.1001
   ID: 146
   Category: product
   Language: de
   Text: Integration with popular tools...

5. Score: 0.0947
   ID: 147
   Category: faq
   Language: fr
   Text: Guide: How do I cancel my subscription?...


## üéõÔ∏è Filtered Search

Now let's add filters to search within specific categories or time ranges.

In [12]:
# Create a filter for product and policy categories
category_filter = Filter(
    must=[
        FieldCondition(
            key="category",
            match=MatchValue(value="product")
        )
    ]
)

# Search with filter
filtered_results = search_dense(
    client=client,
    collection_name=COLLECTION_NAME,
    query_vector=query_vector,
    limit=5,
    filter_condition=category_filter,
    with_payload=True
)

print_search_results(filtered_results, "üéØ Product Category Results")

# Compare result counts
print(f"\nüìä Results comparison:")
print(f"   Unfiltered: {len(results)} results")
print(f"   Product only: {len(filtered_results)} results")


üéØ Product Category Results

1. Score: 1.0000
   ID: 1
   Category: product
   Language: en
   Text: Learn about api documentation and examples...

2. Score: 0.1001
   ID: 146
   Category: product
   Language: de
   Text: Integration with popular tools...

3. Score: 0.0922
   ID: 111
   Category: product
   Language: de
   Text: Learn about user interface design principles...

4. Score: 0.0880
   ID: 64
   Category: product
   Language: es
   Text: System requirements and compatibility...

5. Score: 0.0569
   ID: 127
   Category: product
   Language: de
   Text: FAQ: New feature: Advanced search capabilities?...

üìä Results comparison:
   Unfiltered: 5 results
   Product only: 5 results


## ‚è∞ Time-based Filtering

Let's filter by timestamp to find recent documents.

In [13]:
import time

# Calculate timestamp for "last 6 months"
six_months_ago = int(time.time()) - (6 * 30 * 24 * 60 * 60)

# Create time-based filter
time_filter = Filter(
    must=[
        FieldCondition(
            key="timestamp",
            range={"gt": six_months_ago}
        )
    ]
)

# Search recent documents
recent_results = search_dense(
    client=client,
    collection_name=COLLECTION_NAME,
    query_vector=query_vector,
    limit=5,
    filter_condition=time_filter,
    with_payload=True
)

print_search_results(recent_results, "üïí Recent Documents (Last 6 months)")

print(f"\nüìä Time filtering:")
print(f"   All documents: {len(results)} results")
print(f"   Recent only: {len(recent_results)} results")


üïí Recent Documents (Last 6 months)

1. Score: 0.1541
   ID: 23
   Category: faq
   Language: fr
   Text: How do I reset my password?...

2. Score: 0.1001
   ID: 146
   Category: product
   Language: de
   Text: Integration with popular tools...

3. Score: 0.0947
   ID: 147
   Category: faq
   Language: fr
   Text: Guide: How do I cancel my subscription?...

4. Score: 0.0930
   ID: 69
   Category: faq
   Language: de
   Text: Learn about what payment methods do you accept?...

5. Score: 0.0883
   ID: 12
   Category: faq
   Language: de
   Text: Is there a mobile app available? - Updated version...

üìä Time filtering:
   All documents: 5 results
   Recent only: 5 results


## üéöÔ∏è Score Threshold

Use score thresholds to filter out low-quality matches.

In [14]:
# Search with score threshold
high_quality_results = search_dense(
    client=client,
    collection_name=COLLECTION_NAME,
    query_vector=query_vector,
    limit=10,
    score_threshold=0.3,  # Only results with score >= 0.3
    with_payload=True
)

print_search_results(high_quality_results, "üéØ High Quality Matches (score >= 0.3)")

print(f"\nüìä Quality filtering:")
print(f"   All results: {len(results)} results")
print(f"   High quality: {len(high_quality_results)} results")

# Show score distribution
scores = [r.score for r in results]
print(f"\nüìà Score statistics:")
print(f"   Max: {max(scores):.4f}")
print(f"   Min: {min(scores):.4f}")
print(f"   Mean: {np.mean(scores):.4f}")


üéØ High Quality Matches (score >= 0.3)

1. Score: 1.0000
   ID: 1
   Category: product
   Language: en
   Text: Learn about api documentation and examples...

üìä Quality filtering:
   All results: 5 results
   High quality: 1 results

üìà Score statistics:
   Max: 1.0000
   Min: 0.0947
   Mean: 0.2922


## üåê Multi-Language Search

Filter by language to search within specific locales.

In [15]:
# Search within English documents only
english_filter = Filter(
    must=[
        FieldCondition(
            key="lang",
            match=MatchValue(value="en")
        )
    ]
)

english_results = search_dense(
    client=client,
    collection_name=COLLECTION_NAME,
    query_vector=query_vector,
    limit=5,
    filter_condition=english_filter,
    with_payload=True
)

print_search_results(english_results, "üá∫üá∏ English Documents Only")

# Language distribution in results
all_langs = [r.payload["lang"] for r in results]
en_langs = [r.payload["lang"] for r in english_results]

print(f"\nüåç Language distribution:")
print(f"   All results: {pd.Series(all_langs).value_counts().to_dict()}")
print(f"   English only: {pd.Series(en_langs).value_counts().to_dict()}")


üá∫üá∏ English Documents Only

1. Score: 1.0000
   ID: 1
   Category: product
   Language: en
   Text: Learn about api documentation and examples...

2. Score: 0.1120
   ID: 26
   Category: howto
   Language: en
   Text: Configuring database connections...

3. Score: 0.0860
   ID: 76
   Category: release
   Language: en
   Text: FAQ: Security patches and updates?...

4. Score: 0.0736
   ID: 65
   Category: faq
   Language: en
   Text: What are your business hours? - Updated version...

5. Score: 0.0528
   ID: 134
   Category: release
   Language: en
   Text: Bug fixes and improvements - Updated version...

üåç Language distribution:
   All results: {'en': 2, 'fr': 2, 'de': 1}
   English only: {'en': 5}


## üîç Complex Filtering

Combine multiple filters using boolean logic.

In [16]:
# Complex filter: (product OR policy) AND english AND recent
complex_filter = Filter(
    must=[
        # Language must be English
        FieldCondition(
            key="lang",
            match=MatchValue(value="en")
        ),
        # Timestamp must be recent
        FieldCondition(
            key="timestamp",
            range={"gt": six_months_ago}
        )
    ],
    should=[
        # Category should be product OR policy
        FieldCondition(
            key="category",
            match=MatchValue(value="product")
        ),
        FieldCondition(
            key="category",
            match=MatchValue(value="policy")
        )
    ]
)

complex_results = search_dense(
    client=client,
    collection_name=COLLECTION_NAME,
    query_vector=query_vector,
    limit=5,
    filter_condition=complex_filter,
    with_payload=True
)

print_search_results(complex_results, "üéØ Complex Filter: Recent English Product/Policy Docs")

if complex_results:
    categories = [r.payload["category"] for r in complex_results]
    languages = [r.payload["lang"] for r in complex_results]
    
    print(f"\n‚úÖ Filter verification:")
    print(f"   Categories found: {set(categories)}")
    print(f"   Languages found: {set(languages)}")
    print(f"   All recent: {all(r.payload['timestamp'] > six_months_ago for r in complex_results)}")
else:
    print("\n‚ö†Ô∏è  No results found matching the complex filter criteria")


üéØ Complex Filter: Recent English Product/Policy Docs

1. Score: 0.0329
   ID: 60
   Category: product
   Language: en
   Text: Learn about integration with popular tools...

2. Score: -0.0405
   ID: 121
   Category: product
   Language: en
   Text: FAQ: Integration with popular tools?...

3. Score: -0.0439
   ID: 6
   Category: product
   Language: en
   Text: Guide: New feature: Advanced search capabilities...

‚úÖ Filter verification:
   Categories found: {'product'}
   Languages found: {'en'}
   All recent: True


## üñ•Ô∏è Web UI Checkpoint (Optional)

If you're running Qdrant locally, you can explore the collection in the web UI.

In [17]:
# Qdrant Cloud Dashboard Access
qdrant_url = os.getenv("QDRANT_URL", "")

if "localhost" in qdrant_url:
    print("üåê Local Qdrant Web UI:")
    print(f"   Open: {qdrant_url.replace(':6333', ':6333/dashboard')}")
    print(f"   Navigate to collection: {COLLECTION_NAME}")
else:
    print("üåê Qdrant Cloud Dashboard:")
    print("   1. Go to https://cloud.qdrant.io")
    print("   2. Select your cluster")
    print("   3. Use the 'Console' tab to:")
    print("      ‚Ä¢ View collection schema and points")
    print("      ‚Ä¢ Run vector searches")
    print("      ‚Ä¢ Test payload filters")
    print("      ‚Ä¢ Monitor cluster performance")
    print(f"   4. Explore collection: {COLLECTION_NAME}")
    
print("\nüîç Try these in the dashboard:")
print("   ‚Ä¢ Browse points and payload data")
print("   ‚Ä¢ Run similarity searches")
print("   ‚Ä¢ Test different filters")
print("   ‚Ä¢ View collection statistics")

üåê Qdrant Cloud Dashboard:
   1. Go to https://cloud.qdrant.io
   2. Select your cluster
   3. Use the 'Console' tab to:
      ‚Ä¢ View collection schema and points
      ‚Ä¢ Run vector searches
      ‚Ä¢ Test payload filters
      ‚Ä¢ Monitor cluster performance
   4. Explore collection: workshop_fundamentals

üîç Try these in the dashboard:
   ‚Ä¢ Browse points and payload data
   ‚Ä¢ Run similarity searches
   ‚Ä¢ Test different filters
   ‚Ä¢ View collection statistics


## üìä Summary & Key Concepts

Let's summarize what we've learned about Qdrant fundamentals.

In [18]:
# Collection statistics
final_info = client.get_collection(COLLECTION_NAME)

print("üéâ Qdrant Fundamentals Summary")
print("=" * 40)
print(f"\nüìö Collection: {COLLECTION_NAME}")
print(f"   üìä Total points: {final_info.points_count}")
print(f"   üìè Vector dimension: {final_info.config.params.vectors.size}")
print(f"   üìê Distance metric: {final_info.config.params.vectors.distance}")

print(f"\nüè∑Ô∏è Payload structure:")
sample_point = client.retrieve(COLLECTION_NAME, ids=[1])[0]
for key, value in sample_point.payload.items():
    print(f"   {key}: {type(value).__name__} - {value}")

print(f"\nüîç Search capabilities demonstrated:")
print("   ‚úÖ Basic vector similarity search")
print("   ‚úÖ Payload filtering (category, language, time)")
print("   ‚úÖ Complex boolean filters (AND, OR logic)")
print("   ‚úÖ Score thresholding")
print("   ‚úÖ Payload indexes for fast filtering")

print(f"\nüéØ Key takeaways:")
print("   ‚Ä¢ Collections store points (vectors + metadata)")
print("   ‚Ä¢ Payload enables rich filtering capabilities")
print("   ‚Ä¢ Indexes dramatically speed up filtered searches")
print("   ‚Ä¢ Cosine distance works well for text embeddings")
print("   ‚Ä¢ Score thresholds help filter low-quality matches")

print(f"\nüöÄ Ready for Notebook 2: Hybrid Search!")

üéâ Qdrant Fundamentals Summary

üìö Collection: workshop_fundamentals
   üìä Total points: 150
   üìè Vector dimension: 384
   üìê Distance metric: Cosine

üè∑Ô∏è Payload structure:


   text: str - Learn about api documentation and examples
   category: str - product
   lang: str - en
   timestamp: int - 1728716456

üîç Search capabilities demonstrated:
   ‚úÖ Basic vector similarity search
   ‚úÖ Payload filtering (category, language, time)
   ‚úÖ Complex boolean filters (AND, OR logic)
   ‚úÖ Score thresholding
   ‚úÖ Payload indexes for fast filtering

üéØ Key takeaways:
   ‚Ä¢ Collections store points (vectors + metadata)
   ‚Ä¢ Payload enables rich filtering capabilities
   ‚Ä¢ Indexes dramatically speed up filtered searches
   ‚Ä¢ Cosine distance works well for text embeddings
   ‚Ä¢ Score thresholds help filter low-quality matches

üöÄ Ready for Notebook 2: Hybrid Search!


## üéÆ Stretch Goals (Optional)

Try these additional experiments to deepen your understanding:

### üîç Full-Text Search with Payload Index

Add a full-text index to search within document text.

In [19]:
# Create full-text index on the text field
try:
    create_payload_index(client, COLLECTION_NAME, "text", "text")
    print("‚úÖ Full-text index created!")
    
    # Example: Search for documents containing specific terms
    # Note: This searches in payload, not vector similarity
    text_filter = Filter(
        must=[
            FieldCondition(
                key="text",
                match={"text": "password"}  # Find docs mentioning "password"
            )
        ]
    )
    
    text_results = client.scroll(
        collection_name=COLLECTION_NAME,
        scroll_filter=text_filter,
        limit=5,
        with_payload=True
    )[0]  # scroll returns (points, next_page_offset)
    
    print(f"\nüîç Full-text search results for 'password':")
    for i, point in enumerate(text_results, 1):
        print(f"{i}. {point.payload['text'][:80]}...")
        
except Exception as e:
    print(f"Note: Full-text search might not be available: {e}")

‚úì Created payload index for 'text' (text)
‚úÖ Full-text index created!

üîç Full-text search results for 'password':
1. How do I reset my password?...
2. Learn about how do i reset my password?...
3. How do I reset my password?...
4. How do I reset my password?...
5. FAQ: How do I reset my password??...


### üéØ Second Named Vector Slot

Prepare for multi-vector scenarios by adding a second vector configuration.

In [20]:
# This would typically be done when creating the collection
# For demonstration, let's create a new collection with multiple named vectors

MULTI_VECTOR_COLLECTION = "workshop_multi_vector"

# Define multiple named vectors
multi_vector_config = {
    "text_dense": VectorParams(size=384, distance=Distance.COSINE),
    "text_sparse": VectorParams(size=0, distance=Distance.DOT)  # Sparse placeholder
}

try:
    ensure_collection(
        client=client,
        collection_name=MULTI_VECTOR_COLLECTION,
        vector_config=multi_vector_config,
        force_recreate=True
    )
    
    print(f"‚úÖ Created multi-vector collection: {MULTI_VECTOR_COLLECTION}")
    
    # Show collection info
    info = client.get_collection(MULTI_VECTOR_COLLECTION)
    print(f"   Vector configurations:")
    if hasattr(info.config.params, 'vectors') and isinstance(info.config.params.vectors, dict):
        for name, config in info.config.params.vectors.items():
            print(f"     {name}: size={config.size}, distance={config.distance}")
    
    print(f"\nüöÄ Ready for hybrid search in Notebook 2!")
    
except Exception as e:
    print(f"Note: Multi-vector setup encountered an issue: {e}")

‚úì Created collection 'workshop_multi_vector'
‚úÖ Created multi-vector collection: workshop_multi_vector
   Vector configurations:
     text_dense: size=384, distance=Cosine

üöÄ Ready for hybrid search in Notebook 2!


## üßπ Cleanup (Optional)

Uncomment to clean up collections after the workshop.

In [21]:
# Uncomment to clean up collections
# PRESERVE_COLLECTIONS = True  # Set to False to delete collections

# if not PRESERVE_COLLECTIONS:
#     try:
#         client.delete_collection(COLLECTION_NAME)
#         print(f"üóëÔ∏è Deleted collection: {COLLECTION_NAME}")
#     except Exception as e:
#         print(f"Note: Could not delete collection: {e}")
        
#     try:
#         client.delete_collection(MULTI_VECTOR_COLLECTION)
#         print(f"üóëÔ∏è Deleted collection: {MULTI_VECTOR_COLLECTION}")
#     except Exception as e:
#         print(f"Note: Could not delete collection: {e}")
# else:
#     print(f"üíæ Collections preserved for next notebooks")

print(f"\n‚ú® Notebook 1 complete! Move on to 02_hybrid_search.ipynb")


‚ú® Notebook 1 complete! Move on to 02_hybrid_search.ipynb
