## Q1. Running Mage

What's the version of mage?

`v0.9.72`

## Q2. Reading the documents

How many FAQ documents we processed?

`1`

## Q3. Chunking

How many documents (chunks) do we have in the output?

`86`

## Q4. Export

What's the last document id?

`6fc3236a`

What's the last document id?

## Q5. Testing the retrieval

Let's use the following query: "When is the next cohort?"  
What's the ID of the top matching result?

`bf024675`

In [27]:
from elasticsearch import Elasticsearch
import pandas as pd

# Connect to your Elasticsearch instance
es = Elasticsearch(["http://localhost:9200"])

# Check if the connection is successful
if es.ping():
    print("Connected to Elasticsearch")
else:
    print("Could not connect to Elasticsearch")
    exit()

# Define the search query
query = {
    "query": {
        "multi_match": {
            "query": "When is the next cohort?",
            "fields": ["text", "question", "section"]
        }
    },
    "size": 5  # Limit to top 5 results
}

# Perform the search
try:
    response = es.search(index="documents_20240818_4441", body=query)
    
    # Extract and display the results
    hits = response['hits']['hits']
    results = []
    for hit in hits:
        results.append({
            'score': hit['_score'],
            'text': hit['_source'].get('text', ''),
            'question': hit['_source'].get('question', ''),
            'section': hit['_source'].get('section', ''),
            'course': hit['_source'].get('course', ''),
            'document_id': hit['_source'].get('document_id', '')
        })

    # Convert results to a pandas DataFrame for easy viewing
    df = pd.DataFrame(results)
    display(df)

    # Print the total number of matching documents
    print(f"Total matching documents: {response['hits']['total']['value']}")

except Exception as e:
    print(f"An error occurred: {str(e)}")
    import traceback
    traceback.print_exc()

Connected to Elasticsearch


Unnamed: 0,score,text,question,section,course,document_id
0,8.443945,Summer 2025 (via Alexey).,When will the course be offered next?,General course-related questions,llm-zoomcamp,bf024675
1,5.754293,"No, you can only get a certificate if you fini...",Certificate - Can I follow the course in a sel...,General course-related questions,llm-zoomcamp,a705279d
2,4.399607,This is likely to be an error when indexing th...,Returning Empty list after filtering my query ...,Module 1: Introduction,llm-zoomcamp,190fc999
3,4.220145,Cosine similarity is a measure used to calcula...,What is the cosine similarity?,Module 3: X,llm-zoomcamp,ee355823
4,4.070828,The error indicates that you have not changed ...,There is an error when opening the table using...,Workshops: dlthub,llm-zoomcamp,6cf805ca


Total matching documents: 68


## Q6. Reindexing

In [34]:
from elasticsearch import Elasticsearch

# Connect to your Elasticsearch instance
es = Elasticsearch(["http://localhost:9200"])

# Check if the connection is successful
if es.ping():
    print("Connected to Elasticsearch")
else:
    print("Could not connect to Elasticsearch")
    exit()

try:
    # Get information about all indices
    indices = es.indices.get_alias(index="*")
    
    print("Available indices:")
    for index in indices:
        print(f"- {index}")
        
    print(f"\nTotal number of indices: {len(indices)}")
except Exception as e:
    print(f"An error occurred: {str(e)}")
    
    # Print more details about the error
    import traceback
    traceback.print_exc()

Connected to Elasticsearch
Available indices:
- documents_20240818_1717
- documents_20240818_4441

Total number of indices: 2


In [35]:
from elasticsearch import Elasticsearch
import pandas as pd

# Connect to your Elasticsearch instance
es = Elasticsearch(["http://localhost:9200"])

# Check if the connection is successful
if es.ping():
    print("Connected to Elasticsearch")
else:
    print("Could not connect to Elasticsearch")
    exit()

# Define the search query
query = {
    "query": {
        "multi_match": {
            "query": "When is the next cohort?",
            "fields": ["text", "question", "section"]
        }
    },
    "size": 5  # Limit to top 5 results
}

# Perform the search
try:
    response = es.search(index="documents_20240818_1717", body=query)
    
    # Extract and display the results
    hits = response['hits']['hits']
    results = []
    for hit in hits:
        results.append({
            'score': hit['_score'],
            'text': hit['_source'].get('text', ''),
            'question': hit['_source'].get('question', ''),
            'section': hit['_source'].get('section', ''),
            'course': hit['_source'].get('course', ''),
            'document_id': hit['_source'].get('document_id', '')
        })

    # Convert results to a pandas DataFrame for easy viewing
    df = pd.DataFrame(results)
    display(df)

    # Print the total number of matching documents
    print(f"Total matching documents: {response['hits']['total']['value']}")

except Exception as e:
    print(f"An error occurred: {str(e)}")
    import traceback
    traceback.print_exc()

Connected to Elasticsearch


Unnamed: 0,score,text,question,section,course,document_id
0,8.443945,Summer 2025 (via Alexey).,When will the course be offered next?,General course-related questions,llm-zoomcamp,bf024675
1,5.754293,"No, you can only get a certificate if you fini...",Certificate - Can I follow the course in a sel...,General course-related questions,llm-zoomcamp,a705279d
2,4.399607,This is likely to be an error when indexing th...,Returning Empty list after filtering my query ...,Module 1: Introduction,llm-zoomcamp,190fc999
3,4.220145,Cosine similarity is a measure used to calcula...,What is the cosine similarity?,Module 3: X,llm-zoomcamp,ee355823
4,4.070828,The error indicates that you have not changed ...,There is an error when opening the table using...,Workshops: dlthub,llm-zoomcamp,6cf805ca


Total matching documents: 68


For the same query "When is the next cohort?".  
What's the ID of the top matching result?

`bf024675`