# Multi-Agent Extraction and Neo4j Loading Demo

This notebook demonstrates how to correctly use the `EntityExtractor` and `EntityLinker` classes to extract entities from text and load them into Neo4j. This ensures that your specific prompts and logic are applied before data hits the graph.

In [None]:
import os
import sys
from omegaconf import OmegaConf

# Ensure we can import from the parent directory
sys.path.append("..")

from extractor import EntityExtractor
from linker import EntityLinker
from graph_loader import GraphLoader
from prompt_manager import PromptManager

# Configuration (Mocking what Hydra usually does)
conf = OmegaConf.create({
    "model": "gpt-3.5-turbo",
    "openai_api_key": os.getenv("OPENAI_API_KEY", "sk-placeholder"),
    "prompts": "default",
    "linking_prompt": "linking"
})

# Initialize Components
prompt_manager = PromptManager(conf)
extractor = EntityExtractor(prompt_manager, api_key=conf.openai_api_key, model=conf.model)
linker = EntityLinker(prompt_manager, api_key=conf.openai_api_key, model=conf.model)

# Graph Loader Connection
neo4j_uri = os.getenv("NEO4J_URI", "bolt://neo4j:7687")
neo4j_user = os.getenv("NEO4J_USER", "neo4j")
neo4j_password = os.getenv("NEO4J_PASSWORD", "password")

# Initialize Loader (Ensure Neo4j is running!)
# loader = GraphLoader(neo4j_uri, neo4j_user, neo4j_password)

## 1. Define Sample Data
Here we simulate data that might come from your 'multi instance' setup or HuggingFace dataset.

In [None]:
sample_text = """
Apple Inc. is likely to launch the new iPhone 16 in September 2024. 
Tim Cook mentioned that AI features will be a key selling point. 
Competitors like Samsung are also ramping up their Galaxy AI capabilities.
"""

doc_id = "demo_doc_001"
category = "Technology"

## 2. Extraction
We use the `EntityExtractor` to identify nodes and initial relationships.

In [None]:
print("Running Extraction...")
extracted_data = extractor.extract_entities(sample_text, category)

print(f"Found {len(extracted_data.get('nodes', []))} nodes.")
print(extracted_data.get('nodes', []))

## 3. Linking
Entity linking is crucial for resolving disambiguities and effectively connecting the graph.

In [None]:
print("Running Linking...")
linked_data = linker.link_entities(extracted_data, category)

print(f"Nodes after linking: {len(linked_data.get('nodes', []))}")
print(linked_data.get('nodes', []))

## 4. Loading to Neo4j
Finally, we use the `GraphLoader` to push the structured data into the database.

In [None]:
# Uncomment to run if Neo4j is accessible
# print(f"Loading data for {doc_id}...")
# loader.load_graph(linked_data, doc_id)
# print("Data loaded successfully!")
# loader.close()