In [None]:
!pip install owlready2

# Graphrag Ontology Integration Demo

This demonstration shows how graphrag works both with and without ontology integration, highlighting the differences, benefits, and practical applications of using ontological knowledge in a knowledge graph system.

## Overview

In this demo, we'll:
1. Process data without ontology integration
2. Process the same data with ontology integration
3. Compare search results between the two approaches
4. Visualize the differences in knowledge graphs

GraphRAG is taking advantage of the rich context in graph data structures to retrieve data


Ontology represents a formalized depiction of knowledge. It delineates the essential concepts, interrelationships, and attributes within a specific domain

## 1. Setup and Environment



First, let's set up our environment with the necessary imports:

In [17]:
import os
import asyncio
import pathlib
from typing import List

# Import Cognee utilities
from utils import (
setup_logging, 
visualize_graph,
get_datasets,
get_dataset_data,
prune_data,
prune_system,
add,
search,
SearchType,
get_default_user,
KnowledgeGraph,
add_data_points,
base_graph_rag
)

# Import the ontology handling functions
from ontology_demo import (
owl_testing_pipeline,
owl_ontology_merging_layer
)

import logging
setup_logging(logging.ERROR)

# change loggi



## 2. Data Preparation

We'll use the same test data for both approaches:

In [18]:
async def prepare_data():
    # Clean previous data
    await prune_data()
    await prune_system(metadata=True)
    
    # Add test data - the path should point to your data files
    current_dir = os.getcwd()
    parent_dir = os.path.dirname(current_dir)
    file_path = os.path.join(parent_dir, "ontology_test_input")
    # file_path = os.path.join(
    #     os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(__file__)), os.pardir)),
    #     "ontology_test_input"
    # )
    await add(file_path)
    
    print("Data prepared successfully")


## 3. Standard Knowledge Graph Processing (Without Ontology)



Let's process our data using the standard graphrag pipeline without ontology integration:

In [19]:
async def process_without_ontology():
    # Get the dataset to process
    from cognee.modules.data.methods import get_datasets
    from cognee import cognify
    
    await base_graph_rag()
    # user = await get_default_user()
    # datasets = await get_datasets(user.id)
    # 
    # if not datasets:
    #     print("No datasets found!")
    #     return
    # 
    # # Use the standard pipeline
    # from utils import (
    #     run_tasks,
    #     Task,
    #     classify_documents,
    #     check_permissions_on_documents,
    #     extract_chunks_from_documents,
    #     extract_content_graph,
    #     get_max_chunk_tokens,
    # 
    # )
    # from cognee.modules.cognify.config import get_cognify_config
    # from cognee.modules.data.models import Data, Dataset
    # from cognee.tasks.summarization import summarize_text
    # 
    # existing_datasets = await get_datasets(user.id)
    # 
    # for dataset in datasets:
    # 
    #     data_documents: list[Data] = await get_dataset_data(dataset_id=dataset.id)
    # 
    #     try:
    #         # cognee_config = get_cognify_config()
    # 
    #         tasks = [
    #             Task(classify_documents),
    #             Task(check_permissions_on_documents, user=user, permissions=["write"]),
    #             Task(
    #                 extract_chunks_from_documents, max_chunk_tokens=get_max_chunk_tokens()
    #             ),  # Extract text chunks based on the document type.
    #             Task(
    #                 extract_graph_from_data, graph_model=KnowledgeGraph, task_config={"batch_size": 10}
    #             ),  # Generate knowledge graphs from the document chunks.
    #             # Task(
    #             #     summarize_text,
    #             #     summarization_model=cognee_config.summarization_model,
    #             #     task_config={"batch_size": 10},
    #             # ),
    #             Task(add_data_points, task_config={"batch_size": 10}),
    #         ]
    # 
    #         pipeline_run = run_tasks(tasks, dataset.id, data_documents, "cognify_pipeline")
    #     except:
    #         print("U KURAC")
    #         pass
    # for dataset in datasets:
    #     data_documents = await get_dataset_data(dataset_id=dataset.id)
    # 
    #     
    #     tasks = [
    #         Task(classify_documents),
    #         Task(check_permissions_on_documents, user=user, permissions=["write"]),
    #         Task(extract_chunks_from_documents, max_chunk_tokens=get_max_chunk_tokens()),
    #         Task(
    #             extract_graph_from_data, graph_model=KnowledgeGraph, task_config={"batch_size": 10}
    #         ),  # Generate knowledge graphs from the document chunks.
    #         Task(add_data_points, task_config={"batch_size": 10}),
    #     ]
    #     
    #     
    #     pipeline_run = run_tasks(tasks, dataset.id, data_documents, "standard_pipeline")
    #     
    #     async for run_status in pipeline_run:
    #         print(run_status)
    
    # Save graph visualization
    notebook_dir = pathlib.Path.cwd()
    output_dir = notebook_dir / ".artifacts"
    os.makedirs(output_dir, exist_ok=True)
    
    standard_graph_path = (output_dir / "standard_graph_visualization.html").resolve()
    await visualize_graph(str(standard_graph_path))
    
    print(f"Standard graph saved to: {standard_graph_path}")
    return standard_graph_path

## 4. Ontology-Enhanced Processing

- RDF is a framework, think of a graph based model
- NO reasoning NO Class restrictions (disjoint classes etc.)
- NO inference (from A — B , B—C  it cannot infer A — C
- NO class restrictions
- Owl is the one that allows class inheritance for example Vehicle is a subclass of Car (nice to have)
- Classes of instances (Airbnb is a type of a company which is similar to an orginaziation)
- Transitive reasoning (A—father — B and B — father — C then A — grandfather — C)
- Symmetric properties (for example owns — ownedby)
Now, let's process the same data with ontology integration:

In [20]:

async def process_with_ontology():
    # This uses the owl_testing_pipeline from ontology_demo.py
    await owl_testing_pipeline()
    
    
    # Save graph visualization
    notebook_dir = pathlib.Path.cwd()
    output_dir = notebook_dir / ".artifacts"
    os.makedirs(output_dir, exist_ok=True)
    
    ontology_graph_path = (output_dir / "ontology_graph_visualization.html").resolve()
    await visualize_graph(str(ontology_graph_path))
    
    print(f"Ontology-enhanced graph saved to: {ontology_graph_path}")
    return ontology_graph_path

## 5. Comparing Search Results


Let's execute some queries to compare the results:

In [21]:
queries = [
    "What are the exact cars produced by Audi and what are their types?",
    "What features do luxury cars have?",
    "Tell me about vehicle manufacturers and their relationships"
]

print("==== STANDARD KNOWLEDGE GRAPH SEARCH RESULTS ====")
# First, search using the standard graph
await prune_data()
await prune_system()
await prepare_data()
await process_without_ontology()

for query in queries:
    print(f"\nQuery: {query}")
    results = await search(query_type=SearchType.GRAPH_COMPLETION, query_text=query)
    print("Results:")
    for i, result in enumerate(results[:3]):
        print(f"{i+1}. {result}")

==== STANDARD KNOWLEDGE GRAPH SEARCH RESULTS ====
User 899b86ee-99a4-4c08-ac5c-bf3360c79ee1 has registered.
<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x3468f5f50>


  staging_table_obj = table_obj.to_metadata(
  order_by=order_dir_func(order_by_col),


<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x3465eef90>
Data prepared successfully
Standard graph saved to: /Users/vasilije/cognee/cognee/ontology_testing_SANDBOX/Ontology_demo/.artifacts/standard_graph_visualization.html

Query: What are the exact cars produced by Audi and what are their types?
Results:
1. The provided context does not specify the exact cars produced by Audi or their types. It only identifies Audi as a brand.

Query: What features do luxury cars have?
Results:
1. Luxury cars typically feature modern designs, advanced technology, innovative safety features, high-quality engineering, dynamic driving experiences, and a range of options such as sedans, SUVs, sports cars, and all-wheel-drive systems. Brands like Audi, BMW, Mercedes-Benz, Porsche, and Volkswagen exemplify these characteristics.

Query: Tell me about vehicle manufacturers and their relationships
Results:
1. The vehicle manufacturers mentioned are Volkswagen, Porsche, BMW, Mercedes-Ben

In [14]:
print("\n==== ONTOLOGY-ENHANCED KNOWLEDGE GRAPH SEARCH RESULTS ====")
# Now, search using the ontology-enhanced graph
await prune_data()
await prune_system()
await prepare_data()  # Keep dataset but remove processing results
await process_with_ontology()

for query in queries:
    print(f"\nQuery: {query}")
    results = await search(query_type=SearchType.GRAPH_COMPLETION, query_text=query)
    print("Results:")
    for i, result in enumerate(results[:3]):
        print(f"{i+1}. {result}")


==== ONTOLOGY-ENHANCED KNOWLEDGE GRAPH SEARCH RESULTS ====
User 1054f57e-b701-40cc-a4c6-95f15e203229 has registered.
<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x344f93790>


  staging_table_obj = table_obj.to_metadata(
  order_by=order_dir_func(order_by_col),


<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x33ea4f650>
Data prepared successfully
Ontology loaded successfully.
<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x347ee8fd0>
<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x346279cd0>
The query is What are the exact cars produced by Audi and what are their types?:
Audi produces the following cars:
1. Audi R8 - Type: Sports Car
2. Audi e-tron - Type: Electric Car
3. Audi A8 - Type: Luxury Sedan
Ontology-enhanced graph saved to: /Users/vasilije/cognee/cognee/ontology_testing_SANDBOX/Ontology_demo/.artifacts/ontology_graph_visualization.html

Query: What are the exact cars produced by Audi and what are their types?
Results:
1. Audi produces the following cars:
1. Audi R8 - Type: Car
2. Audi e-tron - Type: Car
3. Audi A8 - Type: Car

Query: What features do luxury cars have?
Results:
1. Luxury cars typically feature: 1. Elegant designs, 2. Advanced safety features, 3. High-quali


## 6. Key Differences and Benefits

### Without Ontology:
- **Knowledge is limited to extracted information**: Only relationships and entities explicitly mentioned in the text are captured
- **No hierarchical understanding**: Lacks class/subclass relationships unless explicitly stated
- **Missing implicit connections**: Cannot infer relationships that weren't explicitly stated
- **Domain knowledge is limited**: No external domain knowledge beyond the processed content

### With Ontology:
- **Enhanced semantic understanding**: Integration with domain ontologies provides richer semantic context
- **Hierarchical relationships**: Class/subclass relationships from the ontology enrich the graph
- **Inference capabilities**: Can infer relationships based on ontological axioms
- **Domain knowledge enrichment**: External knowledge from the ontology supplements extracted information
- **Standardized terminology**: Entities are mapped to standardized ontology concepts
- **Better query answering**: More comprehensive answers due to extended knowledge

## 7. Visualizations and Metrics

Here are some key metrics to observe in the visualizations:

1. **Node count**: The ontology-enhanced graph typically has more nodes
2. **Edge density**: More connections between nodes in the ontology version
3. **Clustering coefficient**: Often higher in the ontology version due to richer relationships
4. **Average path length**: May be shorter in the ontology version due to additional connections
5. **Connected components**: The ontology version usually has fewer isolated subgraphs

## 8. Running the Demo

Execute the following to run the complete demo:

