# Understanding Ontologies with Cognee

This notebook demonstrates how to work with ontologies in scientific research using the Cognee framework. We'll explore how ontologies can enhance our understanding and querying of scientific papers.

## What is an Ontology?

An ontology is a formal representation of knowledge that defines:
- Concepts within a domain
- Relationships between concepts
- Properties and attributes
- Rules and constraints

Key terms:
- **Classes**: Categories or types (e.g., Disease, Symptom)
- **Instances**: Specific examples of classes (e.g., Type 2 Diabetes)
- **Properties**: Relationships between classes/instances (e.g., hasSymptom)
- **Axioms**: Logical statements defining relationships

## Setup

First, let's install the required packages and set up our environment:

In [17]:
# Install required package
# !pip install cognee

In [18]:
# Import required libraries
import cognee
import asyncio
from cognee.shared.logging_utils import get_logger
import os
import textwrap
from cognee.api.v1.search import SearchType
from cognee.api.v1.visualize.visualize import visualize_graph

logger = get_logger()

# Set up OpenAI API key (required for Cognee's LLM functionality)
os.environ["LLM_API_KEY"] = "your-api-key-here"  # Replace with your API key

## Creating the Pipeline

Let's create a pipeline that will:
1. Clean existing data
2. Process scientific papers
3. Apply ontological knowledge

In [26]:
async def run_pipeline(ontology_path=None):
    # Clean existing data
    await cognee.prune.prune_data()
    await cognee.prune.prune_system(metadata=True)
    
    # Set up path to scientific papers
    scientific_papers_dir = os.path.join(
        os.path.dirname(os.path.dirname(os.path.abspath("."))), 
        "cognee",
        "examples",
        "data", 
        "scientific_papers/"
    )
    
    # Add papers to the system
    await cognee.add(scientific_papers_dir)
    
    # Cognify with optional ontology
    return await cognee.cognify(ontology_file_path=ontology_path)

async def query_pipeline(questions):
    answers = []
    for question in questions:
        search_results = await cognee.search(
            query_type=SearchType.GRAPH_COMPLETION,
            query_text=question,
        )
        answers.append(search_results)
    return answers

## Running the Demo

Let's test our system with some medical questions, comparing results with and without ontological knowledge:

In [27]:
# Test questions
questions = [
    "What are common risk factors for Type 2 Diabetes?",
    "What preventive measures reduce the risk of Hypertension?",
    "What symptoms indicate possible Cardiovascular Disease?",
    "What diseases are associated with Obesity?"
]

# Path to medical ontology
ontology_path = "examples/python/ontology_input_example/enriched_medical_ontology_with_classes.owl"  # Update with your ontology path

# Run with ontology
print("\n--- Results WITH ontology ---\n")
await run_pipeline(ontology_path=ontology_path)
answers_with = await query_pipeline(questions)
for q, a in zip(questions, answers_with):
    print(f"Q: {q}\nA: {a}\n")


--- Results WITH ontology ---


[2m2025-04-09T17:12:55.499538Z[0m [[32m[1minfo     [0m] [1mGraph deleted successfully.   [0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m
[2m2025-04-09T17:12:55.588613Z[0m [[32m[1minfo     [0m] [1mDatabase deleted successfully.[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0mUser 312efe9d-380c-4f8d-9848-f628fe3dc177 has registered.
<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x371baf650>

[2m2025-04-09T17:12:55.747683Z[0m [[32m[1minfo     [0m] [1mPipeline run started: `4b84e400-23fc-5976-bbb4-f8ee303eed81`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T17:12:55.749122Z[0m [[32m[1minfo     [0m] [1mCoroutine task started: `resolve_data_directories`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T17:12:55.750745Z[0m [[32m[1minfo     [0m] [1mCoroutine task started: `ingest_data`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025

  staging_table_obj = table_obj.to_metadata(
  order_by=order_dir_func(order_by_col),



[2m2025-04-09T17:12:56.490952Z[0m [[32m[1minfo     [0m] [1mCoroutine task started: `extract_graph_from_data`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m

[92m19:12:56 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:12:56 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:12:56 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:12:56 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:12:56 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:12:56 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m
[2m2025-04-09T17:13:45.031340Z[0m [[32m[1minfo     [0m] [1mNo close match found for 'study' in category 'classes'[0m [[0m[1m[34mOntologyAdapter[0m][0m
[2m2025-04-09T17:13:45.031582Z[0m [[32m[1minfo     [0m] [1mNo close match found for 'coffee consumption study' in category 'individuals'[0m [[0m[1m[34mOntologyAdapter[0m][0m
[2m2025-04-09T17:13:45.031815Z[0m [[32m[1minfo     [0m] [1mNo close match found for 'nutrient' in category 'classes'[0m [[0m[1m[34mOntologyAdapter[0m][0m
[2m2025-04-09T17:13:45.032000Z[0m [[32m[1minfo     [0m] [1mNo close match found for 'caffeinated coffee' in category 'individuals'[0m [[0m[1m[34mOntologyAdapter[0m][0m
[2m2025-04-09T17:13:45.032523Z[0m [[32m[1minfo     [0m] [1mNo close match found for 'decaffeinated coffee' in category 'individuals'[0m [[0m[1m[34mOntologyAdapter[0m][0m
[2m2025-04-09T17:13:45.032768Z[0m [[32m[1minfo    

[92m19:13:52 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:13:52 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:13:52 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:13:52 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:13:52 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:13:52 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m
[2m2025-04-09T17:13:58.267490Z[0m [[32m[1minfo     [0m] [1mCoroutine task started: `add_data_points`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T17:14:07.439855Z[0m [[32m[1minfo     [0m] [1mCoroutine task completed: `add_data_points`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T17:14:07.440337Z[0m [[32m[1minfo     [0m] [1mCoroutine task completed: `summarize_text`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T17:14:07.440524Z[0m [[32m[1minfo     [0m] [1mCoroutine task completed: `extract_graph_from_data`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T17:14:07.440690Z[0m [[32m[1minfo     [0m] [1mAsync generator task completed: `extract_chunks_from_documents`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T17:14:07.440886Z[0m [[32m[1minfo     [0m] [1mCorout

[92m19:14:09 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:14:23 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:14:27 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m19:14:30 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0mQ: What are common risk factors for Type 2 Diabetes?
A: ['Common risk factors for Type 2 Diabetes include:\n1. **Obesity** - particularly a high body mass index (BMI) (≥30 kg/m2).\n2. **Waist circumference** - increased risk is associated with larger waist sizes.\n3. **Sedentary lifestyle** - low levels of physical activity are linked to higher risk.\n4. **Age** - the risk increases with age, especially for those over 45 years.\n5. **Diet** - poor dietary choices, including low adherence to a Mediterranean diet, may contribute.\n6. **Smoking** - current smoking status is a risk factor.\n7. **Hypertension** - having high blood pressure increases risk.\n8. **Cholesterol levels** - high cholesterol levels may also be a factor.']

Q: What preventive measures reduce the risk of Hypertension?
A: ['Preventive measures to reduce the risk of hypertension include moderate coffee consumption, which has been associated with a decr

In [23]:
# Run without ontology
print("\n--- Results WITHOUT ontology ---\n")
await run_pipeline()
answers_without = await query_pipeline(questions)
for q, a in zip(questions, answers_without):
    print(f"Q: {q}\nA: {a}\n")


--- Results WITHOUT ontology ---


[2m2025-04-09T14:30:47.865578Z[0m [[32m[1minfo     [0m] [1mGraph deleted successfully.   [0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m
[2m2025-04-09T14:30:47.879242Z[0m [[32m[1minfo     [0m] [1mDatabase deleted successfully.[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0mUser 5fcea8fd-95ce-4a3b-861e-d8f8a5d01fe5 has registered.
<cognee.modules.pipelines.models.PipelineRun.PipelineRun object at 0x38b7e78d0>

[2m2025-04-09T14:30:47.952091Z[0m [[32m[1minfo     [0m] [1mPipeline run started: `4b84e400-23fc-5976-bbb4-f8ee303eed81`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T14:30:47.952386Z[0m [[32m[1minfo     [0m] [1mCoroutine task started: `resolve_data_directories`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T14:30:47.952810Z[0m [[32m[1minfo     [0m] [1mCoroutine task started: `ingest_data`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2

  staging_table_obj = table_obj.to_metadata(
  order_by=order_dir_func(order_by_col),



[2m2025-04-09T14:30:48.491623Z[0m [[32m[1minfo     [0m] [1mCoroutine task started: `extract_graph_from_data`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m

[92m16:30:48 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:30:48 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:30:48 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:30:48 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:30:48 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:30:48 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m
[2m2025-04-09T14:31:53.728424Z[0m [[32m[1minfo     [0m] [1mNo close match found for 'person' in category 'classes'[0m [[0m[1m[34mOntologyAdapter[0m][0m
[2m2025-04-09T14:31:53.728765Z[0m [[32m[1minfo     [0m] [1mNo close match found for 'michael f. mendoza' in category 'individuals'[0m [[0m[1m[34mOntologyAdapter[0m][0m
[2m2025-04-09T14:31:53.729061Z[0m [[32m[1minfo     [0m] [1mNo close match found for 'ralf martz sulague' in category 'individuals'[0m [[0m[1m[34mOntologyAdapter[0m][0m
[2m2025-04-09T14:31:53.729298Z[0m [[32m[1minfo     [0m] [1mNo close match found for 'therese posas-mendoza' in category 'individuals'[0m [[0m[1m[34mOntologyAdapter[0m][0m
[2m2025-04-09T14:31:53.729489Z[0m [[32m[1minfo     [0m] [1mNo close match found for 'carl j. lavie' in category 'individuals'[0m [[0m[1m[34mOntologyAdapter[0m][0m
[2m2025-04-09T14:31:53.729721Z[0m [[32m[1minf

[92m16:32:00 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:32:00 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:32:00 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:32:00 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:32:00 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:32:00 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m
[2m2025-04-09T14:32:05.162204Z[0m [[32m[1minfo     [0m] [1mCoroutine task started: `add_data_points`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T14:32:10.432940Z[0m [[32m[1minfo     [0m] [1mCoroutine task completed: `add_data_points`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T14:32:10.433399Z[0m [[32m[1minfo     [0m] [1mCoroutine task completed: `summarize_text`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T14:32:10.433585Z[0m [[32m[1minfo     [0m] [1mCoroutine task completed: `extract_graph_from_data`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T14:32:10.433749Z[0m [[32m[1minfo     [0m] [1mAsync generator task completed: `extract_chunks_from_documents`[0m [[0m[1m[34mrun_tasks(tasks: [Task], data)[0m][0m
[2m2025-04-09T14:32:10.433973Z[0m [[32m[1minfo     [0m] [1mCorout

[92m16:32:12 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:32:16 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:32:20 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0m

[92m16:32:23 - LiteLLM:INFO[0m: utils.py:2784 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai


[1m
LiteLLM completion() model= gpt-4o-mini; provider = openai[0mQ: What are common risk factors for Type 2 Diabetes?
A: ['Common risk factors for Type 2 Diabetes include obesity, physical inactivity, and poor diet. Additionally, coffee consumption may be related to lower mortality rates, which could indirectly influence diabetes risk. Participants aged 20 years and above are also considered in health metrics related to diabetes.']

Q: What preventive measures reduce the risk of Hypertension?
A: ['Preventive measures that reduce the risk of hypertension include moderate coffee consumption, which is linked to reduced cardiovascular disease mortality and potentially lower risks for hypertension, cholesterol issues, heart failure, and atrial fibrillation. Additionally, antioxidants may also provide a preventative effect on cardiovascular health.']

Q: What symptoms indicate possible Cardiovascular Disease?
A: ['Symptoms that may indicate possible Cardiovascular Disease include hyperten

## Visualizing the Knowledge Graph

Let's visualize how our ontology connects different medical concepts:

In [25]:
from cognee.api.v1.visualize import visualize_graph
await visualize_graph()


[2m2025-04-09T15:25:33.504468Z[0m [[32m[1minfo     [0m] [1mGraph visualization saved as /Users/vasilije/graph_visualization.html[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m
[2m2025-04-09T15:25:33.505762Z[0m [[32m[1minfo     [0m] [1mThe HTML file has been stored on your home directory! Navigate there with cd ~[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

'\n    <!DOCTYPE html>\n    <html>\n    <head>\n        <meta charset="utf-8">\n        <script src="https://d3js.org/d3.v5.min.js"></script>\n        <style>\n            body, html { margin: 0; padding: 0; width: 100%; height: 100%; overflow: hidden; background: linear-gradient(90deg, #101010, #1a1a2e); color: white; font-family: \'Inter\', sans-serif; }\n\n            svg { width: 100vw; height: 100vh; display: block; }\n            .links line { stroke: rgba(255, 255, 255, 0.4); stroke-width: 2px; }\n            .nodes circle { stroke: white; stroke-width: 0.5px; filter: drop-shadow(0 0 5px rgba(255,255,255,0.3)); }\n            .node-label { font-size: 5px; font-weight: bold; fill: white; text-anchor: middle; dominant-baseline: middle; font-family: \'Inter\', sans-serif; pointer-events: none; }\n            .edge-label { font-size: 3px; fill: rgba(255, 255, 255, 0.7); text-anchor: middle; dominant-baseline: middle; font-family: \'Inter\', sans-serif; pointer-events: none; }\n     

## Understanding the Results

The demonstration above shows how ontologies enhance our analysis by:

1. **Making Connections**: 
   - Linking related medical concepts even when not explicitly stated
   - Identifying relationships between symptoms, diseases, and risk factors

2. **Standardizing Terms**: 
   - Unifying different ways of referring to the same medical condition
   - Ensuring consistent terminology across documents

3. **Enabling Inference**: 
   - Drawing conclusions based on ontological relationships
   - Discovering implicit connections in the data

## Next Steps

To learn more about Cognee and ontologies:
1. Check out the [Cognee documentation](https://docs.cognee.ai/)
2. Explore more examples in the `examples` directory
3. Try creating your own domain-specific ontology

Remember to:
- Place your scientific papers in the appropriate directory
- Update the ontology path to point to your .owl file
- Replace the API key with your own OpenAI key