# Understanding Ontologies with Cognee

This notebook demonstrates how to work with ontologies in scientific research using the Cognee framework. We'll explore how ontologies can enhance our understanding and querying of scientific papers.

## What is an Ontology?

An ontology is a formal representation of knowledge that defines:
- Concepts within a domain
- Relationships between concepts
- Properties and attributes
- Rules and constraints

Key terms:
- **Classes**: Categories or types (e.g., Disease, Symptom)
- **Instances**: Specific examples of classes (e.g., Type 2 Diabetes)
- **Properties**: Relationships between classes/instances (e.g., hasSymptom)
- **Axioms**: Logical statements defining relationships

## Setup

First, let's install the required packages and set up our environment:

In [None]:
# Install required package
# !pip install cognee

In [1]:
import os

# Set up OpenAI API key (required for Cognee's LLM functionality)
if "LLM_API_KEY" not in os.environ:
    os.environ["LLM_API_KEY"] = "your-api-key-here"  # Replace with your API key

In [2]:
# Import required libraries
import cognee
print(cognee.__version__)
from cognee.shared.logging_utils import get_logger

cognee.config.set_llm_model("gpt-4o-mini")
cognee.config.set_llm_provider("openai")
from cognee.api.v1.search import SearchType

logger = get_logger()


[2m2025-10-07T20:40:15.192965[0m [[32m[1minfo     [0m] [1mDeleted old log file: /Users/daulet/Desktop/dev/cognee-claude/logs/2025-10-07_21-25-04.log[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

[2m2025-10-07T20:40:15.894155[0m [[32m[1minfo     [0m] [1mLogging initialized           [0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m [36mcognee_version[0m=[35m0.3.5-local[0m [36mdatabase_path[0m=[35m/Users/daulet/Desktop/dev/cognee-claude/cognee/.cognee_system/databases[0m [36mgraph_database_name[0m=[35m[0m [36mos_info[0m=[35m'Darwin 24.5.0 (Darwin Kernel Version 24.5.0: Tue Apr 22 19:54:43 PDT 2025; root:xnu-11417.121.6~2/RELEASE_ARM64_T8132)'[0m [36mpython_version[0m=[35m3.10.11[0m [36mrelational_config[0m=[35mcognee_db[0m [36mstructlog_version[0m=[35m25.4.0[0m [36mvector_config[0m=[35mlancedb[0m

[2m2025-10-07T20:40:15.894641[0m [[32m[1minfo     [0m] [1mDatabase storage: /Users/daulet/Desktop/dev/cognee-claude/cognee/.co

0.3.5-local


## Creating the Pipeline

Let's create a pipeline that will:
1. Clean existing data
2. Process scientific papers
3. Apply ontological knowledge

In [3]:
async def run_pipeline(config=None):
    # Clean existing data
    await cognee.prune.prune_data()
    await cognee.prune.prune_system(metadata=True)
    
    # Set up path to scientific papers
    scientific_papers_dir = os.path.join(
        os.path.dirname(os.path.dirname(os.path.abspath("."))), 
        "cognee",
        "examples",
        "data", 
        "scientific_papers/"
    )
    
    # Add papers to the system
    await cognee.add(scientific_papers_dir)
    
    # Cognify with optional ontology
    return await cognee.cognify(config=config)

async def query_pipeline(questions):
    answers = []
    for question in questions:
        search_results = await cognee.search(
            query_type=SearchType.GRAPH_COMPLETION,
            query_text=question,
        )
        answers.append(search_results)
    return answers

## Running the Demo

Let's test our system with some medical questions, comparing results with and without ontological knowledge:

In [4]:
from cognee.modules.ontology.rdf_xml.RDFLibOntologyResolver import RDFLibOntologyResolver
from cognee.modules.ontology.ontology_config import Config
# Test questions
questions = [
    "What are common risk factors for Type 2 Diabetes?",
    "What preventive measures reduce the risk of Hypertension?",
    "What symptoms indicate possible Cardiovascular Disease?",
    "What diseases are associated with Obesity?"
]

# Path to medical ontology
ontology_path = "../examples/python/ontology_input_example/enriched_medical_ontology_with_classes.owl"  # Update with your ontology path

config: Config = {
        "ontology_config": {
            "ontology_resolver": RDFLibOntologyResolver(ontology_file=ontology_path)
        }
    }

# Run with ontology
print("\n--- Results WITH ontology ---\n")
await run_pipeline(config=config)
answers_with = await query_pipeline(questions)
for q, a in zip(questions, answers_with):
    print(f"Q: {q}\nA: {a}\n")


[2m2025-10-07T20:40:24.236015[0m [[32m[1minfo     [0m] [1mOntology loaded successfully from file: ../examples/python/ontology_input_example/enriched_medical_ontology_with_classes.owl[0m [[0m[1m[34mOntologyAdapter[0m][0m

[2m2025-10-07T20:40:24.236931[0m [[32m[1minfo     [0m] [1mLookup built: 4 classes, 50 individuals[0m [[0m[1m[34mOntologyAdapter[0m][0m

[2m2025-10-07T20:40:24.348030[0m [[32m[1minfo     [0m] [1mJSON extension already loaded or unavailable: Binder exception: Extension: JSON is already loaded. You can check loaded extensions by `CALL SHOW_LOADED_EXTENSIONS() RETURN *`.[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

[2m2025-10-07T20:40:24.387417[0m [[32m[1minfo     [0m] [1mDeleted Kuzu database files at /Users/daulet/Desktop/dev/cognee-claude/cognee/.cognee_system/databases/cognee_graph_kuzu[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m



--- Results WITH ontology ---




[2m2025-10-07T20:40:26.930731[0m [[32m[1minfo     [0m] [1mDatabase deleted successfully.[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

[1mStorage manager absolute path: /Users/daulet/Desktop/dev/cognee-claude/cognee/.cognee_cache[0m

[1mDeleting cache...             [0m

[1m✓ Cache deleted successfully! [0m


User ddfe2676-fa68-430d-981e-1d335a6fdb1b has registered.



[1mEmbeddingRateLimiter initialized: enabled=False, requests_limit=60, interval_seconds=60[0m

[2m2025-10-07T20:40:28.202666[0m [[32m[1minfo     [0m] [1mPipeline run started: `00128a51-0bd2-5512-9865-851caf7251ba`[0m [[0m[1m[34mrun_tasks_with_telemetry()[0m][0m

[2m2025-10-07T20:40:28.203107[0m [[32m[1minfo     [0m] [1mCoroutine task started: `resolve_data_directories`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:40:28.203471[0m [[32m[1minfo     [0m] [1mCoroutine task started: `ingest_data`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:40:28.204254[0m [[32m[1minfo     [0m] [1mPipeline run started: `00128a51-0bd2-5512-9865-851caf7251ba`[0m [[0m[1m[34mrun_tasks_with_telemetry()[0m][0m

[2m2025-10-07T20:40:28.204491[0m [[32m[1minfo     [0m] [1mCoroutine task started: `resolve_data_directories`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:40:28.204812[0m [[32m[1minfo     [0m] [1mCoroutine task st




[1mReading PDF for file: /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/nutrients-13-01241.pdf ...[0m


[2m2025-10-07T20:40:35.977344[0m [[32m[1minfo     [0m] [1mFalling back to PyPDF loader for /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/nutrients-13-01241.pdf[0m [[0m[1m[34mcognee.infrastructure.loaders.external.advanced_pdf_loader[0m][0m

[2m2025-10-07T20:40:35.979542[0m [[32m[1minfo     [0m] [1mReading PDF: /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/nutrients-13-01241.pdf[0m [[0m[1m[34mcognee.infrastructure.loaders.external.pypdf_loader[0m][0m

[2m2025-10-07T20:40:36.323535[0m [[32m[1minfo     [0m] [1mProcessing PDF: /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/TOJ-22-0073_152Mendoza.pdf[0m [[0m[1m[34mcognee.infrastructure.loaders.external.advanced_pdf_loader[0m][0m

[1mReading PDF for file: /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/TOJ-22-0




[2m2025-10-07T20:40:36.849367[0m [[32m[1minfo     [0m] [1mCoroutine task started: `extract_graph_from_data`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:40:36.863627[0m [[32m[1minfo     [0m] [1mCoroutine task started: `extract_graph_from_data`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:40:55.726911[0m [[32m[1minfo     [0m] [1mLoaded JSON extension         [0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

[2m2025-10-07T20:40:55.757864[0m [[32m[1minfo     [0m] [1mNo close match found for 'person' in category 'classes'[0m [[0m[1m[34mOntologyAdapter[0m][0m

[2m2025-10-07T20:40:55.758561[0m [[32m[1minfo     [0m] [1mNo close match found for 'michael f. mendoza' in category 'individuals'[0m [[0m[1m[34mOntologyAdapter[0m][0m

[2m2025-10-07T20:40:55.759088[0m [[32m[1minfo     [0m] [1mNo close match found for 'ralf martz sulague' in category 'individuals'[0m [[0m[1m[34mOntologyAdapter[0m][0m

[2m2025-1

Q: What are common risk factors for Type 2 Diabetes?
A: ['Common risk factors for Type 2 Diabetes include:\n1. Obesity\n2. Hypertension (high blood pressure)\n3. High cholesterol\n4. Smoking\n5. Cardiovascular disease\n6. Heart failure']

Q: What preventive measures reduce the risk of Hypertension?
A: ['Preventive measures to reduce the risk of hypertension include:\n1. Low sodium diet\n2. Moderate coffee consumption\n3. Regular exercise\n4. Maintaining a healthy lifestyle']

Q: What symptoms indicate possible Cardiovascular Disease?
A: ['Symptoms indicating possible cardiovascular disease include:\n1. Chest pain\n2. Shortness of breath\n3. Fatigue']

Q: What diseases are associated with Obesity?
A: ['Diseases associated with obesity include cardiovascular disease, diabetes, hypertension, high cholesterol, and high blood pressure.']




[2m2025-10-07T20:41:48.134301[0m [[32m[1minfo     [0m] [1mLoaded JSON extension         [0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m


In [5]:
# Run without ontology
print("\n--- Results WITHOUT ontology ---\n")
await run_pipeline()
answers_without = await query_pipeline(questions)
for q, a in zip(questions, answers_without):
    print(f"Q: {q}\nA: {a}\n")


[2m2025-10-07T20:41:23.957345[0m [[32m[1minfo     [0m] [1mDeleted Kuzu database files at /Users/daulet/Desktop/dev/cognee-claude/cognee/.cognee_system/databases/cognee_graph_kuzu[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m



--- Results WITHOUT ontology ---




[2m2025-10-07T20:41:25.980678[0m [[32m[1minfo     [0m] [1mDatabase deleted successfully.[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

[1mDeleting cache...             [0m

[1m✓ Cache deleted successfully! [0m

[2m2025-10-07T20:41:26.100988[0m [[32m[1minfo     [0m] [1mPipeline run started: `d7e0340c-b6c6-568f-856c-b9f4347628d4`[0m [[0m[1m[34mrun_tasks_with_telemetry()[0m][0m

[2m2025-10-07T20:41:26.101412[0m [[32m[1minfo     [0m] [1mCoroutine task started: `resolve_data_directories`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:41:26.101746[0m [[32m[1minfo     [0m] [1mCoroutine task started: `ingest_data`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:41:26.102449[0m [[32m[1minfo     [0m] [1mPipeline run started: `d7e0340c-b6c6-568f-856c-b9f4347628d4`[0m [[0m[1m[34mrun_tasks_with_telemetry()[0m][0m

[2m2025-10-07T20:41:26.102713[0m [[32m[1minfo     [0m] [1mCoroutine task started: `resolve_data_dir

User 215612a9-f107-44d8-9263-d680872182c9 has registered.



[1mReading PDF for file: /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/TOJ-22-0073_152Mendoza.pdf ...[0m


[2m2025-10-07T20:41:26.605986[0m [[32m[1minfo     [0m] [1mFalling back to PyPDF loader for /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/TOJ-22-0073_152Mendoza.pdf[0m [[0m[1m[34mcognee.infrastructure.loaders.external.advanced_pdf_loader[0m][0m

[2m2025-10-07T20:41:26.606982[0m [[32m[1minfo     [0m] [1mReading PDF: /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/TOJ-22-0073_152Mendoza.pdf[0m [[0m[1m[34mcognee.infrastructure.loaders.external.pypdf_loader[0m][0m

[2m2025-10-07T20:41:26.714728[0m [[32m[1minfo     [0m] [1mProcessing PDF: /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/nutrients-13-01241.pdf[0m [[0m[1m[34mcognee.infrastructure.loaders.external.advanced_pdf_loader[0m][0m





[1mReading PDF for file: /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/nutrients-13-01241.pdf ...[0m


[2m2025-10-07T20:41:27.140291[0m [[32m[1minfo     [0m] [1mFalling back to PyPDF loader for /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/nutrients-13-01241.pdf[0m [[0m[1m[34mcognee.infrastructure.loaders.external.advanced_pdf_loader[0m][0m

[2m2025-10-07T20:41:27.142512[0m [[32m[1minfo     [0m] [1mReading PDF: /Users/daulet/Desktop/dev/cognee/examples/data/scientific_papers/nutrients-13-01241.pdf[0m [[0m[1m[34mcognee.infrastructure.loaders.external.pypdf_loader[0m][0m

[2m2025-10-07T20:41:27.289235[0m [[32m[1minfo     [0m] [1mCoroutine task completed: `ingest_data`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:41:27.289621[0m [[32m[1minfo     [0m] [1mCoroutine task completed: `resolve_data_directories`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:41:27.289895[0m [[32m[1minfo  




[2m2025-10-07T20:41:27.349508[0m [[32m[1minfo     [0m] [1mAsync Generator task started: `extract_chunks_from_documents`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:41:27.387202[0m [[32m[1minfo     [0m] [1mCoroutine task started: `extract_graph_from_data`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:41:27.398908[0m [[32m[1minfo     [0m] [1mCoroutine task started: `extract_graph_from_data`[0m [[0m[1m[34mrun_tasks_base[0m][0m

[2m2025-10-07T20:41:48.152674[0m [[32m[1minfo     [0m] [1mNo close match found for 'person' in category 'classes'[0m [[0m[1m[34mOntologyAdapter[0m][0m

[2m2025-10-07T20:41:48.153506[0m [[32m[1minfo     [0m] [1mNo close match found for 'laura torres-collado' in category 'individuals'[0m [[0m[1m[34mOntologyAdapter[0m][0m

[2m2025-10-07T20:41:48.154051[0m [[32m[1minfo     [0m] [1mNo close match found for 'laura maría compañ-gabucio' in category 'individuals'[0m [[0m[1m[34mOntology

Q: What are common risk factors for Type 2 Diabetes?
A: ['Common risk factors for Type 2 Diabetes include obesity (as indicated by a body mass index of 25 kg/m² or more), sedentary lifestyle, poor dietary habits, smoking habits, and the presence of chronic conditions such as hypertension and high blood cholesterol. Additionally, waist circumference measurement can indicate risk levels, with increased risk seen in men with a waist over 102 cm and women over 88 cm.']

Q: What preventive measures reduce the risk of Hypertension?
A: ['Preventive measures to reduce the risk of hypertension include:\n1. **Moderate Coffee Consumption**: Regular moderate coffee consumption (1-4 cups per day) is associated with a decreased risk of developing hypertension.\n2. **Dietary Antioxidants**: Coffee contains antioxidants, such as chlorogenic acid, which may inhibit inflammation and support cardiovascular health.\n3. **Healthy Preparation Methods**: Choosing filtered coffee over boiled coffee can reduce

## Visualizing the Knowledge Graph

Let's visualize how our ontology connects different medical concepts:

In [6]:
import webbrowser
import os
from cognee.api.v1.visualize.visualize import visualize_graph
html = await visualize_graph()
home_dir = os.path.expanduser("~")
html_file = os.path.join(home_dir, "graph_visualization.html")
display(html_file)
webbrowser.open(f"file://{html_file}")


[2m2025-10-07T20:42:22.959024[0m [[32m[1minfo     [0m] [1mGraph visualization saved as /Users/daulet/graph_visualization.html[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m

[2m2025-10-07T20:42:22.959720[0m [[32m[1minfo     [0m] [1mThe HTML file has been stored on your home directory! Navigate there with cd ~[0m [[0m[1m[34mcognee.shared.logging_utils[0m][0m


'/Users/daulet/graph_visualization.html'

True

## Understanding the Results

The demonstration above shows how ontologies enhance our analysis by:

1. **Making Connections**: 
   - Linking related medical concepts even when not explicitly stated
   - Identifying relationships between symptoms, diseases, and risk factors

2. **Standardizing Terms**: 
   - Unifying different ways of referring to the same medical condition
   - Ensuring consistent terminology across documents

3. **Enabling Inference**: 
   - Drawing conclusions based on ontological relationships
   - Discovering implicit connections in the data

## Next Steps

To learn more about Cognee and ontologies:
1. Check out the [Cognee documentation](https://docs.cognee.ai/)
2. Explore more examples in the `examples` directory
3. Try creating your own domain-specific ontology

Remember to:
- Place your scientific papers in the appropriate directory
- Update the ontology path to point to your .owl file
- Replace the API key with your own OpenAI key

In [None]:
# Only exit in interactive mode, not during GitHub Actions
import os

# Skip exit if we're running in GitHub Actions
if not os.environ.get('GITHUB_ACTIONS'):
    print("Exiting kernel to clean up resources...")
    os._exit(0)
else:
    print("Skipping kernel exit - running in GitHub Actions")

: 