# Build and Visualize KGs

This notebook demonstrates the process of building and visualizing knowledge graphs from different ontologies and data sources. The initial section sets up the environment, loads required modules, and prepares ontology files for validation and further processing.

## The Knowledge Graphs

This notebook builds three different knowledge graphs:

1. **RDB Ontology KG**: Knowledge graph generated from a relational database ontology (RIGOR methodology)
2. **Text Ontology KG**: Knowledge graph generated from text documents

The graphs are saved as both HTML visualizations and CSV files for further analysis.

In [1]:
import webbrowser
import os
import sys

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../..')))

from app_settings import PROJECT_ROOT
from src.build_kg import abuild_kg, save_graph_to_csv 

os.chdir(PROJECT_ROOT)
print(f"Changed working dir to {PROJECT_ROOT}")

Project root path: C:\Users\tiago\Documents\Granter ai Internship\Implementation\Code\KGs_for_Vertical_AI
046b1...
Environment variables loaded successfully


  from .autonotebook import tqdm as notebook_tqdm


Changed working dir to C:\Users\tiago\Documents\Granter ai Internship\Implementation\Code\KGs_for_Vertical_AI


In [2]:
def analyze_chunk_structure(graph_documents):
    chunk_nodes = [n for n in graph_documents[0].nodes if hasattr(n, 'type') and n.type == "TextChunk"]
    print(f"Number of chunk nodes: {len(chunk_nodes)}")

    # Count relationships per chunk
    for chunk in chunk_nodes:
        chunk_rels = [r for r in graph_documents[0].relationships 
                    if r.source.id == chunk.id and r.type == "CONTAINS_ENTITY"]
        print(f"{chunk.id} contains {len(chunk_rels)} entities")
        
        # Sample the first 5 entities in this chunk
        if len(chunk_rels) > 0:
            print("  Sample entities:")
            for rel in chunk_rels[:5]:  # Show only first 5
                print(f"  - {rel.target.id} ({rel.target.type if hasattr(rel.target, 'type') else 'Unknown'})")
            if len(chunk_rels) > 5:
                print(f"  - ...and {len(chunk_rels) - 5} more entities")
        print("")

## RDB Ontology Knowledge Graph

The RDB ontology knowledge graph is built from an ontology that models relational database concepts—such as tables, columns, and relationships—as graph nodes and edges. This graph enables visualization and analysis of structured data. In the next steps, we will add text data to be mapped; the RDB ontology serves only as the schema for this mapping.

In [3]:
rdb_ont_kg_nochunks = await abuild_kg(
    input_path="data/texts/Voucher_Granter_Application.txt",
    ontology_path="results/ontologies/rdb/rigor_ontology_few_fixes.ttl",
    html_output="results/kgs/rdb_ontology_kg_nochunks.html",
    include_chunks=False
)

Input path: data/texts/Voucher_Granter_Application.txt
Ontology path exists
Using ontology constraints: 26 classes, 43 relations
Loaded content from data/texts/Voucher_Granter_Application.txt
Split into 14 chunks
Converting text to graph documents...
Merging documents...
Merged: 216 nodes, 249 relationships
Visualizing graph...
HTML visualization: C:\Users\tiago\Documents\Granter ai Internship\Implementation\Code\KGs_for_Vertical_AI\results\kgs\rdb_ontology_kg_nochunks.html
Knowledge graph built successfully!


In [4]:
save_graph_to_csv(
    graph_documents = rdb_ont_kg_nochunks,
    output_file="results/kgs/rdb_ontology_kg_nochunks.csv"
)

Graph saved to: C:\Users\tiago\Documents\Granter ai Internship\Implementation\Code\KGs_for_Vertical_AI\results\kgs\rdb_ontology_kg_nochunks.csv
Total nodes: 216
Total edges: 249
Chunk nodes: 0
Chunk relationships: 0


**Visualize the RDB ontology KG**

In [5]:
webbrowser.open(f"file://{os.path.abspath('results/kgs/rdb_ontology_kg_nochunks.html')}")

True

# Text Ontology Knowledge Graph

The Text Ontology Knowledge Graph is constructed from an ontology derived from textual data sources. This graph captures entities and relationships identified in unstructured text, enabling semantic analysis and visualization of concepts extracted from documents. The ontology is created using natural language processing techniques that identify key domain concepts without requiring pre-existing database structures.

In [6]:
txt_ont_kg_nochunks = await abuild_kg(
    input_path="data/texts/Voucher_Granter_Application.txt",
    ontology_path="results/ontologies/text/text_ontology.ttl",
    html_output="results/kgs/txt_ontology_kg_nochunks.html",
    include_chunks=False
)

Input path: data/texts/Voucher_Granter_Application.txt
Ontology path exists
Using ontology constraints: 210 classes, 330 relations
Loaded content from data/texts/Voucher_Granter_Application.txt
Split into 14 chunks
Converting text to graph documents...
Merging documents...
Merged: 250 nodes, 244 relationships
Visualizing graph...
HTML visualization: C:\Users\tiago\Documents\Granter ai Internship\Implementation\Code\KGs_for_Vertical_AI\results\kgs\txt_ontology_kg_nochunks.html
Knowledge graph built successfully!


In [7]:
save_graph_to_csv(
    graph_documents = txt_ont_kg_nochunks, 
    output_file="results/kgs/txt_ontology_kg_nochunks.csv"
)

Graph saved to: C:\Users\tiago\Documents\Granter ai Internship\Implementation\Code\KGs_for_Vertical_AI\results\kgs\txt_ontology_kg_nochunks.csv
Total nodes: 250
Total edges: 244
Chunk nodes: 0
Chunk relationships: 0


**Visualize the Text Ontology KG**

In [8]:
webbrowser.open(f"file://{os.path.abspath('results/kgs/txt_ontology_kg_nochunks.html')}")

True

# Comparing Graphs With and Without Chunk Information

We've implemented two ways to build and visualize knowledge graphs:

1. **Without chunk information** - Using `merge_graph_documents()` and `visualize_graph()`
   - Creates a standard knowledge graph with only entities and relationships
   - Does not preserve information about which text chunks entities came from

2. **With chunk information** - Using `merge_graph_documents_with_chunks()` and `visualize_graph_with_chunks()`
   - Creates nodes for each text chunk (shown as red diamonds in visualization)
   - Adds relationships between chunks and the entities they contain
   - Makes it possible to trace entities back to their source text

The parameter `include_chunks=True` enables the chunk-aware processing.

## RDB ontology-guided graph

In [9]:
rdb_ont_kg_chunks = await abuild_kg(
    input_path="data/texts/Voucher_Granter_Application.txt",
    ontology_path="results/ontologies/rdb/rigor_ontology_few_fixes.ttl",
    html_output="results/kgs/rdb_ontology_kg_chunks.html",
    include_chunks=True
)

Input path: data/texts/Voucher_Granter_Application.txt
Ontology path exists
Using ontology constraints: 26 classes, 43 relations
Loaded content from data/texts/Voucher_Granter_Application.txt
Split into 14 chunks
Converting text to graph documents...
Merging documents while preserving chunk information...
Merged: 241 nodes (14 chunk nodes), 554 relationships
Visualizing graph with chunks...
HTML visualization: C:\Users\tiago\Documents\Granter ai Internship\Implementation\Code\KGs_for_Vertical_AI\results\kgs\rdb_ontology_kg_chunks.html
Knowledge graph built successfully!


In [10]:
save_graph_to_csv(
    graph_documents = rdb_ont_kg_chunks,
    output_file="results/kgs/rdb_ontology_kg_chunks.csv"
)

Graph saved to: C:\Users\tiago\Documents\Granter ai Internship\Implementation\Code\KGs_for_Vertical_AI\results\kgs\rdb_ontology_kg_chunks.csv
Total nodes: 241
Total edges: 554
Chunk nodes: 14
Chunk relationships: 302


In [11]:
analyze_chunk_structure(rdb_ont_kg_chunks)

Number of chunk nodes: 14
Chunk_0:  contains 19 entities
  Sample entities:
  - Granter.Ai (Person)
  - Ai Solutions (Concept)
  - Public Funding Applications (Concept)
  - Artificial Intelligence (Concept)
  - Gans (Concept)
  - ...and 14 more entities

Chunk_1:  contains 34 entities
  Sample entities:
  - Granter.Ai (Person)
  - Ai Agent (Concept)
  - Specialists (Person)
  - Businesses (Person)
  - Public Funding (Concept)
  - ...and 29 more entities

Chunk_2:  contains 26 entities
  Sample entities:
  - Granter.Ai (Person)
  - Company (Person)
  - Ai Agent (Person)
  - Public Funds (Person)
  - Markets (Person)
  - ...and 21 more entities

Chunk_3:  contains 44 entities
  Sample entities:
  - Granter.Ai (Person)
  - Project (Concept)
  - Digital Product (Concept)
  - Ai Agent (Concept)
  - Public Funding Applications (Concept)
  - ...and 39 more entities

Chunk_4:  contains 4 entities
  Sample entities:
  - Models (Databasetable)
  - Access Levels (Databasetable)
  - Individual (Pe

**Visualize the RDB KG**

In [12]:
webbrowser.open(f"file://{os.path.abspath('results/kgs/rdb_ontology_kg_chunks.html')}")

True

## Text ontology-guided graph

In [13]:
txt_ont_kg_chunks = await abuild_kg(
    input_path="data/texts/Voucher_Granter_Application.txt",
    ontology_path="results/ontologies/text/text_ontology.ttl",
    html_output="results/kgs/txt_ontology_kg_chunks.html",
    include_chunks=True
)

Input path: data/texts/Voucher_Granter_Application.txt
Ontology path exists
Using ontology constraints: 210 classes, 330 relations
Loaded content from data/texts/Voucher_Granter_Application.txt
Split into 14 chunks
Converting text to graph documents...
Merging documents while preserving chunk information...
Merged: 259 nodes (14 chunk nodes), 495 relationships
Visualizing graph with chunks...
HTML visualization: C:\Users\tiago\Documents\Granter ai Internship\Implementation\Code\KGs_for_Vertical_AI\results\kgs\txt_ontology_kg_chunks.html
Knowledge graph built successfully!


In [14]:
save_graph_to_csv(
    graph_documents = txt_ont_kg_chunks,
    output_file="results/kgs/txt_ontology_kg_chunks.csv"
)

Graph saved to: C:\Users\tiago\Documents\Granter ai Internship\Implementation\Code\KGs_for_Vertical_AI\results\kgs\txt_ontology_kg_chunks.csv
Total nodes: 259
Total edges: 495
Chunk nodes: 14
Chunk relationships: 300


In [15]:
analyze_chunk_structure(txt_ont_kg_chunks)

Number of chunk nodes: 14
Chunk_0:  contains 12 entities
  Sample entities:
  - Granter.Ai (Company)
  - Ai Solutions For Public Funding (Business)
  - Ai Agent For Funding Applications (Product)
  - Generative Adversarial Networks (Aicontentgenerationmethodology)
  - Adversarial In-Context Learning (Aicontentgenerationmethodology)
  - ...and 7 more entities

Chunk_1:  contains 29 entities
  Sample entities:
  - Granter.Ai (Organization)
  - Ai Agent (Agent)
  - Specialists (Person)
  - Businesses (Organization)
  - Funding Opportunities (Businessgoal)
  - ...and 24 more entities

Chunk_2:  contains 33 entities
  Sample entities:
  - Granter.Ai (Company)
  - Ai Agent (Digitalproduct)
  - Public Funds Application Sector (Sector)
  - Company'S Growth Support (Businessgoal)
  - Innovative Digital Product (Digitalproduct)
  - ...and 28 more entities

Chunk_3:  contains 29 entities
  Sample entities:
  - Granter.Ai (Company)
  - Project (Project)
  - Digital Product (Digitalproduct)
  - Tec

**Visualize the txt KG**

In [16]:
webbrowser.open(f"file://{os.path.abspath('results/kgs/txt_ontology_kg_chunks.html')}")

True