# Build and Visualize KGs

This notebook demonstrates the process of building and visualizing knowledge graphs from different ontologies and data sources. The initial section sets up the environment, loads required modules, and prepares ontology files for validation and further processing.

## The Knowledge Graphs

This notebook builds three different knowledge graphs:

1. **RDB Ontology KG**: Knowledge graph generated from a relational database ontology (RIGOR methodology)
2. **Text Ontology KG**: Knowledge graph generated from text documents

The graphs are saved as both HTML visualizations and CSV files for further analysis.

In [1]:
import os
import sys

sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '../..')))

from app_settings import PROJECT_ROOT
from src.build_kg import abuild_kg, save_graph_to_csv, extract_local_names 
from src.txt_ontology_learning import validate_turtle_string

os.chdir(PROJECT_ROOT)
print(f"Changed working dir to {PROJECT_ROOT}")

Project root path: C:\Users\tiago\Documents\Granter Ai Internship\Implementation\Code\KGs_for_Vertical_AI
046b1...
Environment variables loaded successfully


  from .autonotebook import tqdm as notebook_tqdm


Changed working dir to C:\Users\tiago\Documents\Granter Ai Internship\Implementation\Code\KGs_for_Vertical_AI


[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\tiago\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [2]:

# Now we can open files using relative paths directly
with open('results/ontologies/text/text_ontology_few_fixes.txt', 'r', encoding='utf-8') as f:
    ontology = f.read()

# Add prefixes since gpt_results_to_ttl doesn't include them
standard_prefixes = [
    '@prefix : <http://example.org/> .',
    "@prefix owl: <http://www.w3.org/2002/07/owl#> .",
    "@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .",
    "@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .",
    "@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> ."
]
print(ontology)
ontology = "\n".join(standard_prefixes) + "\n\n" + ontology
is_valid, error_message = validate_turtle_string(ontology)

if not is_valid:
    print("\n\nOntology is not valid Turtle syntax. Error:")
    print(error_message)

:VouchersParaStartupsNovosProdutosDigitaisTecnológicosGranterAi a rdfs:Class .
:AtividadeEconómicaPrincipalBeneficiário a rdfs:Class .
:ProjetoProposto a rdfs:Class .
:SoftwareProprietário a rdfs:Class .
:SetorTecnológico a rdfs:Class .
:AgenteDeIA a rdfs:Class .
:MetodologiasIA a rdfs:Class .
:RedesGenerativasAdversariais a rdfs:Class .
:AdversarialInContextLearning a rdfs:Class .
:FerramentaInteligente a rdfs:Class .
:TipologiaApoio a rdfs:Class .
:FaseInicialCrescimento a rdfs:Class .
:CrescimentoAcelerado a rdfs:Class .
:IndicadoresDeDesempenho a rdfs:Class .
:ReconhecimentoANI a rdfs:Class .
:ProdutoDigital a rdfs:Class .
:ModeloDeNegócio a rdfs:Class .
:EstratégiaDeInternacionalização a rdfs:Class .
:EquipeGestão a rdfs:Class .
:LiderançaProjeto a rdfs:Class .
:CEO a rdfs:Class .
:CTO a rdfs:Class .
:HeadOfOperations a rdfs:Class .
:TechLead a rdfs:Class .
:HeadOfSales a rdfs:Class .
:EquipeMultidisciplinar a rdfs:Class .
:PropriedadeIntelectual a rdfs:Class .
:ImpactoCompetitivi

## RDB Ontology Knowledge Graph

The RDB ontology knowledge graph is built from an ontology that models relational database concepts—such as tables, columns, and relationships—as graph nodes and edges. This graph enables visualization and analysis of structured data. In the next steps, we will add text data to be mapped; the RDB ontology serves only as the schema for this mapping.

In [2]:
rdb_ont_kg = await abuild_kg(
    input_path="data/texts/application_example.txt",
    ontology_path="results/ontologies/rdb/rigor_ontology_few_fixes.ttl",
    html_output="results/kgs/rdb_ontology_kg.html",
    include_chunks=True
)

file:///C:/Users/tiago/Documents/Granter Ai Internship/Implementation/Code/KGs_for_Vertical_AI/results/ontologies/rdb/rigor_ontology_few_fixes.ttl does not look like a valid URI, trying to serialize this will break.
file:///C:/Users/tiago/Documents/Granter Ai Internship/Implementation/Code/KGs_for_Vertical_AI/results/ontologies/rdb/rigor_ontology_few_fixes.ttl does not look like a valid URI, trying to serialize this will break.


Input path: data/texts/application_example.txt
Ontology path exists
Using ontology constraints: 26 classes, 43 relations
Loaded content from data/texts/application_example.txt
Split into 14 chunks
Graph transformer settings: {'llm': AzureChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001F3510D3C80>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001F3517CE300>, root_client=<openai.lib.azure.AzureOpenAI object at 0x000001F351217AA0>, root_async_client=<openai.lib.azure.AsyncAzureOpenAI object at 0x000001F3513C5C10>, temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********'), disabled_params={'parallel_tool_calls': None}, azure_endpoint='https://granteraistudi5497094960.cognitiveservices.azure.com/openai/deployments/resarch-gpt-4.1-nano/chat/completions?api-version=2025-01-01-preview', deployment_name='gpt-4.1-nano', openai_api_version='2025-01-01-preview', openai_api_type='azure'), 'a

# Comparing Graphs With and Without Chunk Information

We've implemented two ways to build and visualize knowledge graphs:

1. **Without chunk information** - Using `merge_graph_documents()` and `visualize_graph()`
   - Creates a standard knowledge graph with only entities and relationships
   - Does not preserve information about which text chunks entities came from

2. **With chunk information** - Using `merge_graph_documents_with_chunks()` and `visualize_graph_with_chunks()`
   - Creates nodes for each text chunk (shown as red diamonds in visualization)
   - Adds relationships between chunks and the entities they contain
   - Makes it possible to trace entities back to their source text

The parameter `include_chunks=True` enables the chunk-aware processing.

In [3]:
save_graph_to_csv(
    graph_documents = rdb_ont_kg,
    output_file="results/kgs/rdb_ontology_kg.csv"
)

Graph saved to: C:\Users\tiago\Documents\Granter Ai Internship\Implementation\Code\KGs_for_Vertical_AI\results\kgs\rdb_ontology_kg.csv
Total nodes: 243
Total edges: 490
Chunk nodes: 14
Chunk relationships: 270


In [None]:
# Analyze the chunk structure in the graph
chunk_nodes = [n for n in rdb_ont_kg[0].nodes if hasattr(n, 'type') and n.type == "TextChunk"]
print(f"Number of chunk nodes: {len(chunk_nodes)}")

# Count relationships per chunk
for chunk in chunk_nodes:
    chunk_rels = [r for r in rdb_ont_kg[0].relationships 
                 if r.source.id == chunk.id and r.type == "CONTAINS_ENTITY"]
    print(f"Chunk {chunk.id} contains {len(chunk_rels)} entities")
    
    # Sample the first 5 entities in this chunk
    if len(chunk_rels) > 0:
        print("  Sample entities:")
        for rel in chunk_rels[:5]:  # Show only first 5
            print(f"  - {rel.target.id} ({rel.target.type if hasattr(rel.target, 'type') else 'Unknown'})")
        if len(chunk_rels) > 5:
            print(f"  - ...and {len(chunk_rels) - 5} more entities")
    print("")

**Visualize the RDB KG**

In [None]:
import webbrowser
webbrowser.open(f"file://{os.path.abspath('results/kgs/rdb_ontology_kg.html')}")

## Text Ontology Knowledge Graph

The Text Ontology Knowledge Graph is constructed from an ontology derived from textual data sources. This graph captures entities and relationships identified in unstructured text, enabling semantic analysis and visualization of concepts extracted from documents. The ontology is created using natural language processing techniques that identify key domain concepts without requiring pre-existing database structures.

In [3]:
# Extract class, relation, and attribute names from the ontology
# This helps with understanding what's in the ontology file
extract_local_names('results/ontologies/text/text_ontology_10sens.ttl')

(['AdversarialInContextLearning',
  'AgenteDeIA',
  'ApoioAoArranqueOuAoCrescimento',
  'AtividadeEconómicaPrincipalBeneficiário',
  'Certificacoes',
  'Equipamentos',
  'EquipeDeGestao',
  'Escalabilidade',
  'ExpansaoComercial',
  'FerramentaInteligente',
  'GestaoDeProduto',
  'ImpactoNaCompetitividade',
  'Investimentos',
  'LiderancaDoProjeto',
  'MetodologiasIA',
  'ModeloDeNegócio',
  'Projeto',
  'PropostaDeValor',
  'PropriedadeIntelectual',
  'ProtecaoDePropriedadeIntelectual',
  'RecursosHumanos',
  'RedesGenerativasAdversariais',
  'ServicosExternos',
  'SoftwareProprietário',
  'Tecnologia',
  'Tipologia',
  'VouchersParaStartupsNovosProdutosDigitaisTecnológicosGranterAi'],
 ['aplicaAdversarialInContextLearning',
  'aplicadoA',
  'apoia',
  'baseadoEm',
  'classificaTipologia',
  'contaComEquipe',
  'contrataServicosExternos',
  'dedicaAoDesenvolvimento',
  'desenvolveSoftwareProprietário',
  'destina',
  'disponibilizaRecursosHumanos',
  'elaboracaoDeCandidaturas',
  'emp

**Build the Text Ontology KG**

In [4]:
txt_ont_kg = await abuild_kg(
    input_path="data/texts/application_example.txt",
    ontology_path="results/ontologies/text/text_ontology_few_fixes.ttl",
    html_output="results/kgs/txt_ontology_kg.html",
    include_chunks=True
)

Input path: data/texts/application_example.txt
Loaded content from data/texts/application_example.txt
Split into 14 chunks
No ontology constraints provided or loaded!!!
Graph transformer settings: {'llm': AzureChatOpenAI(client=<openai.resources.chat.completions.completions.Completions object at 0x000001C126B55640>, async_client=<openai.resources.chat.completions.completions.AsyncCompletions object at 0x000001C126BB4470>, root_client=<openai.lib.azure.AzureOpenAI object at 0x000001C1264E91F0>, root_async_client=<openai.lib.azure.AsyncAzureOpenAI object at 0x000001C126733560>, temperature=0.0, model_kwargs={}, openai_api_key=SecretStr('**********'), disabled_params={'parallel_tool_calls': None}, azure_endpoint='https://granteraistudi5497094960.cognitiveservices.azure.com/openai/deployments/resarch-gpt-4.1-nano/chat/completions?api-version=2025-01-01-preview', deployment_name='gpt-4.1-nano', openai_api_version='2025-01-01-preview', openai_api_type='azure'), 'allowed_nodes': None, 'allowe

**Save the Text Ontology KG to CSV**

In [5]:
save_graph_to_csv(
    graph_documents = txt_ont_kg, 
    output_file="results/kgs/txt_ontology_kg.csv"
)

Graph saved to: C:\wamp\www\KGs_for_Vertical_AI\results\kgs\txt_ontology_kg.csv
Total nodes: 298
Total edges: 597
Chunk nodes: 14
Chunk relationships: 327


**Visualize the Text Ontology KG**

In [None]:
import webbrowser
webbrowser.open(f"file://{os.path.abspath('results/kgs/txt_ontology_kg.html')}")