<div class="alert alert-block alert-success">
    <h1>
        Example notebook - AI governance control mappings
    </h1>
    <p>
        Link to dataset : Michael Brock Li' dataset
    </p>
</div>

## Executive Summary

This notebook demonstrates a **knowledge graph-based approach** to managing regulatory control mappings across multiple standards (ISO42001, ISO27001, ISO27701, EU AI Act, NIST RMF, SOC2).

**Key capabilities:**
- **Semantic search**: Find relevant controls using natural language queries
- **Cross-standard mapping**: Identify overlapping requirements between different frameworks
- **Graph visualization**: Explore relationships between domains, topics, controls, and standards
- **Hybrid search**: Combine semantic understanding with keyword matching for optimal results

---

## Data Overview

**Dataset**: Michael Brock Li's AI governance control mappings
- **44 control statements** covering AI governance requirements
- **6 regulatory standards** with reference mappings
- **Hierarchical structure**: Domain → Topic → Control → Standards

### Standard Coverage
- ISO 42001 (AI Management System)
- ISO 27001 (Information Security)
- ISO 27701 (Privacy Information Management)
- EU AI Act
- NIST RMF (Risk Management Framework)
- SOC 2 (Service Organization Control)


# Import modules and functions

In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import warnings

warnings.filterwarnings("ignore")

In [3]:
import re
import pandas as pd
import networkx as nx
from sentence_transformers import SentenceTransformer
from IPython.display import display, Markdown

In [4]:
from turingdb_examples.utils import get_return_statements
from turingdb_examples.graph import build_create_command_from_networkx

In [5]:
from turingdb_kgsearch.embeddings import (
    build_node_only_embeddings,
    build_context_enriched_embeddings,
    build_smart_enriched_embeddings,
    build_sparse_embeddings,
    build_node2vec_embeddings,
)
from turingdb_kgsearch.search import (
    dense_search,
    sparse_search,
    print_results,
    hybrid_search,
    compare_search_methods,
)
from turingdb_kgsearch.subgraph import get_subgraph_around_query
from turingdb_kgsearch.visualization import (
    visualize_graph_with_pyvis,
    extract_and_visualize_subgraph,
)
from turingdb_kgsearch.workflow import (
    search_and_expand_hybrid_filtered,
    generate_report_hybrid_workflow_results,
)
from turingdb_kgsearch.statistics import get_subgraph_stats, print_subgraph_stats
from turingdb_kgsearch.ranking import (
    rank_nodes_by_importance,
    rank_nodes_by_importance_with_context,
    print_node_rankings,
    compare_node_importance,
    diagnose_rankings,
)
from turingdb_kgsearch.explain_results import (
    explain_retrieval,
    explain_top_results,
    print_explanation,
)
from turingdb_kgsearch.llm import (
    create_llm_prompt_with_graph,
    query_llm,
)

In [6]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

True

# Set path to data

In [7]:
example_name = "AI_governance_control_mapping"
path_data = f"{os.getcwd()}/data/{example_name}"
if not os.path.exists(path_data):
    raise ValueError(f"{path_data} does not exists")

# Create graph using `turingdb` python package

<div class="alert alert-block alert-info">
    <h2>
        See <a href="https://docs.turingdb.ai/quickstart">TuringDB Get started documentation</a> for the important steps to follow :
    </h2>
    <h3>
        <ul>
            <li>Create your TuringDB account</li>
            <li>Create your instance in the <a href="https://console.turingdb.ai/auth">TuringDB Cloud UI</a></li>
            <li>Copy your Instance ID from the Database Instances management page</li>
            <li>Get API Key from the Settings in UI</li>
        </ul>
        Remember to have your instance active while working in this notebook !
    </h3>
</div>

## Connect to instance and transfer data

In [19]:
from turingdb import TuringDB

# Create TuringDB client
client = TuringDB(
    host="http://localhost:6666"  # Remove this parameter and set the two parameters below
    # instance_id=os.getenv("INSTANCE_ID"),
    # auth_token=os.getenv("AUTH_TOKEN"),
)

In [20]:
%%time

client.s3_connect(
    bucket_name="turing-internal",
    region="eu-west-2",
    access_key=os.getenv("AWS_ACCESS_KEY"),
    secret_key=os.getenv("AWS_SECRET_KEY"),
)

CPU times: user 54.6 ms, sys: 33.1 ms, total: 87.7 ms
Wall time: 135 ms


In [21]:
%%time

client.transfer(
    src=f"data/{example_name}/ai_gov_control_mappings_full.csv",
    dst="turingdb://ai_gov_control_mappings_full.csv",  # to s3 bucket or TuringDB instance or local .turing
)

CPU times: user 78.2 ms, sys: 26 ms, total: 104 ms
Wall time: 461 ms


## Check data files are available

In [22]:
list_csv_files = sorted(os.listdir(path_data))
if "ai_gov_control_mappings_full.csv" not in list_csv_files:
    raise ValueError(f"csv file is not available in {path_data}")

## Import and format data

In [23]:
# Load CSV
path_turing_folder = f"{os.getenv('HOME')}/.turing"
df = pd.read_csv(f"{path_turing_folder}/data/ai_gov_control_mappings_full.csv")
df = df.replace({r"\s+$": "", r"^\s+": ""}, regex=True).replace(r"\n", " ", regex=True)
print(f"✓ Loaded {len(df)} rows from CSV")
df

✓ Loaded 44 rows from CSV


Unnamed: 0,Domain,Master,Topic,Control Statement,ISO42001,ISO27001,ISO27701,EU AI ACT,NIST RMF,SOC2
0,Governance & Leadership,GL-1,Executive Commitment and Accountability,The organisation's executive leadership shall ...,4.1 5.1 5.2 9.3 A.2.2 A.2.3 A.2.4,5.1 5.2 9.3 A.5.1 A.5.2,6.1.1 6.1.2,4.1,Govern 1.1 Govern 2.3 Govern 3.1,CC.1.1 CC.1.2 CC.1.3 CC.1.4 CC.1.5 CC.5.3
1,Governance & Leadership,GL-2,"Roles, Responsibilities & Resources","The organisation shall define, document, and m...",5.3 7.1-7.3 A.3.2 A.4.2,5.3 7.1-7.3 A.6.1 A.6.2 A.7.2,6.2.1 6.2.2 7.2.2 9.2.3,22.1 22.2 26.3,,CC.1.3 CC.1.4
2,Governance & Leadership,GL-3,Strategic Alignment & Objectives,The organisation shall document clear objectiv...,4.1-4.4 5.2 6.2-6.3 A.2.2-A.2.4 A.6.1.2 A.9.3 ...,4.1-4.4 6.2-6.3,A.7.2.1 A.7.2.2 B.8.2.2,,Map 1.3 Map 1.4 Govern 1.1 Govern 1.2 Govern 4...,
3,Risk Management,RM-1,Risk Management Framework and Governance,"The organisation shall establish, document, an...",6.1,6.1,12.2.1 A.7.2.5 A.7.2.8 B.8.2.6,9.1 9.2,Govern 1.3 Govern 1.4 Govern 1.5 Map 1.5,CC3.1
4,Risk Management,RM-2,Risk Identification and Impact Assessment,The organisation shall conduct and document co...,6.1.1-6.1.2 6.1.4 8.4 A.5.2 A.5.3 A.5.4 A.5.5,6.1.2,A.7.2.5 A.7.3.10 A.7.4.4,9.9 27.1,Map 1.1 Map 3.1 Map 3.2 Measure 2.6 Measure 2...,CC3.2
5,Risk Management,RM-3,Risk Treatment and Control Implementation,The organisation shall implement appropriate t...,6.1.3,6.1.3,A.7.4.1 A.7.4.2 A.7.4.4 A.7.4.5,8.1 8.2 17.1 9.3 9.4 9.5,Manage 1.2 Manage 1.3 Manage 1.4,CC5.1 CC9.1
6,Risk Management,RM-4,Risk Monitoring and Response,The organisation shall implement continuous mo...,6.1.3 8.1-8.3,6.1.3 8.1-8.3,A.7.4.3 A.7.4.9 B.8.2.4 B.8.2.5 B.8.4.3,9.6,Measure 3.1 Measure 3.2 Manage 2.1 Manage 2.2 ...,CC3.4 CC9.2
7,Regulatory Operations,RO-1,Regulatory Compliance Framework,"The organisation shall establish, document, an...",,A.18.1,18.2.1 A.7.2.1-A.7.2.4 B.8.2.1-B.8.2.2 B.8.2.4...,5.1 5.2 6.1-6.4 8.1-8.2 40.1 41.1 42.1 43.1-43...,Govern 1.1 Map 4.1,CC1.5
8,Regulatory Operations,RO-2,"Transparency, Disclosure and Reporting",The organisation shall implement mechanisms to...,A.8.3 A.8.5,A.6.3,6.2.3 A.7.3.2-A.7.3.3 A.7.3.8-A.7.3.9 A.7.5.3-...,50.1-50.5 86.1-86.3 20.1 20.2 60.7 60.8,Govern 6.1 Map 4.1,CC2.3 P1.1 P1.2 P1.3
9,Regulatory Operations,RO-3,Record-Keeping,The organisation shall maintain comprehensive ...,,A.7.2,8.2.3 A.7.2.8 A.7.3.1 A.7.4.3 A.7.4.6-A.7.4.8 ...,11.1 11.3 18.1 19.1 19.2 71.2 71.3,Map 4.1 Measure 2.12,P3.1 P3.2 P3.3


## Create graph + Graph Design - Regulatory Control Mappings

### Graph Structure

```
[Domain] --contains--> [Topic] --has_control--> [Control] --maps_to--> [Standard Reference]
```

**Example:**
```
Data Governance → Privacy → "Ensure data encryption" → ISO27001: A.8.24
                                                      → NIST RMF: SC-28
```

### Node Types

| Type | Count | Description | Searchable |
|------|-------|-------------|------------|
| **Control** | 44 | Actual control statements | ✅ Primary |
| **Topic** | ~15-20 | Sub-categories | ✅ Optional |
| **Domain** | ~5-10 | High-level groupings | ✅ Optional |
| **Standard** | ~50-100 | Standard references | ❌ Reference only |

### Design Benefits

1. **Flexible querying**: Search by meaning, not just keywords
2. **Many-to-many mappings**: One control can satisfy multiple standards
3. **Hierarchical navigation**: Browse from high-level domains down to specific requirements
4. **Cross-standard analysis**: Find overlaps and gaps between frameworks
5. **Supports Key Queries**:
   - "Find controls about data privacy" (semantic search)
   - "Which standards cover this topic?" (graph traversal)
   - "What's the overlap between ISO and NIST?" (cross-standard analysis)

In [24]:
# Create graph
G = nx.DiGraph()

# Build graph structure
for idx, row in df.iterrows():
    # Main control node (what we'll search)
    control_id = f"control_{idx}"
    G.add_node(
        control_id,
        type="control",
        statement=row["Control Statement"],
        domain=row["Domain"],
        master=row["Master"],
        topic=row["Topic"],
    )

    # Domain node
    domain_id = f"domain_{row['Domain']}"
    if domain_id not in G:
        G.add_node(domain_id, type="domain", name=row["Domain"])

    # Topic node
    topic_id = f"topic_{row['Topic']}"
    if topic_id not in G:
        G.add_node(topic_id, type="topic", name=row["Topic"])

    # Connections
    G.add_edge(domain_id, topic_id, rel="contains")
    G.add_edge(topic_id, control_id, rel="has_control")

    # Standard mappings
    standards = {
        "ISO42001": row["ISO42001"],
        "ISO27001": row["ISO27001"],
        "ISO27701": row["ISO27701"],
        "EU_AI_ACT": row["EU AI ACT"],
        "NIST_RMF": row["NIST RMF"],
        "SOC2": row["SOC2"],
    }

    for std_name, std_ref in standards.items():
        if pd.notna(std_ref) and str(std_ref).strip():
            std_id = f"std_{std_name}_{std_ref}"
            if std_id not in G:
                G.add_node(
                    std_id, type="standard", standard=std_name, reference=std_ref
                )
            G.add_edge(control_id, std_id, rel="maps_to")

print(f"✓ Graph built: {G.number_of_nodes()} nodes, {G.number_of_edges()} edges")

# Show breakdown
node_types = {}
for node, data in G.nodes(data=True):
    node_type = data.get("type", "unknown")
    node_types[node_type] = node_types.get(node_type, 0) + 1

print("\nNode breakdown:")
for ntype, count in node_types.items():
    print(f"  {ntype}: {count}")

✓ Graph built: 305 nodes, 304 edges

Node breakdown:
  control: 44
  domain: 12
  topic: 44
  standard: 205


## Create `turingdb` graph

In [25]:
# Get list of available graphs
list_graphs = client.query("LIST GRAPH").loc[:, 0].tolist()

In [26]:
# Set graph name
graph_name_prefix = example_name
graph_name_nb_suffix = str(
    max(
        [
            int(re.sub(graph_name_prefix, "", g))
            for g in list_graphs
            if g.startswith(graph_name_prefix)
            and re.sub(graph_name_prefix, "", g).isdigit()
        ]
        + [0]
    )
    + 1
)
graph_name = graph_name_prefix + graph_name_nb_suffix
graph_name = re.sub("-", "_", graph_name)
graph_name

'AI_governance_control_mapping1'

In [27]:
%%time

# Set graph
client.query(f"CREATE GRAPH {graph_name}")
client.set_graph(graph_name)

# Create a new change on the graph
change = client.query("CHANGE NEW").loc[0, 0]
# TODO change to : client.new_change()
print(f"Current change {change}")

# Checkout into the change
client.checkout(change=change)

Current change 0


In [28]:
# Build CREATE command from networkx object
create_command = build_create_command_from_networkx(G)
print(f"Cypher CREATE command :\n\n{100 * '*'}\n{create_command}\n{100 * '*'}")

Cypher CREATE command :

****************************************************************************************************
CREATE (n0:Control {"id":"control_0", "type":"control", "statement":"The organisation s executive leadership shall establish, document, and maintain formal accountability for AI governance through approved policies that align with organisational objectives and values. These policies shall be reviewed at planned intervals by executive leadership to ensure continued effectiveness and relevance. Executive leadership shall demonstrate active engagement in AI risk decisions and maintain ultimate accountability for the organisation s AI systems.", "domain":"Governance & Leadership", "master":"GL-1", "topic":"Executive Commitment and Accountability"}),
(n1:Domain {"id":"domain_Governance & Leadership", "type":"domain", "name":"Governance & Leadership"}),
(n2:Topic {"id":"topic_Executive Commitment and Accountability", "type":"topic", "name":"Executive Commitment and Ac

In [29]:
%%time

# Run CREATE command
client.query(create_command)

# Commit the change
client.query("COMMIT")
client.query("CHANGE SUBMIT")

# Checkout into main
client.checkout()

CPU times: user 3.23 ms, sys: 860 μs, total: 4.09 ms
Wall time: 157 ms


<div class="alert alert-block alert-info">
    <h2>
        Visualize your graph in TuringDB Graph Visualizer ! Now that your instance is running:
    </h2>
    <h3>
        <ul>
            <li>Go to <a href="https://console.turingdb.ai/databases">TuringDB Console - Database Instances</a></li>
            <li>In your current instance panel, click on "Open Visualizer" button</li>
            <li>Visualizer opens, now you can choose your graph in the dropdown menu at the top-right corner</li>
        </ul>
        You can then play with your graph and visualize the nodes you want !
    </h3>
</div>

# Query TuringDB

## Use metaqueries to have insight on graph overall structure

<h3>
    To learn more about 📮 Metaqueries, please check TuringDB documentation on this <a href="https://turingdb.mintlify.app/query/cypher_subset#%F0%9F%93%AE-metaqueries">link</a>
</h3>

In [30]:
%%time

# CALL PROPERTIES() - returns a column of all the different node and edge properties and their types in the database
command = """
CALL PROPERTIES()
"""
df = client.query(command)
if df.empty:
    print("No result found")
else:
    df.columns = ["Property_ID", "Property_name", "Property_type"]
    display(df)

Unnamed: 0,Property_ID,Property_name,Property_type
0,0,id,String
1,1,type,String
2,2,statement,String
3,3,domain,String
4,4,master,String
5,5,topic,String
6,6,name,String
7,7,standard,String
8,8,reference,String
9,9,rel,String


CPU times: user 4.55 ms, sys: 101 μs, total: 4.65 ms
Wall time: 4.04 ms


In [31]:
%%time

# CALL LABELS () - returns a column of all the different node labels
command = """
CALL LABELS()
"""
df = client.query(command)
if df.empty:
    print("No result found")
else:
    df.columns = ["Node_type_ID", "Node_label"]
    display(df)

Unnamed: 0,Node_type_ID,Node_label
0,0,Control
1,1,Domain
2,2,Topic
3,3,Standard


CPU times: user 3.63 ms, sys: 119 μs, total: 3.75 ms
Wall time: 3.25 ms


In [32]:
%%time

# CALL EDGETYPES() - returns a column of all the different edge types (edge equivalent of node labels)
command = """
CALL EDGETYPES()
"""
df = client.query(command)
if df.empty:
    print("No result found")
else:
    df.columns = ["Edge_type_ID", "Edge_label"]
    display(df)

Unnamed: 0,Edge_type_ID,Edge_label
0,0,CONNECTED


CPU times: user 3.55 ms, sys: 0 ns, total: 3.55 ms
Wall time: 3.07 ms


In [33]:
%%time

# CALL LABELSETS() - returns a two columns describing combinations of node labels
command = """
CALL LABELSETS()
"""
df = client.query(command)
if df.empty:
    print("No result found")
else:
    df.columns = ["Node_type_ID", "Node_label"]
    display(df)

Unnamed: 0,Node_type_ID,Node_label
0,0,Control
1,1,Domain
2,2,Topic
3,3,Standard


CPU times: user 2.79 ms, sys: 959 μs, total: 3.75 ms
Wall time: 3.26 ms


## Simple queries

In [34]:
%%time

# Match all edges and return them
command = """
MATCH (n)-[e]-(m)
RETURN n.id, e, m.id
"""
df = client.query(command)
if df.empty:
    print("No result found")
else:
    df.columns = get_return_statements(command)
    display(df)

Unnamed: 0,n.id,e,m.id
0,control_34,0,std_ISO27701_12.2.2 A.7.4.3
1,control_34,1,std_EU_AI_ACT_26.5 72.1 72.2 72.3 72.4
2,control_34,2,std_NIST_RMF_Measure 1.2 Measure 2.4 Manage 2....
3,control_34,3,std_SOC2_CC4.1 CC4.2 A1.1 A1.2
4,control_34,4,std_ISO27001_8.1-8.3 A.12.3 A.12.6 A.17.1 A.17.2
...,...,...,...
299,"topic_Transparency, Disclosure and Reporting",299,control_8
300,topic_Robustness,300,control_24
301,topic_Data Security,301,control_19
302,"topic_Security Governance, Architecture and En...",302,control_16


CPU times: user 7.29 ms, sys: 856 μs, total: 8.15 ms
Wall time: 7.41 ms


# Load the embedding model

In [35]:
%%time

# This will convert text to vectors
model = SentenceTransformer("paraphrase-MiniLM-L3-v2")
print(f"✓ Model loaded: {model.get_sentence_embedding_dimension()} dimensions")

✓ Model loaded: 384 dimensions
CPU times: user 174 ms, sys: 20.3 ms, total: 195 ms
Wall time: 1.54 s


# Build vector index on the graph

## Vector Search Implementation - Dense (semantic) search

#### How It Works

Each control is converted to a **384-dimensional vector** using a pre-trained language model (`paraphrase-MiniLM-L3-v2`).

**Search process:**
1. Convert user query to vector
2. Calculate cosine similarity with all control vectors
3. Rank by similarity score (0-1)
4. Return top-k most relevant controls

#### Why Vectors?

- **Semantic understanding**: "data protection" matches "privacy safeguards"
- **Handles synonyms**: "AI governance" finds "artificial intelligence oversight"
- **No keyword dependency**: Works even without exact term matches

In [36]:
from turingdb_kgsearch.save_embeddings import load_embeddings, save_embeddings

In [37]:
# Define embedding paths
embeddings_folder = "embeddings"

### Use the three different approaches

In [38]:
%%time

# Node-only embeddings
method = "node_only"
embeddings_file_path = f"{path_data}/{embeddings_folder}/embeddings_{method}.npz"

if os.path.exists(embeddings_file_path):
    print(f"Loading pre-computed embeddings (method {method})...")
    node_vectors_node_only, node_texts_node_only, metadata_node_only, _ = load_embeddings(embeddings_file_path)
else:
    print(f"Building (method {method}) embeddings (this may take a while)...")
    node_vectors_node_only, node_texts_node_only = build_node_only_embeddings(G, model)
    save_embeddings(
        node_vectors_node_only,
        node_texts_node_only,
        filepath=embeddings_file_path,
        embedding_type="dense"
    )

print("\n" + "=" * 80 + "\n")

Loading pre-computed embeddings (method node_only)...
✓ Embeddings loaded from: /home/dev/turingdb-kgsearch/notebooks/data/AI_governance_control_mapping/embeddings/embeddings_node_only.npz
  - Type: dense
  - Format: Dense
  - 305 vectors
  - Vector dimension: 384
  - Has texts: Yes
  - Has vectorizer: No


CPU times: user 19.6 ms, sys: 2.91 ms, total: 22.5 ms
Wall time: 21.4 ms


In [39]:
%%time

# Context-enriched embeddings
strategy = "heavy"  # "lightweight"
method = f"context_enriched_{strategy}"
embeddings_file_path = f"{path_data}/{embeddings_folder}/embeddings_{method}.npz"

if os.path.exists(embeddings_file_path):
    print(f"Loading pre-computed embeddings (method {method})...")
    node_vectors_context_heavy, node_texts_context_heavy, metadata, _ = load_embeddings(embeddings_file_path)
else:
    print(f"Building (method {method}) embeddings (this may take a while)...")
    node_vectors_context_heavy, node_texts_context_heavy = build_context_enriched_embeddings(
        G, model, strategy=strategy
    )
    save_embeddings(
        node_vectors_context_heavy,
        node_texts_context_heavy,
        filepath=embeddings_file_path,
        embedding_type="dense"
    )

print("\n" + "=" * 80 + "\n")

Loading pre-computed embeddings (method context_enriched_heavy)...
✓ Embeddings loaded from: /home/dev/turingdb-kgsearch/notebooks/data/AI_governance_control_mapping/embeddings/embeddings_context_enriched_heavy.npz
  - Type: dense
  - Format: Dense
  - 305 vectors
  - Vector dimension: 384
  - Has texts: Yes
  - Has vectorizer: No


CPU times: user 21.2 ms, sys: 1.81 ms, total: 23 ms
Wall time: 22 ms


In [40]:
%%time

# Type-specific context enrichment
method = f"smart_enriched"
embeddings_file_path = f"{path_data}/{embeddings_folder}/embeddings_{method}.npz"

if os.path.exists(embeddings_file_path):
    print(f"Loading pre-computed embeddings (method {method})...")
    node_vectors_context_heavy, node_texts_context_heavy, metadata, _ = load_embeddings(embeddings_file_path)
else:
    print(f"Building (method {method}) embeddings (this may take a while)...")
    node_vectors_smart, node_texts_smart = build_smart_enriched_embeddings(
        G, model
    )
    save_embeddings(
        node_vectors_context_heavy,
        node_texts_context_heavy,
        filepath=embeddings_file_path,
        embedding_type="dense"
    )

print("\n" + "=" * 80 + "\n")

Loading pre-computed embeddings (method smart_enriched)...
✓ Embeddings loaded from: /home/dev/turingdb-kgsearch/notebooks/data/AI_governance_control_mapping/embeddings/embeddings_smart_enriched.npz
  - Type: dense
  - Format: Dense
  - 305 vectors
  - Vector dimension: 384
  - Has texts: Yes
  - Has vectorizer: No


CPU times: user 12.4 ms, sys: 2.08 ms, total: 14.5 ms
Wall time: 13.4 ms


## Vector Search Implementation - Sparse (keyword) search

In [41]:
%%time

# Choose from which dense method take the created node_texts
node_texts = node_texts_context_heavy

# Build sparse embeddings
method = "sparse"
embeddings_file_path = f"{path_data}/{embeddings_folder}/embeddings_{method}.npz"

if os.path.exists(embeddings_file_path):
    print(f"Loading pre-computed embeddings (method {method})...")
    node_vectors_sparse, node_texts, metadata, vectorizer_sparse = load_embeddings(embeddings_file_path)
else:
    print(f"Building (method {method}) embeddings (this may take a while)...")
    node_vectors_sparse, _, vectorizer_sparse = build_sparse_embeddings(
        G=G, node_texts=node_texts  # max_features=500, ngram_range=(1, 2)
    )
    save_embeddings(
        node_vectors_sparse,
        node_texts,
        filepath=embeddings_file_path,
        embedding_type="sparse",
        vectorizer=vectorizer_sparse
    )

print("\n" + "=" * 80 + "\n")

Loading pre-computed embeddings (method sparse)...
✓ Embeddings loaded from: /home/dev/turingdb-kgsearch/notebooks/data/AI_governance_control_mapping/embeddings/embeddings_sparse.npz
  - Type: sparse
  - Format: Sparse
  - 305 vectors
  - Vector dimension: 500
  - Has texts: Yes
  - Has vectorizer: Yes


CPU times: user 20.5 ms, sys: 1.88 ms, total: 22.4 ms
Wall time: 21.3 ms


## Node2Vec

In [42]:
%%time

# Build Node2Vec embeddings
method = "node2vec"
embeddings_file_path = f"{path_data}/{embeddings_folder}/embeddings_{method}.npz"

if os.path.exists(embeddings_file_path):
    print(f"Loading pre-computed embeddings (method {method})...")
    node_vectors_node2vec, _, metadata, _ = load_embeddings(embeddings_file_path)
else:
    print(f"Building (method {method}) embeddings (this may take a while)...")
    node_vectors_node2vec = build_node2vec_embeddings(G, dimensions=384)
    save_embeddings(
        node_vectors_node2vec,
        None,
        embeddings_file_path,
        "node2vec"
    )

print("\n" + "=" * 80 + "\n")

Loading pre-computed embeddings (method node2vec)...
✓ Embeddings loaded from: /home/dev/turingdb-kgsearch/notebooks/data/AI_governance_control_mapping/embeddings/embeddings_node2vec.npz
  - Type: node2vec
  - Format: Dense
  - 305 vectors
  - Vector dimension: 384
  - Has texts: No
  - Has vectorizer: No


CPU times: user 7.93 ms, sys: 0 ns, total: 7.93 ms
Wall time: 6.87 ms


# Search capabilities

## Vector Search Implementation - Dense (semantic) search

Find controls relevant to any natural language query:

```python
results = search("data privacy protection", k=5)
```

**Use cases:**
- Exploratory research: "What controls cover AI model governance?"
- Concept-based lookup: "security monitoring requirements"
- Gap analysis: "What's missing in our risk management?"

### Query

In [43]:
%%time

# Try different queries
queries = [
    "Companies investing in AI",
    "Events related to AI regulation",
    "AI providers",
    "companies with risks of failure due to cloud providers",
    "AI model governance",
    "security monitoring requirements",
    "companies with risks linked to goods transportations/delivery",
]

for query in queries:
    print(f"\n{'=' * 80}")
    print(f"QUERY: '{query}'")
    print("=" * 80)
    results = dense_search(
        query=query,
        node_vectors=node_vectors_context_heavy,
        node_texts=node_texts,
        G=G,
        model=model,
        k=3
    )
    print_results(results)


QUERY: 'Companies investing in AI'

Found 3 results:

1. Similarity: 0.4420
   Node: topic_Explainability and Interpretability
   Type: topic
   Text: Topic: Explainability and Interpretability. in domain Safe Responsible AI. includes: The organisation shall ensure AI systems' decisions and outputs can be appropriately explained and i...

2. Similarity: 0.4398
   Node: topic_Robustness
   Type: topic
   Text: Topic: Robustness. in domain Safe Responsible AI. includes: The organisation shall ensure AI systems demonstrate consistent and reliable performance across thei...

3. Similarity: 0.4343
   Node: topic_Fairness and Bias Management
   Type: topic
   Text: Topic: Fairness and Bias Management. in domain Safe Responsible AI. includes: The organisation shall implement processes to identify, assess, and mitigate unfair bias in AI syste...

QUERY: 'Events related to AI regulation'

Found 3 results:

1. Similarity: 0.5923
   Node: control_22
   Type: control
   Text: The organisation sha

## Vector Search Implementation - Sparse (keyword) search

In [44]:
%%time

# Try different queries
for query in queries:
    print(f"\n{'=' * 80}")
    print(f"QUERY: '{query}'")
    print("=" * 80)
    results = sparse_search(
        query=query,
        node_vectors=node_vectors_sparse,
        node_texts=node_texts,  # same node texts used previously in dense search
        vectorizer=vectorizer_sparse,
        G=G,
        k=3
    )
    print_results(results)


QUERY: 'Companies investing in AI'

Found 3 results:

1. Similarity: 0.3420
   Node: control_39
   Type: control
   Text: The organisation shall ensure AI systems are designed and operated with appropriate transparency, enabling users and affected individuals to understand when they are interacting with AI, how the AI sy...
   Domain: Transparency & Communication
   Topic: AI System Transparency

2. Similarity: 0.2680
   Node: topic_Robustness
   Type: topic
   Text: Topic: Robustness. in domain Safe Responsible AI. includes: The organisation shall ensure AI systems demonstrate consistent and reliable performance across thei...

3. Similarity: 0.2591
   Node: domain_Safe Responsible AI
   Type: domain
   Text: Domain: Safe Responsible AI...

QUERY: 'Events related to AI regulation'

Found 3 results:

1. Similarity: 0.3105
   Node: topic_Event Logging
   Type: topic
   Text: Topic: Event Logging. in domain Operational Monitoring. includes: The organisation shall maintain comprehensive 

## Hybrid search: Best of Both Worlds

### The Problem

- **Dense (semantic) search**: Great for concepts, misses exact terms
- **Sparse (keyword) search**: Finds exact matches, misses semantics

### The Solution

**Hybrid search** combines both approaches:

```
final_score = α × semantic_score + (1-α) × keyword_score
```

### Alpha Parameter Guide

| Alpha | Behavior | Best For |
|-------|----------|----------|
| 1.0 | Pure semantic | Conceptual queries |
| 0.7 | Favor semantics | General use (recommended) |
| 0.5 | Balanced | Mixed queries |
| 0.3 | Favor keywords | Technical lookups |
| 0.0 | Pure keywords | Exact term matching |

### When to Use What

**Dense (α=1.0)**
- Query: "What covers security?"
- Finds: Controls about protection, safeguards, defense

**Sparse (α=0.0)**
- Query: "ISO27001 A.8.24"
- Finds: Exact standard reference

**Hybrid (α=0.7)**
- Query: "NIST risk management frameworks"
- Finds: Both NIST references AND risk-related controls

### Query

In [45]:
%%time

# Try different queries
for query in queries:
    print(f"\n{'=' * 80}")
    print(f"QUERY: '{query}'")
    print("=" * 80)
    results = hybrid_search(
        query=query,
        G=G,
        dense_node_vectors=node_vectors_context_heavy,
        sparse_node_vectors=node_vectors_sparse,
        sparse_vectorizer=vectorizer_sparse,
        node_texts=node_texts,  # same node texts used for both dense and sparse
        model=model,
        k=3,
        alpha=0.7,  # 70% semantic, 30% keywords
    )
    print_results(results)


QUERY: 'Companies investing in AI'

Found 3 results:

1. Similarity: 0.7903
   Node: topic_Robustness
   Type: topic
   Text: Topic: Robustness. in domain Safe Responsible AI. includes: The organisation shall ensure AI systems demonstrate consistent and reliable performance across thei...

2. Similarity: 0.7783
   Node: topic_Explainability and Interpretability
   Type: topic
   Text: Topic: Explainability and Interpretability. in domain Safe Responsible AI. includes: The organisation shall ensure AI systems' decisions and outputs can be appropriately explained and i...

3. Similarity: 0.5863
   Node: topic_Safety
   Type: topic
   Text: Topic: Safety. in domain Safe Responsible AI. includes: The organisation shall establish and maintain processes to prevent AI systems from producing outputs...

QUERY: 'Events related to AI regulation'

Found 3 results:

1. Similarity: 0.7169
   Node: control_22
   Type: control
   Text: The organisation shall implement mechanisms for meaningful human

## Compare Dense vs Sparse vs Hybrid

In [46]:
%%time

# Limit results to specific node type (all node types by default)
k = 5
alpha = 0.7
node_type = None

for query in queries:
    compare_search_methods(
        query=query,
        G=G,
        dense_node_vectors=node_vectors_context_heavy,
        sparse_node_vectors=node_vectors_sparse,
        sparse_vectorizer=vectorizer_sparse,
        node_texts=node_texts,
        model=model,
        k=k,
        alpha=alpha,
        node_type=node_type
    )


QUERY: 'Companies investing in AI'

1. DENSE ONLY (Semantic):
--------------------------------------------------------------------------------
1. 0.442 | Topic: Explainability and Interpretability. in domain Safe Responsible AI. includes: The organisatio...
2. 0.440 | Topic: Robustness. in domain Safe Responsible AI. includes: The organisation shall ensure AI systems...
3. 0.434 | Topic: Fairness and Bias Management. in domain Safe Responsible AI. includes: The organisation shall...

2. SPARSE ONLY (Keywords):
--------------------------------------------------------------------------------
1. 0.342 | The organisation shall ensure AI systems are designed and operated with appropriate transparency, en...
2. 0.268 | Topic: Robustness. in domain Safe Responsible AI. includes: The organisation shall ensure AI systems...
3. 0.259 | Domain: Safe Responsible AI...

3. HYBRID (alpha=0.7):
--------------------------------------------------------------------------------
1. 0.875 (raw: D:0.44/S:0

# Get subgraph around query results and visualise

### Graph-Based Context Retrieval

Get full context around relevant controls:

```python
subgraph = get_subgraph_around_query("risk management", k=3, hops=1)
```

**Returns:**
- Relevant controls
- Related topics and domains
- All mapped standard references
- Network connections

In [47]:
query = queries[1]

In [48]:
print(f"Query: '{query}'")

method_to_test = "hybrid"
possible_method_to_test = ["dense", "sparse", "hybrid"]
if method_to_test not in possible_method_to_test:
    raise ValueError(f"method_to_test has to be one of {possible_method_to_test}")

if method_to_test == "dense":
    # Dense search
    subgraph, results = get_subgraph_around_query(
        query=query,
        G=G,
        search_func=dense_search,
        search_params={
            "node_vectors": node_vectors_context_heavy,
            "node_texts": node_texts,
            "model": model
        },
        k=3,
        hops=1,
    )

elif method_to_test == "sparse":
    # Sparse search
    subgraph, results = get_subgraph_around_query(
        query=query,
        G=G,
        search_func=sparse_search,
        search_params={
            "sparse_vectors": node_vectors_sparse,
            "sparse_vectorizer": vectorizer_sparse,
            "node_texts": node_texts
        },
        k=3,
        hops=1,
    )

elif method_to_test == "hybrid":
    # Hybrid search
    subgraph, results = get_subgraph_around_query(
        query=query,
        G=G,
        search_func=hybrid_search,
        search_params={
            "dense_node_vectors": node_vectors_context_heavy,
            "sparse_node_vectors": node_vectors_sparse,
            "sparse_vectorizer": vectorizer_sparse,
            "node_texts": node_texts,  # same node texts used for both dense and sparse
            "model": model,
            "alpha": 0.7
        },
        k=3,
        hops=1,
    )

Query: 'Events related to AI regulation'


In [49]:
print(f"Query: '{query}'")
print(
    f"Subgraph: {subgraph.number_of_nodes()} nodes, {subgraph.number_of_edges()} edges"
)

# Show what's in the subgraph
types_in_subgraph = {}
for node, data in subgraph.nodes(data=True):
    ntype = data.get("type", "unknown")
    types_in_subgraph[ntype] = types_in_subgraph.get(ntype, 0) + 1

print("\nSubgraph composition:")
for ntype, count in types_in_subgraph.items():
    print(f"  {ntype}: {count}")

Query: 'Events related to AI regulation'
Subgraph: 11 nodes, 10 edges

Subgraph composition:
  standard: 6
  topic: 2
  control: 2
  domain: 1


# Visualization Capabilities

## Interactive Visualization (PyVis)

- Hover to see full control text
- Click to explore connections
- Physics-based layout
- Filterable and zoomable

**Features:**
- Color legend for node types
- Relevance-based sizing
- Relationship labels on edges
- Responsive browser-based interface

In [50]:
print(f"Query: '{query}'")

# With hybrid search
extract_and_visualize_subgraph(
    query=query,
    G=G,
    search_func=hybrid_search,
    search_params={
        "dense_node_vectors": node_vectors_context_heavy,
        "sparse_node_vectors": node_vectors_sparse,
        "sparse_vectorizer": vectorizer_sparse,
        "node_texts": node_texts,  # same node texts used for both dense and sparse
        "model": model,
        "alpha": 0.7
    },
    k=4,
    hops=2,
    output_file=f"{example_name}.html"  # f"{path_data}/{example_name}.html",
)

Query: 'Events related to AI regulation'
Auto-generated color map for 4 node types:
  control: #ff6b6b
  domain: #4ecdc4
  standard: #95e1d3
  topic: #ffe66d
✓ Interactive graph saved to: AI_governance_control_mapping.html
  Nodes: 24
  Edges: 23


# Workflow

In [51]:
%%time

print(f"Query: '{query}'")

semantic_results, expanded, subgraph = search_and_expand_hybrid_filtered(
    query=query,
    G=G,
    node_vectors=node_vectors_context_heavy,
    node_texts=node_texts,
    sparse_vectors=node_vectors_sparse,
    sparse_vectorizer=vectorizer_sparse,
    structural_vectors=node_vectors_node2vec,
    model=model,
    k_search=5,
    max_hops=10,
    min_structural_sim=0.1,  # Must be structurally similar
    min_semantic_sim=0.1,  # AND semantically relevant to query
    structural_weight=0.5,  # 50-50 balance
    alpha=0.7,  # Weight alpha to attribute to semantic (dense) search, (1 - alpha) for keyword (sparse) search
)

report = generate_report_hybrid_workflow_results(semantic_results, expanded)
print(report)

Query: 'Events related to AI regulation'
Stage 1: Hybrid search for 'Events related to AI regulation'...
--------------------------------------------------------------------------------

Found 5 semantically relevant seed nodes:
  1. control_22 (score: 0.739)
     The organisation shall implement mechanisms for meaningful human oversight of AI systems, ensuring h...
  2. topic_Human Oversight and Intervention (score: 0.602)
     Topic: Human Oversight and Intervention. in domain Safe Responsible AI. includes: The organisation s...
  3. topic_Safety (score: 0.547)
     Topic: Safety. in domain Safe Responsible AI. includes: The organisation shall establish and maintai...
  4. topic_Explainability and Interpretability (score: 0.490)
     Topic: Explainability and Interpretability. in domain Safe Responsible AI. includes: The organisatio...
  5. control_8 (score: 0.432)
     The organisation shall implement mechanisms to ensure appropriate transparency regarding AI systems,...

Stage 2: H

In [76]:
# Access subgraph data
print(f"\nSubgraph ({subgraph}) node attributes:")
for node in list(subgraph.nodes()):
    print(f"\n* {node}:")
    for key, value in subgraph.nodes[node].items():
        if key not in ["statement"]:  # Skip long text
            print(f"  {key}: {value}")

# Export subgraph if needed
# nx.write_gml(subgraph, "filtered_subgraph.gml")

# Visualise subgraph
visualize_graph_with_pyvis(subgraph, output_file=f"{example_name}_expanded.html")


Subgraph (DiGraph with 22 nodes and 20 edges) node attributes:

* control_23:
  type: control
  domain: Safe Responsible AI
  master: RS-2
  topic: Safety
  is_seed: False
  seed_node: topic_Safety
  hop_distance: 1
  direction: successor
  structural_similarity: 0.978790819644928
  semantic_similarity: 0.511070966720581
  combined_score: 0.7449308633804321

* std_SOC2_CC2.3 P1.1 P1.2 P1.3:
  type: standard
  standard: SOC2
  reference: CC2.3 P1.1 P1.2 P1.3
  is_seed: False
  seed_node: control_8
  hop_distance: 1
  direction: successor
  structural_similarity: 0.9560324549674988
  semantic_similarity: 0.16102871298789978
  combined_score: 0.5585305690765381

* control_25:
  type: control
  domain: Safe Responsible AI
  master: RS-4
  topic: Explainability and Interpretability
  is_seed: False
  seed_node: topic_Explainability and Interpretability
  hop_distance: 1
  direction: successor
  structural_similarity: 0.9775872230529785
  semantic_similarity: 0.5295003652572632
  combined_s

# Results exploration

## Graph statistics

In [54]:
# Usage
stats = get_subgraph_stats(
    subgraph, include_node_breakdown=True, include_centrality=True, include_paths=True
)
print_subgraph_stats(stats, verbose=True)


SUBGRAPH STATISTICS

📊 Basic Metrics:
   nodes: 22
   edges: 20
   density: 0.04329004329004329
   is_connected: False

🔗 Degree Statistics:
   average: 1.82
   max: 4.00
   min: 1.00
   median: 1.00

🏷️  Node Types:
   control: 4
   standard: 12
   domain: 2
   topic: 4

🎯 Node Roles:
   seed: 3
   found: 19
   intermediate: 0

⭐ Most Central Nodes:
   By Degree:
      control_23: 0.190
      control_25: 0.190
      control_8: 0.190
      control_22: 0.190
      domain_Safe Responsible AI: 0.143
   By Betweenness:
      control_23: 0.014
      control_25: 0.014
      control_8: 0.014
      control_22: 0.014
      topic_Transparency, Disclosure and Reporting: 0.010

🔗 Edge Types:
   maps_to: 12
   contains: 4
   has_control: 4

📈 Relevance Scores:
   avg_score: 0.641
   max_score: 0.782
   min_score: 0.529
   nodes_with_scores: 19



## Node importance ranking

In [64]:
node_to_check = "topic_Safety"
focus_type = "topic"

In [65]:
# Usage examples
rankings = rank_nodes_by_importance(
    subgraph,
    methods="all",  # or ['pagerank', 'degree', 'relevance']
    top_k=10,
    aggregate="average",  # or 'max' or {'pagerank': 0.4, 'degree': 0.3, 'relevance': 0.3}
)

print_node_rankings(rankings, subgraph, show_details=True)


NODE IMPORTANCE RANKINGS

Ranked 22 nodes using: pagerank, degree, betweenness, eigenvector, relevance


📊 PAGERANK (Top 5):
   1. control_8 (control): 0.0713
   2. control_23 (control): 0.0580
   3. control_25 (control): 0.0580
   4. control_22 (control): 0.0580
   5. topic_Transparency, Disclosure and Reporting (topic): 0.0513

📊 DEGREE (Top 5):
   1. control_23 (control): 0.1905
   2. control_25 (control): 0.1905
   3. control_8 (control): 0.1905
   4. control_22 (control): 0.1905
   5. domain_Safe Responsible AI (domain): 0.1429

📊 BETWEENNESS (Top 5):
   1. control_23 (control): 0.0143
   2. control_25 (control): 0.0143
   3. control_8 (control): 0.0143
   4. control_22 (control): 0.0143
   5. topic_Transparency, Disclosure and Reporting (topic): 0.0095

❌ CLOSENESS: graph is not connected enough to compute closeness centrality

📊 EIGENVECTOR (Top 5):
   1. std_SOC2_CC2.3 P1.1 P1.2 P1.3 (standard): 0.2887
   2. std_EU_AI_ACT_50.2 50.3 50.4 50.5 (standard): 0.2887
   3. std_NIST_R

In [66]:
# Compare specific node
# Check if node exists first
if node_to_check in subgraph:
    node_comparison = compare_node_importance(node_to_check, rankings)
    if node_comparison:
        print(f"\nHow {node_to_check} ranks:")
        for method, info in node_comparison.items():
            print(f"  {method}: #{info['rank']} (score: {info['score']:.3f})")
    else:
        print(f"\{node_to_check} not in top rankings")
else:
    print(f"\{node_to_check} not in subgraph")


How topic_Safety ranks:
  degree: #7 (score: 0.095)
  betweenness: #6 (score: 0.010)


In [67]:
filtered_rankings, full_rankings = rank_nodes_by_importance_with_context(
    subgraph,
    focus_type=focus_type,
    methods="all",
    top_k=10,
    aggregate="average",  # or 'max' or {'pagerank': 0.4, 'degree': 0.3, 'relevance': 0.3}
)

print("\n=== CONTROL NODE RANKINGS (computed on full graph) ===")
print_node_rankings(filtered_rankings, subgraph)


=== CONTROL NODE RANKINGS (computed on full graph) ===

NODE IMPORTANCE RANKINGS

Ranked 22 nodes using: pagerank, degree, betweenness, eigenvector, relevance


📊 PAGERANK (Top 5):
   1. topic_Transparency, Disclosure and Reporting (topic): 0.0513

📊 DEGREE (Top 5):
   1. topic_Transparency, Disclosure and Reporting (topic): 0.0952
   2. topic_Safety (topic): 0.0952
   3. topic_Human Oversight and Intervention (topic): 0.0952
   4. topic_Explainability and Interpretability (topic): 0.0952

📊 BETWEENNESS (Top 5):
   1. topic_Transparency, Disclosure and Reporting (topic): 0.0095
   2. topic_Safety (topic): 0.0095
   3. topic_Human Oversight and Intervention (topic): 0.0095
   4. topic_Explainability and Interpretability (topic): 0.0095

❌ CLOSENESS: graph is not connected enough to compute closeness centrality

📊 EIGENVECTOR (Top 5):

📊 RELEVANCE (Top 5):
   1. topic_Human Oversight and Intervention (topic): 0.7743
   2. topic_Transparency, Disclosure and Reporting (topic): 0.7289

⭐ C

In [68]:
# Debug why rankings might be zero for some nodes in some metrics
diagnose_rankings(subgraph, focus_type)

Total nodes: 22
Filtered nodes (topic): 4
Filtered subgraph connected: False
Filtered subgraph edges: 0
Isolated nodes: 4/4

   Recommendation: Don't filter by node_type for ranking.
   Rank on full graph, then filter results for display.


## Explain node retrieval

In [69]:
# For a specific node
explanation = explain_retrieval(
    node_id=node_to_check,
    query=query,
    subgraph=subgraph,
    node_vectors=node_vectors_context_heavy,
    sparse_vectors=node_vectors_sparse,
    sparse_vectorizer=vectorizer_sparse,
    structural_vectors=node_vectors_node2vec,
    model=model,
)
print_explanation(explanation, verbose=True)


RETRIEVAL EXPLANATION: topic_Safety

Node Type: topic
Query: 'Events related to AI regulation'

🎯 Reason: SEED NODE
   Initial search score: 0.547
   - Semantic component: 0.564
   - Keyword component: 0.103

📝 Semantic Similarity: 0.564
   Moderate similarity

🔍 Keyword Matching: 0.103
   Very low similarity
   Top TF-IDF terms in node:
      - prevent ai: 0.275
      - prevent: 0.261
      - ai: 0.259

🕸️  Structural Similarity:
   Shares graph structure with seed nodes
   Similar to seed nodes:
      - topic_Explainability and Interpretability: 0.883
      - control_8: 0.836

📄 Content Preview:
   Safety



In [70]:
explanations = explain_top_results(
    query=query,
    results=semantic_results,
    subgraph=subgraph,
    node_vectors=node_vectors_context_heavy,
    sparse_vectors=node_vectors_sparse,
    sparse_vectorizer=vectorizer_sparse,
    structural_vectors=node_vectors_node2vec,
    model=model,
    top_k=3,
)


RETRIEVAL EXPLANATION: control_22

Node Type: control
Query: 'Events related to AI regulation'

🎯 Reason: FOUND VIA GRAPH TRAVERSAL
   From seed: topic_Human Oversight and Intervention
   Distance: 1 hops (successor)
   Structural similarity: 0.972
   Semantic similarity: 0.592
   Combined score: 0.782

📝 Semantic Similarity: 0.592
   Moderate similarity

🔍 Keyword Matching: 0.115
   Very low similarity
   Matching terms: to, ai
   Top TF-IDF terms in node:
      - maps: 0.366
      - human: 0.335
      - oversight: 0.321

🕸️  Structural Similarity:
   Shares graph structure with seed nodes
   Similar to seed nodes:
      - topic_Safety: 0.889
      - topic_Explainability and Interpretability: 0.885
      - control_8: 0.843

📄 Content Preview:
   The organisation shall implement mechanisms for meaningful human oversight of AI systems, ensuring humans maintain appropriate control over AI decision-making. This includes clearly defined procedures...


RETRIEVAL EXPLANATION: topic_Human O

# Take exploration results and ask LLM to answer original client query only using subgraph

In [71]:
api_keys = {
    "Anthropic": os.getenv("ANTHROPIC_API_KEY"),
    "Mistral": os.getenv("MISTRAL_API_KEY"),
    "OpenAI": os.getenv("OPENAI_API_KEY"),
}

## Create prompt

In [72]:
print(f"Query: '{query}'")

# Complete prompt
prompt = create_llm_prompt_with_graph(
    query=query,
    subgraph=subgraph,
    report=report,
    format="natural",  # or 'markdown'
)

# Send to LLM
print(prompt)

Query: 'Events related to AI regulation'
# Graph-Based Query Response

## User Query
"Events related to AI regulation"

## Search Results
HYBRID-FILTERED WORKFLOW RESULTS

1. SEED NODE (Hybrid Search Match):
   Node: control_22
   Semantic Score: 0.739
   Text: The organisation shall implement mechanisms for meaningful human oversight of AI systems, ensuring humans maintain appropriate control over AI decisio...
   Domain: Safe Responsible AI
   Topic: Human Oversight and Intervention

   HYBRID-FILTERED NEIGHBORS:
   (Must pass BOTH structural AND semantic thresholds)
   1. topic_Human Oversight and Intervention (predecessor, 1 hops)
      Combined: 0.774
      └─ Structural: 0.972
      └─ Semantic: 0.577
      Type: topic
      Topic: Human Oversight and Intervention. in domain Safe Responsible AI. includes...
   2. domain_Safe Responsible AI (predecessor, 2 hops)
      Combined: 0.711
      └─ Structural: 0.958
      └─ Semantic: 0.463
      Type: domain
      Domain: Safe Responsi

## Query LLM

In [73]:
%%time

provider = "Anthropic"

result = query_llm(
    prompt=prompt,
    # system_prompt=system_prompt,
    provider=provider,
    api_key=api_keys[provider],
    temperature=0.2,
)

CPU times: user 192 ms, sys: 27.8 ms, total: 220 ms
Wall time: 10.9 s


In [74]:
display(Markdown(result))

Based on the graph data provided, here's an analysis of events related to AI regulation:

Key Insights on AI Regulation:

1. Regulatory Standards and Frameworks:
- The EU AI Act (std_EU_AI_ACT_*) appears to be a central standard across multiple regulatory aspects
- Multiple standards are referenced, including:
  - EU AI Act
  - SOC2
  - NIST Risk Management Framework (NIST_RMF)
  - ISO27001

2. Primary Regulatory Focus Areas:
a) Human Oversight and Intervention
- Ensures meaningful human control over AI systems
- Highlighted by control_22: "implement mechanisms for meaningful human oversight"
- Mapped to EU AI Act sections 14.1-14.5

b) Safety
- Preventing AI systems from producing harmful outputs
- Control_23 focuses on establishing processes to prevent negative AI system outcomes
- Mapped to EU AI Act sections 5.1, 5.2, 9.1-9.3

c) Transparency and Reporting
- Mechanisms for clear AI system disclosure
- Control_8 emphasizes transparency, notification of AI use
- Mapped to EU AI Act sections 50.1-50.5

d) Explainability and Interpretability
- Ensuring AI decisions can be explained
- Control_25 focuses on making AI decisions comprehensible
- Mapped to EU AI Act sections 50.2-50.5

3. Regulatory Domains:
- Safe Responsible AI (primary domain)
- Regulatory Operations

4. Interconnected Regulatory Approach:
- Multiple standards and controls are interconnected
- Each focus area (oversight, safety, transparency) has specific controls
- Standards like EU AI Act provide comprehensive guidelines across these areas

Emerging Patterns:
- Strong emphasis on human-centric AI regulation
- Multi-layered approach covering technical, ethical, and operational aspects
- Consistent focus on preventing potential harm and ensuring responsible AI deployment

Limitations of Analysis:
- This interpretation is strictly based on the provided graph data
- Detailed specifics of each regulation are not fully expanded in this representation

In [75]:
print("Notebook finished !")

Notebook finished !
