# SHACL Validation Example

This notebook demonstrates how to use SHACL validation in K-GAP to validate RDF data in GraphDB.

## Prerequisites

1. GraphDB must be running and accessible
2. SHACL shapes must be loaded into the repository
3. Data to validate must be present in the repository

## What is SHACL?

SHACL (Shapes Constraint Language) is a W3C standard for validating RDF graphs against a set of conditions. It allows you to define the expected structure and constraints of your RDF data.

In [None]:
# Import the SHACL validation module
import sys
sys.path.append('/workspace')  # Add workspace to path

from kgap_shacl import validate_repository, ValidationReport

## Example 1: Validate Entire Repository

Validate all data in the default repository against all SHACL shapes.

In [None]:
# Validate the entire repository
report = validate_repository()

# Print the report
report.print_report()

## Example 2: Check Validation Status

Check if the validation passed and get the number of violations.

In [None]:
report = validate_repository()

if report.conforms:
    print("✓ Validation PASSED - Data conforms to all SHACL shapes")
else:
    print(f"✗ Validation FAILED - Found {len(report.violations)} violations")
    print("\nViolations:")
    for i, violation in enumerate(report.violations, 1):
        print(f"  {i}. {violation}")

## Example 3: Validate a Specific Named Graph

If your data is organized in named graphs, you can validate a specific graph.

In [None]:
# Validate a specific named graph
graph_iri = "http://example.org/my-data-graph"
report = validate_repository(named_graph=graph_iri)

print(f"Validating graph: {graph_iri}")
report.print_report()

## Example 4: Load Sample Data and Shapes

Let's load some sample SHACL shapes and data to demonstrate validation.

In [None]:
import requests
import os

# Configuration
graphdb_url = os.getenv('GDB_BASE', 'http://graphdb:7200/')
repository = os.getenv('GDB_REPO', 'kgap')
statements_url = f"{graphdb_url}repositories/{repository}/statements"

# Load SHACL shapes
person_shape = """@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.org/> .

ex:PersonShape
    a sh:NodeShape ;
    sh:targetClass foaf:Person ;
    sh:property [
        sh:path foaf:name ;
        sh:minCount 1 ;
        sh:datatype xsd:string ;
    ] ;
    sh:property [
        sh:path foaf:age ;
        sh:maxCount 1 ;
        sh:datatype xsd:integer ;
        sh:minInclusive 0 ;
        sh:maxInclusive 150 ;
    ] .
"""

# Load the shape
response = requests.post(
    statements_url,
    headers={'Content-Type': 'text/turtle'},
    data=person_shape
)
print(f"Shapes loaded: {response.status_code}")

# Load valid sample data
valid_data = """@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.org/> .

ex:alice a foaf:Person ;
    foaf:name "Alice" ;
    foaf:age 30 .
"""

response = requests.post(
    statements_url,
    headers={'Content-Type': 'text/turtle'},
    data=valid_data
)
print(f"Valid data loaded: {response.status_code}")

# Load invalid sample data (missing required name)
invalid_data = """@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix ex: <http://example.org/> .

ex:bob a foaf:Person ;
    foaf:age 25 .
"""

response = requests.post(
    statements_url,
    headers={'Content-Type': 'text/turtle'},
    data=invalid_data
)
print(f"Invalid data loaded: {response.status_code}")

## Example 5: Run Validation on Sample Data

Now validate the data we just loaded. This should find a violation (Bob has no name).

In [None]:
report = validate_repository()
report.print_report()

# Should show a violation for ex:bob missing foaf:name

## Example 6: Validate Different Repository

You can specify a different repository name if you have multiple repositories.

In [None]:
# Validate a different repository
report = validate_repository(repository='other-repo')
report.print_report()

## Example 7: Get Different Report Formats

SHACL validation reports can be retrieved in different RDF formats.

In [None]:
# Get report in JSON-LD format
report_jsonld = validate_repository(accept_format='application/ld+json')
print("JSON-LD Report:")
print(report_jsonld.report_text[:500])  # Print first 500 chars

# Get report in RDF/XML format
report_xml = validate_repository(accept_format='application/rdf+xml')
print("\nRDF/XML Report:")
print(report_xml.report_text[:500])  # Print first 500 chars

## Cleanup

Remove the sample data and shapes we added.

In [None]:
# Remove all triples with ex: namespace
delete_query = """PREFIX ex: <http://example.org/>
DELETE WHERE {
    ?s ?p ?o .
    FILTER(STRSTARTS(STR(?s), "http://example.org/"))
}"""

response = requests.post(
    f"{graphdb_url}repositories/{repository}",
    headers={'Content-Type': 'application/sparql-update'},
    data=delete_query
)
print(f"Cleanup completed: {response.status_code}")

## Summary

This notebook demonstrated:
- How to validate an entire repository
- How to validate specific named graphs
- How to check validation status and violations
- How to load SHACL shapes and test data
- How to get validation reports in different formats

For more information:
- [SHACL Specification](https://www.w3.org/TR/shacl/)
- [GraphDB SHACL Documentation](https://graphdb.ontotext.com/documentation/latest/shacl-validation.html)
- [K-GAP GraphDB Component Documentation](../docs/components/graphdb.md)