# Synchronization of ontology files
In this notebook we can look at the current state of the synchronization system between ontology files from a version control system and a wikibase instance.

## Setup

Setting up the Python path to import the module locally and starting the logging system.

In [1]:
import logging
import os
import sys

# set up module paths for imports
module_path = os.path.abspath(os.path.join('..'))
hercules_sync_path = os.path.abspath(os.path.join('..', 'hercules_sync'))
sys.path.append(module_path)
sys.path.append(hercules_sync_path)

# start logging system and set logging level
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logging.info("Starting logger")

INFO:root:Starting logger


We can now start importing the dependencies from hercules_sync to execute the synchronization:
* The GitFile class encapsulates the diff of a file before and after changes have been made.
* We will use the GraphDiffSyncAlgorithm to obtain the differences from each ontology file. This algorithm parses the files into a graph and computes the difference between the original and the modified graph.
* The OntologySynchronizer class uses the algorithm to return the list of operations that need to be executed in the triplestore.

In [2]:
from hercules_sync.git import GitFile
from hercules_sync.synchronization import GraphDiffSyncAlgorithm, OntologySynchronizer

algorithm = GraphDiffSyncAlgorithm()
synchronizer = OntologySynchronizer(algorithm)

We are making use of a URIs factory mock to store the URIs of each entity from wikibase. Since we are starting from scratch in this example, we will reset its internal state:

In [3]:
from hercules_sync.external.uri_factory_mock import URIFactory

factory = URIFactory()
factory.reset_factory()

Now we will create an instance of the WikibaseAdapter class to connect to our wikibase instance where the synchronization will be executed:

In [4]:
from hercules_sync.triplestore import WikibaseAdapter
from secret import USERNAME, PASSWORD

mediawiki_api_url='http://156.35.94.149:8181/w/api.php'
sparql_endpoint_url='http://156.35.94.149:8282/proxy/wdqs/bigdata/namespace/wdq/sparql'

adapter = WikibaseAdapter(mediawiki_api_url, sparql_endpoint_url, USERNAME, PASSWORD)

http://156.35.94.149:8181/w/api.php
Successfully logged in as WikibaseAdmin


Finally, we will create this helper function that executes the operations returned by the algorithm in the triplestore:

In [5]:
def execute_synchronization(source_content, target_content, synchronizer, adapter):
    gitfile = GitFile(None, source_content, target_content)
    ops = synchronizer.synchronize(gitfile)
    for op in ops:
        op.execute(adapter)

## Adding content to the triplestore

In order to simulate the initial commit of an ontology file, we will start with an empty string as the source content, and the target content will represent the contents of the file after the commit:

In [6]:
source_content = ""

target_content = """
#################################################################
# Example ontology.                                             #
# This file is used to test the CI and synchronization systems. #
#################################################################

@prefix ex: <http://www.semanticweb.org/spitxa/ontologies/2020/1/asio-human-resource#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:AdministrativePersonnel rdf:type owl:Class ;
                           rdfs:subClassOf  ex:HumanResource ;
                           owl:disjointWith ex:ResearchPersonnel .

ex:HumanResource rdf:type owl:Class .

ex:ResearchPersonnel rdf:type owl:Class ;
                     rdfs:subClassOf ex:HumanResource .

ex:authors rdf:type owl:ObjectProperty .
"""

Before executing the synchronization, we are going to briefly inspect the operations created by the algorithm first:

In [7]:
gitfile = GitFile(None, source_content, target_content)
ops = synchronizer.synchronize(gitfile)

ops

[<hercules_sync.synchronization.operations.AdditionOperation at 0x11f04ec18>,
 <hercules_sync.synchronization.operations.AdditionOperation at 0x11f04e9b0>,
 <hercules_sync.synchronization.operations.AdditionOperation at 0x11f04ef60>,
 <hercules_sync.synchronization.operations.AdditionOperation at 0x11f06d048>,
 <hercules_sync.synchronization.operations.AdditionOperation at 0x11f06d160>,
 <hercules_sync.synchronization.operations.AdditionOperation at 0x11f06d278>,
 <hercules_sync.synchronization.operations.AdditionOperation at 0x11f06d390>]

In [8]:
str(ops[-1])

'AdditionOperation: URIElement: http://www.semanticweb.org/spitxa/ontologies/2020/1/asio-human-resource#AdministrativePersonnel - Type: item - URIElement: http://www.w3.org/2000/01/rdf-schema#subClassOf - Type: item - URIElement: http://www.semanticweb.org/spitxa/ontologies/2020/1/asio-human-resource#HumanResource - Type: item'

As we can see, we have 6 AdditionOperations to execute in the triplestore.
<br><br>

We can now execute the synchronization using the function defined in the setup section:

In [9]:
execute_synchronization(source_content, target_content, synchronizer, adapter)

Please set P2302 and Q21502410 in your wikibase or set `core_props` manually.
Continuing with no core_props
  "Continuing with no core_props")


We can see below a sequence diagram of the complete process of adding new triples to the triplestore:

![abc](img/herc_sync_sequence_new_triple.png)

The process is equivalent to removing content from the triplestore, with the main difference that the 'remove_triple' method from WikibaseAdapter would be called instead.

## Modifying existing content

Now we are going to simulate a change in the ontology file previously created. The following changes are introduced:
* Add a new entity (ChangedPersonnel) of type owl:Class.
* Add labels and description to the ResearchPersonnel entity.
* Change the subclassOf property of ResearchPersonnel from HumanResource to ChangedPersonnel (for illustrative purposes).
* Change the disjointWith property of AdministrativePersonnel from ResearchPersonnel to ChangedPersonnel.
* Create a new 'authoredBy' property.
* Set 'authors' as a subProperty of 'authoredBy'.

In [10]:
source_content = target_content

target_content = """
#################################################################
# Example ontology.                                             #
# This file is used to test the CI and synchronization systems. #
#################################################################

@prefix ex: <http://www.semanticweb.org/spitxa/ontologies/2020/1/asio-human-resource#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix asio: <http://www.asio.es/asioontologies/asio#> .


ex:AdministrativePersonnel rdf:type owl:Class ;
                           rdfs:subClassOf  ex:HumanResource ;
                           owl:disjointWith ex:ChangedPersonnel .

ex:HumanResource rdf:type owl:Class .

ex:ChangedPersonnel rdf:type owl:Class .

ex:ResearchPersonnel rdf:type owl:Class ;
                        rdfs:subClassOf ex:ChangedPersonnel ;
                        rdfs:subClassOf ex:HumanResource ;
                        rdfs:comment "Personnel devoted to technical suport."@en ;
                        rdfs:label "Personal tècnic"@ca ,
                                   "Personal técnico"@es ,
                                   "Personnel technique"@fr ,
                                   "Pessoal técnico"@pt ,
                                   "Technical personnel"@en .

ex:authoredBy rdf:type owl:ObjectProperty .

ex:authors rdf:type owl:ObjectProperty ;
           rdfs:subPropertyOf ex:authoredBy .
"""

In [11]:
execute_synchronization(source_content, target_content, synchronizer, adapter)

## Removing content

Finally, we are going to remove some statements from our entities:
* Remove type statement from the ChangedPersonnel entity.
* Remove subclassOf statement from the ResearchPersonnel entity.
* Remove label @pt from ResearchPersonnel.
* Delete authors and authoredBy content.

In [12]:
source_content = target_content

target_content = """
#################################################################
# Example ontology.                                             #
# This file is used to test the CI and synchronization systems. #
#################################################################

@prefix ex: <http://www.semanticweb.org/spitxa/ontologies/2020/1/asio-human-resource#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix asio: <http://www.asio.es/asioontologies/asio#> .


ex:AdministrativePersonnel rdf:type owl:Class ;
                           rdfs:subClassOf  ex:HumanResource ;
                           owl:disjointWith ex:ChangedPersonnel .

ex:HumanResource rdf:type owl:Class .

ex:ResearchPersonnel rdf:type owl:Class ;
                        rdfs:comment "Personnel devoted to technical suport."@en ;
                        rdfs:label "Personal tècnic"@ca ,
                                   "Personal técnico"@es ,
                                   "Personnel technique"@fr ,
                                   "Technical personnel"@en .
"""

In [13]:
execute_synchronization(source_content, target_content, synchronizer, adapter)

## Future work

For additional information about the work that needs to be done with the synchronization system, please visit: https://github.com/weso/hercules-sync/issues?q=is%3Aopen+is%3Aissue+label%3Awikibase+