# Part 1: Extract the data and put it in Neptune

Here we do the following: open the json archive, find .jsonld files that are labeled as verifiable credentials, turn their contents into a dict. From this dict we take id and all the information stored under 'credentialInformation' and create a vertex in Neptune with those properties. Then, we find all the ids in format 'urn:uuid:...' that are present in our dict and create corresponding edges in Neptune. For easy traversal, we also add inverted edges. Then, we visualize the result. 

Open the archive.

In [1]:
import json
from zipfile import ZipFile

#myzip = ZipFile('hr-app-json-ld-export.zip', 'r')
myzip = ZipFile('json-ld-env-export-with-resignation-03-nov-2022.zip', 'r')
namelist = myzip.namelist()

Add utility functions: extract a json file from the archive and turn it into a dict, and find all the ids that are mentioned in the document, so we can later find edges in out graph.

In [2]:
def extract_dict(filename):
    """
    Make a dictionary out of the .jsonld file from myzip.
    """
    with myzip.open(filename) as json_file:
        return json.load(json_file)
        

def find_mentioned_ids(dct):
    """
    Find ids that are mentionaed in the dict.
    """
    ids = []
    for key in dct.keys():
        if type(dct[key]) is dict:
            ids.extend(find_mentioned_ids(dct[key]))
        elif type(dct[key]) is str and dct[key][:8] == 'urn:uuid':
            ids.append(dct[key])
    return ids

Start with Gremlin: clear the database.

In [3]:
%%gremlin

g.V().hasLabel('VC_ulyss').drop().iterate()

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Output(layout=L…

#,Result
Loading... (need help?),


Create a graph traversal object.

In [4]:
from gremlin_python import statics
from gremlin_python.structure.graph import Graph
from gremlin_python.process.graph_traversal import __
from gremlin_python.process.strategies import *
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection

remoteConn = DriverRemoteConnection('wss://neptunecdkcluster6a0221d7-tf6zid0yicr7.cluster-cnhnmqpcjens.eu-west-1.neptune.amazonaws.com:8182/gremlin','g')
g = Graph().traversal().withRemote(remoteConn)

Add a function that puts all the VCs into the database. Add credential subject information to the database for easy access. (But, actually, it works really slow. Maybe, it would've been better to store credential information in files of even in a separate array.) Add a function to construct edges.

In [5]:
def add_vertices(namelist):
    """
    Adds vertices with filenames from namelist. Return the added vertices.
    """
    vertices = []
    for name in namelist:
        if name[:2] != 'vc':
            continue
        data = extract_dict(name)
        v = g.addV('VC_ulyss').property('id', data['id']).next()
        for key in data['credentialSubject'].keys():
            if type(data['credentialSubject'][key]) is not dict and key != 'id':
                g.V(v.id).property(key, data['credentialSubject'][key]).next()
        vertices.append(v)
    return vertices
                
                
def add_edges(namelist):
    """
    Add edges from vertices from namelist.
    """
    for name in namelist:
        if name[:2] != 'vc':
            continue
        data = extract_dict(name)
        v1 = g.V().hasLabel('VC_ulyss').has('id', data['id']).next()
        for id_ in find_mentioned_ids(data):
            if id_ == data['id']:
                continue
            v2 = g.V().hasLabel('VC_ulyss').has('id', id_).next()
            g.V(v1).addE('e').to(v2).next()
            g.V(v2).addE('e_inv').to(v1).next()

In [6]:
add_vertices(namelist)
add_edges(namelist)

Visualize.

In [7]:
%%gremlin -p v,oute,inv -d type

g.V().hasLabel('VC_ulyss').outE('e').inV().hasLabel('VC_ulyss').path().by(elementMap())

Tab(children=(Output(layout=Layout(max_height='600px', max_width='940px', overflow='scroll')), Force(network=<…

#,Result
Loading... (need help?),


# Part 2: Query the graph and construct the list of employees

Here we construct the list of current employees. We store it in the following way: create a dict, in which keys will be ids of employent agreements and values will be Employee classes, which contain name, surname and position of employee. We will also store the start date of the last change in employee's position. 

The 'update_list_of_employees' function contains three parts: query employment agreements, employment addendums and resignation confirmations. Firstly, we iterate over employment agreements and add newly appointed employees to out list, getting their name from passport, position from job offer and start date from candidate agreement. Secondly, we iterate over employment addendums and, if the addendum is not outdated and it's start date has already come, update the position of the employee, whose employment agreement is linked to this addendum. Thirdly, we iterate over resignation confirmations and, if the last working day has already come, remove the employee, whose employment agreement is linked to this confirmation, from the database.

As you can notice, this structure allows new documents to be added to the database, and the next query will only need to deal with them, not with the whole database again. I decided to add lists of new documents as arguments to the 'update_list_of_employees' function. So, when you add new documents, you need to save ids of those documents with types 'EmploymentAgreement', 'EmploymentAddendum' and 'ResignationConfirmation' and run the 'update_list_of_employees' function.

To deal with date, import datetime. Add other utility functions.

In [8]:
from datetime import *

def make_date(str_time):
    """
    Make datetime object from 'yyyy-mm-dd' representation.
    """
    y, m, d = list(map(int, str_time.split('-')))
    return date(y, m, d)


def get_property(id_, name):
    """
    Get property 'name' from VC with id_.
    """
    property_ = g.V(id_).valueMap().next()[name][0]
    return property_
    
    
def get_employment_agreement_id(id_):
    """
    Get id of the employment agreement linked to the VC with id. Used to get the key 
    for the list_of_employees dictionary.
    """
    employmentAgreement = g.V(id_).out('e').out('e').has('type', 'EmploymentAgreement').next()
    return employmentAgreement.id

Add "employee" class.

In [9]:
class Employee:
    def __init__(self, name, surname, position, date_):
        self.name, self.surname, self.position, self.date = name, surname, position, date_
        
    def __str__(self):
        return self.name + ' ' + self.surname + ' ' + self.position

Get all the surnames from passports linked to EmploymentAgreements. Find which position each employee occupies at this day, or whether they have already resigned.

In [10]:
def update_list_of_employees(today,
                                employmentAgreements = g.V().hasLabel('VC_ulyss').has('type', 'EmploymentAgreement').toList(),
                                employmentAddendums = g.V().hasLabel('VC_ulyss').has('type', 'EmploymentAddendum').toList(),
                                resignationConfirmations = g.V().hasLabel('VC_ulyss').has('type', 'ResignationConfirmation').toList()):
    """
    Updates the list of employees. Takes four arguments: today's date, lists of employment agreements, 
    employment addendums and resignation confirmations to be added. 
    """
    global list_of_employees
    
    for employmentAgreement in employmentAgreements:
        # get surname from passport
        passport = g.V(employmentAgreement).out('e').out('e').has('type', 'Passport').next()
        name, surname = get_property(passport, 'GivenNames'), get_property(passport, 'Surname')
        # get position from JobOffer
        jobOffer = g.V(employmentAgreement).out('e').out('e').has('type', 'JobOffer').next()
        offered_position = get_property(jobOffer, 'OfferedPosition')
        # get start date from EmploymentAgreement
        candidateAgreement = g.V(employmentAgreement).out('e').has('type', 'CandidateAgreement').next()
        start_date = make_date(get_property(candidateAgreement, 'StartDate'))
        # add a new employee to the list
        list_of_employees[employmentAgreement.id] = Employee(name, surname, offered_position, start_date)
        
    for employmentAddendum in employmentAddendums:
        employment_agreement_id = get_employment_agreement_id(employmentAddendum)
        # get new position and new start date from CompanyAddendum
        companyAddendum = g.V(employmentAddendum).out('e').has('type', 'CompanyAddendum').next()
        addendum_position = get_property(companyAddendum, 'Position')
        addendum_date = make_date(get_property(companyAddendum, 'From'))
        # check whether the start date has already come and the addendum is not outdated
        if list_of_employees[employment_agreement_id].date < addendum_date <= today:
            list_of_employees[employment_agreement_id].position = addendum_position
            list_of_employees[employment_agreement_id].date = addendum_date
        
    for resignationConfirmation in resignationConfirmations:
        employmentAgreement_id = get_employment_agreement_id(resignationConfirmation)
        # get last working day from ResignationConfirmation
        last_working_day = make_date(get_property(resignationConfirmation, 'LastWorkingDay'))
        # check whether the last working day has already come
        if last_working_day <= today:
            del list_of_employees[employmentAgreement_id]

Print the final list of employes

In [11]:
today = date.today()
list_of_employees = dict()

update_list_of_employees(today)
for key in list_of_employees.keys():
    print(list_of_employees[key])

Ivan Ivanov celebration engineer
EmployeeWithAddendumDraft Hernandez Support Engineer
EmployeeWithAddendums Jones CFO
EmployeeAgain Larson Product Owner
Employee Davis Engineer
EmployeeWithDraftResignationNotice Evans Editor
EmployeeWithDraftResignationConfirmation Franco Secretary


# Part 3: Update the graph

Test the 'update' part. Let's remove some vertices, make a new list of employees, then add them back again and update our list.

In [12]:
ids_to_drop = ['urn:uuid:fcd7285a-650c-493a-b705-a32c97edf679',
                'urn:uuid:bab0b678-1d7c-40e5-a579-a76828fd8ccd',
                'urn:uuid:9b38551e-b641-41c9-96a0-af56cad14c8d',
                'urn:uuid:1c48d7a4-56b3-4350-bff8-6a1970bb5bd7',
                'urn:uuid:8f54da1a-cd0c-46ef-b9da-5088755a6a46']
for id_ in ids_to_drop:
    g.V().has('id', id_).drop().iterate()

Remake the list of employees.

In [13]:
def update_list_of_employees(today,
                                employmentAgreements = g.V().hasLabel('VC_ulyss').has('type', 'EmploymentAgreement').toList(),
                                employmentAddendums = g.V().hasLabel('VC_ulyss').has('type', 'EmploymentAddendum').toList(),
                                resignationConfirmations = g.V().hasLabel('VC_ulyss').has('type', 'ResignationConfirmation').toList()):
    """
    Updates the list of employees. Takes four arguments: today's date, lists of employment agreements, 
    employment addendums and resignation confirmations to be added. 
    """
    global list_of_employees
    
    for employmentAgreement in employmentAgreements:
        # get surname from passport
        passport = g.V(employmentAgreement).out('e').out('e').has('type', 'Passport').next()
        name, surname = get_property(passport, 'GivenNames'), get_property(passport, 'Surname')
        # get position from JobOffer
        jobOffer = g.V(employmentAgreement).out('e').out('e').has('type', 'JobOffer').next()
        offered_position = get_property(jobOffer, 'OfferedPosition')
        # get start date from EmploymentAgreement
        candidateAgreement = g.V(employmentAgreement).out('e').has('type', 'CandidateAgreement').next()
        start_date = make_date(get_property(candidateAgreement, 'StartDate'))
        # add a new employee to the list
        list_of_employees[employmentAgreement.id] = Employee(name, surname, offered_position, start_date)
        
    for employmentAddendum in employmentAddendums:
        employment_agreement_id = get_employment_agreement_id(employmentAddendum)
        # get new position and new start date from CompanyAddendum
        companyAddendum = g.V(employmentAddendum).out('e').has('type', 'CompanyAddendum').next()
        addendum_position = get_property(companyAddendum, 'Position')
        addendum_date = make_date(get_property(companyAddendum, 'From'))
        # check whether the start date has already come and the addendum is not outdated
        if list_of_employees[employment_agreement_id].date < addendum_date <= today:
            list_of_employees[employment_agreement_id].position = addendum_position
            list_of_employees[employment_agreement_id].date = addendum_date
        
    for resignationConfirmation in resignationConfirmations:
        employmentAgreement_id = get_employment_agreement_id(resignationConfirmation)
        # get last working day from ResignationConfirmation
        last_working_day = make_date(get_property(resignationConfirmation, 'LastWorkingDay'))
        # check whether the last working day has already come
        if last_working_day <= today:
            del list_of_employees[employmentAgreement_id]

In [14]:
today = date.today()
list_of_employees = dict()

update_list_of_employees(today)
for key in list_of_employees.keys():
    print(list_of_employees[key])

Ivan Ivanov celebration engineer
EmployeeWithAddendums Jones Senior finance manager
EmployeeAgain Larson Product Owner
Employee Davis Engineer
EmployeeWithDraftResignationNotice Evans Editor
EmployeeWithDraftResignationConfirmation Franco Secretary


Add the removed vertices. Construct three list of vertices, that need to be queried to update the list of employees.

In [15]:
files_to_add = ['vc_body_urn_uuid_fcd7285a-650c-493a-b705-a32c97edf679.jsonld', 
                'vc_body_urn_uuid_bab0b678-1d7c-40e5-a579-a76828fd8ccd.jsonld', 
                'vc_body_urn_uuid_9b38551e-b641-41c9-96a0-af56cad14c8d.jsonld', 
                'vc_body_urn_uuid_1c48d7a4-56b3-4350-bff8-6a1970bb5bd7.jsonld', 
                'vc_body_urn_uuid_8f54da1a-cd0c-46ef-b9da-5088755a6a46.jsonld']
                
new_vertices = add_vertices(files_to_add)
add_edges(files_to_add)

employmentAgreements = []
employmentAddendums = []
resignationConfirmations = []
for vertex in new_vertices:
    if get_property(vertex, 'type') == 'EmploymentAgreement':
        employmentAgreements.append(vertex)
    elif get_property(vertex, 'type') == 'EmploymentAddendum':
        employmentAddendums.append(vertex)
    elif get_property(vertex, 'type') == 'ResignationConfirmation':
        resignationConfirmations.append(vertex)

Update the list of employees.

In [16]:
def update_list_of_employees(today,
                                employmentAgreements = g.V().hasLabel('VC_ulyss').has('type', 'EmploymentAgreement').toList(),
                                employmentAddendums = g.V().hasLabel('VC_ulyss').has('type', 'EmploymentAddendum').toList(),
                                resignationConfirmations = g.V().hasLabel('VC_ulyss').has('type', 'ResignationConfirmation').toList()):
    """
    Updates the list of employees. Takes four arguments: today's date, lists of employment agreements, 
    employment addendums and resignation confirmations to be added. 
    """
    global list_of_employees
    
    for employmentAgreement in employmentAgreements:
        # get surname from passport
        passport = g.V(employmentAgreement).out('e').out('e').has('type', 'Passport').next()
        name, surname = get_property(passport, 'GivenNames'), get_property(passport, 'Surname')
        # get position from JobOffer
        jobOffer = g.V(employmentAgreement).out('e').out('e').has('type', 'JobOffer').next()
        offered_position = get_property(jobOffer, 'OfferedPosition')
        # get start date from EmploymentAgreement
        candidateAgreement = g.V(employmentAgreement).out('e').has('type', 'CandidateAgreement').next()
        start_date = make_date(get_property(candidateAgreement, 'StartDate'))
        # add a new employee to the list
        list_of_employees[employmentAgreement.id] = Employee(name, surname, offered_position, start_date)
        
    for employmentAddendum in employmentAddendums:
        employment_agreement_id = get_employment_agreement_id(employmentAddendum)
        # get new position and new start date from CompanyAddendum
        companyAddendum = g.V(employmentAddendum).out('e').has('type', 'CompanyAddendum').next()
        addendum_position = get_property(companyAddendum, 'Position')
        addendum_date = make_date(get_property(companyAddendum, 'From'))
        # check whether the start date has already come and the addendum is not outdated
        if list_of_employees[employment_agreement_id].date < addendum_date <= today:
            list_of_employees[employment_agreement_id].position = addendum_position
            list_of_employees[employment_agreement_id].date = addendum_date
        
    for resignationConfirmation in resignationConfirmations:
        employmentAgreement_id = get_employment_agreement_id(resignationConfirmation)
        # get last working day from ResignationConfirmation
        last_working_day = make_date(get_property(resignationConfirmation, 'LastWorkingDay'))
        # check whether the last working day has already come
        if last_working_day <= today:
            del list_of_employees[employmentAgreement_id]

In [17]:
today = date.today()

update_list_of_employees(today, employmentAgreements, employmentAddendums, resignationConfirmations)
for key in list_of_employees.keys():
    print(list_of_employees[key])

Ivan Ivanov celebration engineer
EmployeeWithAddendums Jones CFO
EmployeeAgain Larson Product Owner
Employee Davis Engineer
EmployeeWithDraftResignationNotice Evans Editor
EmployeeWithDraftResignationConfirmation Franco Secretary
EmployeeWithAddendumDraft Hernandez Support Engineer


As we can see, the list matches the first one that we've got by adding all the vertices at the same time.