# Real Entities

## Problem Statement

In a given database many occurrences of the same entity can be found.

Due to homonyms and name variations it is difficult to identigy people during data entry, 
specially when collecting information from various sources, such as parish records.

`timelink` introduces the concept of `real entity` to solve this problem.

A `real entity` represents a real person, object or event, that can be represented by one or more occurrences in the database.

A special table, called `links` is used to link all occurrences of the same `real entity`.

Real entities are considered "interpretations" of the data, and can be reviewed any time, by changing the
links table and occurrences associated with it.

Also `timelink` allows for different users to create their own entires in the `links` table, so that each user can have their own interpretation of the data, and their own set of real entities.

This notebook shows how to manage real entities in `timelink`.

## Create a database

In [1]:
from timelink.notebooks import TimelinkNotebook

tlnb = TimelinkNotebook(db_name='rentities_tutorial')
tlnb.db.drop_db()
tlnb.db.create_db()
kleio_home = tlnb.kleio_server.get_kleio_home()
kleio_token = tlnb.kleio_server.get_token()
kleio_url = tlnb.kleio_server.get_url()
tlnb.print_info(show_token=True, show_password=True)

INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
INFO  [alembic.runtime.migration] Will assume non-transactional DDL.
INFO  [alembic.runtime.migration] Running stamp_revision  -> 48dd68d06c60


Timelink version: 1.1.15
Project name: tutorial
Project home: /Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial
Database type: sqlite
Database name: rentities_tutorial
Kleio image: timelinkserver/kleio-server
Kleio server token: F4ZqXQsCbs8UBlMAmQTJ5cfjmnIXBuCE
Kleio server URL: http://127.0.0.1:8089
Kleio server home: /Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial
Kleio server container: loving_zhukovsky
Kleio version requested: latest
Kleio server version: 12.6.577 (2024-10-24 16:53:53)
SQLite directory: /Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial/database/sqlite
TimelinkNotebook(project_name=tutorial, project_home=/Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial, db_type=sqlite, db_name=rentities_tutorial, kleio_image=timelinkserver/kleio-server, kleio_version=latest, postgres_image=postgres, postgres_version=latest)


Set VSCode extension settings.

In [None]:
import json
import os

# Path to the workspace settings file
workspace_path = os.path.abspath("../../../../../")
settings_path = os.path.join(workspace_path, '.vscode', 'settings.json')

# Ensure the .vscode directory exists
os.makedirs(os.path.dirname(settings_path), exist_ok=True)

# Read the existing settings
if os.path.exists(settings_path):
    with open(settings_path, 'r') as file:
        settings = json.load(file)
else:
    settings = {}

# Update the settings with your desired preferences
# Example: Set the editor font size to 14
settings['timelink.kleio.kleioServerHome'] = kleio_home
settings['timelink.kleio.kleioServerToken'] = kleio_token
settings['timelink.kleio.kleioServerUrl'] = kleio_url

# Write the updated settings back to the file
with open(settings_path, 'w') as file:
    json.dump(settings, file, indent=4)

print("Workspace settings updated successfully.")

Workspace settings updated successfully.


## Check the kleio files

In [None]:
kleio_files = tlnb.get_kleio_files(path="kleio/real-entities")
kleio_files

Unnamed: 0,path,name,modified,status,translated,errors,warnings,import_status,import_errors,import_warnings,import_error_rpt,import_warning_rpt,imported,rpt_url,xml_url
0,kleio/real-entities/real-entities.cli,real-entities.cli,2024-10-16 04:21:34.075653+00:00,V,2024-10-16 04:21:00+00:00,0,0,N,0,0,,,,/rest/reports/kleio/real-entities/real-entitie...,/rest/exports/kleio/real-entities/real-entitie...


## Import data

In [None]:
tlnb.update_from_sources(path="kleio/real-entities")

## Check the database

In [None]:
tlnb.table_row_count_df()

Unnamed: 0,table,count
0,acts,1
1,aregisters,0
2,attributes,413
3,class_attributes,63
4,classes,13
5,entities,524
6,geoentities,0
7,goods,0
8,kleiofiles,1
9,links,0


## Create a new real entity

### Search for occurences of "Matteo Ricci"

In [None]:
from timelink.api.models import Person

with tlnb.db.session() as session:

    session.expire_on_commit = False

    occurrences = [
        (id,name)
        for (id,name) in session.query(Person.id, Person.name)
        .filter(Person.name.like("Mat%Ricci"))
        .all()
    ]
occurrences

[('deh-matteo-ricci', 'Matteo Ricci'),
 ('deh-joao-barradas-ref1', 'Matteo Ricci'),
 ('deh-lazzaro-cattaneo-ref1', 'Matteo Ricci'),
 ('deh-giovanni-cola-niccolo-ref1', 'Matteo Ricci'),
 ('deh-sabatino-de-ursis-ref1', 'Matteo Ricci'),
 ('deh-sebastien-fernandes-tchong-ref2', 'Matteo Ricci'),
 ('deh-jean-fernandes-tchong-ref1', 'Matteo Ricci'),
 ('deh-bento-de-gois-ref2', 'Matteo Ricci'),
 ('deh-manuel-pereira-yeou-ref1', 'Matteo Ricci'),
 ('deh-michele-ruggiere-ref4', 'Matteo Ricci')]

## Create a new real entity

In [None]:
from timelink.api.models.rentity import REntity, REntityStatus

real_matteo = REntity(id='rp-matteo-ricci', description="Matteo Ricci", type="Person", status=REntityStatus.MANUAL)

TypeError: 'type' is an invalid keyword argument for REntity