# Using Timelink in notebooks: tutorial

> First time use: follow instructions in the [README.md](README.md) file in this directory.

## Initial setup

Create a TimelinkNotebook object. This will be the interface for most of your interactions with Timelink.

The first time you run this notebook, you will be prompted to install ìpykernel. This is required to run Timelink in a notebook.

Note:
* First time run takes a little time as the required Docker images are downloaded.
* Timelink will default to using sqlite as the database, see [Receipts notebook for more control](1-receipts.ipynb)


In [1]:
from timelink.notebooks import TimelinkNotebook

tlnb = TimelinkNotebook()


TimelinkNotebook created a database to store data and a Kleio server to 
translate transcriptions in Kleio notation into data that can be imported
into a database.

To get information about the database and the Kleio server do ```tnlb.print_info()```

In [2]:
tlnb.print_info()

Project name: tutorial
Project home: /Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial
Database type: sqlite
Database name: tutorial
Kleio image: timelinkserver/kleio-server
Kleio version: latest
Kleio server token: yNEupdoWvbKkAfALXitPVuC0D9PXUkPa
Kleio server URL: http://127.0.0.1:8089
Kleio server home: /Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial
SQLite directory: /Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial/database/sqlite
TimelinkNotebook(project_name=tutorial, project_home=/Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial, db_type=sqlite, db_name=tutorial, kleio_image=timelinkserver/kleio-server, kleio_version=latest, postgres_image=postgres, postgres_version=latest)


### Database status

Count the number of rows in each table in the database.


In [3]:
tlnb.table_row_count_df()

Unnamed: 0,table,count
0,acts,3
1,attributes,14680
2,class_attributes,63
3,classes,13
4,entities,17182
5,geoentities,215
6,goods,0
7,kleiofiles,3
8,objects,0
9,persons,873


### Show the kleio files available 

In [5]:
kleio_files = tlnb.get_kleio_files()
kleio_files

Unnamed: 0,path,name,modified,status,translated,errors,warnings,import_status,import_errors,import_warnings,import_error_rpt,import_warning_rpt,imported,rpt_url,xml_url
0,kleio/auc-alunos.cli,auc-alunos.cli,2024-02-04 08:38:20.975128+00:00,V,2024-02-04 08:38:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-04 08:38:54.291236,/rest/reports/kleio/auc-alunos.rpt,/rest/exports/kleio/auc-alunos.xml
1,kleio/dehergne-a.cli,dehergne-a.cli,2024-02-04 14:08:29.476407+00:00,V,2024-02-04 14:08:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-04 14:08:36.475775,/rest/reports/kleio/dehergne-a.rpt,/rest/exports/kleio/dehergne-a.xml
2,kleio/dehergne-locations-1644.cli,dehergne-locations-1644.cli,2024-02-04 08:40:21.505312+00:00,V,2024-02-04 08:40:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-04 08:40:24.149131,/rest/reports/kleio/dehergne-locations-1644.rpt,/rest/exports/kleio/dehergne-locations-1644.xml



# Update the database from Kleio sources.

Updates source translations and imports into database sources with no errors.

Only changed files since last import, or new files, will be processed.

In [6]:
import logging
logging.basicConfig(level=logging.INFO)

tlnb.update_from_sources()

## Check the status of the files

Check the import status of the translated files

I: Imported

E: Imported with error

W: Imported with warnings no errors

N: Not imported

U: Translation updated need to reimport

In [7]:
imported_files_df = tlnb.get_import_status()
imported_files_df[["import_status","import_errors","import_warnings","name","errors","warnings","path"]].sort_values("name")


Unnamed: 0,import_status,import_errors,import_warnings,name,errors,warnings,path
0,I,0,0,auc-alunos.cli,0,0,kleio/auc-alunos.cli
1,I,0,0,dehergne-a.cli,0,0,kleio/dehergne-a.cli
2,I,0,0,dehergne-locations-1644.cli,0,0,kleio/dehergne-locations-1644.cli


### Check the translation report

In [9]:
rpt = tlnb.get_translation_report(imported_files_df, 1)
print(rpt)


KleioTranslator - server version 12.2 - build 560 2024-01-08 08:18:15
4-2-2024 14-8

Processing data file dehergne-a.cli
-------------------------------------------
Generic Act translation module with geoentities (XML).
     Joaquim Ramos de Carvalho (joaquim@uc.pt) 
** New document: kleio
kleio translation started
Structure: gacto2.str
Prefix: 
Autorel: 
Translation count: 178
Obs: 
** Processing source fonte$dehergne-a
7: lista$dehergne-notices-a
*** End of File

Line 541 "SAME AS" TO EXTERNAL REFERENCE EXPORTED (deh-belchior-miguel-carneiro-leitao) CHECK IF IT EXISTS BEFORE IMPORTING THIS FILE.
Line 814 "SAME AS" TO EXTERNAL REFERENCE EXPORTED (deh-jean-regis-lieou) CHECK IF IT EXISTS BEFORE IMPORTING THIS FILE.
Line 876 "SAME AS" TO EXTERNAL REFERENCE EXPORTED (deh-alessandro-cicero) CHECK IF IT EXISTS BEFORE IMPORTING THIS FILE.

Structure file: /usr/local/timelink/clio/src/gacto2.str
Structure processing report: /usr/local/timelink/clio/src/gacto2.srpt
Structure in JSON: /usr/loc

### Get the import report for a file

In case there are errors in the import phase check the import report.

In [8]:
rpt = tlnb.get_import_rpt(imported_files_df, rows=[0])
print(rpt)

No errors


## Todo

This as data frame in a single function
* TimelinkNotebook.translate([files_df,paths=List[str], rows=List[int],status="T"])
* TimelinkNotebook.import([files_df,paths=List[str], rows=List[int],status="V"])



# Obter dados
---

# Getting data

### Procurar pessoa, mostrar em notação Kleio

---
### Search for people, show in Kleio notation

In [9]:
from timelink.api.models import Person

show_only=10

with tlnb.db.session() as session:
    persons = session.query(Person).filter(Person.name.like('%Abegão')).all()

    for person in persons[:show_only]:
        print(person.to_kleio())
        print()

n$Estevão de Matos Abegão/m/id=140338/obs="""
      """

                  Id: 140338
                  Código de referência: PT/AUC/ELU/UC-AUC/B/001-001/A/000002

                  Nome        : Estevão de Matos Abegão
                  Data inicial: 1728-10-01
                  Data final  : 1733-07-03
                  Filiação: Lourenço de Matos Pereira
                  Naturalidade: Elvas
                  Faculdade: Cânones

                  Matrícula(s): 01.10.1729
                  01.10.1730
                  01.10.1732

                  Instituta: 01.10.1728

                  Bacharel: 03.07.1733
              """
  """
  rel$function-in-act/n/auc-alumni-A-140337-140771/20200211
  atr$código-de-referência/""PT/AUC/ELU/UC-AUC/B/001-001/A/000002""/2021-05-17
  atr$data-do-registo/2021-05-17/2021-05-17
  atr$url/""https://pesquisa.auc.uc.pt/details?id=140338""/2021-05-17
  ls$uc.entrada/1728-10-01/1728-10-01
  ls$uc.saida/1733-07-03/1733-07-03
  ls$uc.entrada.ano/1728/1728-1

In [42]:
from sqlalchemy import select, and_
from sqlalchemy.orm import joinedload
from timelink.api.models import Person, Attribute

show_only=10

with tlnb.db.session() as session:
    stmt = select(Person).join(Person.attributes.and_(Attribute.the_type == 'naturalidade',
                                                      Attribute.the_value.like('Lisboa')))
    print(stmt)
    persons = session.execute(stmt).scalars().all()

    for person in persons[:show_only]:
        p = tlnb.db.get_person(person.id)
        print(p.to_kleio())


SELECT persons.id, entities.id AS id_1, entities.class, entities.inside, entities.the_order, entities.the_level, entities.the_line, entities.groupname, entities.updated, entities.indexed, persons.name, persons.sex, persons.obs 
FROM entities JOIN persons ON entities.id = persons.id JOIN (entities AS entities_1 JOIN attributes AS attributes_1 ON attributes_1.id = entities_1.id) ON entities.id = attributes_1.entity AND attributes_1.the_type = :the_type_1 AND attributes_1.the_value LIKE :the_value_1
n$Diogo Pereira de Melo e Aboim/m/id=140354/obs="""
      """

                  Id: 140354
                  Código de referência: PT/AUC/ELU/UC-AUC/B/001-001/A/000020

                  Nome        : Diogo Pereira de Melo e Aboim
                  Data inicial: 1768-10-01
                  Data final  : 1768-10-01
                  Filiação:

                  Naturalidade:  Lisboa
                  Faculdade: Cânones

                  Matrícula(s): 01.10.1768
                  Instituta:
 

  print(stmt)




###  Obter um dataframe a partir de atributos

---


###  Get a Dataframe from attributes


#### Exemplo: Faculdade, data de entrada e data de saída e grau dos naturais de Coimbra

In [16]:
from timelink.pandas import entities_with_attribute


# Get list of people with with a certain value in a specific attribute
df = entities_with_attribute(
                    entity_type='person',
                    the_type='naturalidade',
                    the_value='Coimbra',
                    more_info=['name'],
                    more_cols=['faculdade','uc.entrada','uc.saida'],
                    db=tlnb.db,
                    sql_echo=False)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, 140349 to 140367
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   name               14 non-null     object
 1   naturalidade       14 non-null     object
 2   naturalidade.date  14 non-null     object
 3   naturalidade.obs   0 non-null      object
 4   faculdade          14 non-null     object
 5   faculdade.date     14 non-null     object
 6   faculdade.obs      1 non-null      object
 7   uc.entrada         14 non-null     object
 8   uc.entrada.date    14 non-null     object
 9   uc.entrada.obs     0 non-null      object
 10  uc.saida           14 non-null     object
 11  uc.saida.date      14 non-null     object
 12  uc.saida.obs       0 non-null      object
dtypes: object(13)
memory usage: 2.1+ KB


In [17]:
df.head(5)

Unnamed: 0_level_0,name,naturalidade,naturalidade.date,naturalidade.obs,faculdade,faculdade.date,faculdade.obs,uc.entrada,uc.entrada.date,uc.entrada.obs,uc.saida,uc.saida.date,uc.saida.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
140349,António de Aboim,Coimbra,1566-12-20,,Cânones,1566-12-20,,1566-12-20,1566-12-20,,1574-07-24,1574-07-24,
140671,Diogo de Abreu,Coimbra,1604-10-11,,Cânones,1604-10-11,,1604-10-11,1604-10-11,,1604-10-11,1604-10-11,
140643,Bernardo de Abreu,Coimbra,1612-10-15,,Cânones,1612-10-15,,1612-10-15,1612-10-15,,1616-10-19,1616-10-19,
140629,Bartolomeu de Abreu,Coimbra,1627-10-19,,Cânones,1627-10-19,,1627-10-19,1627-10-19,,1638-10-11,1638-10-11,
140662,Cristóvão de Abreu,Coimbra,1660-01-08,,Cânones,1660-01-08,,1660-01-08,1660-01-08,,1662-10-22,1662-10-22,




###  Remover colunas sem valores

---



###  Remove empty columns

In [18]:
df.dropna(how='all', axis=1, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, 140349 to 140367
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   name               14 non-null     object
 1   naturalidade       14 non-null     object
 2   naturalidade.date  14 non-null     object
 3   faculdade          14 non-null     object
 4   faculdade.date     14 non-null     object
 5   faculdade.obs      1 non-null      object
 6   uc.entrada         14 non-null     object
 7   uc.entrada.date    14 non-null     object
 8   uc.saida           14 non-null     object
 9   uc.saida.date      14 non-null     object
dtypes: object(10)
memory usage: 1.7+ KB


In [19]:
df.head(5)

Unnamed: 0_level_0,name,naturalidade,naturalidade.date,faculdade,faculdade.date,faculdade.obs,uc.entrada,uc.entrada.date,uc.saida,uc.saida.date
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
140349,António de Aboim,Coimbra,1566-12-20,Cânones,1566-12-20,,1566-12-20,1566-12-20,1574-07-24,1574-07-24
140671,Diogo de Abreu,Coimbra,1604-10-11,Cânones,1604-10-11,,1604-10-11,1604-10-11,1604-10-11,1604-10-11
140643,Bernardo de Abreu,Coimbra,1612-10-15,Cânones,1612-10-15,,1612-10-15,1612-10-15,1616-10-19,1616-10-19
140629,Bartolomeu de Abreu,Coimbra,1627-10-19,Cânones,1627-10-19,,1627-10-19,1627-10-19,1638-10-11,1638-10-11
140662,Cristóvão de Abreu,Coimbra,1660-01-08,Cânones,1660-01-08,,1660-01-08,1660-01-08,1662-10-22,1662-10-22



## Contagens

---

## Counting



###  Contagem de atributos a partir de uma tabela em memória

---

###  Count attributes from an existing dataframe



In [20]:
# create a column with the index values which are the id numbers
df['id'] = df.index.values

col = 'faculdade' # subotal by this column

# Use pandas groupby and specify unique value count for id
df_totals = df.groupby(col).agg({'id':'nunique',
                                                  'uc.entrada':'min',
                                                  'uc.saida':'max'})

df_totals.sort_values('id',ascending= False).head(30)

Unnamed: 0_level_0,id,uc.entrada,uc.saida
faculdade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Cânones,7,0000-00-00,1767-07-27
Medicina,5,1670-10-01,1823-10-20
Direito,1,1748-10-19,1748-10-19
Matemática,1,1868-10-02,1872-10-03



### Contagens na base de dados

Quando o atributo tem muitos valores e não é necessário
ter todas as pessoas em memória: contagem feita na base de dados

---

### Counting directly in the database
When there are many values and it is not
necessary to have all the people in memory:
count directly in the database.




In [25]:
from timelink.pandas import attribute_values

df_totals = attribute_values('jesuita-entrada',db=tlnb.db)


In [26]:
df_totals.head(10)


Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
?,31,15550000,17581126
Coimbra,6,15420000,16770701
Goa,3,15480502,16660000
Lisboa,3,16680000,17460223
Avignon,2,17350731,17370927
Paris,2,16700926,16771029
Saragoça,2,16740613,16771115
Évora,2,15710304,17230000
Alcalá de Henares,1,15580413,15580413
Bolonha,1,17250728,17250728


#### Filtrar por datas

---

#### Filter by dates



In [29]:
df_totals = attribute_values('grau',dates_between=('1535','1600'),db=tlnb.db)

In [30]:
df_totals.head(10)

Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bacharel,9,1554-07-19,1593-06-09
Formatura,6,1574-07-24,1593-05-12
Bacharel em Artes,3,1574-03-14,1588-03-12
Licenciado,3,1574-06-03,1597-05-06
Doutor,2,1560-12-22,1589-10-27
Licenciado em Artes,2,1574-05-15,1589-01-13


## Visualizar registos

---

## View records





### Visualizar uma pessoa

---

### View a person


#### Atributos de uma pessoa numa tabela, uma linha por attributo

---

#### Person attributes in a dataframe, one line per attribute

In [40]:
import pandas as pd
from timelink.pandas import group_attributes as person_attributes

pd.set_option('display.max_rows',1000)

id = '140687'
pdf = person_attributes([id],db=tlnb.db)  # note id in a list
pdf[['date','type','value','attr_obs']].sort_values(['date','type'])

Unnamed: 0_level_0,date,type,value,attr_obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
140687,1821-10-31,faculdade,"""#29""",
140687,1821-10-31,faculdade.ano,"""#29.1821""",
140687,1821-10-31,matricula,Filosofia,
140687,1821-10-31,matricula.ano,Filosofia.1821,
140687,1821-10-31,naturalidade,Vilar de Veiga,
140687,1821-10-31,naturalidade.ano,Vilar de Veiga.1821,
140687,1821-10-31,nome,Domingos José da Cunha Abreu,
140687,1821-10-31,nome-geografico,Vilar de Veiga,
140687,1821-10-31,nome-geografico.ano,Vilar de Veiga.1821,
140687,1821-10-31,nome.apelido,José da Cunha Abreu,


In [41]:
p = tlnb.db.get_person(id)
print(p.to_kleio())

n$Domingos José da Cunha Abreu/m/id=140687/obs="""
      """

                  Id: 140687
                  Código de referência: PT/AUC/ELU/UC-AUC/B/001-001/A/000404

                  Nome        : Domingos José da Cunha Abreu
                  Data inicial: 1821-10-31
                  Data final  : 1821-11-05
                  Filiação:

                  Naturalidade: Vilar de Veiga
                  Faculdade: #29;Matemática

                  Matrícula(s): Filosofia: 1821/10/31 (obrigado)
                  Matemática: 1821/11/05 (obrigado)
                  Instituta:
                  Bacharel:
                  Formatura:
                  Licenciado:
                  Doutor:

                  Outras informações:
              """
  """
  rel$function-in-act/n/auc-alumni-A-140337-140771/20200211
  atr$código-de-referência/""PT/AUC/ELU/UC-AUC/B/001-001/A/000404""/2021-04-23
  atr$data-do-registo/2021-04-23/2021-04-23
  atr$url/""https://pesquisa.auc.uc.pt/details?id=140687""

#### Atributos de uma pessoa numa tabela, attributos em colunas

---

#### Person attributes in a dataframe, attributes in columns

In [39]:
# Get list of people with with a certain value in a specific attribute

id = '316297'  # João Pedro Ribeiro
# id = '234295'  # Alexandre Metelo de Sousa
df = entities_with_attribute(
                    entity_type='person',
                    the_type='uc.entrada',  # we need a base attribute
                    more_info=['name'],
                    more_cols=['instituta','faculdade'],
                    db=tlnb.db,
                    sql_echo=False)
view_cols = ['name','faculdade','faculdade.date','faculdade.obs']
df.info()
df[view_cols].sort_values('faculdade.date')

<class 'pandas.core.frame.DataFrame'>
Index: 503 entries, 140337 to 269364
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   name             503 non-null    object
 1   uc.entrada       503 non-null    object
 2   uc.entrada.date  503 non-null    object
 3   uc.entrada.obs   0 non-null      object
 4   instituta        205 non-null    object
 5   instituta.date   205 non-null    object
 6   instituta.obs    205 non-null    object
 7   faculdade        447 non-null    object
 8   faculdade.date   447 non-null    object
 9   faculdade.obs    29 non-null     object
dtypes: object(10)
memory usage: 43.2+ KB


Unnamed: 0_level_0,name,faculdade,faculdade.date,faculdade.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
140345,Jorge Abenajado,Medicina,1537-12-17,
140669,Diogo de Abreu,Leis,1538-05-07,
140347,Adrião de Aboim,Leis,1539-12-07,
140708,Filipe de Abreu,Cânones,1548-06-23,
140541,António de Abreu,Cânones,1551-10-00,
140358,João de Aboim,Cânones,1551-10-01,
140670,Diogo de Abreu,Leis,1552-07-00,
140695,Duarte de Abreu,Cânones,1566-01-20,
140349,António de Aboim,Cânones,1566-12-20,
140742,Francisco de Abreu,Medicina,1573-10-03,


#### Examinar potenciais duplicados

---

#### Examine potentital duplicates

In [15]:
from timelink.pandas import display_group_attributes
pd.set_option('display.max_rows',250)

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome-apelido','nome-primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','nome-geografico','instituta.ano']

dup_ids = ['234295','234710',]  # Alexandre Metelo de

display_group_attributes(dup_ids,
                             header_cols=['uc-entrada','naturalidade','faculdade','nome-pai'],
                             exclude_attributes=no_show,
                             sort_attributes=['date','type','value'],
                             cmap_name='Pastel1')

Unnamed: 0,id,uc-entrada,naturalidade,faculdade,nome-pai
0,234295,1704-11-07,Marialva,Cânones,
1,234710,1705-10-24,Marialva,Matemática,Manuel Cardoso Metelo


Unnamed: 0,date,id,type,value,attr_obs
0,1704-11-07,234295,faculdade,Cânones,Cânones
1,1704-11-07,234295,instituta,1704-11-07,07.11.1704 1704-11-07
2,1704-11-07,234295,naturalidade,Marialva,
3,1704-11-07,234295,nome,Alexandre Metelo de Sousa,
4,1704-11-07,234295,uc-entrada,1704-11-07,
5,1704-11-07,234295,uc-entrada.ano,1704,
6,1705-10-24,234710,faculdade,Matemática,Matemática
7,1705-10-24,234295,matricula-faculdade,Cânones,24.10.1705
8,1705-10-24,234710,matricula-faculdade,Matemática,24.10.1705
9,1705-10-24,234710,naturalidade,Marialva,
