# Using Timelink in notebooks: tutorial

> First time use: follow instructions in the [README.md](README.md) file in this directory.

## Initial setup

Create a TimelinkNotebook object. This will be the interface for most of your interactions with Timelink.

The first time you run this notebook, you will be prompted to install ìpykernel. This is required to run Timelink in a notebook.

Note:
* First time run takes a little time as the required Docker images are downloaded.
* Timelink will default to using sqlite as the database, see [Receipts notebook for more control](1-receipts.ipynb)


In [1]:
from timelink.notebooks import TimelinkNotebook

tlnb = TimelinkNotebook()


TimelinkNotebook created a database to store data and a Kleio server to 
translate transcriptions in Kleio notation into data that can be imported
into a database.

To get information about the database and the Kleio server do ```tnlb.print_info()```

In [2]:
tlnb.print_info()

Project name: tutorial
Project home: /Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial
Database type: sqlite
Database name: tutorial
Kleio image: timelinkserver/kleio-server
Kleio server token: JmJP05S7ogJAOgkalhX4Tk8LfY1xbi42
Kleio server URL: http://127.0.0.1:8088
Kleio server home: /Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial
Kleio server container: epic_robinson
Kleio version requested: latest
Kleio server version: 12.4.567 (2024-02-07 13:02:03)
SQLite directory: /Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial/database/sqlite
TimelinkNotebook(project_name=tutorial, project_home=/Users/jrc/develop/timelink-py/tests/timelink-home/projects/tutorial, db_type=sqlite, db_name=tutorial, kleio_image=timelinkserver/kleio-server, kleio_version=latest, postgres_image=postgres, postgres_version=latest)


### Database status

Count the number of rows in each table in the database.


In [3]:
tlnb.table_row_count_df()

Unnamed: 0,table,count
0,acts,40
1,attributes,15297
2,class_attributes,63
3,classes,13
4,entities,18644
5,geoentities,359
6,goods,0
7,kleiofiles,5
8,objects,0
9,persons,1066


### Show the kleio files available 

In [4]:
kleio_files = tlnb.get_kleio_files()
kleio_files

Unnamed: 0,path,name,modified,status,translated,errors,warnings,import_status,import_errors,import_warnings,import_error_rpt,import_warning_rpt,imported,rpt_url,xml_url
0,kleio/auc-alunos.cli,auc-alunos.cli,2024-02-26 23:41:23.284867+00:00,V,2024-02-26 23:41:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-26 23:42:06.615588,/rest/reports/kleio/auc-alunos.rpt,/rest/exports/kleio/auc-alunos.xml
1,kleio/b1685.cli,b1685.cli,2024-02-26 23:40:15.982092+00:00,W,2024-02-26 23:40:00+00:00,0,1,I,0,0,No errors,No warnings,2024-02-26 23:42:21.330458,/rest/reports/kleio/b1685.rpt,/rest/exports/kleio/b1685.xml
2,kleio/dehergne-a.cli,dehergne-a.cli,2024-02-26 23:40:18.064364+00:00,V,2024-02-26 23:40:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-26 23:42:12.382473,/rest/reports/kleio/dehergne-a.rpt,/rest/exports/kleio/dehergne-a.xml
3,kleio/dehergne-locations-1644.cli,dehergne-locations-1644.cli,2024-02-26 23:40:18.379161+00:00,V,2024-02-26 23:40:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-26 23:42:15.727081,/rest/reports/kleio/dehergne-locations-1644.rpt,/rest/exports/kleio/dehergne-locations-1644.xml
4,kleio/dehergne-locations-1701.cli,dehergne-locations-1701.cli,2024-02-26 23:40:19.696118+00:00,V,2024-02-26 23:40:00+00:00,0,0,I,0,0,No errors,No warnings,2024-02-26 23:42:17.975926,/rest/reports/kleio/dehergne-locations-1701.rpt,/rest/exports/kleio/dehergne-locations-1701.xml



# Update the database from Kleio sources.

Updates source translations and imports into database sources with no errors.

Only changed files since last import, or new files, will be processed.

In [5]:
import logging
logging.basicConfig(level=logging.INFO)

tlnb.update_from_sources()

## Check the status of the files

Check the import status of the translated files

I: Imported

E: Imported with error

W: Imported with warnings no errors

N: Not imported

U: Translation updated need to reimport

In [6]:
imported_files_df = tlnb.get_import_status()
imported_files_df[["import_status","import_errors","import_warnings","name","errors","warnings","path"]].sort_values("name")


Unnamed: 0,import_status,import_errors,import_warnings,name,errors,warnings,path
0,I,0,0,auc-alunos.cli,0,0,kleio/auc-alunos.cli
1,I,0,0,b1685.cli,0,1,kleio/b1685.cli
2,I,0,0,dehergne-a.cli,0,0,kleio/dehergne-a.cli
3,I,0,0,dehergne-locations-1644.cli,0,0,kleio/dehergne-locations-1644.cli
4,I,0,0,dehergne-locations-1701.cli,0,0,kleio/dehergne-locations-1701.cli


### Check the translation report

In [7]:
rpt = tlnb.get_translation_report(imported_files_df, 2)
print(rpt)


HTTPError: HTTP Error 400: Bad Request

### Get the import report for a file

In case there are errors in the import phase check the import report.

In [None]:
rpt = tlnb.get_import_rpt(imported_files_df, rows=[0])
print(rpt)

No errors


## Todo

This as data frame in a single function
* TimelinkNotebook.translate([files_df,paths=List[str], rows=List[int],status="T"])
* TimelinkNotebook.import([files_df,paths=List[str], rows=List[int],status="V"])



# Obter dados
---

# Getting data

### Procurar pessoa, mostrar em notação Kleio

---
### Search for people, show in Kleio notation

In [None]:
from timelink.api.models import Person

show_only=10

with tlnb.db.session() as session:
    persons = session.query(Person).filter(Person.name.like('%Abegão')).all()

    for person in persons[:show_only]:
        print(person.to_kleio())
        print()

n$Estevão de Matos Abegão/m/id=140338/obs="""
      """

                  Id: 140338
                  Código de referência: PT/AUC/ELU/UC-AUC/B/001-001/A/000002

                  Nome        : Estevão de Matos Abegão
                  Data inicial: 1728-10-01
                  Data final  : 1733-07-03
                  Filiação: Lourenço de Matos Pereira
                  Naturalidade: Elvas
                  Faculdade: Cânones

                  Matrícula(s): 01.10.1729
                  01.10.1730
                  01.10.1732

                  Instituta: 01.10.1728

                  Bacharel: 03.07.1733
              """
  """
  rel$function-in-act/n/auc-alumni-A-140337-140771/20200211
  atr$código-de-referência/""PT/AUC/ELU/UC-AUC/B/001-001/A/000002""/2021-05-17
  atr$data-do-registo/2021-05-17/2021-05-17
  atr$url/""https://pesquisa.auc.uc.pt/details?id=140338""/2021-05-17
  ls$uc.entrada/1728-10-01/1728-10-01
  ls$uc.saida/1733-07-03/1733-07-03
  ls$uc.entrada.ano/1728/1728-1

### Search for people with specific attribute

In [None]:
from sqlalchemy import select, and_
from sqlalchemy.orm import joinedload
from timelink.api.models import Person, Attribute

show_only=10

with tlnb.db.session() as session:
    stmt = select(Person).join(Person.attributes.and_(Attribute.the_type == 'naturalidade',
                                                      Attribute.the_value.like('Soure')))
    print(stmt)
    persons = session.execute(stmt).scalars().all()

    for person in persons[:show_only]:
        p = tlnb.db.get_person(person.id)
        print(p.to_kleio())


SELECT persons.id, entities.id AS id_1, entities.class, entities.inside, entities.the_order, entities.the_level, entities.the_line, entities.groupname, entities.updated, entities.indexed, persons.name, persons.sex, persons.obs 
FROM entities JOIN persons ON entities.id = persons.id JOIN (entities AS entities_1 JOIN attributes AS attributes_1 ON attributes_1.id = entities_1.id) ON entities.id = attributes_1.entity AND attributes_1.the_type = :the_type_1 AND attributes_1.the_value LIKE :the_value_1
n$Anselmo Munhós de Abreu/m/id=140532/obs="""
      """

                  Id: 140532
                  Código de referência: PT/AUC/ELU/UC-AUC/B/001-001/A/000227

                  Nome        : Anselmo Munhós de Abreu
                  Data inicial: 1707-04-06
                  Data final  : 1719-01-01
                  Filiação: António Munhós de Abreu
                  Naturalidade: Soure
                  Faculdade: Medicina

                  Matrícula(s): 26.10.1707
                  01

  print(stmt)
  persons = session.execute(stmt).scalars().all()




###  Obter um dataframe a partir de atributos

---


###  Get a Dataframe from attributes


#### Exemplo: Faculdade, data de entrada e data de saída e grau dos naturais de Coimbra

In [None]:
from timelink.pandas import entities_with_attribute


# Get list of people with with a certain value in a specific attribute
df = entities_with_attribute(
                    entity_type='person',
                    the_type='naturalidade',
                    the_value='Soure',
                    more_info=['name'],
                    more_cols=['faculdade','uc.entrada','uc.saida'],
                    db=tlnb.db,
                    sql_echo=False)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 4 entries, 140554 to 140533
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   name               4 non-null      object
 1   naturalidade       4 non-null      object
 2   naturalidade.date  4 non-null      object
 3   naturalidade.obs   0 non-null      object
 4   faculdade          4 non-null      object
 5   faculdade.date     4 non-null      object
 6   faculdade.obs      0 non-null      object
 7   uc.entrada         4 non-null      object
 8   uc.entrada.date    4 non-null      object
 9   uc.entrada.obs     0 non-null      object
 10  uc.saida           4 non-null      object
 11  uc.saida.date      4 non-null      object
 12  uc.saida.obs       0 non-null      object
dtypes: object(13)
memory usage: 620.0+ bytes


In [None]:
df.head(5)

Unnamed: 0_level_0,name,naturalidade,naturalidade.date,naturalidade.obs,faculdade,faculdade.date,faculdade.obs,uc.entrada,uc.entrada.date,uc.entrada.obs,uc.saida,uc.saida.date,uc.saida.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
140554,António de Abreu,Soure,1681-10-01,,Cânones,1681-10-01,,1681-10-01,1681-10-01,,1690-06-21,1690-06-21,
140532,Anselmo Munhós de Abreu,Soure,1707-04-06,,Medicina,1707-04-06,,1707-04-06,1707-04-06,,1719-01-01,1719-01-01,
140712,Fernando Munhós de Abreu,Soure,1719-01-09,,Cânones,1719-01-09,,1719-01-09,1719-01-09,,1722-10-17,1722-10-17,
140533,Anselmo Munhós de Abreu,Soure,1737-10-01,,Leis,1737-10-01,,1737-10-01,1737-10-01,,1740-10-01,1740-10-01,




###  Remover colunas sem valores

---



###  Remove empty columns

In [None]:
df.dropna(how='all', axis=1, inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, 140349 to 140367
Data columns (total 10 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   name               14 non-null     object
 1   naturalidade       14 non-null     object
 2   naturalidade.date  14 non-null     object
 3   faculdade          14 non-null     object
 4   faculdade.date     14 non-null     object
 5   faculdade.obs      1 non-null      object
 6   uc.entrada         14 non-null     object
 7   uc.entrada.date    14 non-null     object
 8   uc.saida           14 non-null     object
 9   uc.saida.date      14 non-null     object
dtypes: object(10)
memory usage: 1.7+ KB


In [None]:
df.head(5)

Unnamed: 0_level_0,name,naturalidade,naturalidade.date,faculdade,faculdade.date,faculdade.obs,uc.entrada,uc.entrada.date,uc.saida,uc.saida.date
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
140349,António de Aboim,Coimbra,1566-12-20,Cânones,1566-12-20,,1566-12-20,1566-12-20,1574-07-24,1574-07-24
140671,Diogo de Abreu,Coimbra,1604-10-11,Cânones,1604-10-11,,1604-10-11,1604-10-11,1604-10-11,1604-10-11
140643,Bernardo de Abreu,Coimbra,1612-10-15,Cânones,1612-10-15,,1612-10-15,1612-10-15,1616-10-19,1616-10-19
140629,Bartolomeu de Abreu,Coimbra,1627-10-19,Cânones,1627-10-19,,1627-10-19,1627-10-19,1638-10-11,1638-10-11
140662,Cristóvão de Abreu,Coimbra,1660-01-08,Cânones,1660-01-08,,1660-01-08,1660-01-08,1662-10-22,1662-10-22



## Contagens

---

## Counting



###  Contagem de atributos a partir de uma tabela em memória

---

###  Count attributes from an existing dataframe



In [None]:
# create a column with the index values which are the id numbers
df['id'] = df.index.values

col = 'faculdade' # subotal by this column

# Use pandas groupby and specify unique value count for id
df_totals = df.groupby(col).agg({'id':'nunique',
                                                  'uc.entrada':'min',
                                                  'uc.saida':'max'})

df_totals.sort_values('id',ascending= False).head(30)

Unnamed: 0_level_0,id,uc.entrada,uc.saida
faculdade,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Cânones,7,0000-00-00,1767-07-27
Medicina,5,1670-10-01,1823-10-20
Direito,1,1748-10-19,1748-10-19
Matemática,1,1868-10-02,1872-10-03



### Contagens na base de dados

Quando o atributo tem muitos valores e não é necessário
ter todas as pessoas em memória: contagem feita na base de dados

---

### Counting directly in the database
When there are many values and it is not
necessary to have all the people in memory:
count directly in the database.




In [None]:
from timelink.pandas import attribute_values

df_totals = attribute_values('jesuita-entrada',db=tlnb.db)


In [None]:
df_totals.head(10)


Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
?,31,15550000,17581126
Coimbra,6,15420000,16770701
Goa,3,15480502,16660000
Lisboa,3,16680000,17460223
Avignon,2,17350731,17370927
Paris,2,16700926,16771029
Saragoça,2,16740613,16771115
Évora,2,15710304,17230000
Alcalá de Henares,1,15580413,15580413
Bolonha,1,17250728,17250728


#### Filtrar por datas

---

#### Filter by dates



In [None]:
df_totals = attribute_values('grau',dates_between=('1535','1600'),db=tlnb.db)

In [None]:
df_totals.head(10)

Unnamed: 0_level_0,count,date_min,date_max
value,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Bacharel,9,1554-07-19,1593-06-09
Formatura,6,1574-07-24,1593-05-12
Bacharel em Artes,3,1574-03-14,1588-03-12
Licenciado,3,1574-06-03,1597-05-06
Doutor,2,1560-12-22,1589-10-27
Licenciado em Artes,2,1574-05-15,1589-01-13


## Visualizar registos

---

## View records





### Search people by name, results in dataframe

In [None]:
from timelink.pandas import pname_to_df
people = pname_to_df(name='%Abegão%',
                     db=tlnb.db,)
people.head()

Unnamed: 0,id,name,sex,obs
0,140338,Estevão de Matos Abegão,m,"""""""\n\n Id: 140338\n Cód..."
1,140339,Jerónimo Rodrigues Abegão,m,"""""""\n\n Id: 140339\n Cód..."
2,140339-pai,Jerónimo Rodrigues Abegão,m,


### Visualizar uma pessoa

---

### View a person


#### Atributos de uma pessoa numa tabela, uma linha por attributo

---

#### Person attributes in a dataframe, one line per attribute

In [None]:
import pandas as pd
from timelink.pandas import group_attributes as person_attributes

pd.set_option('display.max_rows',1000)

id = '140338'
pdf = person_attributes([id],db=tlnb.db)  # note id in a list
pdf[['the_date','the_type','the_value','attr_obs']].sort_values(['the_date','the_type'])

Unnamed: 0_level_0,the_date,the_type,the_value,attr_obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
140338,1728-10-01,faculdade,Cânones,
140338,1728-10-01,faculdade.ano,Cânones.1728,
140338,1728-10-01,instituta,1728-10-01,01.10.1728 1728-10-01
140338,1728-10-01,instituta.ano,1728,01.10.1728 1728-10-01
140338,1728-10-01,naturalidade,Elvas,
140338,1728-10-01,naturalidade.ano,Elvas.1728,
140338,1728-10-01,nome,Estevão de Matos Abegão,
140338,1728-10-01,nome-geografico,Elvas,
140338,1728-10-01,nome-geografico.ano,Elvas.1728,
140338,1728-10-01,nome.apelido,Matos Abegão,


In [None]:
p = tlnb.db.get_person(id)
print(p.to_kleio())

n$Estevão de Matos Abegão/m/id=140338/obs="""
      """

                  Id: 140338
                  Código de referência: PT/AUC/ELU/UC-AUC/B/001-001/A/000002

                  Nome        : Estevão de Matos Abegão
                  Data inicial: 1728-10-01
                  Data final  : 1733-07-03
                  Filiação: Lourenço de Matos Pereira
                  Naturalidade: Elvas
                  Faculdade: Cânones

                  Matrícula(s): 01.10.1729
                  01.10.1730
                  01.10.1732

                  Instituta: 01.10.1728

                  Bacharel: 03.07.1733
              """
  """
  rel$function-in-act/n/auc-alumni-A-140337-140771/20200211
  atr$código-de-referência/""PT/AUC/ELU/UC-AUC/B/001-001/A/000002""/2021-05-17
  atr$data-do-registo/2021-05-17/2021-05-17
  atr$url/""https://pesquisa.auc.uc.pt/details?id=140338""/2021-05-17
  ls$uc.entrada/1728-10-01/1728-10-01
  ls$uc.saida/1733-07-03/1733-07-03
  ls$uc.entrada.ano/1728/1728-1

#### Pessoas com determinados atributos, em colunas

Permite criar uma tabela com vários attributos em colunas

---

#### Person attributes in a dataframe, attributes in columns

Generate a table of people with attributes in columns

In [None]:
# Get list of people with with a certain value in a specific attribute
from timelink.pandas import entities_with_attribute

id = 140338
# id = '316297'  # João Pedro Ribeiro
# id = '234295'  # Alexandre Metelo de Sousa
df = entities_with_attribute(
                    entity_type='person',
                    the_type='faculdade',  # we need a base attribute
                    the_value='Teologia',
                    show_elements=['name'],
                    more_attributes=['naturalidade','uc.entrada','uc.saida'],
                    db=tlnb.db,
                    sql_echo=False)
view_cols = ['name','naturalidade','uc.entrada','faculdade','uc.saida','faculdade.obs']
df.info()
df[view_cols].sort_values(['name','uc.entrada'])

<class 'pandas.core.frame.DataFrame'>
Index: 29 entries, 140719 to 269145
Data columns (total 13 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   name               29 non-null     object
 1   faculdade          29 non-null     object
 2   faculdade.date     29 non-null     object
 3   faculdade.obs      3 non-null      object
 4   naturalidade       22 non-null     object
 5   naturalidade.date  22 non-null     object
 6   naturalidade.obs   0 non-null      object
 7   uc.entrada         29 non-null     object
 8   uc.entrada.date    29 non-null     object
 9   uc.entrada.obs     0 non-null      object
 10  uc.saida           29 non-null     object
 11  uc.saida.date      29 non-null     object
 12  uc.saida.obs       0 non-null      object
dtypes: object(13)
memory usage: 4.2+ KB


Unnamed: 0_level_0,name,naturalidade,uc.entrada,faculdade,uc.saida,faculdade.obs
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
140511,Abílio Ferreira Gomes de Abreu,"Cuenca, Coimbra",1854-10-02,Teologia,1856-07-09,"""Teologia#29"""
140516,Agostinho Machado de Abreu,Gesteira,1721-10-01,Teologia,1726-10-01,
140375,Alexandre de Brito e Sousa Abranches,Santa Ovaia,1863-10-12,Teologia,1867-06-12,
140534,Antão de Abreu,,1638-07-20,Teologia,1639-11-20,
140350,António Camacho Guerreiro de Brito e Aboim,Mértola,1770-10-01,Teologia,1770-10-01,
140545,António Dinis de Abreu,Midões,1695-12-09,Teologia,1699-10-01,
140593,António Lopes Coelho de Abreu,Azóia,1861-10-11,Teologia,1864-06-20,
140552,António de Abreu,Sertã,1603-11-03,Teologia,1605-10-26,
140553,António de Abreu,Setúbal,1626-11-19,Teologia,1626-11-19,
140395,Augusto Cândido de Abranches,Moçambique,1852-10-12,Teologia,1861-05-27,


#### Examinar potenciais duplicados

---

#### Examine potentital duplicates

In [9]:
import pandas as pd
from timelink.pandas.group_attributes import display_group_attributes, group_attributes
pd.set_option('display.max_rows',250)

no_show=['código-de-referência','data-do-registo','url','faculdade.ano','naturalidade.ano',
         'matricula-faculdade.ano','nome.apelido','nome.primeiro','nome-geografico.ano',
         'grau.ano','matricula-outra.ano','matricula.ano','nome-geografico','instituta.ano',
         'uc.entrada.ano','uc.saida.ano']

dup_ids = ['140490','140503',]

display_group_attributes(dup_ids,
                            entity_type='person',
                            header_elements=['name'],
                            #header_attributes=['nome.nota','uc.entrada'],
                             exclude_attributes=no_show,
                             sort_attributes=['the_type','the_value','the_date',],
                             db=tlnb.db)

Unnamed: 0,name,the_type,the_type.date,the_type.obs,id
0,Manuel de Abrantes,1705-10-01,1705-10-01,,140490
1,Manuel de Abrantes,1706-10-01,1706-10-01,,140503


Unnamed: 0,name,the_type,the_value,the_date,attr_obs,id
0,Manuel de Abrantes,faculdade,Teologia,1705-10-01,,140490
1,Manuel de Abrantes,faculdade,Teologia,1706-10-01,,140503
2,Manuel de Abrantes,matricula,Teologia,1705-10-01,01.10.1705,140490
3,Manuel de Abrantes,matricula,Teologia,1706-10-01,01.10.1706,140503
4,Manuel de Abrantes,matricula,Teologia,1707-10-01,01.10.1707,140503
5,Manuel de Abrantes,matricula,Teologia,1708-10-01,01.10.1708,140503
6,Manuel de Abrantes,naturalidade,Vila de Matos,1706-10-01,,140503
7,Manuel de Abrantes,nome,Manuel de Abrantes,1705-10-01,,140490
8,Manuel de Abrantes,nome,Manuel de Abrantes,1706-10-01,,140503
9,Manuel de Abrantes,nome.nota,padre,1705-10-01,,140490
