Let us take an example of a data from the NIH API. We have a patent with a person with a name Aagaard M. Kjersti and an email aagaardt@bcm.edu along with an NIH ID 8196581. This person's specialty is Obstetrics and Gynecology, and this person has made a patent under the title SYSTEM AND METHODS FOR FUNCTIONAL IMAGING OF THE PLACENTA. In addition, this person is affiliated with Baylor College of Medicine.

Just like the OpenAlex data, let us first upload the data for each entity (a person, and organization, a bioentity, and a work) to their own tables.

In [1]:
import sys
import os
module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)
from SQLConnect import connect_and_query
from SQLConnect import insert_query_dict

In [2]:
person = {
    'origin_database': 'NIH Demo',
    'email': 'aagaardt@bcm.edu',
    'phone': None,
    'name': 'Aagaard M. Kjersti',
    'first_name': 'Aagaard',
    'middle_name': 'M.',
    'last_name': 'Kjersti',
    'nih_id': 8196581
}
person_query = [insert_query_dict('People', person)]

In [3]:
bio = [
    {
        'origin_database': 'NIH Demo',
        'name': 'Obstetrics',
    },
    {
        'origin_database': 'NIH Demo',
        'name': 'Gynecology',
    }
]
bio_query = [insert_query_dict('Bioentity', rec) for rec in bio]

In [4]:
org = {
    'origin_database': 'NIH Demo',
    'name': 'Baylor College of Medicine',
    'funding': None
}
org_query = [insert_query_dict('Org', org)]

In [5]:
work = {
    'origin_database': 'NIH Demo',
    'title': 'SYSTEM AND METHODS FOR FUNCTIONAL IMAGING OF THE PLACENTA',
    'start_date': None,
    'end_date': None,
    'type': 'Patent',
    'pmid': None
}
work_query = [insert_query_dict('Work', work)]

In [6]:
queries = person_query + org_query + bio_query + work_query
queries

['INSERT INTO People (origin_database, email, phone, name, first_name, middle_name, last_name, nih_id) VALUES ("NIH Demo", "aagaardt@bcm.edu", NULL, "Aagaard M. Kjersti", "Aagaard", "M.", "Kjersti", 8196581);',
 'INSERT INTO Org (origin_database, name, funding) VALUES ("NIH Demo", "Baylor College of Medicine", NULL);',
 'INSERT INTO Bioentity (origin_database, name) VALUES ("NIH Demo", "Obstetrics");',
 'INSERT INTO Bioentity (origin_database, name) VALUES ("NIH Demo", "Gynecology");',
 'INSERT INTO Work (origin_database, title, start_date, end_date, type, pmid) VALUES ("NIH Demo", "SYSTEM AND METHODS FOR FUNCTIONAL IMAGING OF THE PLACENTA", NULL, NULL, "Patent", NULL);']

In [7]:
connect_and_query(queries, ['INSERT' for _ in queries], 'UnmergedV1')

Connection to database established
MySQL connection is closed


[]

Now, we can go ahead an upload the connectivity data, i.e. that Aagaard is affiliated with Baylor, specializes in Obstetrics and Gynecology, and did that patent. First, we need the IDs of each of these entities

In [8]:
get_id_person = 'SELECT people_id, name FROM People WHERE origin_database = "NIH Demo"'
get_id_bio = 'SELECT bio_id, name FROM Bioentity WHERE origin_database = "NIH Demo"'
get_id_org = 'SELECT org_id, name FROM Org WHERE origin_database = "NIH Demo"'
get_id_work = 'SELECT work_id, title FROM Work WHERE origin_database = "NIH Demo"'
get_id_queries = [get_id_person, get_id_bio, get_id_work, get_id_org]

In [9]:
ids = connect_and_query(get_id_queries, ['SELECT' for _ in range(4)], 'UnmergedV1')
ids

Connection to database established
MySQL connection is closed


[[(9064, 'Aagaard M. Kjersti')],
 [(2708, 'Obstetrics'), (2709, 'Gynecology')],
 [(12587, 'SYSTEM AND METHODS FOR FUNCTIONAL IMAGING OF THE PLACENTA')],
 [(3139, 'Baylor College of Medicine')]]

In [10]:
people_to_id = {}
for rec in ids[0]:
    people_to_id[rec[1]] = rec[0]

In [11]:
bio_to_id = {}
for rec in ids[1]:
    bio_to_id[rec[1]] = rec[0]
bio_to_id

{'Obstetrics': 2708, 'Gynecology': 2709}

In [12]:
work_to_id = {}
for rec in ids[2]:
    work_to_id[rec[1]] = rec[0]

In [13]:
org_to_id = {}
for rec in ids[3]:
    org_to_id[rec[1]] = rec[0]

In [14]:
people_org = {
    'people_id': people_to_id['Aagaard M. Kjersti'],
    'org_id': org_to_id['Baylor College of Medicine'],
    'year': None
}

people_spec = [
    {
        'people_id': people_to_id['Aagaard M. Kjersti'],
        'bio_id': bio_to_id['Obstetrics']
    },
    {
        'people_id': people_to_id['Aagaard M. Kjersti'],
        'bio_id': bio_to_id['Gynecology']
    }
]

work_people = {
    'people_id': people_to_id['Aagaard M. Kjersti'],
    'work_id': work_to_id['SYSTEM AND METHODS FOR FUNCTIONAL IMAGING OF THE PLACENTA']
}

In [15]:
queries = [insert_query_dict('PeopleOrg', people_org)] +\
    [insert_query_dict('PeopleSpec', rec) for rec in people_spec] +\
        [insert_query_dict('WorkPeople', work_people)]
queries

['INSERT INTO PeopleOrg (people_id, org_id, year) VALUES (9064, 3139, NULL);',
 'INSERT INTO PeopleSpec (people_id, bio_id) VALUES (9064, 2708);',
 'INSERT INTO PeopleSpec (people_id, bio_id) VALUES (9064, 2709);',
 'INSERT INTO WorkPeople (people_id, work_id) VALUES (9064, 12587);']

In [16]:
connect_and_query(queries, ['INSERT' for _ in queries], 'UnmergedV1')

Connection to database established
MySQL connection is closed


[]