In [1]:
import pandas as pd
from groupings import Groupings

### Load the pickled Groupings object

In [2]:
g = Groupings.read_pickle('sample_data', 'cosine', 0.4)

### There are 10 supergroups created from the sample data

In [3]:
g.supergroups

34    {(sample_data, NCT00398931), (sample_data, NCT...
30    {(sample_data, 29066877), (sample_data, 189791...
27    {(sample_data, NCT00537680), (sample_data, 224...
26    {(sample_data, NCT00537680), (sample_data, 127...
2     {(ema, 1094), (sample_data, 23907995), (sample...
24    {(sample_data, NCT01674621), (sample_data, NCT...
37    {(sample_data, NCT00537680), (sample_data, 224...
1     {(sample_data, NCT01674621), (sample_data, 253...
8     {(sample_data, 17988688), (sample_data, 114063...
0                     {(ema, 48), (ema, 32), (ema, 44)}
Name: supergroups, dtype: object

### The sizes of the supergroups

In [4]:
g.supergroup_sizes

34    48
30    32
27    11
26    11
2     10
24     9
37     8
1      8
8      3
0      3
Name: supergroups, dtype: int64

### Let's explore one of the supergroups (idx 24)

#### Here are all the records that belong to it:

In [5]:
g.get_records_per_group(24)

{('ema', 1039),
 ('sample_data', '27612281'),
 ('sample_data', '27826127'),
 ('sample_data', '28160873'),
 ('sample_data', 'NCT01343004'),
 ('sample_data', 'NCT01657162'),
 ('sample_data', 'NCT01674621'),
 ('sample_data', 'NCT03512262'),
 ('sample_data', 'NCT03710889')}

#### Here are the CUI's of these records:

In [6]:
ema = pd.read_pickle('../data/ema_cuis.pkl')
sample_cuis = pd.read_pickle('../data/cuis/batch_0.pkl')

In [7]:
idx = list()
for tup in g.get_records_per_group(24):
    if tup[0] == 'sample_data':
        idx.append(tup[1])
sample_cuis.loc[idx]

Unnamed: 0,ent_text_disease,ent_text_drug,disease_cuis,drug_cuis
NCT01674621,[osteoporosis],"[ba058, abaloparatide]",{C0029456},{C4042342}
NCT01657162,[osteoporosis],"[ba058, abaloparatide, ba058-05-003]",{C0029456},{C4042342}
NCT03512262,[osteoporosis],[abaloparatide],{C0029456},{C4042342}
27826127,"[osteoporosis, fractures, bone marrow abnormal...","[abaloparatide, teriparatide]","{C0029456, C0016059, C4540463, C0016658}","{C0070093, C4042342}"
27612281,"[fractures, osteoporosis, vertebral fractures,...","[amino acid, abaloparatide]","{C0029456, C0080179, C0016658}","{C0002520, C4042342}"
28160873,"[osteoporosis, vertebral fractures, nonvertebr...","[alendronate, abaloparatide, aln]","{C0029456, C0016658}","{C0102118, C4042342}"
NCT03710889,[osteoporosis],[abaloparatide],{C0029456},{C4042342}
NCT01343004,"[fracture, osteoporosis, fractures]","[ba058, abaloparatide]","{C0029456, C0016658}",{C4042342}


In [8]:
ema.loc[[1039]]

Unnamed: 0_level_0,active_substance,disease_name,authoriz_status,disease_cuis,drug_cuis
Unnamed: 0,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1039,abaloparatide,Osteoporosis,Refused,{C0029456},{C4042342}


#### Looks correct!

All the records are about osteoporosis (C4042342) and the active substance abaloparatide (C0029456).

- Here is the [EMA authorization](https://www.ema.europa.eu/en/medicines/human/EPAR/eladynos)

- Here is one of the grouped CTgov records [NCT01674621](https://clinicaltrials.gov/ct2/show/NCT01674621?term=NCT01674621)

- Here is one of the grouped PubMed records [27826127](https://pubmed.ncbi.nlm.nih.gov/27826127/)