# Determine the active ingredients of RXCUIs associated with National Drug Codes

2019-04-22

Now that we have managed to convert NDCs into RXCUIs, we have an entry point into the RxNorm semantic network.
We will use the relationships contained in the semantic network to determine what the active ingredients of each NDC are.

## Version 1

Algorithm for the first version: we assume that
1. All drugs only have a single active ingredient
2. The active ingredient for each RXCUI follows this pattern:

`(RXCUI of NDC) -> [has_ingredient] -> (intermediate node) -> [has_precise_ingredient] -> (RXCUI of active ingredient)`


### Rationale

I used the drug Razadyne as a starting point for exploring the semantic network.
After looking at all the edges which linked the nodes, I managed to explore my way to the active ingredient using these two particular edges.
This seemed to work for the six razadyne drugs.

In [1]:
import pandas as pd
from collections import deque
from collections import defaultdict

## Read semantic network and convert to adjacency list

In [2]:
def get_adj_list(rels_fname):
    rel_table = pd.read_csv(rels_fname, sep='\t')
    
    adj_list = defaultdict(lambda: defaultdict(list))
    for row in rel_table.itertuples():
        adj_list[row.rxcui2][row.rela].append(row.rxcui1)
        
    return adj_list

In [3]:
fname = "../../pipeline/rxnorm/rxcui_rels.tsv"

adj_list = get_adj_list(fname)

In [4]:
len(adj_list)

196300

In [5]:
adj_list[602734]

defaultdict(list,
            {'has_dose_form': [317541],
             'consists_of': [330343, 602732],
             'tradename_of': [579148],
             'has_ingredient': [583099],
             'isa': [602733, 1178299, 1178300]})

---

## Determine active ingredients

In [6]:
def get_ingredient(start_node):
    """
    Get active ingredient.
    """
    
    # different format
    if "has_ingredient" not in adj_list[start_node]:
        return -90000
    
    # multiple ingredients
    if len(adj_list[start_node]["has_ingredient"]) > 1:
        return -7000
    
    assert len(adj_list[start_node]["has_ingredient"]) == 1

    
    
    neighbour = adj_list[start_node]["has_ingredient"][0]
    
    if "has_precise_ingredient" not in adj_list[neighbour]:
        return -500
    
    if len(adj_list[neighbour]["has_precise_ingredient"]) > 1:
        return -40
    
    assert len(adj_list[neighbour]["has_precise_ingredient"]) == 1
    
    return adj_list[neighbour]["has_precise_ingredient"][0]

---

## Read all NDCs with RXCUIs

In [7]:
data = pd.read_csv("../../pipeline/merged_ndc_info.tsv", sep='\t')

In [8]:
data.shape

(265692, 22)

In [9]:
data.head(2)

Unnamed: 0,rxcui,rxaui,NDCPACKAGECODE,suppress,PRODUCTID,PRODUCTNDC,PACKAGEDESCRIPTION,PRODUCTTYPENAME,PROPRIETARYNAME,NONPROPRIETARYNAME,...,MARKETINGCATEGORYNAME,APPLICATIONNUMBER,LABELERNAME,SUBSTANCENAME,ACTIVE_NUMERATOR_STRENGTH,ACTIVE_INGRED_UNIT,PHARM_CLASSES,DEASCHEDULE,NDC_EXCLUDE_FLAG,LISTING_RECORD_CERTIFIED_THROUGH
0,91349,3507080,12745-202-01,N,12745-202_7d063901-255c-bffc-e053-2a91aa0a91ee,12745-202,"59 mL in 1 BOTTLE, PLASTIC (12745-202-01)",HUMAN OTC DRUG,HYDROGEN PEROXIDE,HYDROGEN PEROXIDE,...,OTC MONOGRAPH NOT FINAL,part333A,Medical Chemical Corporation,HYDROGEN PEROXIDE,8.57,g/100mL,,,N,20191231.0
1,91349,3507080,12745-202-02,N,12745-202_7d063901-255c-bffc-e053-2a91aa0a91ee,12745-202,"118 mL in 1 BOTTLE, PLASTIC (12745-202-02)",HUMAN OTC DRUG,HYDROGEN PEROXIDE,HYDROGEN PEROXIDE,...,OTC MONOGRAPH NOT FINAL,part333A,Medical Chemical Corporation,HYDROGEN PEROXIDE,8.57,g/100mL,,,N,20191231.0


## Filter out relevant data

We just need the starting RXCUI.

In [10]:
drugs = (data
    [["rxcui"]]
    .drop_duplicates()
    .sort_values("rxcui")
    .reset_index(drop=True)
)

In [11]:
drugs.shape

(41576, 1)

In [12]:
drugs.head()

Unnamed: 0,rxcui
0,91349
1,91792
2,92582
3,92583
4,92584


We will analyze our algorithm's performance in another notebook.
For the purposes of finding the active ingredients we only need the starting RXCUI associated with the NDC.

# Use the semantic network to determine the active ingredients

In [13]:
ingredients = drugs.assign(
    active_ingredients = lambda df: df["rxcui"].map(get_ingredient)
)

In [14]:
ingredients.shape

(41576, 2)

In [15]:
ingredients.head()

Unnamed: 0,rxcui,active_ingredients
0,91349,-90000
1,91792,-500
2,92582,30145
3,92583,30145
4,92584,30145


## Save results to file

In [16]:
ingredients.to_csv("../../pipeline/ingredients/ndc_active_ingredients_version_1.tsv", sep='\t', index=False)