# Infer-Retrieve-Rerank Llama Pack

<a href="https://colab.research.google.com/github/run-llama/llama-hub/blob/main/llama_hub/llama_packs/research/infer_retrieve_rerank/infer_retrieve_rerank.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This is our implementation of the paper ["In-Context Learning for Extreme Multi-Label Classification](https://arxiv.org/pdf/2401.12178.pdf) by Oosterlinck et al.

The paper proposes "infer-retrieve-rerank", a simple paradigm using frozen LLM/retriever models that can do "extreme"-label classification (the label space is huge).
1. Given a user query, use an LLM to predict an initial set of labels.
2. For each prediction, retrieve the actual label from the corpus.
3. Given the final set of labels, rerank them using an LLM.

All of these can be implemented as LlamaIndex abstractions. In this notebook we show you how to build "infer-retrieve-rerank" from scratch but also how to build it as a LlamaPack.

## Try out a Dataset

We use the BioDEX dataset as mentioned in the paper.

Here is the [link to the paper](https://arxiv.org/pdf/2305.13395.pdf). Here is the [link to the Github repo](https://github.com/KarelDO/BioDEX).

In [11]:
import os
import logging
from dotenv import load_dotenv
import datasets
from llama_index import get_tokenizer
import re
from typing import Set, List

In [12]:
logging.basicConfig(level=logging.INFO)
log = logging.getLogger(__name__)

load_dotenv()

True

In [13]:
openai_api_key = os.getenv("OPENAI_API_KEY")
langchain_api_url = os.getenv("LANGCHAIN_ENDPOINT")
langchain_api_key = os.getenv("LANGCHAIN_API_KEY")

In [3]:
# load the report-extraction dataset
dataset = datasets.load_dataset("BioDEX/BioDEX-ICSR")

In [4]:
dataset

DatasetDict({
    train: Dataset({
        features: ['title', 'abstract', 'fulltext', 'target', 'pmid', 'fulltext_license', 'title_normalized', 'issue', 'pages', 'journal', 'authors', 'pubdate', 'doi', 'affiliations', 'medline_ta', 'nlm_unique_id', 'issn_linking', 'country', 'mesh_terms', 'publication_types', 'chemical_list', 'keywords', 'references', 'delete', 'pmc', 'other_id', 'safetyreportid', 'fulltext_processed'],
        num_rows: 9624
    })
    validation: Dataset({
        features: ['title', 'abstract', 'fulltext', 'target', 'pmid', 'fulltext_license', 'title_normalized', 'issue', 'pages', 'journal', 'authors', 'pubdate', 'doi', 'affiliations', 'medline_ta', 'nlm_unique_id', 'issn_linking', 'country', 'mesh_terms', 'publication_types', 'chemical_list', 'keywords', 'references', 'delete', 'pmc', 'other_id', 'safetyreportid', 'fulltext_processed'],
        num_rows: 2407
    })
    test: Dataset({
        features: ['title', 'abstract', 'fulltext', 'target', 'pmid', 'fulltext

### Define Dataset Processing Functions

Here we define some basic functions to get the set of reactions (labels) and samples from the BioDEX dataset.

In [5]:
tokenizer = get_tokenizer()


sample_size = 5


def get_reactions_row(raw_target: str) -> List[str]:
    """Get reactions from a single row."""
    reaction_pattern = re.compile(r"reactions:\s*(.*)")
    reaction_match = reaction_pattern.search(raw_target)
    if reaction_match:
        reactions = reaction_match.group(1).split(",")
        reactions = [r.strip().lower() for r in reactions]
    else:
        reactions = []
    return reactions


def get_reactions_set(dataset) -> Set[str]:
    """Get set of all reactions."""
    reactions = set()
    for data in dataset["train"]:
        reactions.update(set(get_reactions_row(data["target"])))
    return reactions


def get_samples(dataset, sample_size: int = 5):
    """Get processed sample.

    Contains source text and also the reaction label.

    Parse reaction text to specifically extract reactions.

    """
    samples = []
    for idx, data in enumerate(dataset["train"]):
        if idx >= sample_size:
            break
        text = data["fulltext_processed"]
        raw_target = data["target"]

        reactions = get_reactions_row(raw_target)

        samples.append({"text": text, "reactions": reactions})
    return samples

## Use LlamaPack

In this first section we use our infer-retrieve-rerank LlamaPack to output predicted labels.

In [6]:
# Option: if developing with the llama_hub package
# from llama_hub.llama_packs.research.infer_retrieve_rerank.base import InferRetrieveRerankPack

# # Option: download_llama_pack
from llama_index.llama_pack import download_llama_pack

InferRetrieveRerankPack = download_llama_pack(
    "InferRetrieveRerankPack",
    "./irr_pack",
    # leave the below line commented out if using the notebook on main
    # llama_hub_url="https://raw.githubusercontent.com/run-llama/llama-hub/jerry/add_infer_retrieve_rerank/llama_hub"
)

In [7]:
from llama_index.llms import OpenAI

llm = OpenAI(model="gpt-3.5-turbo-0125")
pred_context = """\
The output predictions should be a list of comma-separated adverse \
drug reactions. \
"""
reranker_top_n = 10

pack = InferRetrieveRerankPack(
    get_reactions_set(dataset),
    llm=llm,
    pred_context=pred_context,
    reranker_top_n=reranker_top_n,
    verbose=True,
)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embedding

In [8]:
samples = get_samples(dataset, sample_size=5)
pred_reactions = pack.run(inputs=[s["text"] for s in samples])
gt_reactions = [s["reactions"] for s in samples]



> Generating predictions for input 0: TITLE:
SARS-CoV-2-related ARDS in a maintenance hemodialysis patient: case report on tailored approach by daily hemodialysis, noninvasive ventilation, tocilizumab, anxiolytics, and point-of-care ultrasound.

ABSTRACT:
Without rescue drugs approved, holistic approach by daily hemodialysis, noninvasiv


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/cha

> Generated predictions: ['acute respiratory distress syndrome', 'fluid overload', 'hypertensive crisis', 'anxiety', 'respiratory distress', 'delirium', 'hypereosinophilic syndrome', 'eosinophilia', 'cardiovascular insufficiency', 'hypovolaemia']


> Generating predictions for input 1: TITLE:
Corynebacterium propinquum: A Rare Cause of Prosthetic Valve Endocarditis.

ABSTRACT:
Nondiphtheria Corynebacterium species are often dismissed as culture contaminants, but they have recently become increasingly recognized as pathologic organisms. We present the case of a 48-year-old male pat


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


> Generated predictions: ['tuberculosis', 'anger']


> Generating predictions for input 2: TITLE:
A Case of Pancytopenia with Many Possible Causes: How Do You Tell Which is the Right One?

ABSTRACT:
Systemic lupus erythematosus (SLE) often presents with cytopenia(s); however, pancytopenia is found less commonly, requiring the consideration of possible aetiologies other than the primary di


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"


> Generated predictions: ['neutropenia', 'leukopenia', 'thrombocytopenia', 'agranulocytosis', 'granulocytopenia', 'lymphopenia', 'neutropenic infection', 'myelosuppression', 'thrombocytosis', 'myelopathy']


> Generating predictions for input 3: TITLE:
Hepatic Lesions with Secondary Syphilis in an HIV-Infected Patient.

ABSTRACT:
Syphilis among HIV-infected patients continues to be a public health concern, especially in men who have sex with men. The clinical manifestations of syphilis are protean; syphilitic hepatitis is an unusual complic


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/emb

> Generated predictions: ['anger', 'pain', 'chills', 'cold sweat', 'tachycardia', 'atrial tachycardia', 'jarisch-herxheimer reaction', 'meningoencephalitis herpetic', 'vision blurred', 'optic discs blurred']


> Generating predictions for input 4: TITLE:
Managing Toe Walking, a Treatment Side Effect, in a Child With T-Cell Non-Hodgkin's Lymphoma: A Case Report.

ABSTRACT:
Background and Purpose: Children who have survived cancer are at risk of experiencing adverse effects of the cancer or its treatments. One of the adverse effects may be the 


INFO:httpx:HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/emb

> Generated predictions: ['hepatosplenomegaly', 'spleen disorder', 'lymph node pain', 'night sweats', 'weight decreased', 'lymphadenopathy', 'skin disorder', 'lymphadenopathy', 'bone marrow disorder', 'blood disorder']


In [9]:
pred_reactions[2]

['neutropenia',
 'leukopenia',
 'thrombocytopenia',
 'agranulocytosis',
 'granulocytopenia',
 'lymphopenia',
 'neutropenic infection',
 'myelosuppression',
 'thrombocytosis',
 'myelopathy']

In [10]:
gt_reactions[2]

['bone marrow toxicity',
 'cytomegalovirus infection',
 'cytomegalovirus mucocutaneous ulcer',
 'febrile neutropenia',
 'leukoplakia',
 'odynophagia',
 'oropharyngeal candidiasis',
 'pancytopenia',
 'product use issue',
 'red blood cell poikilocytes present',
 'vitamin d deficiency']

## Define Infer-Retrieve-Rerank Pipeline

Here we define the core components needed for the full infer-retrieve-rerank pipeline. 

Refer to the [paper](https://arxiv.org/pdf/2401.12178.pdf) for more details. The paper implements it in DSPy, here we adapt an implementation with LlamaIndex abstractions. As a result the specific implementations (e.g. prompts, output parsing modules, reranking module) are different even though the conceptually we follow similar steps.

Our implementation uses fixed models, and does not do automatic distillation between teacher and student.

In [14]:
from llama_index.retrievers import BaseRetriever
from llama_index.llms.llm import LLM
from llama_index.llms import OpenAI
from llama_index.prompts import PromptTemplate
from llama_index.query_pipeline import QueryPipeline
from llama_index.postprocessor.types import BaseNodePostprocessor
from llama_index.postprocessor.rankGPT_rerank import RankGPTRerank
from llama_index.output_parsers import ChainableOutputParser
from typing import List

#### Index each Reaction with a Vector Index

Since the set of reactions is quite large, we can define a vector index over all reactions. That way we can retrieve the top k most semantically similar reactions to any prediction.

In [16]:
import random

all_reactions = get_reactions_set(dataset)
# random.sample(all_reactions, 5)

In [17]:
from llama_index.schema import TextNode
from llama_index.embeddings import OpenAIEmbedding
from llama_index.ingestion import IngestionPipeline
from llama_index import VectorStoreIndex

reaction_nodes = [TextNode(text=r) for r in all_reactions]
pipeline = IngestionPipeline(transformations=[OpenAIEmbedding()])
reaction_nodes = await pipeline.arun(documents=reaction_nodes)

index = VectorStoreIndex(reaction_nodes)

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embedding

In [18]:
reaction_nodes[0].embedding

[-0.01811717078089714,
 -0.03355604037642479,
 0.021529430523514748,
 -0.04465554282069206,
 -0.013970952481031418,
 0.0057557751424610615,
 -0.029538584873080254,
 -0.02438800409436226,
 -0.014550392515957355,
 -0.020808350294828415,
 0.012792756780982018,
 -6.126375228632241e-05,
 -0.01785964146256447,
 0.004323269240558147,
 -0.003962728660553694,
 -0.007597107905894518,
 0.03950496017932892,
 -0.0051924302242696285,
 0.012644677422940731,
 -0.011949349194765091,
 -0.01608269102871418,
 0.01854209415614605,
 -0.004059302154928446,
 -0.00012222571240272373,
 -0.0077065578661859035,
 0.003914441913366318,
 -0.02172257751226425,
 -0.04161670058965683,
 -0.004210600629448891,
 -0.01035910751670599,
 0.01979110948741436,
 -0.01793690025806427,
 -0.02507045492529869,
 0.0036440363619476557,
 0.006702194456011057,
 -0.0015966802602633834,
 0.0014952782075852156,
 0.008884753100574017,
 0.010262534022331238,
 -0.01050074864178896,
 0.00669575622305274,
 0.00518921110779047,
 0.0113248415291

In [19]:
reaction_retriever = index.as_retriever(similarity_top_k=2)

In [20]:
nodes = reaction_retriever.retrieve("abdominal")
print([n.get_content() for n in nodes])

INFO:httpx:HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK"


['abdominal pain', 'abdominal symptom']


#### Define Infer Prompt

We define an infer prompt that given a document and relevant task context, can generate a list of comma-separated predictions.

**NOTE**: This is our own prompt and not taken from the paper.

In [21]:
infer_prompt_str = """\

Your job is to output a list of predictions given context from a given piece of text. The text context,
and information regarding the set of valid predictions is given below. 

Return the predictions as a comma-separated list of strings.

Text Context:
{doc_context}

Prediction Info:
{pred_context}

Predictions: """

infer_prompt = PromptTemplate(infer_prompt_str)

#### Define Output Parser

We define a very simple output parser that can parse an output into a list of strings.

In [22]:
class PredsOutputParser(ChainableOutputParser):
    """Predictions output parser."""

    def parse(self, output: str) -> List[str]:
        """Parse predictions."""
        tokens = output.split(",")
        return [t.strip() for t in tokens]


preds_output_parser = PredsOutputParser()

#### Define Rerank Prompt

Here we define a rerank prompt that will reorder a batch of labels based on their relevance to the query.

In [23]:
rerank_str = """\
Given a piece of text, rank the {num} labels above based on their relevance \
to this piece of text. The labels \
should be listed in descending order using identifiers. \
The most relevant labels should be listed first. \
The output format should be [] > [], e.g., [1] > [2]. \
Only response the ranking results, \
do not say any word or explain. \

Here is a given piece of text: {query}. 

"""
rerank_prompt = PromptTemplate(rerank_str)

#### Define Infer-Retrieve-Rerank Function

We define the infer-retrieve-rerank steps as a function.

In [24]:
def infer_retrieve_rerank(
    query: str,
    retriever: BaseRetriever,
    llm: LLM,
    pred_context: str,
    reranker_top_n: int = 3,
):
    """Infer retrieve rerank."""
    infer_prompt_c = infer_prompt.as_query_component(
        partial={"pred_context": pred_context}
    )
    infer_pipeline = QueryPipeline(chain=[infer_prompt_c, llm, preds_output_parser])
    preds = infer_pipeline.run(query)

    print(f"PREDS: {preds}")
    all_nodes = []
    for pred in preds:
        nodes = retriever.retrieve(str(pred))
        all_nodes.extend(nodes)

    reranker = RankGPTRerank(
        llm=llm,
        top_n=reranker_top_n,
        rankgpt_rerank_prompt=rerank_prompt,
        # verbose=True,
    )
    reranked_nodes = reranker.postprocess_nodes(all_nodes, query_str=query)
    return [n.get_content() for n in reranked_nodes]

## Run Over Sample Data

Now we're ready to run over some sample data! 

In [25]:
samples = get_samples(dataset, sample_size=5)

In [34]:
reaction_retriever = index.as_retriever(similarity_top_k=2)
llm = OpenAI(model="gpt-3.5-turbo-0125")
pred_context = """\
The output predictions should be a list of comma-separated adverse \
drug reactions. \
"""

reranker_top_n = 10

pred_reactions = []
gt_reactions = []
for idx, sample in enumerate(samples):
    print(idx)
    cur_pred_reactions = infer_retrieve_rerank(
        sample["text"],
        reaction_retriever,
        llm,
        pred_context,
        reranker_top_n=reranker_top_n,
    )
    cur_gt_reactions = sample["reactions"]

    pred_reactions.append(cur_pred_reactions)
    gt_reactions.append(cur_gt_reactions)

0
PREDS: ['fluid overload', 'acute respiratory distress syndrome', 'anxiety', 'myocardial insufficiency', 'hypervolemia', 'hypovolemia', 'respiratory distress', 'allergic reaction', 'diarrhea', 'rash']
1
PREDS: ['fever', 'dizziness', 'dyspnea on exertion', 'intermittent chest pain', 'palpitations']
2
PREDS: ['azathioprine-induced myelotoxicity', 'drug-induced agranulocytosis']
3
PREDS: ['There is no information provided about adverse drug reactions in the given text context. Therefore', 'it is not possible to make any predictions about adverse drug reactions.']
4
PREDS: ['painful swelling in lymph nodes', 'weight loss', 'night sweats', 'hepatosplenomegaly', 'generalized lymphadenopathy', 'skin disorders', 'bone marrow disorders', 'blood disorders', 'misorientation of body segments', 'excessive backward pelvic tilt', 'excessive kyphosis', 's-shaped scoliosis', 'excessive pelvic obliquity', 'flat right foot contact', 'limited ankle dorsiflexion', 'toe walking', 'muscle weakness', 'limite

In [37]:
pred_reactions[2]

['agranulocytosis',
 'haematotoxicity',
 'bone marrow toxicity',
 'infantile genetic agranulocytosis']

In [38]:
gt_reactions[2]

['bone marrow toxicity',
 'cytomegalovirus infection',
 'cytomegalovirus mucocutaneous ulcer',
 'febrile neutropenia',
 'leukoplakia',
 'odynophagia',
 'oropharyngeal candidiasis',
 'pancytopenia',
 'product use issue',
 'red blood cell poikilocytes present',
 'vitamin d deficiency']