# Retrieval using first 3 queries from sampled_queries.tsv

In [19]:
import pandas as pd
sampled_queries = pd.read_csv('sampled_queries_1k.tsv', sep='\t', header=0, names=['qid', 'query'], dtype={'qid': str, 'query': str})
test_queries = sampled_queries.head(1)
test_queries

Unnamed: 0,qid,query
0,507646,symptoms of flu a & b in children


In [20]:
collection_df = pd.read_csv("common_dataset_80k.tsv", sep="\t", header=None, names=["pid", "text"], dtype={"pid": str, "text": str})
collection_df.head()

Unnamed: 0,pid,text
0,448,A postal code (also known locally in various E...
1,466,"Therefore, all pathologists must have complete..."
2,646,Obesity is a complex disorder involving an exc...
3,1212,Which president appointed FBI Director James C...
4,1213,"Comey was confirmed by the Senate on July 29, ..."


In [31]:
qrels_dev = pd.read_csv('qrels.dev.tsv', sep='\t', header=None, names=['qid', '_', 'pid', 'label'], dtype={'qid': str, 'pid': str, 'label': int})
query_ground_truth = qrels_dev[qrels_dev['qid'].isin(test_queries['qid'])]
query_ground_truth = query_ground_truth.merge(collection_df, on='pid', how='left')
# To print the full text without truncation in Jupyter, use pandas display options:
pd.set_option('display.max_colwidth', None)
print(query_ground_truth.head()['text'].iloc[0])

A: Symptoms of influenza in children include a high-grade fever, extreme fatigue, a headache, body aches and a dry, hacking cough, according to WebMD. Children with influenza may also experience a sore throat, belly pain, vomiting, and chills and shakes.


In [22]:
def format_topN_results(topN_results, test_queries, collection_df):
    rows = []
    for qid, results in topN_results.items():
        query = test_queries.loc[test_queries['qid'] == qid, 'query'].values[0]
        for pid, score in results:
            passage = collection_df.loc[collection_df['pid'] == pid, 'text'].values[0]
            rows.append({'query': query, 'pid': pid, 'passage': passage, 'score': score})
    return pd.DataFrame(rows)

### Test retrieval with TF-IDF

In [32]:
import joblib
from tf_idf_utils import retrieve_topN_for_queries

vectorizer = joblib.load('tfidf_vectorizer.joblib')
doc_matrix = joblib.load('tfidf_doc_matrix.joblib')

topN_results = retrieve_topN_for_queries(vectorizer, doc_matrix, collection_df['pid'], test_queries, topN=10)

pd.set_option('display.max_colwidth', None)
retrieval_df = format_topN_results(topN_results, test_queries, collection_df)
retrieval_df.head()


Retrieving: 100%|██████████| 1/1 [00:00<00:00, 37.69it/s]


Unnamed: 0,query,pid,passage,score
0,symptoms of flu a & b in children,822848,"Symptoms of flu. The symptoms of flu usually develop within one to three days of becoming infected. Most people will feel better within a week. However, you may have a lingering cough and still feel very tired for a further couple of weeks. Flu can give you any of the following symptoms: 1 a sudden fever â a temperature of 38C (100.4F) or above. a dry, chesty cough.",0.394323
1,symptoms of flu a & b in children,7071178,"What are the common flu symptoms? A: Common symptoms of the flu include a cough, sore throat, fever of 100 degrees or higher, chills, fatigue and headache, according to FLU.gov. Some people al... Full Answer >",0.345496
2,symptoms of flu a & b in children,7201019,"The main difference between cold and flu is that, generally, symptoms of the flu are usually a lot more severe. Each year, more than 200,000 people are hospitalized because of flu complications; flu is responsible for around 23,600 deaths every year.",0.34074
3,symptoms of flu a & b in children,1580421,Detecting early symptoms of the flu can prevent the spread of the virus and possibly help you treat the illness before it gets worse. Early symptoms can include: fatigue; body aches and chills; cough; sore throat; fever; gastrointestinal problems; There are also early flu symptoms that are unique to children.,0.313504
4,symptoms of flu a & b in children,196474,"All symptoms of the flu are usually gone in 7 to 10 days, except a cough, which might last up to two weeks. If symptoms are extended beyond this time frame, check with your do â¦ ctor to see if you need an examination.sually a high fever lasts only a few days for a normal bout of the flu without secondary infections, perhaps 3 days at the most, although low grade fevers can accompany flu s â¦ ymptoms for slightly longer.",0.28917


## Test retrieval with BM25

In [33]:
from whoosh import index
from whoosh.qparser import QueryParser, OrGroup
from whoosh.scoring import BM25F
from tqdm.auto import tqdm
IDX_DIR = "indexes/whoosh"
K1, B = 1.2, 0.75

topN_results = {}

ix = index.open_dir(IDX_DIR)
with ix.searcher(weighting=BM25F(k1=K1, b=B)) as searcher:
    qp = QueryParser("text", schema=ix.schema, group=OrGroup)
    it = test_queries[["qid","query"]].itertuples(index=False, name=None)

    for qid, query in it:
        q = qp.parse(query)
        results = searcher.search(q, limit=10)
        rows = [(r['pid'], r.score) for r in results]
        topN_results[str(qid)] = rows

pd.set_option('display.max_colwidth', None)
retrieval_df = format_topN_results(topN_results, test_queries, collection_df)
retrieval_df.head()

Unnamed: 0,query,pid,passage,score
0,symptoms of flu a & b in children,1580421,Detecting early symptoms of the flu can prevent the spread of the virus and possibly help you treat the illness before it gets worse. Early symptoms can include: fatigue; body aches and chills; cough; sore throat; fever; gastrointestinal problems; There are also early flu symptoms that are unique to children.,22.446466
1,symptoms of flu a & b in children,7087923,The flu is caused by a virus. Common symptoms of the flu include: Fever and chills; Cough; Sore throat; Runny or stuffy nose; Muscle or body aches; Headache; Feeling very tired; Some people with the flu may throw up or have diarrhea (watery poop) â this is more common in children than adults. Itâs also important to know that not everyone with the flu will have a fever. The flu is worse than the common cold.,21.515997
2,symptoms of flu a & b in children,7988185,"Yes | No Thank you! Flu shots are not made for children under the age of 6 months. If you read the vaccine insert and studies regarding the flu shot and kids, you will see that flu shots don't even work for children under the age of 2.es | No Thank you! Flu shots are not made for children under the age of 6 months. If you read the vaccine insert and studies regarding the flu shot and kids, you will see that flu shots don't even work for children under the age of 2.",20.992554
3,symptoms of flu a & b in children,7492976,"1 A specific syrup containing elderberry juice (Sambucol, Natureâs Way) seems to relieve flu symptoms and reduce the length of time the flu lasts when taken by mouth within 24-48 hours of the first symptoms. 2 Some research also shows that an elderberry lozenge (ViraBLOC, HerbalScience) also reduces symptoms of the flu.Y MOUTH 1 : The flu: one tablespoon (15 mL) 4 times daily of a specific elderberry juice-containing syrup (Sambucol, Natureâs Way) daily for 3-5 days. 2 A dose of 15 mL (1 tablespoon) twice daily for 3 days has been used in children.",20.895779
4,symptoms of flu a & b in children,35887,The Flu Is Contagious Most healthy adults may be able to infect other people beginning 1 day before symptoms develop and up to 5 to 7 days after becoming sick. Children may pass the virus for longer than 7 days.,20.811622


## Test retrieval with DPR

In [25]:
import numpy as np
embedding_filename = f"passage_embeddings_80k.npy"

# Load later
passage_embeddings = np.load(embedding_filename)
print(passage_embeddings.shape)

(80000, 768)


In [36]:
import faiss
dim = passage_embeddings.shape[1]  # typically 768 for DPR
faiss.normalize_L2(passage_embeddings)  # normalize for cosine similarity
index = faiss.IndexFlatIP(dim)
index.add(passage_embeddings)

print("Number of vectors in FAISS:", index.ntotal)

Number of vectors in FAISS: 80000


In [37]:
import numpy as np
import torch
import faiss
from tqdm import tqdm
from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer

DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

question_encoder = DPRQuestionEncoder.from_pretrained("./dpr_question_encoder").to(DEVICE)
question_tokenizer = DPRQuestionEncoderTokenizer.from_pretrained("./dpr_question_encoder")

qids = test_queries['qid'].tolist()
queries_list = test_queries['query'].tolist()

for qid, q in tqdm(zip(qids, queries_list), total=len(queries_list), desc="Retrieving"):
    inputs = question_tokenizer(q, return_tensors="pt", padding=True, truncation=True).to(DEVICE)
    with torch.no_grad():
        q_emb = question_encoder(**inputs).pooler_output
    D, I = index.search(q_emb.detach().cpu().numpy(), 10)

    rows = [(collection_df.iloc[i]['pid'], float(d)) for i, d in zip(I[0], D[0])]
    topN_results[str(qid)] = rows

pd.set_option('display.max_colwidth', None)
retrieval_df = format_topN_results(topN_results, test_queries, collection_df)
retrieval_df.head()

Retrieving:   0%|          | 0/1 [00:00<?, ?it/s]Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
Retrieving: 100%|██████████| 1/1 [00:00<00:00,  2.66it/s]


Unnamed: 0,query,pid,passage,score
0,symptoms of flu a & b in children,7548862,"A: Symptoms of influenza in children include a high-grade fever, extreme fatigue, a headache, body aches and a dry, hacking cough, according to WebMD. Children with influenza may also experience a sore throat, belly pain, vomiting, and chills and shakes.",11.732758
1,symptoms of flu a & b in children,7619619,"Below are the symptoms that some individuals may experience in these three stages. Not all individuals will experience these symptoms. Early Stage of HIV. About 40% to 90% of people have flu-like symptoms within 2-4 weeks after HIV infection. Other people do not feel sick at all during this stage, which is also known as acute HIV infection. Early infection is defined as HIV infection in the past six months (recent) and includes acute (very recent) infections. Flu-like symptoms can include ...",10.516281
2,symptoms of flu a & b in children,7828612,"Flu Symptoms. The most common symptoms of the flu are chills, body aches, dizziness, headaches, nausea, lack of energy, flushed face, nausea and vomiting. Other symptoms may include asthma, heart failure, sweating, stuffy nose, loss of appetite and muscle aches. General symptoms last for about two to four days and then subside.",9.83946
3,symptoms of flu a & b in children,7480406,Symptoms of TEF in adult patients may include: 1 Chest pain. 2 Shortness of breath. 3 Labored breathing. Coughing or choking when eating or 1 drinking. Drooling or excess mucus in the mouth. Enlarged 1 abdomen. Swallowed food or liquids are coughed out or suctioned from the airways. Wheezing or bubbly sounds following each breath.,9.776333
4,symptoms of flu a & b in children,109900,"Influenza, commonly known as the flu, is an infectious disease caused by an influenza virus. Symptoms can be mild to severe. The most common symptoms include: a high fever, runny nose, sore throat, muscle pains, headache, coughing, and feeling tired. These symptoms typically begin two days after exposure to the virus and most last less than a week. The cough, however, may last for more than two weeks. In children, there may be nausea and vomiting, but these are not common in adults. Nausea and",9.574122


## Test re-ranking with Cross Encoder

In [38]:
from sentence_transformers import CrossEncoder

cross_model = CrossEncoder("./cross-encoder-model")

pairs = list(zip(retrieval_df["query"], retrieval_df["passage"]))

cross_scores = cross_model.predict(pairs, show_progress_bar=True)
retrieval_df["cross_score"] = cross_scores
retrieval_df = retrieval_df.sort_values(by=["cross_score"], ascending=[False])
pd.set_option('display.max_colwidth', None)
retrieval_df.head(10)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches: 100%|██████████| 1/1 [00:00<00:00,  3.61it/s]


Unnamed: 0,query,pid,passage,score,cross_score
0,symptoms of flu a & b in children,7548862,"A: Symptoms of influenza in children include a high-grade fever, extreme fatigue, a headache, body aches and a dry, hacking cough, according to WebMD. Children with influenza may also experience a sore throat, belly pain, vomiting, and chills and shakes.",11.732758,6.749188
5,symptoms of flu a & b in children,7590755,The list of signs and symptoms mentioned in various sources for Type B Influenza includes the 20 symptoms listed below: 1 Runny nose. 2 Sore throat. 3 Aching muscles. 4 Headache. 5 Cough. 6 Nasal congestion. 7 Malaise. 8 Fever.,9.54969,5.264629
4,symptoms of flu a & b in children,109900,"Influenza, commonly known as the flu, is an infectious disease caused by an influenza virus. Symptoms can be mild to severe. The most common symptoms include: a high fever, runny nose, sore throat, muscle pains, headache, coughing, and feeling tired. These symptoms typically begin two days after exposure to the virus and most last less than a week. The cough, however, may last for more than two weeks. In children, there may be nausea and vomiting, but these are not common in adults. Nausea and",9.574122,5.243699
2,symptoms of flu a & b in children,7828612,"Flu Symptoms. The most common symptoms of the flu are chills, body aches, dizziness, headaches, nausea, lack of energy, flushed face, nausea and vomiting. Other symptoms may include asthma, heart failure, sweating, stuffy nose, loss of appetite and muscle aches. General symptoms last for about two to four days and then subside.",9.83946,3.304995
7,symptoms of flu a & b in children,7480405,"Symptoms of TEF in infants are generally worse while they are feeding and may include: 1 Frothy white bubbles in the mouth. 2 Coughing or choking during feeding. 3 Vomiting. Blue color of the skin, especially during feeding. Trouble 1 breathing. Very round, full stomach.",9.446806,-2.090316
6,symptoms of flu a & b in children,7649123,"Other mild childhood illnesses: EBV infection in young children has also been linked to ear infections, diarrhea, other gastrointestinal symptoms, and cold symptoms in addition to the classic symptoms of IM.",9.508144,-8.698326
3,symptoms of flu a & b in children,7480406,Symptoms of TEF in adult patients may include: 1 Chest pain. 2 Shortness of breath. 3 Labored breathing. Coughing or choking when eating or 1 drinking. Drooling or excess mucus in the mouth. Enlarged 1 abdomen. Swallowed food or liquids are coughed out or suctioned from the airways. Wheezing or bubbly sounds following each breath.,9.776333,-9.161399
9,symptoms of flu a & b in children,7661062,About 1 out of 4 people with poliovirus infection will have flu-like symptoms that may includeâ. 1 Sore throat. 2 Fever. 3 Tiredness. Nausea. 4 Headache. Stomach pain.,9.422058,-10.012847
8,symptoms of flu a & b in children,964216,"Signs and symptoms of depression in teens. 1 Sadness or hopelessness. 2 Irritability, anger, or hostility. 3 Tearfulness or frequent crying. 4 Withdrawal from friends and family. 5 Loss of interest in activities. 6 Poor school performance. 7 Changes in eating and sleeping habits.",9.438098,-10.150231
1,symptoms of flu a & b in children,7619619,"Below are the symptoms that some individuals may experience in these three stages. Not all individuals will experience these symptoms. Early Stage of HIV. About 40% to 90% of people have flu-like symptoms within 2-4 weeks after HIV infection. Other people do not feel sick at all during this stage, which is also known as acute HIV infection. Early infection is defined as HIV infection in the past six months (recent) and includes acute (very recent) infections. Flu-like symptoms can include ...",10.516281,-10.312838
