In this part we will create a large language model(LLM) QnA bot for medical issue. Since training an LLM is a cumbersome task which requires huge time and compute resource, we will use a technique called retrieval augmented generation. This concept is getting very popular nowadays because using this we don’t need to train or finetune model, but like the name suggests we just pass the reference data which the pretrained LLM model uses as reference and gives answers to the user’s queries.

We use the PUBMED articles for the reference text input to our model. First we load the articles and  then parse them to create a proper structured data including the metadata. For this task we make use of the langchain library. This library is very useful since it helps in easy integration of LLMs, vector databases, embeddings and so on thus making it very easy to make whole pipelines of generative AI in a matter of few minutes.

We begin by splitting the text into token chunks and then use the popular E5 embedding model from huggingface to generate the text embeddings. Text embeddings is nothing but a vector representation of the text which is very popular in NLP and can be used to treat text as numbers thus enabling us the train the models on it and generate statistics.

Next up we create a ChromaDB vectorbase to store these embedding vectors. For this task we are going to use the Falcon LLM model from huggingface. After loading the Falcon model we create our LLM prompt. Prompt is just an initial set of instruction which is used the tell our LLM model how to process the user query, how to analyse it and in what format and how to give the answer required. Finally we integrate everything to create a chain which the user can use with their query to get the desired answer.


In [None]:
!pip install langchain chromadb jq tiktoken sentence-transformers biopython accelerate -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m611.1/611.1 kB[0m [31m20.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m19.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m746.6/746.6 kB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m22.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.3/3.3 MB[0m [31m23.6 MB/s[0m eta [36m0:00:00

In [None]:
!pip install -q langchain-community

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m25.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from langchain.document_loaders import JSONLoader

# Define the metadata extraction function.
def metadata_func(record: dict, metadata: dict) -> dict:

    metadata["year"] = record.get("pub_date").get('year')
    metadata["month"] = record.get("pub_date").get('month')
    metadata["day"] = record.get("pub_date").get('day')
    metadata["title"] = record.get("article_title")

    return metadata

loader = JSONLoader(
    file_path='./pubmed.json',
    jq_schema='.[]',
    content_key='article_abstract',
    metadata_func=metadata_func)
data = loader.load()

In [None]:
from langchain.text_splitter import TokenTextSplitter,CharacterTextSplitter
text_splitter = TokenTextSplitter(chunk_size=128, chunk_overlap=50)
chunks = text_splitter.split_documents(data)

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
modelPath = "intfloat/e5-large-unsupervised"
embeddings = HuggingFaceEmbeddings(
  model_name = modelPath,
  model_kwargs = {'device':'cuda'},
  encode_kwargs={'normalize_embeddings':False})

# Using faiss index
from langchain_community.vectorstores import Chroma
db = Chroma.from_documents(chunks, embeddings)

  embeddings = HuggingFaceEmbeddings(
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/387 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/5.09k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/57.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/675 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/372 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/201 [00:00<?, ?B/s]

In [None]:
import os
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import AutoTokenizer, pipeline
from langchain import HuggingFacePipeline

model_id = "Rocketknight1/falcon-rw-1b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto')

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=128)
llm = HuggingFacePipeline(
    pipeline = pipe,
    model_kwargs={"temperature": 0.5, "max_length": 512}
)

tokenizer_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/675 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.62G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.62G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Device set to use cuda:0
  llm = HuggingFacePipeline(


In [None]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import time

PROMPT_TEMPLATE = """You are a medical chatbot. Answer the question based only on the following context:
{context}
You are allowed to rephrase the answer based on the context.
Question: {question}
"""
PROMPT = PromptTemplate.from_template(PROMPT_TEMPLATE)


qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=db.as_retriever(k=2),
    chain_type_kwargs={"prompt": PROMPT},
    return_source_documents=True
)

In [None]:
start_time = time.time()
#Query
query = "What are the most common mental health issues?"
result = qa_chain({"query": query})
#Execution Time
print(f"\n--- {time.time() - start_time} seconds ---")
print(result['result'].strip())


  result = qa_chain({"query": query})
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.



--- 5.738328456878662 seconds ---
You are a medical chatbot. Answer the question based only on the following context:
To estimate the one-month prevalence of problematic psychological symptoms among Canadian postsecondary students, and to compare the prevalence by student characteristics.

Major depressive disorder (MDD), anxiety disorders, and somatic symptom disorder (SSD) are associated with quality of life (QoL) reduction. This cross-sectional study investigated the relationship between these conditions as categorical diagnoses and related psychopathologies with QoL, recognizing their frequent overlap.

In a nationwide study, we aimed to study the association of neighborhood deprivation with child and adolescent mental health problems.

Prospective and retrospective measures of childhood maltreatment identify largely different groups of individuals. However, it is unclear if these measures are differentially associated with psychopathology.
You are allowed to rephrase the answer b