In this part we will create a large language model(LLM) QnA bot for medical issue. Since training an LLM is a cumbersome task which requires huge time and compute resource, we will use a technique called retrieval augmented generation. But you might be wondering that we already did this in the last chapter. Yes, you are right but this time we will showcase that it is not entirely safe to use LLMs and there are a few things that we should keep in mind before building anything using GenAI.

In this particular task we will use a medical record dataset  with all the patient details, their history, basically everything. For this task we make use of the langchain library. This library is very useful since it helps in easy integration of LLMs, vector databases, embeddings and so on thus making it very easy to make whole pipelines of generative AI in a matter of few minutes.

First we load the csv file and  then parse it to create a proper structured data including the metadata. We use the MiniLM L6 v2 from the sentence transformer library to generate the embeddings. Text embeddings is nothing but a vector representation of the text which is very popular in NLP and can be used to treat text as numbers thus enabling us the train the models on it and generate statistics.

Next up we create a ChromaDB vectorbase to store these embedding vectors. For this task we are going to use the Falcon LLM model from huggingface. After loading the Falcon model we create our LLM prompt. Prompt is just an initial set of instruction which is used the tell our LLM model how to process the user query, how to analyse it and in what format and how to give the answer required. Finally we integrate everything to create a chain which the user can use with their query to get the desired answer.

In the end we can clearly see that their is no safety or privacy setting in this LLM pipeline and it divulges any patient details or statistic which is asked. This can be a huge breach in privacy and can lead to data leaks.


**Dataset** :

The medical_records.csv file is a dataset containing sensitive patient information, including:

**patient_id**: Unique identifiers for each patient.

**name** : Name of the Patient

**date_of_birth**: Patients' birth dates.

**medical_condition**: Diagnoses and health issues of patients.

**treatment_details**: Describes the medical treatments and procedures received by each patient.



In [None]:
!pip install langchain chromadb sentence-transformers accelerate -q

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.3/18.3 MB[0m [31m110.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m91.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.9/94.9 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m25.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m89.9 MB/s[0m eta [36m0:00:00

In [None]:
!pip install -q langchain-community

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m117.7 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m63.4 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/44.4 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.4/44.4 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/50.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
from langchain.document_loaders import CSVLoader
from langchain.embeddings.sentence_transformer import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma

embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

loader = CSVLoader("./medical_records.csv", encoding="windows-1252")
documents = loader.load()

db = Chroma.from_documents(documents, embedding_function)

  embedding_function = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
import os
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from transformers import AutoTokenizer, pipeline
from langchain import HuggingFacePipeline

model_id = "Rocketknight1/falcon-rw-1b"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map='auto')

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=128)
llm = HuggingFacePipeline(
    pipeline = pipe,
    model_kwargs={"temperature": 0.5, "max_length": 512}
)

tokenizer_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/675 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.62G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.62G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Device set to use cuda:0
  llm = HuggingFacePipeline(


In [None]:
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
import time

PROMPT_TEMPLATE = """Answer the question based only on the following context:
{context}
You are allowed to rephrase the answer based on the context.
Question: {question}
"""
PROMPT = PromptTemplate.from_template(PROMPT_TEMPLATE)


qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=db.as_retriever(k=2),
    chain_type_kwargs={"prompt": PROMPT},
    return_source_documents=False
)

In [None]:
#Query
query = "What is the gender of Scott Webb?"
result = qa_chain({"query": query})
#Execution Time
print(result['result'].strip())


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Answer the question based only on the following context:
patient_id: 53051
name: Mr. Scott Webb
date_of_birth: 1996-09-30
gender: F
medical_conditions: medical, minute, challenge
medications: young, teach, recently
allergies: week, successful, maybe
last_appointment_date: 2021-05-29

patient_id: 82960
name: Scott Webb
date_of_birth: 2007-12-06
gender: F
medical_conditions: know, daughter, address
medications: debate, threat, religious
allergies: government, assume, leave
last_appointment_date: 2022-09-21

patient_id: 1
name: Scott Webb
date_of_birth: 1967-04-28
gender: F
medical_conditions: Mrs, story, security
medications: example, parent, city
allergies: each, product, two
last_appointment_date: 2022-07-26

patient_id: 24899
name: Paul Webb
date_of_birth: 1990-03-02
gender: M
medical_conditions: remain, local, decision
medications: Republican, ago, into
allergies: before, claim, rock
last_appointment_date: 2021-12-24
You are allowed to rephrase the answer based on the context.
Questi