<a href="https://colab.research.google.com/github/tejaswinisamanta/PCOS-Chatbot/blob/main/PCOSchatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Building BioMistral Medical RAG chatbot using BioMistral Open Source LLM

## Load the google drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install langchain sentence-transformers chromadb llama-cpp-python langchain_community pypdf


Collecting langchain
  Using cached langchain-0.3.0-py3-none-any.whl.metadata (7.1 kB)
Collecting sentence-transformers
  Downloading sentence_transformers-3.1.1-py3-none-any.whl.metadata (10 kB)
Collecting chromadb
  Downloading chromadb-0.5.7-py3-none-any.whl.metadata (6.8 kB)
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.2.90.tar.gz (63.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.8/63.8 MB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Installing backend dependencies ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting langchain_community
  Downloading langchain_community-0.3.0-py3-none-any.whl.metadata (2.8 kB)
Collecting pypdf
  Downloading pypdf-5.0.0-py3-none-any.whl.metadata (7.4 kB)
Collecting langchain-core<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_core-0.3.5-py3-none-

#Importing Libraries

1. Read the PDF
2. Split the test into chunks
3. Convert chunks into embeddings
4. Store the embeddings
5. Load the LLM Model
6. Build end-to-end model

In [None]:
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain.vectorstores import Chroma
from langchain_community.llms import LlamaCpp
from langchain.chains import RetrievalQA, LLMChain


Import the document

In [None]:
loader = PyPDFDirectoryLoader("/content/drive/MyDrive/College/Career/Portfolio/CS/PCOS chatbot/PDF")
docs = loader.load()

In [None]:
len(docs)   #no of pages in the document

59

In [None]:
docs[5]

Document(metadata={'source': '/content/drive/MyDrive/College/Career/Portfolio/CS/PCOS chatbot/PDF/PCOS.pdf', 'page': 5}, page_content='and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019\n. Lancet. 2020,\n396:1204-22. \n10.1016/S0140-6736(20)30925-9\n2022 Akre et al. Cureus 14(8): e27689. DOI 10.7759/cureus.27689\n6\n of \n6')

Chunking

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=70)
chunks = text_splitter.split_documents(docs)

In [None]:
len(chunks)

1103

In [None]:
chunks[10]

Document(metadata={'source': '/content/drive/MyDrive/College/Career/Portfolio/CS/PCOS chatbot/PDF/PCOS.pdf', 'page': 0}, page_content='Our understanding of the pathophysiological process, diagnosis, and therapy of PCOS has advanced\nrecently.\nCategories:\n Obstetrics/Gynecology\nKeywords:\n combined oral contraceptive pills, menstrual irregularity, hyperandrogenism, lifestyle interventions, pcos\nIntroduction And Background')

In [None]:
chunks[600]

Document(metadata={'source': '/content/drive/MyDrive/College/Career/Portfolio/CS/PCOS chatbot/PDF/PCOS.pdf', 'page': 32}, page_content='environment during gestation. Ovaries are relatively quiescent until the onset of puberty.Detailed knowledge regarding follicular morphology in prepubertal and early pubertalovaries is lacking. Ovarian tissue obtained from prepubertal and early pubertal girls showsdifferences in follicle morphology')

Embedding Creations

In [None]:
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "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"

In [None]:
embeddings = SentenceTransformerEmbeddings(model_name="NeuML/pubmedbert-base-embeddings")

Vector Store creation

In [None]:
vectorstore = Chroma.from_documents(chunks, embeddings)

In [None]:
query = "How to manage PCOS?"

In [None]:
search_results= vectorstore.similarity_search(query)

In [None]:
retriever = vectorstore.as_retriever(search_kwargs={'k':5})

In [None]:
retriever.get_relevant_documents(query)

[Document(metadata={'page': 0, 'source': '/content/drive/MyDrive/College/Career/Portfolio/CS/Medical Chatbot/PCOS.pdf'}, page_content='management should also include regular follow-up visits and planned transition to adult care providers.\nComprehensive knowledge regarding the pathogenesis of PCOS will enable earlier identification of girlswith high propensity to develop PCOS. Timely implementation of individualized therapeutic in-'),
 Document(metadata={'page': 30, 'source': '/content/drive/MyDrive/College/Career/Portfolio/CS/PCOS chatbot/PDF/PCOS.pdf'}, page_content='management should also include regular follow-up visits and planned transition to adult care providers.\nComprehensive knowledge regarding the pathogenesis of PCOS will enable earlier identification of girlswith high propensity to develop PCOS. Timely implementation of individualized therapeutic in-'),
 Document(metadata={'page': 42, 'source': '/content/drive/MyDrive/College/Career/Portfolio/CS/Medical Chatbot/PCOS.pdf'}

LLM Model Loading

In [None]:
llm = LlamaCpp(
    model_path="/content/drive/MyDrive/College/Career/Portfolio/CS/PCOS chatbot/BioMistral Model/BioMistral-7B.Q4_K_M.gguf",
    temperature=0.2,
    max_tokens=2048,
    top_p=1
)

llama_model_loader: loaded meta data with 21 key-value pairs and 291 tensors from /content/drive/MyDrive/College/Career/Portfolio/CS/PCOS chatbot/BioMistral Model/BioMistral-7B.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = hub
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_mo

# Use LLM, Retriever and Query, to generate final response

In [None]:
template = """
<|content|>
You are a Medical Assisant, specialised in PCOS diseases in women, that follows the instructions and generate accurate response based on the query and the contet provided.
Please be truthful and give direct answers.
</s>
<|user|>
{query}
</s>
<|assistant|>
"""

In [None]:
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
from langchain.prompts import ChatPromptTemplate

In [None]:
prompt = ChatPromptTemplate.from_template(template)

In [None]:
rag_chain = (
    {"context": retriever, "query": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

In [None]:
response = rag_chain.invoke(query)

Llama.generate: 84 prefix-match hit, remaining 1 prompt tokens to eval

llama_print_timings:        load time =    3560.97 ms
llama_print_timings:      sample time =     102.76 ms /   143 runs   (    0.72 ms per token,  1391.55 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     0 tokens (    -nan ms per token,     -nan tokens per second)
llama_print_timings:        eval time =  108413.35 ms /   143 runs   (  758.14 ms per token,     1.32 tokens per second)
llama_print_timings:       total time =  108684.31 ms /   143 tokens


In [None]:
response

'PCOS is a complex condition that requires a multifaceted approach for management. The first step is to rule out other causes of hyperandrogenemia, such as congenital adrenal hyperplasia or androgen-secreting tumors. Once these have been ruled out, lifestyle modifications, including weight loss, exercise, and dietary changes, can be recommended. Medications such as oral contraceptive pills, metformin, and spironolactone may also be prescribed to manage symptoms and improve fertility outcomes. It is important for patients to work closely with their healthcare provider to develop an individualized management plan based on their specific needs and goals.'

In [83]:
import sys

In [86]:
while True:
  user_input = input("Input query: ")
  if user_input == "exit":
    print("Exiting...")
    sys.exit()
    if user_input=="":
      continue
  result = rag_chain.invoke(user_input)
  print("Answer: ", result)

Llama.generate: 67 prefix-match hit, remaining 17 prompt tokens to eval

llama_print_timings:        load time =    3560.97 ms
llama_print_timings:      sample time =      50.21 ms /    71 runs   (    0.71 ms per token,  1413.95 tokens per second)
llama_print_timings: prompt eval time =  292492.41 ms /    17 tokens (17205.44 ms per token,     0.06 tokens per second)
llama_print_timings:        eval time =   57098.35 ms /    71 runs   (  804.20 ms per token,     1.24 tokens per second)
llama_print_timings:       total time =   69837.34 ms /    88 tokens


Answer:  Polycystic ovary syndrome (PCOS) is a common endocrine disorder in women of reproductive age, affecting approximately 6% to 10% of this population. It is characterized by irregular menstrual cycles, hyperandrogenism, and polycystic ovaries on ultrasound .


Llama.generate: 67 prefix-match hit, remaining 18 prompt tokens to eval

llama_print_timings:        load time =    3560.97 ms
llama_print_timings:      sample time =      97.61 ms /   139 runs   (    0.70 ms per token,  1424.01 tokens per second)
llama_print_timings: prompt eval time =   10415.08 ms /    18 tokens (  578.62 ms per token,     1.73 tokens per second)
llama_print_timings:        eval time =  106281.21 ms /   138 runs   (  770.15 ms per token,     1.30 tokens per second)
llama_print_timings:       total time =  116943.31 ms /   156 tokens


Answer:  PCOS is a common endocrine disorder in women of reproductive age, affecting approximately 10% of women worldwide. The management of PCOS typically involves lifestyle modifications, including weight loss and exercise, as well as pharmacological treatments such as oral contraceptives, insulin sensitizers, and anti-androgenic agents. In some cases, more invasive treatments such as laparoscopic ovarian drilling or in vitro fertilization may be necessary. It is important for individuals with PCOS to work closely with a healthcare provider to develop an individualized management plan that takes into account their specific symptoms and goals.
Input query: exit
Exiting...


SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
