<a href="https://colab.research.google.com/github/varunpothu/Smart_HealthCare_Chatbot/blob/main/QA_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Installing Required Libraries

In [59]:
pip install pandas numpy sentence-transformers streamlit pyngrok




# Mounting the Google Drive

In [18]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


# Importing the Libraries

In [57]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
import os

import os
from threading import Thread
from pyngrok import ngrok

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np


import shutil

from google.colab import files
import warnings
warnings.filterwarnings("ignore")


# Data Loading

In [20]:
# Load the Dataset
df = pd.read_csv("4.csv")

# Data Preprocessing


In [21]:
# Data Cleaning
# Remove duplicates and handle missing values
df.drop_duplicates(inplace=True)
df.dropna(subset=['question', 'answer'], inplace=True)

In [22]:
# Text Preprocessing
# Removing Special Charectors
df['question_clean'] = df['question'].str.lower().str.replace(r'[^\w\s]', '', regex=True).str.strip()
df['answer_clean'] = df['answer'].str.lower().str.replace(r'[^\w\s]', '', regex=True).str.strip()

# Model Implementation

In [40]:
# Model Loading
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate Embeddings
df['question_embedding'] = df['question_clean'].apply(lambda x: model.encode(x).tolist())

In [41]:
# Save the cleaned dataset with embeddings
output_dataset_path = 'cleaned_dataset_with_embeddings.pkl'
with open(output_dataset_path, 'wb') as f:
    pickle.dump(df, f)

Inverse Document Metrics (TF-IDF)

In [43]:
# Example documents and query
documents = ["What are the symptoms of glaucoma?",
             "How is glaucoma treated?",
             "What causes glaucoma?"]
query = "What are glaucoma symptoms?"

# Generate TF-IDF vectors
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(documents)
query_vector = vectorizer.transform([query])

# Compute cosine similarity
similarities = cosine_similarity(query_vector, tfidf_matrix)

# Format the similarities to two decimal places
formatted_similarities = np.round(similarities, 2)
print(formatted_similarities)


[[0.77 0.11 0.39]]


Dot Product

In [45]:
documents = ["What are the symptoms of glaucoma?",
             "How is glaucoma treated?",
             "What causes glaucoma?"]

query = "What are glaucoma symptoms?"

# Generate embeddings
doc_embeddings = model.encode(documents)
query_embedding = model.encode(query)

# Compute dot product similarity
similarities = np.dot(doc_embeddings, query_embedding)

# Round similarities to two decimal places
formatted_similarities = np.round(similarities, 2)
print(formatted_similarities)


[0.99 0.72 0.78]


Metrics Evaluation

---


Function to evaluate metrics (dot product similarity)

In [24]:
# Retrieval Function
def get_answer(query, df, model):
    """
    Retrieve the most relevant answer for the given query.

    Parameters:
    - query (str): User's question
    - df (DataFrame): Preprocessed DataFrame with embeddings
    - model (SentenceTransformer): Pretrained embedding model

    Returns:
    - answer (str): Most relevant answer
    - source (str): Source of the answer
    - focus_area (str): Topic or focus area of the answer
    """
    # Encode the query
    query_embedding = model.encode(query)

    # Compute similarity scores
    df['similarity'] = df['question_embedding'].apply(lambda x: np.dot(x, query_embedding))

    # Get the best match
    best_match = df.loc[df['similarity'].idxmax()]
    return best_match['answer'], best_match['source'], best_match['focus_area']


Testing the bot

In [48]:
# Example Usage
if __name__ == "__main__":
    print("Healthcare Chatbot Initialized!")

    while True:
        # Take user input
        user_query = input("\nAsk your healthcare question (type 'exit' to quit): ")
        if user_query.lower() == 'exit':
            print("Goodbye!")
            break

        # Get answer
        try:
            answer, source, focus_area = get_answer(user_query, df, model)
            print(f"\nAnswer: {answer}\nSource: {source}\nFocus Area: {focus_area}")
        except Exception as e:
            print(f"\nSorry, something went wrong: {e}")

Healthcare Chatbot Initialized!

Ask your healthcare question (type 'exit' to quit): fever

Answer: A fever is a body temperature that is higher than normal. It is not an illness. It is part of your body's defense against infection. Most bacteria and viruses that cause infections do well at the body's normal temperature (98.6 F). A slight fever can make it harder for them to survive. Fever also activates your body's immune system.    Infections cause most fevers. There can be many other causes, including       -  Medicines    -  Heat exhaustion    -  Cancers    -  Autoimmune diseases       Treatment depends on the cause of your fever. Your health care provider may recommend using over-the-counter medicines such as acetaminophen or ibuprofen to lower a very high fever. Adults can also take aspirin, but children with fevers should not take aspirin. It is also important to drink enough liquids to prevent dehydration.
Source: MPlusHealthTopics
Focus Area: Fever

Ask your healthcare questio

Saving the model

In [46]:
# Save the Model
model_save_path = 'sentence_transformer_model'
model.save(model_save_path)


Compress the model

In [49]:
# Compress the model folder into a zip file
shutil.make_archive("sentence_transformer_model", 'zip', "sentence_transformer_model")


'/content/sentence_transformer_model.zip'

Download the Zip file

In [50]:
# Download the zip file
files.download("sentence_transformer_model.zip")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Temparary Uploading files to Colab

In [51]:
# uplading the .pkl file and transformer .zip files
uploaded = files.upload()


Saving cleaned_dataset_with_embeddings.pkl to cleaned_dataset_with_embeddings (1).pkl
Saving sentence_transformer_model.zip to sentence_transformer_model (1).zip


Accessing the files directly from the Drive

In [53]:
model_save_path = '/content/drive/MyDrive/sentence_transformer_model'
dataset_path = '/content/drive/MyDrive/cleaned_dataset_with_embeddings.pkl'

# Streamlit Deployment

### Streamlit Front End Code

In [54]:
with open("streamlit_app.py", "w") as f:
    f.write("""
# Streamlit App for Healthcare Chatbot
import streamlit as st
import pickle
import numpy as np
from sentence_transformers import SentenceTransformer
import re
import pandas as pd

# Set page configuration (must be the first Streamlit command)
st.set_page_config(page_title="Smart Healthcare Chatbot", layout="wide")

# Function to clean and preprocess user query
def preprocess_query(query):
    return re.sub(r'[^\w\s]', '', query.lower()).strip()

# Load the model and dataset
@st.cache_resource
def load_model_and_data():
    model_path = '/content/drive/My Drive/sentence_transformer_model'
    dataset_path = '/content/drive/My Drive/cleaned_dataset_with_embeddings.pkl'
    model = SentenceTransformer(model_path)
    with open(dataset_path, 'rb') as f:
        df = pickle.load(f)
    return model, df

model, df = load_model_and_data()

# Sidebar for additional options
st.sidebar.header("Navigation")
st.sidebar.markdown(\"""
- Use the chatbot below to ask healthcare questions.
- Browse answers based on focus areas.
- Contact support for more help.
\""")

st.sidebar.info("**Current Dataset Size:** {} entries".format(len(df)))

# Main application
st.title("ðŸ©º Smart Healthcare Chatbot")
st.markdown(\"""
This chatbot helps answer healthcare-related questions. Enter your query below, and the system will provide the most relevant information.
\""")

# User input
user_query = st.text_input("Ask your healthcare question:", placeholder="Type your question here...")
st.markdown("---")

if user_query:
    # Process query
    query_clean = preprocess_query(user_query)
    query_embedding = model.encode(query_clean)

    # Calculate similarities and get the top match
    df['similarity'] = df['question_embedding'].apply(lambda x: np.dot(query_embedding, x))
    top_match = df.loc[df['similarity'].idxmax()]

    # Display response with highlighted colors
    st.success(f"### {top_match['answer']}")
    st.markdown(f"**Source:** <span style='color:blue;'>{top_match['source']}</span>", unsafe_allow_html=True)
    st.markdown(f"**Focus Area:** <span style='color:green;'>{top_match['focus_area']}</span>", unsafe_allow_html=True)
    st.markdown(f"### Similarity Score: <span style='color:orange;'>{top_match['similarity']:.2f}</span>", unsafe_allow_html=True)


    # Provide additional recommendations
    st.markdown("---")
    st.subheader("Other Relevant Questions")
    top_similar_questions = df.sort_values(by='similarity', ascending=False).head(3)
    for _, row in top_similar_questions.iterrows():
        st.write(f"- **{row['question']}**")

# Footer
st.markdown("---")
st.info("ðŸ’¡ **Tip:** The chatbot is continuously learning. For detailed inquiries, consult a healthcare professional.")
""")


Generated Ngrok Authtoken

In [55]:

# Authenticate Ngrok
!ngrok authtoken 2pPynQ1umOErGUuiyW31IDnmrF1_6VvhjMYosxEu6cnK5g3jx


Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


Generating Deployment Website

In [60]:
# Define a function to run the Streamlit app
def run_streamlit():
    os.system("streamlit run streamlit_app.py --server.address 0.0.0.0 --server.port 8501")

# Start a thread to run the Streamlit app
thread = Thread(target=run_streamlit)
thread.start()

# Open a tunnel to the Streamlit port (8501) using Ngrok
public_url = ngrok.connect(addr="8501", proto="http", bind_tls=True)
print('Your Streamlit app is live at:', public_url)


Your Streamlit app is live at: NgrokTunnel: "https://bf2d-34-44-25-211.ngrok-free.app" -> "http://localhost:8501"
