# Medi-Guide 
## A Langchain Framework-Based NLP Chatbot Prototype Using Open AI API and RAG on The Kegg Medicus Database 

Medi-Guide uses `Retrieval Augmented Generation` on the [Kegg Medicus Database](https://www.genome.jp/kegg/medicus.html#:~:text=KEGG%20MEDICUS%20is%20a%20health,in%20Japan%20and%20the%20USA.) to give users accuracte and scientifically valid information on medicines and drug interactions as well as in-depth information about disease and the human genome.  

The Kegg Medicus Database has information that is relevant for the following industries:

1. Pharmaceutical Industry: 
    The database includes information on various drugs and their efficacy in treating specific conditions 
    such as rheumatoid arthritis and cancer. Pharmaceutical companies can use this information for research 
    and development of new drugs or improving existing ones.

2. Biotechnology Industry: 
    The database contains information on genes, variations, and signaling pathways related to diseases such 
    as hepatocellular carcinoma. Biotech companies can utilize this information for developing targeted 
    therapies or diagnostic tools.

3. Healthcare Industry: 
    The database includes information on drugs used for antihypertensive and vasodilator purposes. 
    Healthcare providers can use this information to better understand the efficacy and potential 
    side effects of these drugs for patient treatment.

4. Research Institutions: 
    The database provides valuable information on various drugs, their mechanisms of action, and their 
    potential applications. Research institutions can use this information for conducting further studies 
    and advancing scientific knowledge in the field of medicine.

This is not an exhaustive list, and other industries or sectors may also find
value in the information contained in the database depending on their specific needs and interests.

A chat agent generates responses to prompts by engaging in a process of called `chain of thought resoning`.  Chain of though reasoning selects appropriate tools connected to the agent in order to make a decision on how best to approach generating an output. 

The agent has three tools, namely:
- The Kegg Medicus Vector Database, Hosted via Pinecone.io
- Web Search via DuckDuckGo Search
- Agent Memory Summarization Tool

These tools, provide the context for the conversational responses. 

The agent is customized via a system prompt that serves as a guardrail against discussing anything besides medical topics or giving advice that could be harmful to users. 

## Outline
 * Dependencies
 * Chat Model
 * Document Loader
 * Text Splitter
 * Data Storage
 * Output Generation / Completion
 * Helper Functions

### Dependencies

In [1]:
#!pip install openai langchain tiktoken faiss-cpu python-dotenv pinecone-client

### Environment Variables

In [1]:
import openai
import os
from dotenv import load_dotenv 


load_dotenv()

True

In [2]:
from langchain.chat_models import ChatOpenAI

llm = ChatOpenAI(openai_api_key=os.getenv("OPENAI_API_KEY"), 
                 temperature=0.0, 
                 model_name='gpt-4-1106-preview')

### Tokenizer

In [3]:
import tiktoken 
tiktoken.encoding_for_model('gpt-4-1106-preview')

<Encoding 'cl100k_base'>

In [4]:
tokenizer = tiktoken.get_encoding('cl100k_base')

def tiktoken_len(text):
    tokens = tokenizer.encode(
        text,
        disallowed_special=()
    )
    return len(tokens)

### Embeddings

In [5]:
from langchain.embeddings.openai import OpenAIEmbeddings

model_name = 'text-embedding-ada-002'

embed = OpenAIEmbeddings(
    model=model_name,
    openai_api_key=openai.api_key
)

In [6]:
example_texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed.embed_documents(example_texts)
len(res), len(res[0])

AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-63OKd***************************************etkP. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

### Vector Database Configuration

In [9]:
index_name = "kegg-medicus-database-index"

In [10]:
import pinecone  
from tqdm.autonotebook import tqdm 

pinecone.init(      
	api_key=os.getenv("PINECONE_API_KEY"), 
	environment=os.getenv("PINECONE_ENV")          
)      
index = pinecone.Index('kegg-medicus-database-index')


  from tqdm.autonotebook import tqdm


In [12]:
from langchain.vectorstores import Pinecone

text_field = "text"

# switch back to normal index for langchain
index = pinecone.Index(index_name)

vectorstore = Pinecone(
    index=index, 
    embedding=embed, #.embed_query(), 
    text_key=text_field
)

In [13]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.19462,
 'namespaces': {'': {'vector_count': 19462}},
 'total_vector_count': 19462}

### Q & A Chain

In [16]:
from langchain.chains import RetrievalQA

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever()
)

In [17]:
query = "What is Flavin adenine dinucleotide?"
qa.run(query)

AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-63OKd***************************************etkP. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

### Duck Search Function

In [18]:
from langchain.tools import DuckDuckGoSearchRun
from langchain.utilities import DuckDuckGoSearchAPIWrapper

wrapper = DuckDuckGoSearchAPIWrapper(max_results=10)
search = DuckDuckGoSearchRun(api_wrapper=wrapper, backend='text')

def duck_wrapper(input_text):
    try:
        results = search.run(f'''{input_text}''') 
    except Exception as er:
        print(er)
        return "There was an error fetching results for that query. Please try again"
    # print(search_results)
    else:
        return results

In [19]:
query = "foods for better focus"
duck_wrapper(query)

'1. Fatty fish When people talk about brain foods, fatty fish is often at the top of the list. This type of fish includes salmon, trout, albacore tuna, herring, and sardines, all of which are rich... Tips to improve concentration November 20, 2023 Mindfulness, cognitive training, and a healthy lifestyle may help sharpen your focus. You\'re trying to concentrate, but your mind is wandering or you\'re easily distracted. What happened to the laser-sharp focus you once enjoyed? Yes, when it comes to staying mentally sharp and focused, eating plenty of brain foods matters, especially for our gray matter. The gut and the brain are tightly connected, and when we focus on giving our bodies whole, nutritious foods, we help take care of both. Which foods are best for your brain? While eating a nutrient-dense diet can contribute to brain health, there are certain foods — like leafy greens, eggs, fatty fish, and blueberries — that are particularly good for your brain. No single brain food can ensu

### Agent Memory

In [20]:
from langchain.memory import ConversationBufferMemory, ReadOnlySharedMemory
from langchain import LLMChain
from langchain.prompts import PromptTemplate

mem_template = """This is a conversation between a human and a bot:

{chat_history}

Write a summary of the conversation for {input}:
"""

mem_prompt = PromptTemplate(input_variables=["input", "chat_history"], template=mem_template)
memory = ConversationBufferMemory(memory_key="chat_history")
readonlymemory = ReadOnlySharedMemory(memory=memory)
summary_chain = LLMChain(
    llm=llm,
    prompt=mem_prompt,
    verbose=True,
    memory=readonlymemory,  # use the read-only memory to prevent the tool from modifying the memory
)

### Tools

In [21]:
from langchain.agents import Tool

tools = [
    Tool(
        name='Medicus Text Base',
        func=qa.run,
        description=(
            '''use this tool to respond to queries about drugs (medicine) and drugs interactions for (contraindications (CI) and precautions (P)),
            disease and the human genome'''
        )
    ), 
    Tool(
        name ='Web Search',
        func=duck_wrapper,
        description=(
            '''use this tool to answer more general questions about health and wellness
            '''
        )
    ),
    Tool(
        name="Summary",
        func=summary_chain.run,
        description="useful for when you summarize a conversation. The input to this tool should be a string, representing who will read this summary.",
    )
]

tool_names = [tool.name for tool in tools]

### Prompt Template

In [22]:
from langchain.agents import ZeroShotAgent

prefix = """Be convrersational and act as a smart expert medical advisor. Answering question as best as YOU can. 
You have access to the following tools:"""

suffix = """Begin!"

{chat_history}
Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
    tools,
    prefix=prefix,
    suffix=suffix,
    input_variables=["input", "chat_history", "agent_scratchpad"],
)

###  Generating

In [23]:
from langchain.agents import AgentExecutor

llm_chain = LLMChain(llm=llm, prompt=prompt)
agent = ZeroShotAgent(llm_chain=llm_chain, tools=tools, verbose=True)

agent_chain = AgentExecutor.from_agent_and_tools(
    agent=agent, tools=tools, verbose=True, memory=memory
)

In [24]:
import langchain
langchain.debug = False

In [25]:
query = "Hello" 
agent_chain(query)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: The user has greeted me, and I should respond in kind.
Final Answer: Hello! How can I assist you with your medical inquiries today?[0m

[1m> Finished chain.[0m


{'input': 'Hello',
 'chat_history': '',
 'output': 'Hello! How can I assist you with your medical inquiries today?'}

In [26]:
query2 = "I have flu, can ginger cure me?" 
agent_chain(query2)



[1m> Entering new AgentExecutor chain...[0m


AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: sk-63OKd***************************************etkP. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}

In [None]:
query3 = "What properties in Ginger give it its anti-inflammatory and antioxidant properties?"
agent_chain(query3)

In [None]:
query4 = "What other plants or medicines have similar benefits?"
agent_chain(query4)

In [None]:
query5 = "You have acess to the database as a tool provide to you as an retrieval tool."
agent_chain(query5)

In [None]:
def bot_response(query):
 return agent_chain(query)['output']

In [None]:
bot_response(query)