# Human Leaflets
## LLM Application for reasonable medicine usage

The LLM models are used to create a medicine leaflet analysis tool, that can answer any question in regard to the specific medicine (based on the leaflet).

#### Basic Usage Scenario

* The User inserts which medicine he/she is interested in.
* The App finds the appropriate leaflet in the Internet.
* The User asks question such as : "How many times a day can I take it"? , or "What are the adverse drug reactions?"
* The LLM Model answers based on the information given on the leaflet.
* The Model Response is transformed into Speech (in Polish Language)


#### Technology

The LangChain python library is utilised with addition to the GPT4 LLM Model,

as well as the text-to-speech models provided by the ElevenLabs LangChain integration.


### Step 1: Environment setup and Library Import

In [None]:
# Install important libraries:
!pip install pypdf
!pip install langchain tiktoken openai pypdf chromadb wikipedia docx2txt unstructured youtube-transcript-api pytube
!pip install elevenlabs

In [None]:
# Import all necessary python packages
import sys
from pathlib import Path
from IPython.display import IFrame
from langchain.chains import (
    ConversationalRetrievalChain,
    LLMChain,
    RetrievalQA,
    SequentialChain,
    SimpleSequentialChain,
)
from langchain.chains.qa_with_sources import load_qa_with_sources_chain
from langchain.chains.question_answering import load_qa_chain
from langchain.chat_models import ChatOpenAI
from langchain.document_loaders import (
    Docx2txtLoader,
    PyPDFLoader,
    TextLoader,
    UnstructuredURLLoader,
    WikipediaLoader,
    YoutubeLoader,
)
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.utilities import GoogleSearchAPIWrapper
import json
import os
import time
import urllib
from langchain.tools import ElevenLabsText2SpeechTool

Insert YOUR API Keys Here:

In [None]:
# API key for the Chat GPT
os.environ["OPENAI_API_KEY"] = ''
# API key for the ElevenLabs text to speech tool
os.environ["ELEVEN_API_KEY"] = ''
# Google API keys
os.environ["GOOGLE_CSE_ID"] = ''
os.environ["GOOGLE_API_KEY"] = ''

### Step 2: Create the Large Language Model

In [None]:
# Define the LLM:
# You can choose gpt-4 or gpt-3.5
base_model = ChatOpenAI(model_name='gpt-4', temperature = 0.3)
# Create Chain:
chain = load_qa_chain(llm=base_model, chain_type="map_reduce")

In [None]:
# Utils Functions:
def get_colored_text(text, color, bold = False):
    color_mapping = {
        'blue': '\033[34m',
        'red': '\033[31m',
        'yellow': '\033[33m',
        'green': '\033[32m',
        'purple': '\033[95m'
    }
    color_code = color_mapping.get(color.lower(), '')
    reset_code = '\033[0m'
    colored_text = f'{color_code}{text}{reset_code}'
    if bold:
        return f'\033[1m{colored_text}{reset_code}'
    return colored_text

def print_qa_message(question, answer):
    print(f'{get_colored_text("Question:", color = "blue", bold = True)} {question.strip()}')
    print(f'{get_colored_text("Answer:", color = "blue", bold = True)} {answer.strip()}')

### Step 3: Get the Leaflet form the Internet

Class for the leaflet Internet retrieval:

In [None]:
class Leaflet_Retriever():
  def __init__(self):
    self.medicines = {}

  def leaflet_from_google(self, medicine_name):
      query = f"ulotka dla leku {medicine_name} file:.pdf"
      search = GoogleSearchAPIWrapper()
      result = search.results(query, 10)
      print(result)
      leaflet_link = result[0]['link']
      output_name = f"leaflet_{time.time()}.pdf"
      headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 SE 2.X MetaSr 1.0'}
      req = urllib.request.Request(leaflet_link, headers=headers)
      with open(output_name, "wb") as binary_file:
        binary_file.write(urllib.request.urlopen(req).read())
      return output_name

  def __call__(self, medicine_name):
    medicine_id = medicine_name.lower().strip()
    if medicine_id in self.medicines:
      file_name = self.medicines[medicine_id]
    else:
      file_name = self.leaflet_from_google(medicine_name)
      self.medicines[medicine_id] = file_name
    return file_name

leaflet_retreiever = Leaflet_Retriever()

Create PDF Loader, provide the medicine name:

In [None]:
# Load the Leaflet using Google and LangChain:
file_name = leaflet_retreiever('Apap')
loader = PyPDFLoader(file_name)
pages = loader.load_and_split()
chain = load_qa_chain(llm=base_model, chain_type="map_reduce")

# Alternately you can load to LangChain form the PDF:
# loader = PyPDFLoader("Leaflet_Super_Lek.pdf")
# pages = loader.load_and_split()

### Step 4: Ask the model about a leaflet (English / Polish)

In [None]:
#User asks question about the leaflet:
example_queries = ["Jakie jest zalecane dawkowanie?", "Z jakimi lekami nie powinienem brać apapu?"]
query = example_queries[0]
response = chain.run(input_documents=pages, question=query)
print_qa_message(query, response)
query = example_queries[1]
response = chain.run(input_documents=pages, question=query)
print_qa_message(query, response)

### Step 5: Process Model Response

Model responses in Polish have problem with number pronunciation

Therefore, the LLM is asked to format the numbers:

(1,2,3,4,5 ... ) -> ("jeden", "dwa" ...)

In [None]:
# Formating Querry:
kwerenda = f"Przekonwertuj liczby podane w tekście na słowa, pamiętaj by poprawnie przekonwertować większe liczby (zwróć mi tylko ten tekst): {response}"
# Ask LLM to reformat the answer:
chat = ChatOpenAI()
processed_responce = chat([HumanMessage(content=kwerenda)])
print(processed_responce.content)

### Step 6: Text To Speech

The model response is transformed into speech.
This can be done in different ways, here two approaches are presented.

* ElevenLabs Test to Speech Polish Model - commercial
* Open Polish Text to Speech model

#### ElevenLabs Text to Speech:

In [None]:
# Use the model and create the audio response
tts = ElevenLabsText2SpeechTool()
speech_file = tts.run(processed_responce.content)

In [None]:
# Play the audio in the notebook
from IPython.display import Audio
sampling_rate = 22_050
Audio(speech_file, rate=sampling_rate)

#### Alternative Text to Speech Model:

In [None]:
# Lib Install
# (first uninstall eleven labs)
#!pip install librosa
#!pip install TTS --quiet
# Definicja obiektów:
#from TTS.api import TTS
#from uuid import uuid4
#import soundfile as sf
#from pathlib import Path

# Code for the audio generation:

#tts = TTS(model_name="tts_models/pl/mai_female/vits", progress_bar=True, gpu=False)
#def save_to_generated(audio_data, out_dir, sampling_rate,):
#  file_name = str(uuid4()) + ".wav"
#  output_file = out_dir / file_name
#  sf.write(output_file, audio_data, sampling_rate)
#  print(f"File {file_name} saved succesfully!")
#sampling_rate = 22_050
#out_path = Path("generated_samples")
#out_path.mkdir(exist_ok=True)

# Save audio:

#audio = tts.tts(text=processed_responce.content)
#save_to_generated(audio, out_path, sampling_rate)
#from IPython.display import Audio
#Audio(audio, rate=sampling_rate)

### Authors:

Nikodem Matuszkiewicz

Aleksander Madajczak