<H1 style="text-align: center;">NLP for One Health</H1>
<h3 style="text-align: center;">From BERT to ChatGPT</h3>

|   |   |   |   |
|---|---|---|---|
| <img src="https://mood-h2020.eu/wp-content/uploads/2020/10/logo_Mood_texte-dessous_CMJN_vecto-300x136.jpg" alt="mood"/> | <img src="https://www.murdoch.edu.au/ResourcePackages/Murdoch2021/assets/dist/images/logo.svg" alt="murdoch" /> | <img src="https://www.umr-tetis.fr/images/logo-header-tetis.png" alt="tetis"/> | <img src="https://www.inrae.fr/themes/custom/inrae_socle/logo.svg" alt="INRAE" /> |

Speakers:

- **Rémy DECOUPES** - Research engineer UMR TETIS / INRAE
- **Maguelonne TEISSEIRE** - Prof. UMR TETIS / INRAE

------------------------
# Chapter 2: ChatGPT
[ChatGPT](https://chat.openai.com) is an AI chatbot developed by OpenAI. ChatGPT uses a generative language model called GPT-3.5 without subscription and GPT-4 for 20€/month.

## 2.1 Requirements
As we are going to use ChatGPT through the [OpenAI API](https://platform.openai.com/), participants need to create a account in order to obtain an API key: [here](https://platform.openai.com/)

+ Free access: 18$ will be credited for a short period of time and it would not be renewed.
+ Or add a payment method

Then let's begin with GPT3.5-turbo

In [None]:
# installation
!pip install openai

In [None]:
import openai
openai.api_key = "sk-..."

In [None]:
# list models
models = openai.Model.list()

for model in models.data:
    print(f"{model.id}")


In [None]:
# print GPT3.5 info
print(models.data[44])

In [None]:
response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0301",
    messages=[
        {"role": "system", "content": "You are an academic researcher"},
        {"role": "user", "content": "Could you explain me what one health concept is ?"},
    ],
    temperature=0,
)

print(response['choices'][0]['message']['content'])



## 2.2 Text generation with ChatGPT

Let's now try the same text generation that we tried with bert

In [None]:
my_sentence = "One Health is an approach calling for the collaborative efforts of multiple"

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0301",
    messages=[
        {"role": "user", "content": f"Could you complete this following sentence ? {my_sentence}"},
    ],
    temperature=0,
)

print(response['choices'][0]['message']['content'])

In [None]:
my_question = "question: What does 'one health concept' mean ?"

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0301",
    messages=[
        {"role": "user", "content": f"Could you answer to this following question ? {my_question}"},
    ],
    temperature=0,
)

print(response['choices'][0]['message']['content'])

**ChatGPT can also do classical NLP tasks such as Name Entities Recognition**

Same example that for BERT: "2 swans found dead in Dordogne"

In [None]:
my_sentence = "2 swans found dead in Dordogne"

response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo-0301",
    messages=[
        {"role": "user", "content": f"Do a Name Entity Recognition for the following sentence: '{my_sentence}'"},
    ],
    temperature=0,
)

print(response['choices'][0]['message']['content'])

## 2.3 Langchain
GPT-3.5 is not the only LLM ! Langchain is a python library that makes easy to use different API (like OpenAI, HuggingFace, Cohere, ...)

<img align="right" src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg">

From "A Survey of Large Language Models" - Zhao et al. - 2023 - META


In [None]:
!pip install langchain

### 2.3.1 With ChatGPT

In [None]:
import os
 
os.environ["OPENAI_API_KEY"] = "sk-..."



In [None]:
from langchain.llms import OpenAI
llm = OpenAI(model_name="gpt-3.5-turbo-0301")

llm("What does 'one health concept' mean ?")

### 2.3.1 With HuggingFace API

Same procedure as for OpenAI ChatGPT, we need a HuggingFace API key. Follow this link to have [instructions to get your HuggingFace API key](https://huggingface.co/docs/api-inference/quicktour#get-your-api-token)

In [None]:
import os 

os.environ["HUGGINGFACEHUB_API_TOKEN"] = "hf_..."

**Google Flan-T5-XL**

In [None]:
from langchain import HuggingFaceHub

llm = HuggingFaceHub(repo_id="google/flan-t5-xl")

llm("What does 'one health concept' mean ?")

**BigScience BLOOM**

In [None]:
from langchain import HuggingFaceHub

llm = HuggingFaceHub(repo_id="bigscience/bloom")

llm("What does 'one health concept' mean ?")

## 2.4 Hallucination
LLMs have four main limitations:

+ **Hallucination**: the generated text could be incorrect
+ Continual learning: is too complexe. For example ChatGPT (GPT-3.5) has been trained on corpus until September 2021. Thus, this means that any information or events that have occured after September 2021 are not included in its knowledge base. 
+ Contextual size is limited: the processing text that ChatGPT (GPT-3.5) can take into account is limited at 2048 tokens.
+ Ethical issues: Even if OpenAI provides ethical filters, the generated text could be toxic and could contain discriminating passages (racism, sexism, ...)

In the frame of One-Health, let's focus on hallucination

In [None]:
from langchain.llms import OpenAI
llm = OpenAI(model_name="gpt-3.5-turbo-0301")

llm("Could you provide a bibliography dealing with One health concept in a scientific paper way ?")

In [None]:
print('Sure, here is a list of scientific papers related to the One Health concept:\n\n1. One Health: Interconnectedness of Human, Animal, and Environmental Health. Wolfe ND, Dunavan CP, Diamond J. Pediatr Infect Dis J. 2007 May;26(5):489-491. doi: 10.1097/INF.0b013e3180639e83.\n\n2. One Health: Challenges and Opportunities for a Holistic and Collaborative Approach to Health. Zinsstag J, Schelling E, Waltner-Toews D, Tanner M. Trans R Soc Trop Med Hyg. 2011 Sep;105(9):467-473. doi: 10.1016/j.trstmh.2011.04.010.\n\n3. One Health: From Concept to Action. Rabinowitz PM, Conti LA. Comp Immunol Microbiol Infect Dis. 2013 Nov;36(6):227-232. doi: 10.1016/j.cimid.2013.01.002.\n\n4. One Health: The Intersection of Human, Animal, and Environmental Health. Gibbs EPJ. Int J One Health. 2015;1:1-3. doi: 10.1111/tbed.12393.\n\n5. One Health: The Global Challenge of Epidemic Preparedness. Karesh WB, Dobson A, Lloyd-Smith JO, et al. Lancet Infect Dis. 2012 Feb;12(2):140-141. doi: 10.1016/S1473-3099(11)70304-2.\n\n6. The One Health Concept: 10 Years Old and a Long Road Ahead. Rüegg SR, McMahon BJ, Häsler B, et al. Front Vet Sci. 2018 Jul 5;5:14. doi: 10.3389/fvets.2018.00014.\n\n7. One Health and the Neglected Tropical Diseases: The Interface of Humans, Animals, and the Environment. Hotez PJ, Bottazzi ME, Zhan B, et al. PLoS Negl Trop Dis. 2012 Dec;6(12):e1612. doi: 10.1371/journal.pntd.0001612.\n\n8. One Health: A New Professional Imperative. Kahn LH. Vet Ital. 2009 Oct-Dec;45(4):495-510. doi: 10.1111/tbed.12393.\n\n9. One Health: A Conceptual Framework for Health and Environment Policy. Franco C, Andrade M, da Silva Júnior AJ, et al. Environ Health Prev Med. 2014 Jul;19(4):324-327. doi: 10.1007/s12199-014-0398-3.\n\n10. One Health and Climate Change: An Emerging Public Health Policy Framework. Morse SS, Mazet JAK, Woolhouse M, et al. Trans R Soc Trop Med Hyg. 2012 Aug;106(8):451-457. doi: 10.1016/j.trstmh.2012.02.015.')

**Try to accept to the DOI generated by ChatGPT:**

For example for the last paper : https://doi.org/ + 10.1016/j.trstmh.2012.02.015 => [https://doi.org/10.1016/j.trstmh.2012.02.015] Leads to **DOI not found**

ChatGPT has just created this reference 

## 2.5 Paper-QA
To prevent hallucination, LLMs have to been grounded with real references.

Paper-QA is a python library built uppon LangChain that allow to question and answer from our PDF papers. By grounding responses with in-text citations, it reduces hallucinations.

In [None]:
!pip install paper-qa

In [None]:
import os
 
os.environ["OPENAI_API_KEY"] = "sk-..."

In [None]:
from paperqa import Docs

my_docs = ["...",
           "..."
          ]

docs = Docs()
for d in my_docs:
    docs.add(d)

In [None]:
answer = docs.query("Could you list all environnement data that could have an impact on avian influenza outbreak ?")
print(answer.formatted_answer)