<H1 style="text-align: center;">NLP for One Health</H1>
<h3 style="text-align: center;">From BERT to ChatGPT</h3>

The use of artificial intelligence (AI) in One Health has the potential to revolutionize disease detection and outbreak response. Natural Language Processing (NLP) is a subfield of AI that enables computers to understand and analyze large amounts of text data, including social media posts, news articles or medical records. 

In this practical session, we will explore how NLP can help in the early detection of outbreaks and monitor crisis situations by social mining of newspapers and social media. Specifically, we will start by discussing BERT, a powerful pre-trained language model that can be fine-tuned for specific NLP tasks. We will then move on to ChatGPT, the most popular large language model that can generate human-like responses to natural language inputs. 

Through the use of these 2 models, we will discuss practical examples of how NLP can be applied in the One Health context, including case studies of outbreak detection and social media monitoring. Participants will have the opportunity to work with BERT and ChatGPT in hands-on exercises to gain practical experience with these powerful tools. By the end of the session, participants will have a deeper understanding of how NLP can be used in the One Health context and be equipped with the skills to apply these techniques in their own work.


|   |   |   |   |
|---|---|---|---|
| <img src="https://mood-h2020.eu/wp-content/uploads/2020/10/logo_Mood_texte-dessous_CMJN_vecto-300x136.jpg" alt="mood"/> | <img src="https://www.murdoch.edu.au/ResourcePackages/Murdoch2021/assets/dist/images/logo.svg" alt="murdoch" /> | <img src="https://www.umr-tetis.fr/images/logo-header-tetis.png" alt="tetis"/> | <img src="https://www.inrae.fr/themes/custom/inrae_socle/logo.svg" alt="INRAE" /> |

Speaker: **Rémy DECOUPES** - Research engineer UMR TETIS / INRAE

------------------------

# 1. BERT
"[Bidirectional Encoder Representations from Transformers - Devlin et al - 2018](https://arxiv.org/abs/1810.04805)" from Google Research is an open-source pre-trained Language Model. BERT implements the well known "[Attention is all you need - Vaswani et al - 2017](https://arxiv.org/abs/1706.03762)"

Bert-case was trained on: 
+ Wikipedia (2.5 Billions of tokens)
+ Google books (0.8 Billions of tokens).

On two tasks:
+ Self-masking
+ Next sentence prediction

## 1.1 Transformers
A python library to easily work with BERT-like models




In [9]:
# installation
!pip install transformers
!pip install torch



In [1]:
# load BERT models
from transformers import AutoTokenizer, AutoModelForMaskedLM

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = AutoModelForMaskedLM.from_pretrained("bert-base-uncased")

  from .autonotebook import tqdm as notebook_tqdm
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


## 1.2 NLP tasks with BERT
Let's use transformers' pipeline on common NLP tasks

In [3]:
from transformers import pipeline

unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("The concept of One Health recognizes the interconnectedness of human, [MASK], and environmental health, and emphasizes the need for collaboration across sectors to address complex health challenges.")

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertForMaskedLM: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertForMaskedLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForMaskedLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'score': 0.41287505626678467,
  'token': 4111,
  'token_str': 'animal',
  'sequence': 'the concept of one health recognizes the interconnectedness of human, animal, and environmental health, and emphasizes the need for collaboration across sectors to address complex health challenges.'},
 {'score': 0.08554541319608688,
  'token': 2591,
  'token_str': 'social',
  'sequence': 'the concept of one health recognizes the interconnectedness of human, social, and environmental health, and emphasizes the need for collaboration across sectors to address complex health challenges.'},
 {'score': 0.052403345704078674,
  'token': 16928,
  'token_str': 'occupational',
  'sequence': 'the concept of one health recognizes the interconnectedness of human, occupational, and environmental health, and emphasizes the need for collaboration across sectors to address complex health challenges.'},
 {'score': 0.03880768641829491,
  'token': 5177,
  'token_str': 'mental',
  'sequence': 'the concept of one healt

In [6]:
next_sentence_preduction = pipeline('text-generation', model='bert-base-uncased')
next_sentence_preduction("The concept of One Health recognizes the interconnectedness of ")

If you want to use `BertLMHeadModel` as a standalone, add `is_decoder=True.`
Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertLMHeadModel: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
- This IS expected if you are initializing BertLMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertLMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'generated_text': 'The concept of One Health recognizes the interconnectedness of  of and and and and and and and and'}]