# Tutorial 2: Stance Detection on Tweets using NLP Methods

## Introduction

**Objective of the tutorial**: This tutorial will guide you through the process of stance detection on tweets using two main approaches: fine-tuning a BERT model and using large language models (LLMs).

**Prerequisites**: Basic Python skills and ML knowledge. Familiarity with NLP concepts is a plus.

## What is Stance Detection and Why is it Important?

Stance detection is an essential task in natural language processing that aims to determine the attitude or position expressed by an author towards a specific target, such as an entity, topic, or claim. The output of stance detection is typically a categorical label, such as "in-favor," "against," or "neutral," indicating the stance of the author in relation to the target. This task is critical for studying human belief dynamics, e.g., how people influence each other's opinions and how beliefs change over time.

There are two key challenges in stance detection, especially when working with large datasets like Twitter data. First, the underlying attitude expressed in the text is often subtle, which requires domain knowledge and context to correctly label the stance. Second, the corpus can be very large, with millions of tweets, making it impractical to manually annotate all of them.


[TODO]: Provide a few example tweets of stance detection in the wild.


In this tutorial, we will focus on stance detection in the context of COVID-19 vaccination tweets. We will analyze a dataset containing tweets about COVID-19 vaccination, with each tweet labeled as either in-favor, against, or neutral with respect to COVID-19 vaccination. Our goal is to develop a model that can accurately identify the stance expressed in these tweets.

To address these challenges, we will leverage advanced natural language processing (NLP) techniques like BERT and large language models (LLMs). BERT and LLMs are pre-trained on massive corpora, enabling them to capture subtle contextual information and better understand the nuances of language. With these NLP models, we can effectively adapt their general language understanding to the specific task of stance detection, even in cases where domain knowledge is required. This approach allows us to process large amounts of data with high accuracy while significantly reducing the need for manual annotation.


## Understanding BERT

In this section, we will briefly introduce BERT, a powerful NLP model that has been widely used in many NLP tasks. We will also discuss how BERT can be fine-tuned for stance detection.



### What is BERT and how it works

BERT, which stands for Bidirectional Encoder Representations from Transformers, is a groundbreaking natural language processing (NLP) model that has taken the world by storm. Created by researchers at Google in 2018, BERT has revolutionized the way we understand and analyze language. The model is designed to learn useful representations for words from unlabeled text, which can then be fine-tuned for a wide range of NLP tasks, such as stance detection, sentiment analysis, question-answering, among many.


In a nutshell, BERT is a powerful NLP model that leverages bidirectional context, Transformer architecture, and a pre-training and fine-tuning approach to achieve state-of-the-art performance on a wide range of tasks. I will describe each of these components in more detail below.

[TODO] Should add a link to the BERT iteactive tutorial here.


#### Bidirectional Context: Understanding Context in Both Directions

Language is complex, and understanding it is no simple task. Traditional NLP models have focused on reading text in one direction, either from left-to-right or right-to-left, making it difficult for them to grasp the full context of a word or phrase [TODO: should check whether this is true and should add examples]. BERT, however, is designed to process text in both directions, allowing it to understand the meaning of words based on the words that come before and after them. This bidirectional approach helps BERT capture the subtle nuances of language and produce more accurate results.


[TODO] Should add a figure to illustrate the bidirectional context here.
[TODO] Should add a example to show why bi-directional context is important here.


#### A Powerful Architecture: Transformers

BERT is built upon the Transformer architecture, which was introduced by Vaswani et al. in 2017. Transformers are a type of neural network architecture that rely on self-attention mechanisms to process input sequences in parallel, rather than sequentially as in traditional recurrent neural networks (RNNs) or long short-term memory (LSTM) networks. This parallel processing allows Transformers to be highly efficient in handling long sequences of text, making them well-suited for NLP tasks.


[TODO] Should add a figure to show the self-attention mechanism here.


#### Pre-training and Fine-tuning: Learning from Lots of Text and Adapting to Specific Tasks

One of the key secrets behind BERT's success is its ability to learn from vast amounts of text and then adapt that knowledge to specific tasks. It is usually composed of two stages: pre-training and fine-tuning.

##### Pre-training phase

During the initial pre-training phase, BERT is exposed to massive amounts of text from sources like Wikipedia and online books, allowing it to learn general language understanding. During the pre-training phase, BERT learns to predict missing words in a sentence (masked language modeling) and to determine if two sentences follow each other (next sentence prediction). This phase allows BERT to learn the relationships between words even without any task-specific labels (e.g., stance labels are not needed for pre-training).

##### Fine-tuning phase

After pre-training, BERT can be fine-tuned for a specific task with a smaller labeled dataset (e.g., tweets with stance labels). Fine-tuning involves updating the model's weights using the labeled data, allowing BERT to adapt its general language understanding to the specific task. This process is relatively fast and requires less training data compared to training a model from scratch.



****


### BERT sub-word tokenization

One caveat of BERT is that it requires a special "subword-tokenization" process (i.e., WordPiece tokenization). That is, it deos not directly encode each individual word, but rather encode each word as a sequence of "sub-word tokens". For example, the word "university" can be broken down into the subwords "uni" and "versity," which are more likely to appear in the corpus than the word "university" itself. This process of breaking down words into subwords is called sub-word tokenization.

Sub-word tokenization is important for several reasons. Just to name two important ones:

#### Consistent Representation of Similar Words

Tokenization ensures that the text is represented in a consistent manner, making it easier for the model to learn and identify patterns in the data. By breaking the text into tokens, the model can focus on the essential units of meaning, allowing it to better understand and analyze the input.
For an example, let us consider the following two words: "anti-vaccine" and "antitrust".

These words share a common prefix "anti-", but they are related to different topics. Tokenization can help standardize the text by breaking them down into smaller, overlapping tokens, i.e., ["anti", "-", "vaccine"] and ["anti", "trust"]. 

By representing the words as a sequence of tokens, the model can more effectively identify the commonality between them (the shared "anti" prefix) while also distinguishing the unique parts ("-vaccine" and "trust"). This approach helps the model learn the relationships between word parts and the overall meaning of words in a more generalizable way, while also capturing the nuances that make each word unique.

#### Handling Out-of-Vocabulary Words 

One of the challenges in NLP is dealing with words that the model has not encountered during training, also known as out-of-vocabulary (OOV) words. By using tokenization, BERT can handle OOV words more effectively. Subword tokenization breaks down words into smaller, meaningful parts that the model has likely seen before, allowing it to better understand and process previously unseen words.

For example, suppose we have a sentence containing a relatively newly-coined word: "Anti-vaxxer".

Here, the word "anti-vaxxer" is a neologism that may not be present in the model's vocabulary, particularly if the model was trained on older data. If we used a simple word-based tokenization, the model would struggle to process this word. However, using a subword tokenization approach, the word can be broken down into smaller parts that the model has likely seen before:

["anti", "-", "vaxx", "er"]

This breakdown allows the model to infer the meaning of the previously unseen word based on the subword components it has encountered during training. The model can recognize the "anti" prefix and the similarity of "vaxx" to "vacc" (as in "vaccine"). This enables BERT to better understand and process out-of-vocabulary words, especially those that are relatively new or coined, making it more robust and adaptable to a wide range of text inputs.

****


## Fine-tuning a BERT Model with HuggingFace

Now, let's fine-tune a BERT model using the HuggingFace Transformers library.



### Installing HuggingFace Transformers library


In [None]:
# !pip install transformers

: 

### Loading a pretrained BERT model

In [None]:
# from transformers import BertForSequenceClassification, BertTokenizer

# model_name = "bert-base-uncased"
# tokenizer = BertTokenizer.from_pretrained(model_name)
# model = BertForSequenceClassification.from_pretrained(model_name)

### Preparing and processing the labeled dataset
_Instructions on loading and processing the labeled dataset_

In [None]:
# Load and preprocess the data here

### Fine-tuning the model for stance detection
_Guide on how to fine-tune the BERT model for stance detection_

In [None]:
# Fine-tune the model here

: 

### Evaluating the model and analyzing results
_Methods for evaluating the model and analyzing the results_

In [None]:
# Evaluate the model and analyze the results here

: 

****

## Introduction to Large Language Models (LLM) for Stance Detection

Large Language Models (LLMs) are a type of advanced natural language processing model that has gained significant attention in recent years. These models are designed to understand and generate human-like text by learning from vast amounts of data. In the context of stance detection, LLMs can offer powerful and flexible tools to analyze and classify text based on the stance or attitude expressed towards a particular topic.

### Pre-training: Learning from Massive Amounts of Text

Like BERT model, LLMs also rely on pre-training to learn from massive amounts of text. During pre-training, LLMs are exposed to a large corpus of text, which allows them to learn the structure and style of human language. By learning from a diverse range of text sources, LLMs can build a rich understanding of language, including grammar, vocabulary, and context. This extensive knowledge can be particularly useful for detecting stances in text.


During training, LLMs optimize their parameters to minimize a loss function, which is a measure of the difference between the model's predictions and the actual target outputs. In the case of language models, the loss function is typically based on the likelihood of the correct next word (or token) given the context. By minimizing this loss function, the model learns to generate text that closely resembles the structure and style of the training data.

[TODO: add a diagram to show the loss function]

### Contrast with BERT
 

#### Model Size and Traing Corpus Size

While BERT is considered a Large Language Model, it has a much smaller number of parameters in the model, and has a smaller training data during the pre-training phase, compared to some of the more recent LLMs. Because of this constraint, BERT typically requires fine-tuning on a specific task, using a labeled dataset, to perform optimally.

Newer LLMs, like GPT-3 or Flan-T5, have been trained on even larger datasets and have demonstrated remarkable capabilities, including the ability to perform tasks with little or no fine-tuning. This is due to their extensive training, which allows them to generate more accurate and coherent responses in a variety of situations, including stance detection.


[TODO: add a table to illustrate the difference between BERT and LLMs]


#### The Potential of Prompting

While BERT is designed to be fine-tuned on specific tasks like stance detection using labeled data, some of these more recent LLMs can perform stance detection without fine-tuning, but with prompting techniques (i.e., the way you "ask" these models questions). These techniques involve providing the model with context or examples to guide its response, rather than relying on fine-tuning with labeled data.

For example, with a zero-shot approach, an LLM can perform stance detection on tweets without being fine-tuned on a specific dataset. The model can understand the task and generate an appropriate response based on its extensive knowledge learned during pretraining.

[TODO: add a table to illustrate the difference between BERT and LLMs]


### Open-source model flan-t5

_Brief introduction to flan-t5_



### OpenAI's GPT 3.5

_Brief introduction to GPT 3.5_



## Prompting Techniques for LLMs

Let's explore three different prompting techniques for large language models.



### Zero-shot prompting

_Description of zero-shot prompting_



### Few-shot prompting

_Description of few-shot prompting_



### Chain-of-thoughts method

_Description of the chain-of-thoughts method_



## Stance Detection using LLMs

Now, let's implement stance detection using flan-t5 and GPT 3.5.



### Setting up the environment

_Instructions for setting up the environment, including installing required libraries_


In [None]:
# Set up the environment here

: 

### Preparing input prompts for each method

_Guide on how to create input prompts for each prompting technique_


In [None]:

# Prepare input prompts here



### Implementing stance detection with flan-t5 and GPT 3.5

_Instructions for implementing stance detection using the two LLMs_


In [None]:

# Implement stance detection using flan-t5 and GPT 3.5



### Comparing results and performance

_Methods for comparing the results and performance of the two LLMs_


In [None]:

# Compare results and performance here


: 


## Conclusion

In this tutorial, we learned how to perform stance detection on tweets using a fine-tuned BERT model and large language models.