---
title: "Applied Large Language Models"
subtitle: ''
author: Zach Dickson
institute: Fellow in Quantitative Methodology <br>London School of Economics
bibliography: references.bib
format:
  revealjs: 
    fontsize: 1.5em
    logo: figures/LSE_logo.svg
    embed-resources: true
    slide-number: true
    preview-links: auto
    transition: convex
    caption: true
    tabularx: true
    citation_package: biblatex
    transition-speed: fast
    theme: [simple, custom.scss]
    footer: <a></a>
---




# Schedule {.scrollable .smaller}




<figure>
  <img align="right" src="figures/LLMS_wordcloud.jpg" alt="Trulli" style="width:45%">
<br>


- A Brief Introduction to Large Language Models (LLMs) (50 minutes)
  + Word embeddings vs. LLMs
  + Pre-trained models (BERT, GPT)
  + Applications in the social sciences 
  + Python basics 

**10 minute break**

- Applied Example: Text Classification 
  - Fine-tune a transformer model to predict ideology 
  - Validation and verification

**1 hour lunch**

- Applied Example: Topic Modeling & Text Clustering 
  - Extract issue topics from parliamentary bills


**10 minute break** 

- Everything else 
  + State-of-the-art applications
  + Validating our models 
  + Limitations 
  + Future applications 

</figure>



# My Background & Research Interests





# What are Large Language Models (LLMs)? 

- A language model is a machine learning model that intends to predict the next word in a sentence given the previous words. 
  - Example: Autocomplete on your phone 

- These models work by estimating the probability of a token (e.g. word), or a sequence of tokens, given the context of the sentence. 
  - Example: "The cat is on the ___" 
    + cup: 2.3%
    + mat: 8.9%
    + computer: 1.2%
    + coffee: 0.9%
  - The model predicts the next word is "mat" with the highest probability.
  - A sequence of tokens could be a sentence, paragraph, or entire document.



::: footer
Introduction to Large Language Models
:::


# What are Large Language Models (LLMs)? 

- Modeling human language is very complex 
  - Syntax, semantics, pragmatics, etc. 
  - Context, ambiguity, and nuance 
  - Cultural and social norms

- As models get larger, they can capture more of these complexities 
  - More parameters, more data, more context 
  - Better at predicting the next word in a sentence 
  - Better at understanding the meaning of words and sentences



# Transformers 


- Transformers are a type of neural network architecture that has revolutionized natural language processing (NLP). 
  - Introduced by [Vaswani et al. (2017)](https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=attention+is+all+you+need&btnG=&oq=attention+i)
  - The model is based on the concept of "attention"

- Transformers consist of an encoder and a decoder 
  - The encoder processes the input sequence and produces a sequence of hidden states
  - The decoder takes the hidden states produced by the encoder and generates the output sequence


# Transformers Architecture



<figure>
  <img align="center" src="figures/transformers_arc.png" alt="Trulli" style="width:95%">
</figure>


# Attention Mechanism



- Attention is a mechanism that allows the model to focus on different parts of the input sequence when making predictions. 
  - The model can learn which parts of the input are most important for making predictions. 
  - This allows the model to capture long-range dependencies in the data. 
  - The model can also learn to focus on different parts of the input depending on the context of the sentence.



# Attention Mechanism


![Attention Mechanism](figures/attention.png)



# How do LLMs generate text? 

- LLMs generate text by sampling from the probability distribution over the vocabulary at each time step. 
  - The model predicts the next word in the sequence by sampling from the distribution over the vocabulary. 
  - The model can generate text one word at a time, or it can generate multiple words at once. 
  - The model can also generate text conditioned on a specific input, such as a prompt or a context. 
  - The model can generate text that is coherent and grammatically correct, but it can also generate text that is nonsensical or incoherent.

- **Example:**
  - "My dog, Max, knows how to perform many traditional dog tricks. _______"
    - 2.3%: "For example, he can sit, stay, and roll over."
    - 2.1%: "He can also fetch a ball, and he loves to play with his toys."



# Pre-trained Models

- Pre-trained models are large language models that have been trained on a large amount of text data 
  - Trained on a large corpus of text data, such as Wikipedia, news articles, and books 
  - Unsupervised learning, which means it does not require labeled data 
  - Trained for a long time, often several days or weeks

- Pre-trained models can be fine-tuned on a specific task or dataset
  - Fine-tuning involves updating the parameters of the pre-trained model on a smaller dataset that is specific to the task
  - Fine-tuning allows the model to learn the specific patterns and relationships in the data




# Pre-trained Models

- Pre-trained models example: [BERT](https://huggingface.co/google-bert/bert-base-uncased) (Bi-directional Encoder Representations from Transformers)
  - Introduced by [Devlin et al. (2018)](https://arxiv.org/abs/1810.04805)
  - Pre-trained on a large corpus of text data, such as Wikipedia and news articles
  - Fine-tuned on specific tasks, such as question answering, text classification, and named entity recognition



# BERT 


In [None]:
from transformers import pipeline
unmasker = pipeline('fill-mask', model='bert-base-uncased')
unmasker("Hello I'm a [MASK] model.")

# Classification, Categorization & Regression {.scrollable .smaller}

<br>

 
- Identify frames and narratives in political discourse: [Bailard et al. (2024)](https://doi.org/10.1017/S0003055423001478)
  - Classify text by party, ideology or topic: [Lai et al. (2024)](https://doi.org/10.1017/pan.2023.42)
  - Predicting Conflict Intensity: [Häffner
 et al. (2023)](https://doi.org/10.1017/pan.2023.7) and [Wang (2023)](https://doi.org/10.1017/pan.2023.36)

<figure>
  <img src="figures/reddit.png" alt="Trulli" style="width:90%" class="center">
  <figcaption>Source: Lai et al. (2024) in *Political Analysis*</figcaption>
</figure>




::: footer
Political Science Applications
:::


# LLMs in Survey Experiments and Polling {.scrollable .smaller}

<br>

- Political persuasion and micro-targeting: [Hackenburg & Margetts (2023)](https://files.osf.io/v1/resources/wnt8b/providers/osfstorage/64d8f1fa94a6be0e0212e744?action=download&direct&version=1) and [Simchon et al. (2024)](https://doi.org/10.1093/pnasnexus/pgae035)

- Tailoring messages to specific audiences: [Mellon et al. (2024)](https://journals.sagepub.com/doi/10.1177/20531680241231468) and [Velez (n.d.)](figures/Latino_Issue_Publics.pdf)


- Synthetic survey data generation: [Bisbee et al. (2023)](https://osf.io/preprints/socarxiv/5ecfa), [Sanders et al. (2023)](https://arxiv.org/abs/2307.04781) and [Simmons & Hare (2023)](https://arxiv.org/abs/2310.17888)
  - A word of caution (see [Bisbee et al. (2023)](https://osf.io/preprints/socarxiv/5ecfa))


<figure>
  <img align="right" src="figures/public_opinion.jpg" alt="Trulli" style="width:60%">
<br>
</figure>



# Some Applied Examples {.scrollable .smaller}



1. Using the GPT-3.5 API
2. Classification Validation
3. Building your own classifier
    a. validation and verification

<br>

[**Notebook**](https://colab.research.google.com/drive/1No_V3BzhWim9Zp1xfl6bfiiIYSoPwz9y?usp=sharing)



<figure>
  <img align="right" src="figures/Hugging-Face2.png" alt="Trulli" style="width:60%">
<br>
</figure>



<figure>
  <img align="right" src="figures/colab.png" alt="Trulli" style="width:60%">
<br>
</figure>