<a
    href="https://colab.research.google.com/github/univiemops/tewa1-computational-cognition/blob/main/12%20Large%20Language%20Models.ipynb"
    target="_blank" rel="noopener"> <img
      src="https://colab.research.google.com/assets/colab-badge.svg"
      alt="Open In Colab"> </a>


# Tutorial 12 - Large Language Models

<img src="https://cdn.pixabay.com/photo/2022/12/07/11/48/parrot-7640960_960_720.png" alt="bandit-problem" style="height:300px; background:white">


*Written and revised by Jozsef Arato, Mengfan Zhang, Dominik Pegler, Annika Trapple*  
Computational Cognition Course, University of Vienna  
https://github.com/univiemops/tewa1-computational-cognition

---

# This week's lab:

This week's assignment will concentrate on the application of large language models (LLMs), specifically focusing on OpenAI's ChatGPT, within psychological research. We will explore two main use-cases: (1) generating synthetic response texts from simulated participants using the ChatGPT API, and (2) analyzing these responses through the same model to derive psychological insights.

**Learning Goals:**

- How to make API requests
- How to create synthetic text data with a LLM
- How to analyze a body of text with a LLM

## 0. Imports & Setup

In [None]:
# !pip install openai==1.35.0

In [None]:
import os
import random

import matplotlib.pyplot as plt

# from openai import OpenAI # that's default way – if you use a model directly from OpenAI
# below is the tewa way, because we use a model that is hosted on microsoft azure
from openai import AzureOpenAI as OpenAI
from wordcloud import WordCloud

## 1. Generate Text Data

Using a Large Language Model to generate synthetic text data can be beneficial for research, especially in simulation studies. It allows us to test and improve our methods and research tools under controlled conditions, without the ethical and practical constraints of collecting real-world data. 

### 1.1. Set Up the API-Client for the Data Generation Part

Request the following credentials from your course instructors and store them.  

In [None]:
API_KEY = #
MODEL = #
AZURE_DEPLOYMENT = #
AZURE_ENDPOINT = #
API_VERSION = #

### 1.2. Set Up Prompt

Your task now is to think of a possible question in a questionnaire, and then, in the next step, to define an instruction for a chatbot to answer that question. Store this instruction as a string in the variable prompt.

In [None]:
system = "You are TextBot, powered by GPT-4o-Mini, a large language model trained by OpenAI. TextBot focuses its attention on creating text in the same way and style a human would. Since every human is different TextBot is able to create diverse responses. In general TextBot generates rather simple human answers."
prompt = # YOUR CODE HERE

### 1.3. Run Text Generation

Now create the simulated data for `n_participants` using above prompt. A single request looks like this:

```python
        response = client.chat.completions.create(
            model=MODEL,
            messages=[
                {"role": "system", "content": system},
                {"role": "user", "content": prompt},
            ],
            max_tokens=75,
            temperature=temperature,
        )
```

You can use a `for` loop to do this. Please use a different/random `temperature` for each participant (see the lecture slides what the temperature parameter is about).

In [None]:
n_participants = # CHOOSE A NUMBER OF PARTICIPANTS (it is best to start with 1 and once your code works, increase it to something like 50)

# YOUR CODE HERE

### 1.4. Inspect Generated Data

In [None]:
# YOUR CODE HERE

## 2. Topic Modeling

Topic modeling is a statistical technique used to identify recurring themes within large bodies of text by detecting common patterns of word co-occurrences. Common methods are Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF). In psychology, we can use topic modeling to analyze free-text responses from surveys to enrich our data analysis. Like this we can automate the extraction of prevalent topics, quantify qualitative data, and gain insights into respondents' mental models. Instead of the common methods mentioned above, we will take a new approach and use a large language model to extract topics from our text dataset.

In our example, we want to know what the main topics were in the answers of the simulated participants.

### 2.1. Set up an API-Client for the analysis part

In [None]:
client = # YOUR CODE HERE (see 1.1., it's identical)

### 2.2. Set Up Prompt

Your task now is to define a command for your chatbot to return the topics across all text responses as a list of 10 words. It should also assign an importance weight to each word.

In [None]:
# YOUR CODE HERE

### 2.3. Extract Topics from Text Corpus

Send a request to your chatbot similar to before but this time with the new prompt and store the result of the 10 words in the variable `words` and the importance weights in the variable `relevance`. You can use a `temperature` of your choice.

In [None]:
# YOUR CODE HERE

### 2.4. Create Wordclouds from Topics


In [None]:
# !pip install wordcloud

In [None]:
def create_wordcloud(words, relevance):
    word_freq = dict(zip(words, relevance))
    wordcloud = WordCloud(
        width=800, height=400, background_color="white"
    ).generate_from_frequencies(word_freq)
    plt.figure(figsize=(10, 5))
    plt.imshow(wordcloud, interpolation="bilinear")
    plt.axis("off")
    plt.show()


create_wordcloud(words, relevance)