# Large Language Models

There has been a huge hype around the easily-accessible Large Language Models (LLMs). Tools such as ChatGPT, CoPilot, and so on, keep appearing in new brands and versions every week. This technology is being largely discussed due to their ethical and technical implications, and it is likely that they are going to stay around (officially or not) for at least a few years.

## How LLMs work

Language Models are predictors. They have been trained to predict a distribution of probabilities for a next word given $k$ previous words (which are called *context*), that is:

$$
P(w_n | w_{n-1}, w_{n-2} \cdots w_{n-k})
$$

The model selects the next word by sampling from the distribution $P(w_n | w_{n-1}, w_{n-2} \cdots w_{n-k})$. After that, the predicted word is injected into the context (and possibly the $k$-th word is discarded) and the prediction is executed again. This is called *autorregressive generation*.

## Strengths of LLMs

LLMs are able to predict words based on training data, that is, they are able to *mimic* texts that have been previously seen. However, this is a clever mimic in which the model is able to recombine previous sentences. For example, a model trained on "I took my dogs for a walk" and "I took my children to the park" could combine these sentences and yield: "I took my dog to the park".

Also, within the training data, there are dialogues, requests, and so on. As a consequence, the model is able to identify which of the $k$ previous words are more relevant to predict the next ones considering the context they are in: is this a conversation? A lecture? Should the prediction be as if it was a specialist? Or a student? Or should we only use words that children can understand?

Hence, by populating the context (that is, the previous $k$ words) with phrases that make sense towards a particular problem, we can explicitly "ask" the model, using natural language, to behave as something specific. There are many strategies for such:

## Strategies for LLMs

Before experimenting with these strategies, navigate to [Google AI Studio](https://aistudio.google.com/prompts/new_chat) and, if necessary, create a free account. We will start from there.

### Chatting and behavior


#### Chat

In the *create prompt* tab, you might be seeing a panel in the bottom saying: "type something". Go ahead and type: "hello, LLM model. I am a student trying to figure out how to use LLMs for good, without incurring in ethical issues. Can you help me with that?".

The model will probably give you a lot of information. Save that for later - we will first explore the possibilities here.

#### System prompt

Take a look at the panel at the top, which says: "system instructions". You can use the system instructions to set the tone, style, and typical behaviors of your system. For example: set the system instructions to:  "You are a wicked witch living in a forest. You answer everything very shortly, and always makes a reference to your house made of candy, and you creeply invite people to come in." and then ask again: "Hi. Can you tell me about LLMs?"

#### Temperature

Temperature is a parameter used in the sampling process (when a new word is being sampled from the conditional distribution). The name "temperature" is a metaphor for annealing processes used to make metals. Higher temperature means the distribution converted to "more similar to a uniform distribution", whereas lower temperatures make the outputs closer to deterministic. Move the temperature slider left and right and try asking the same thing over and over again. What happens?

### Zero-shot prompting

Because LLMs have been trained on a lot of data, it might yield some texts that make a lot of sense. This is not *creative* or *intelligent* - rather, it is a recombination of texts that most humans are likely not to have read.

Use the system prompt to give your LLM some personality that could be useful for your research (for example: do you need an advisor? Do you need to talk to a specialist?).

One of my favourite ways to use it is to make the LLM pretend that it is a reviewer for my paper: "You are reviewer 2. You are going to find reasons to reject my paper, and ground them in actual facts. Also, give constructive feedback by indicating the ways each of the points could be addressed. However, we cannot run the experiment again - we can only rewrite the paper". This gives me timely, constructive feedback on my work, and often raises important points to discuss.

One important ethical consideration here is that many communities consider wrong or unethical to ask the LLM to write or generate deliverables for you. This is because science is highly grounded in trust and liability, and a machine that predicts the next work has neither - even if the final product is of decent quality. People reading your paper want to read *your* words, not simply those automatically generated.

On the other hand, other communities believe LLMs are powerful productivity tools, and their usage is advantageous. LLMs can especially benefit non-native speakers of some language by rephrasing sentences to a specific style. Try getting a difficult passage from this lesson, for example, and change its style to the lyrics of your favourite style.

### Few-shot prompting

Although LLMs have been trained on a huge amount of texts, thus are theoretically capable of generating any book contemporary humans have ever read, this does not means that it can, spot on, generate the exact content that we want. An idea that has shown to work well is to provide the LLM with some examples within the context, that is: "do X, Y, Z. Some examples are: A, B, C. The actual material to work with is: [the material]".

It is hard to know exactly what examples, and how many examples, are necessary. What usually happens is that we start with zero-shot prompting and start adding examples as needed by analyzing the results.

### Retrieval-Augmented Generation (RAG)

LLMs work somewhat well at rephrasing and summarizing, but, to generate content in a zero-shot setting, they can randomly create gibberish (even if it is gibberish that reads well-articulated). To solve this, we could, instead of asking for a "generation", ask for a summary of a text that we know will possibly contain the answer for our question. For example, we might wish to understand how LLMs perform in sentiment analysis. If we get [this paper](https://arxiv.org/abs/2305.15005), we can find interesting material to summarize. However, we can, maybe, get 3 different papers on this same topic:



In [3]:
paper1 = """Sentiment Analysis in the Era of Large Language Models: A Reality Check
Wenxuan Zhang, Yue Deng, Bing Liu, Sinno Jialin Pan, Lidong Bing
Sentiment analysis (SA) has been a long-standing research area in natural language processing. It can offer rich insights into human sentiments and opinions and has thus seen considerable interest from both academia and industry. With the advent of large language models (LLMs) such as ChatGPT, there is a great potential for their employment on SA problems. However, the extent to which existing LLMs can be leveraged for different sentiment analysis tasks remains unclear. This paper aims to provide a comprehensive investigation into the capabilities of LLMs in performing various sentiment analysis tasks, from conventional sentiment classification to aspect-based sentiment analysis and multifaceted analysis of subjective texts. We evaluate performance across 13 tasks on 26 datasets and compare the results against small language models (SLMs) trained on domain-specific datasets. Our study reveals that while LLMs demonstrate satisfactory performance in simpler tasks, they lag behind in more complex tasks requiring deeper understanding or structured sentiment information. However, LLMs significantly outperform SLMs in few-shot learning settings, suggesting their potential when annotation resources are limited. We also highlight the limitations of current evaluation practices in assessing LLMs' SA abilities and propose a novel benchmark, \textsc{SentiEval}, for a more comprehensive and realistic evaluation"""

paper2 = """Sentiment Analysis through LLM Negotiations
Xiaofei Sun, Xiaoya Li, Shengyu Zhang, Shuhe Wang, Fei Wu, Jiwei Li, Tianwei Zhang, Guoyin Wang
A standard paradigm for sentiment analysis is to rely on a singular LLM and makes the decision in a single round under the framework of in-context learning. This framework suffers the key disadvantage that the single-turn output generated by a single LLM might not deliver the perfect decision, just as humans sometimes need multiple attempts to get things right. This is especially true for the task of sentiment analysis where deep reasoning is required to address the complex linguistic phenomenon (e.g., clause composition, irony, etc) in the input.
To address this issue, this paper introduces a multi-LLM negotiation framework for sentiment analysis. The framework consists of a reasoning-infused generator to provide decision along with rationale, a explanation-deriving discriminator to evaluate the credibility of the generator. The generator and the discriminator iterate until a consensus is reached. The proposed framework naturally addressed the aforementioned challenge, as we are able to take the complementary abilities of two LLMs, have them use rationale to persuade each other for correction.
Experiments on a wide range of sentiment analysis benchmarks (SST-2, Movie Review, Twitter, yelp, amazon, IMDB) demonstrate the effectiveness of proposed approach: it consistently yields better performances than the ICL baseline across all benchmarks, and even superior performances to supervised baselines on the Twitter and movie review datasets.
"""

paper3 = """Sentiment Analysis in the Age of Generative AI
Jan Ole Krugmann, Jochen Hartmann 
In the rapidly advancing age of Generative AI, Large Language Models (LLMs) such as ChatGPT stand at the forefront of disrupting marketing practice and research. This paper presents a comprehensive exploration of LLMs’ proficiency in sentiment analysis, a core task in marketing research for understanding consumer emotions, opinions, and perceptions. We benchmark the performance of three state-of-the-art LLMs, i.e., GPT-3.5, GPT-4, and Llama 2, against established, high-performing transfer learning models. Despite their zero-shot nature, our research reveals that LLMs can not only compete with but in some cases also surpass traditional transfer learning methods in terms of sentiment classification accuracy. We investigate the influence of textual data characteristics and analytical procedures on classification accuracy, shedding light on how data origin, text complexity, and prompting techniques impact LLM performance. We find that linguistic features such as the presence of lengthy, content-laden words improve classification performance, while other features such as single-sentence reviews and less structured social media text documents reduce performance. Further, we explore the explainability of sentiment classifications generated by LLMs. The findings indicate that LLMs, especially Llama 2, offer remarkable classification explanations, highlighting their advanced human-like reasoning capabilities. Collectively, this paper enriches the current understanding of sentiment analysis, providing valuable insights and guidance for the selection of suitable methods by marketing researchers and practitioners in the age of Generative AI.
"""

then, we can automatically make a prompt including the three texts:

In [5]:
prompt = f"""Make a summary of the current state of affairs in Sentiment Analysis using LLMs. Find conflicting perspectives and cite sources. Use these references:
Paper 1: {paper1}
Paper 2: {paper2}
Paper 3: {paper3}

Also, give the output written as an excertp of a literature review. Use all citations in APA style. Do not use superlatives or embelishment at all.
"""

print(prompt)


Make a summary of the current state of affairs in Sentiment Analysis using LLMs. Find conflicting perspectives and cite sources. Use these references:
Paper 1: Sentiment Analysis in the Era of Large Language Models: A Reality Check
Wenxuan Zhang, Yue Deng, Bing Liu, Sinno Jialin Pan, Lidong Bing
Sentiment analysis (SA) has been a long-standing research area in natural language processing. It can offer rich insights into human sentiments and opinions and has thus seen considerable interest from both academia and industry. With the advent of large language models (LLMs) such as ChatGPT, there is a great potential for their employment on SA problems. However, the extent to which existing LLMs can be leveraged for different sentiment analysis tasks remains unclear. This paper aims to provide a comprehensive investigation into the capabilities of LLMs in performing various sentiment analysis tasks, from conventional sentiment classification to aspect-based sentiment analysis and multifacete

## Using prompts within programming

If you browse straight to https://ai.google.dev/gemini-api/docs, you will find an example of how google's GenAI API works. You will need to install a package (`pip install google-genai`) before using the code. Also, you will need to get an API key from Google. We can start with:

In [None]:
from google import genai

client = genai.Client(api_key="YOUR_API_KEY") # Get your API key at the Google AI Studio website!

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=prompt,
)

print(response.text)

This is nice, but let's improve our prompting so that it can be embedded into our workflow:

In [None]:
from google import genai


prompt = f"""Make a summary of the current state of affairs in Sentiment Analysis using LLMs. Find conflicting perspectives and cite sources. Use these references:
Paper 1: {paper1}
Paper 2: {paper2}
Paper 3: {paper3}

We need the output in a JSON format, using the following schema:
Author = {{'Name' : str}} # Each of the authors of the paper
Paper = {{'Title' : str, # the paper title 
  'Authors' : list[Author],
  'Main conclusion' : str # the main conclusion of the paper 
}}
Return: list[Paper]
"""

client = genai.Client(api_key="AIzaSyCcuZKxUQTWtWGOCAH6f8TQLcDM_q_U7pM") # Get your API key at the Google AI Studio website!

response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents=prompt,
    config={
        'response_mime_type': 'application/json', # This part is VERY IMPORTANT!
    },

)

print(response.text)

[
  {
    "Title": "Sentiment Analysis in the Era of Large Language Models: A Reality Check",
    "Authors": [
      {
        "Name": "Wenxuan Zhang"
      },
      {
        "Name": "Yue Deng"
      },
      {
        "Name": "Bing Liu"
      },
      {
        "Name": "Sinno Jialin Pan"
      },
      {
        "Name": "Lidong Bing"
      }
    ],
    "Main conclusion": "LLMs show satisfactory performance in simpler sentiment analysis tasks, but lag behind small language models (SLMs) in more complex tasks that require deeper understanding. However, LLMs outperform SLMs in few-shot learning, which means that LLMs are useful when there are few examples for training the models."
  },
  {
    "Title": "Sentiment Analysis through LLM Negotiations",
    "Authors": [
      {
        "Name": "Xiaofei Sun"
      },
      {
        "Name": "Xiaoya Li"
      },
      {
        "Name": "Shengyu Zhang"
      },
      {
        "Name": "Shuhe Wang"
      },
      {
        "Name": "Fei Wu"
     

Now, we can import our json document to a dictionary and make a dataframe:

In [14]:
import json 
import pandas as pd

data = json.loads(response.text)
df = pd.DataFrame(data)
df.head()

Unnamed: 0,Title,Authors,Main conclusion
0,Sentiment Analysis in the Era of Large Languag...,"[{'Name': 'Wenxuan Zhang'}, {'Name': 'Yue Deng...",LLMs show satisfactory performance in simpler ...
1,Sentiment Analysis through LLM Negotiations,"[{'Name': 'Xiaofei Sun'}, {'Name': 'Xiaoya Li'...","A multi-LLM negotiation framework, where LLMs ..."
2,Sentiment Analysis in the Age of Generative AI,"[{'Name': 'Jan Ole Krugmann'}, {'Name': 'Joche...",Large Language Models (LLMs) can compete with ...


## Challenge

LLMs are a very recent advance. Every week or two, there is a new model coming up. The challenge here is to find how to use them in your workflow. LLMs have shown to be useful to summarize texts - and information retrieval is one of the many perspectives towards summarization.

Now it is on your hands. Let's do the following:

1. Find three articles that are relevant to your field of study, and that are recent enough (post-2022).
1. Use the procedure we have done above to find their main conclusions, their arguments, and other information you might find relevant to your research.
1. Save the dataframe you generated as an excel spreadsheet. Check if all information is correct, and, if necessary, correct it.
1. Use the results as a source to make a paragraph with a literature review for the three articles. Evaluate the literature review. What could be improved in it?