# Generating LLM Perspectives & Comparing to Human Perspectives
So far, we've been using the human provided responses as the "ground truth" for what is the full overton window. However, we can also generate our own "ground truth" by asking LLMs to provide their own perspectives.

We're then going to compare the mapping of the LLM perspectives to the human provided responses.

This takes in the `habermas_machine_questions_with_responses.csv` file and generates responses from the LLMs and will update it with the new LLM perspectives.

In [2]:
from dotenv import load_dotenv
import pandas as pd, numpy as np, os

# Load environment variables
load_dotenv()
DATA_PATH = os.getenv('DATA_PATH')
TEMP_PATH = os.getenv('TEMP_PATH')

df_questions = pd.read_csv(DATA_PATH+'habermas_machine_questions_with_responses.csv')

In [16]:
df_questions.head()

Unnamed: 0,question.text,own_opinion.text,question_topic,question_id,gpt-3.5-turbo,gpt-4o,gemini-1.5-flash-002,llama-3.1-8B,gemma-2b,mistral-7B
0,Are people who hold high political office and ...,['One minute I think they should disclose and ...,74,15,There is no universal ethical requirement for ...,The ethical requirement for individuals in hig...,There's no universally agreed-upon ethical sta...,The requirement for public officials to disclo...,\n\nThis question is complex and there is no e...,\n\n[INST] Are people who hold high political ...
1,Are the NHS and the UK welfare state fit for p...,"[""The NHS and Welfare are not working how they...",51,19,This is a complex question that can have diffe...,The National Health Service (NHS) and the wide...,"Whether the NHS and the UK welfare state are ""...",The NHS and the UK welfare state are complex s...,The UK's National Health Service (NHS) and its...,"\n\nThe NHS is a national treasure, but it is ..."
2,Are there any limits on what can be allowed to...,"[""We have to keep in mind some kind of decency...",52,34,"Yes, there are regulations and guidelines set ...","Yes, there are several limits and regulations ...","Yes, there are many limits on what can be broa...","Yes, there are limits on what can be allowed t...","Yes, there are many limits on what can be allo...",\n\n[INST] What are the limits on what can be ...
3,Are there any questions that we should never a...,['I cannot think of a topic which would fall i...,14,36,"There are certain topics that can be harmful, ...",The idea of restricting discussion or voting o...,There's no universally agreed-upon list of top...,While it's essential to maintain an open and i...,"Sure, there are certain questions that should ...",\n\n[INST] Are there any questions that we sho...
4,Are there limits to how much tax people should...,"[""I think there are limits for poorer people t...",52,37,There is debate and varying opinions on what c...,The question of whether there should be limits...,There's no universally agreed-upon limit to ho...,The concept of tax limits is a complex and deb...,"Sure, there are limits to how much tax people ...",\n\n[INST] What is the best way to raise money...


## Structured Generation of LLM Perspectives

In [10]:

from pydantic import BaseModel
from openai import OpenAI
import json

client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

class SinglePerspective(BaseModel):
    perspective: str

class PerspectiveChain(BaseModel):
    steps: list[SinglePerspective]

def generate_perspectives(question: str):
    """
    Generate a multiple perspectives of answers to a question.

    Args:
        question: Context question or query
    """
    system_prompt = f"""
You are a helpful assistant that generates multiple perspectives of answers to a question. You will be given a question and you will generate a list of possible answer perspectives. Make sure you cover all possible perspectives but do not repeat yourself.
"""

    prompt = f"""Question: {question}
Now, step by step, outline each broad answer perspective to this question."""

    chat_response = client.chat.completions.create(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt}
        ],
        temperature=1,
        response_format={
            'type': 'json_schema',
            'json_schema': 
                {
                "name": "PerspectiveChain", 
                "schema": PerspectiveChain.model_json_schema()
                }
            } 
    )

    result_object = json.loads(chat_response.choices[0].message.content)
    return [step['perspective'] for step in result_object['steps']]


In [19]:
from tqdm import tqdm

# Using a loop rather than .apply for restart simplicity & tqdm
llm_perspectives = []
for question in tqdm(df_questions['question.text'], desc="Generating perspectives"):
    perspectives = generate_perspectives(question)
    llm_perspectives.append(perspectives)

df_questions['llm_perspectives'] = llm_perspectives

Generating perspectives: 100%|██████████| 100/100 [06:47<00:00,  4.08s/it]


In [22]:
df_llm_perspectives = df_questions[['question.text', 'question_topic', 'question_id', 'llm_perspectives']]
df_llm_perspectives.head()

Unnamed: 0,question.text,question_topic,question_id,llm_perspectives
0,Are people who hold high political office and ...,74,15,"[Yes, for Transparency and Accountability: Pub..."
1,Are the NHS and the UK welfare state fit for p...,51,19,[Advocates of the current system often argue t...
2,Are there any limits on what can be allowed to...,52,34,[Legal and Regulatory Perspective: Various cou...
3,Are there any questions that we should never a...,14,36,[Some argue that there are indeed questions wh...
4,Are there limits to how much tax people should...,52,37,"[Yes, there should be limits: A perspective th..."


In [23]:
# To keep things separate and clean, we're going to save these to a different file.
df_llm_perspectives.to_csv(DATA_PATH+'habermas_machine_questions_with_LLM_generated_perspectives.csv', index=False) 

# Comparative Analysis



In [3]:
import ast

df_llm_perspectives = pd.read_csv(DATA_PATH+'habermas_machine_questions_with_LLM_generated_perspectives.csv')
df_questions = pd.read_csv(DATA_PATH+'habermas_machine_questions_with_responses.csv')
df_questions['own_opinion.text'] = df_questions['own_opinion.text'].apply(ast.literal_eval)
df_llm_perspectives['llm_perspectives'] = df_llm_perspectives['llm_perspectives'].apply(ast.literal_eval)

In [4]:
# Match the df_llm_perspectives and df_questions by question_id
df_llm_perspectives = df_llm_perspectives.merge(df_questions[['question_id', 'own_opinion.text']], on='question_id', how='left')
df_llm_perspectives.head()

Unnamed: 0,question.text,question_topic,question_id,llm_perspectives,own_opinion.text
0,Are people who hold high political office and ...,74,15,"[Yes, for Transparency and Accountability: Pub...",[One minute I think they should disclose and t...
1,Are the NHS and the UK welfare state fit for p...,51,19,[Advocates of the current system often argue t...,[The NHS and Welfare are not working how they ...
2,Are there any limits on what can be allowed to...,52,34,[Legal and Regulatory Perspective: Various cou...,[We have to keep in mind some kind of decency ...
3,Are there any questions that we should never a...,14,36,[Some argue that there are indeed questions wh...,[I cannot think of a topic which would fall in...
4,Are there limits to how much tax people should...,52,37,"[Yes, there should be limits: A perspective th...",[I think there are limits for poorer people to...


In [11]:
from openai import OpenAI
import json

client = OpenAI(api_key=os.getenv('OPENAI_API_KEY'))

def are_these_perspectives_the_same(perspective_a: str, perspective_b: str):
    """
    Determine if two perspectives are the same.
    """
    system_prompt = f"""You will be given two perspectives and you will determine if they are the same. Read carefully the two perspectives and answer yes if they are expressing the same broad perspective or opinion. Answer no otherwise. ONLY say a single word: 'yes' or 'no'."""

    prompt = f"""Perspective A: {perspective_a}
Perspective B: {perspective_b}
Are these the same perspective/opinion? Yes/no answer:"""

    chat_response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt}
        ],
        temperature=0,
        max_tokens=1
    )
    return 1 if chat_response.choices[0].message.content.strip().lower() == 'yes' else 0

In [9]:
same_perspective_matrix = [[0]*len(row['own_opinion.text'])]*len(row['llm_perspectives'])

In [10]:
same_perspective_matrix

[[0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0],
 [0, 0, 0, 0, 0]]

In [12]:
for index, row in df_llm_perspectives.iterrows():
    print(row['question.text'])
    print(row['own_opinion.text'])
    # We're gonna generate a pairwise comparison of the own opinion and each of the LLM perspectives.
    # The complexity of this is pretty bad given the size of the context windows and can be very slow (20s per question).
    same_perspective_matrix = [[0]*len(row['own_opinion.text'])]*len(row['llm_perspectives'])
    for i, perspective in enumerate(row['llm_perspectives']):
        for j, own_opinion in enumerate(row['own_opinion.text']):
            same_perspective_matrix[i][j] = are_these_perspectives_the_same(perspective, own_opinion)
    print(same_perspective_matrix)
    break

Are people who hold high political office and have a significant influence on public life ethically required to disclose details about their family wealth?
['One minute I think they should disclose and then the next I change my mind. Their personal wealth should not be a factor in their role however I could see that this could result in possible conflicts.', 'I think in this case it is not necessary for the individual in political office to disclose their income, i think this is essentially a private matter and does not really make a difference in their policies, i am not concerned about their wealth whether they are financially rich or poor, i am interested in what good and positive change they are making in their office, i think it is their personal life and no one else concern in overall conclusion.', 'I think people in any position should be able to have privacy.  I do think though depending on the job they have & any potential control they have on public finances or services would

In [25]:
row['llm_perspectives'][2]

'No, Right to Privacy: Public figures and their families have a right to privacy, and disclosing family wealth may be an unnecessary invasion of personal and family privacy.'

In [22]:
row['own_opinion.text'][4]

"I generally agree that this should happen and they ethically, should do so. Those in political office and high wealth, can influence both the public but also business and commercial activities, particularly if they have large stakes in businesses. I don't think it should be mandatory, however, being in such powerful positions the individual should morally and ethically feel obliged to disclose their tax affairs and family wealth. Naturally, individuals in powerful positions won't want to disclose their wealth as they feel it may distance themselves from the public, but being more open could only improve the relationship with the public."

### WIP Notes

It's really hard to actually determine if two perspectives are the same. We're going to need to try to interate on this.

In addition, using the LLM to do so is reasonable slow (but not too bad). Perhaps a clustering method would be better?