# GPT-5 Reasoning Analysis: Predicting Human Responses

This notebook examines GPT-5's reasoning when predicting how humans would respond in the false-belief task.

**Central Question**: Does GPT-5 actually "understand" the curse-of-knowledge bias, or is it just pattern matching?

## Structure

1. **Part 1**: Sample analysis of GPT-5-low reasoning (Alice/Beth, violin, first color set)
2. **Part 2**: Position-tracking analysis across conditions
3. **Part 3**: Reasoning effort comparison (Low vs Medium vs High)
4. **Part 4**: Standard curse-of-knowledge effect measures
5. **Key Findings**: Summary of all results

In [1]:
import pandas as pd
import json
import re
import textwrap

# Load data
df = pd.read_csv("model_responses_predict_human.csv")
print(f"Total rows: {len(df)}")

Total rows: 1350


In [2]:
# Filter to GPT-5-low, first name pair, first color set, violin
subset = df[
    (df['model'] == 'gpt-5-low') &
    (df['names'] == "('Alice', 'Beth')") &
    (df['colors'] == "('blue', 'purple', 'red', 'green')") &
    (df['stimulus'].str.contains('violin'))
].copy()

print(f"Filtered to {len(subset)} rows")
print(f"Conditions: {subset['condition'].unique()}")
print(f"Trials per condition: {subset.groupby('condition').size().to_dict()}")

Filtered to 15 rows
Conditions: ['Ignorance' 'Knowledge-plausible' 'Knowledge-implausible']
Trials per condition: {'Ignorance': 5, 'Knowledge-implausible': 5, 'Knowledge-plausible': 5}


In [3]:
# Parse percentages from results
def extract_percentages(result_str):
    try:
        match = re.search(r'<answer>\s*(\{.*?\})\s*</answer>', result_str, re.DOTALL)
        if match:
            return json.loads(match.group(1))
    except:
        pass
    return None

subset['percentages'] = subset['result'].apply(extract_percentages)
print("Percentages parsed successfully")

Percentages parsed successfully


## Condition 1: Ignorance

In this condition, the model is told the violin was "moved to another container" (no specific location).

**Expected human behavior**: High probability on blue (where Alice believes violin is)

In [4]:
ignorance = subset[subset['condition'] == 'Ignorance'].sort_values('trial')

print("="*100)
print("IGNORANCE CONDITION - GPT-5-low Reasoning")
print("="*100)

for idx, row in ignorance.iterrows():
    print(f"\n{'#'*100}")
    print(f"# TRIAL {row['trial']}")
    print(f"{'#'*100}")
    
    # Show percentages
    pct = row['percentages']
    if pct:
        print(f"\nPREDICTED DISTRIBUTION: Blue={pct.get('blue', '?')}%, Purple={pct.get('purple', '?')}%, Red={pct.get('red', '?')}%, Green={pct.get('green', '?')}%")
    
    # Show reasoning
    reasoning = row['reasoning']
    if pd.notna(reasoning) and reasoning.strip():
        print(f"\nREASONING:")
        print("-"*80)
        # Wrap text for readability
        wrapped = textwrap.fill(reasoning, width=100)
        print(wrapped)
    else:
        print("\n[No reasoning captured for this trial]")

IGNORANCE CONDITION - GPT-5-low Reasoning

####################################################################################################
# TRIAL 0
####################################################################################################

PREDICTED DISTRIBUTION: Blue=80%, Purple=4%, Red=12%, Green=4%

REASONING:
--------------------------------------------------------------------------------
**Analyzing false-belief task**  I need to infer how people predict behavior in a false-belief task
involving container movements. Initially, Alice places a violin in a blue container, but then Beth
moves it and swaps container positions, changing the layout. Average humans generally expect Alice
to search where she last left the violin, in the blue container. There's a subtlety here: some might
focus on the container's identity, while others may think Alice encoded its location. It's an
interesting cognitive challenge!

#############################################################

## Condition 2: Knowledge-Plausible

In this condition, the model is told the violin was moved to the **red container** (which now occupies blue's original position).

**Expected human behavior**: Still high on blue, but humans show curse-of-knowledge bias (inflated red)

In [5]:
knowledge_plausible = subset[subset['condition'] == 'Knowledge-plausible'].sort_values('trial')

print("="*100)
print("KNOWLEDGE-PLAUSIBLE CONDITION - GPT-5-low Reasoning")
print("="*100)

for idx, row in knowledge_plausible.iterrows():
    print(f"\n{'#'*100}")
    print(f"# TRIAL {row['trial']}")
    print(f"{'#'*100}")
    
    # Show percentages
    pct = row['percentages']
    if pct:
        print(f"\nPREDICTED DISTRIBUTION: Blue={pct.get('blue', '?')}%, Purple={pct.get('purple', '?')}%, Red={pct.get('red', '?')}%, Green={pct.get('green', '?')}%")
    
    # Show reasoning
    reasoning = row['reasoning']
    if pd.notna(reasoning) and reasoning.strip():
        print(f"\nREASONING:")
        print("-"*80)
        wrapped = textwrap.fill(reasoning, width=100)
        print(wrapped)
    else:
        print("\n[No reasoning captured for this trial]")

KNOWLEDGE-PLAUSIBLE CONDITION - GPT-5-low Reasoning

####################################################################################################
# TRIAL 0
####################################################################################################

PREDICTED DISTRIBUTION: Blue=85%, Purple=3%, Red=10%, Green=2%

REASONING:
--------------------------------------------------------------------------------
**Summarizing container identity responses**  I’ve found that when average participants are asked
about container identity, a significant majority choose blue. Based on studies, around 80-90% prefer
the original container. So, I’m thinking of setting something like blue at 85%, red at 10%, purple
at 3%, and green at 2%. I need to double-check that it adds up to 100%. Ultimately, I’ll present the
final choice with values like blue 85, red 10, purple 3, green 2 in a JSON format.

###############################################################################################

## Condition 3: Knowledge-Implausible

In this condition, the model is told the violin was moved to the **purple container** (not in blue's original position).

**Expected human behavior**: Similar to Ignorance (purple location doesn't create plausible alternative)

In [6]:
knowledge_implausible = subset[subset['condition'] == 'Knowledge-implausible'].sort_values('trial')

print("="*100)
print("KNOWLEDGE-IMPLAUSIBLE CONDITION - GPT-5-low Reasoning")
print("="*100)

for idx, row in knowledge_implausible.iterrows():
    print(f"\n{'#'*100}")
    print(f"# TRIAL {row['trial']}")
    print(f"{'#'*100}")
    
    # Show percentages
    pct = row['percentages']
    if pct:
        print(f"\nPREDICTED DISTRIBUTION: Blue={pct.get('blue', '?')}%, Purple={pct.get('purple', '?')}%, Red={pct.get('red', '?')}%, Green={pct.get('green', '?')}%")
    
    # Show reasoning
    reasoning = row['reasoning']
    if pd.notna(reasoning) and reasoning.strip():
        print(f"\nREASONING:")
        print("-"*80)
        wrapped = textwrap.fill(reasoning, width=100)
        print(wrapped)
    else:
        print("\n[No reasoning captured for this trial]")

KNOWLEDGE-IMPLAUSIBLE CONDITION - GPT-5-low Reasoning

####################################################################################################
# TRIAL 0
####################################################################################################

PREDICTED DISTRIBUTION: Blue=85%, Purple=8%, Red=6%, Green=1%

REASONING:
--------------------------------------------------------------------------------
**Analyzing false-belief task**  I'm thinking about a false-belief task related to human responses.
Alice places a violin in a blue container initially, while Beth moves it to a purple container and
swaps the positions. The new order ends up being red, green, purple, and blue from left to right.
The question is where Alice will look first. Most predict she’ll look in the original blue
container, as she’s unaware of the swap, but is there a twist? Some may confuse the location with
the container identity. The original location of blue is now occupied by red.

############

## Summary: What Does GPT-5 Think About Human Cognition?

In [7]:
# Compute averages per condition
print("="*100)
print("SUMMARY: GPT-5-low Predictions (Alice/Beth, violin, first color set)")
print("="*100)

for condition in ['Ignorance', 'Knowledge-plausible', 'Knowledge-implausible']:
    cond_data = subset[subset['condition'] == condition]
    
    blues = [p.get('blue', 0) for p in cond_data['percentages'] if p]
    reds = [p.get('red', 0) for p in cond_data['percentages'] if p]
    
    print(f"\n{condition}:")
    print(f"  Blue: {sum(blues)/len(blues):.1f}% (range: {min(blues)}-{max(blues)})")
    print(f"  Red:  {sum(reds)/len(reds):.1f}% (range: {min(reds)}-{max(reds)})")
    
    # Count how many trials have reasoning
    has_reasoning = cond_data['reasoning'].notna() & (cond_data['reasoning'] != '')
    print(f"  Trials with reasoning: {has_reasoning.sum()}/{len(cond_data)}")

SUMMARY: GPT-5-low Predictions (Alice/Beth, violin, first color set)

Ignorance:
  Blue: 83.4% (range: 80-90)
  Red:  8.4% (range: 3-12)
  Trials with reasoning: 4/5

Knowledge-plausible:
  Blue: 81.6% (range: 70-88)
  Red:  12.8% (range: 9-20)
  Trials with reasoning: 4/5

Knowledge-implausible:
  Blue: 70.4% (range: 25-85)
  Red:  20.4% (range: 6-65)
  Trials with reasoning: 3/5


## Key Questions to Answer from the Reasoning

1. **Does GPT-5 explicitly mention "curse of knowledge" or similar concepts?**
2. **Does it reason about Alice's perspective vs. the reader's perspective?**
3. **Does it mention that humans are imperfect at perspective-taking?**
4. **Does it explain WHY humans might assign probability to red (the true location)?**
5. **Is the reasoning consistent across trials, or does it vary?**

In [8]:
# Search for key phrases in reasoning
print("="*100)
print("KEYWORD ANALYSIS IN GPT-5 REASONING")
print("="*100)

keywords = [
    'curse of knowledge',
    'curse-of-knowledge',
    'bias',
    'egocentric',
    'perspective',
    'false belief',
    'theory of mind',
    'ToM',
    'cognitive',
    'limitation',
    'imperfect',
    'error',
    'mistake',
    'contamina',  # contamination, contaminated
    'leak',       # knowledge leak
    'intrusion',
    'Alice\'s perspective',
    'her perspective',
    'she believes',
    'she thinks',
    'doesn\'t know',
    'does not know',
    'unaware'
]

all_reasoning = subset['reasoning'].dropna().str.lower()

print(f"\nSearching {len(all_reasoning)} reasoning texts...\n")

for kw in keywords:
    count = all_reasoning.str.contains(kw.lower(), regex=False).sum()
    if count > 0:
        print(f"  '{kw}': found in {count} reasoning(s)")

KEYWORD ANALYSIS IN GPT-5 REASONING

Searching 11 reasoning texts...

  'false belief': found in 1 reasoning(s)
  'cognitive': found in 1 reasoning(s)
  'she believes': found in 1 reasoning(s)
  'unaware': found in 1 reasoning(s)


## Part 2: Position-Tracking Analysis Across Conditions

The "location vs identity" framing appears in GPT-5's reasoning. Does it affect predictions differently across conditions?

In [9]:
# Setup: Understanding the position-tracking issue
print("="*100)
print("POSITION-TRACKING: Why It Matters Differently Across Conditions")
print("="*100)

print("""
AFTER THE SWAP, container positions become:
  Set 1: (red, green, purple, blue) - RED is now leftmost
  Set 2: (purple, blue, red, green) - PURPLE is now leftmost

KEY INSIGHT:
  In K-Plausible:
    - True location = red (set 1) / purple (set 2)
    - Leftmost after swap = red (set 1) / purple (set 2)
    - TRUE LOCATION = LEFTMOST POSITION (they converge!)
    
  In K-Implausible:
    - True location = purple (set 1) / red (set 2)  
    - Leftmost after swap = red (set 1) / purple (set 2)
    - TRUE LOCATION ≠ LEFTMOST POSITION (they conflict!)

This means:
  - In K-P: "Position-tracking" and "reality-tracking" point to the SAME container
  - In K-I: "Position-tracking" and "reality-tracking" point to DIFFERENT containers
""")

POSITION-TRACKING: Why It Matters Differently Across Conditions

AFTER THE SWAP, container positions become:
  Set 1: (red, green, purple, blue) - RED is now leftmost
  Set 2: (purple, blue, red, green) - PURPLE is now leftmost

KEY INSIGHT:
  In K-Plausible:
    - True location = red (set 1) / purple (set 2)
    - Leftmost after swap = red (set 1) / purple (set 2)
    - TRUE LOCATION = LEFTMOST POSITION (they converge!)
    
  In K-Implausible:
    - True location = purple (set 1) / red (set 2)  
    - Leftmost after swap = red (set 1) / purple (set 2)
    - TRUE LOCATION ≠ LEFTMOST POSITION (they conflict!)

This means:
  - In K-P: "Position-tracking" and "reality-tracking" point to the SAME container
  - In K-I: "Position-tracking" and "reality-tracking" point to DIFFERENT containers



In [10]:
# Quantitative comparison: K-P vs K-I stability
print("="*100)
print("COMPARING PREDICTION STABILITY: K-Plausible vs K-Implausible")
print("="*100)

# Load and prepare full data
df_full = pd.read_csv("model_responses_predict_human.csv")
gpt5_low = df_full[df_full['model'] == 'gpt-5-low'].copy()
gpt5_low['pct'] = gpt5_low['result'].apply(extract_percentages)

# Color set definitions
color_set_1 = "('blue', 'purple', 'red', 'green')"
color_set_2 = "('green', 'red', 'purple', 'blue')"

# Apply mapping to get belief container %
gpt5_low['belief_pct'] = gpt5_low.apply(
    lambda row: row['pct'].get('blue', 0) if row['colors'] == color_set_1 
                else row['pct'].get('green', 0) if row['pct'] else 0, 
    axis=1
)

# Analyze each condition
results = []
for cond in ['Knowledge-plausible', 'Knowledge-implausible']:
    cond_data = gpt5_low[gpt5_low['condition'] == cond]
    outliers = cond_data[cond_data['belief_pct'] < 50]
    
    results.append({
        'Condition': cond,
        'Mean': f"{cond_data['belief_pct'].mean():.1f}%",
        'Std Dev': f"{cond_data['belief_pct'].std():.1f}%",
        'Range': f"{cond_data['belief_pct'].min():.0f}-{cond_data['belief_pct'].max():.0f}%",
        'Outliers (<50%)': f"{len(outliers)} ({100*len(outliers)/len(cond_data):.1f}%)"
    })

results_df = pd.DataFrame(results)
print("\nBelief Container Predictions (after color mapping):\n")
print(results_df.to_string(index=False))

print("\n" + "="*100)
print("OUTLIER ANALYSIS: Cases where belief container < 50%")
print("="*100)

for cond in ['Knowledge-plausible', 'Knowledge-implausible']:
    cond_data = gpt5_low[gpt5_low['condition'] == cond]
    outliers = cond_data[cond_data['belief_pct'] < 50]
    
    print(f"\n{cond}: {len(outliers)} outliers")
    
    if len(outliers) > 0:
        for idx, row in outliers.iterrows():
            pct = row['pct']
            max_container = max(pct, key=pct.get)
            print(f"  - Trial {row['trial']}: {max_container}={pct[max_container]}% (belief container={row['belief_pct']:.0f}%)")

COMPARING PREDICTION STABILITY: K-Plausible vs K-Implausible

Belief Container Predictions (after color mapping):

            Condition  Mean Std Dev   Range Outliers (<50%)
  Knowledge-plausible 80.9%    8.9% 20-100%        1 (1.1%)
Knowledge-implausible 76.3%   14.9%  17-92%        5 (5.6%)

OUTLIER ANALYSIS: Cases where belief container < 50%

Knowledge-plausible: 1 outliers
  - Trial 3: red=70% (belief container=20%)

Knowledge-implausible: 5 outliers
  - Trial 3: red=65% (belief container=25%)
  - Trial 1: red=65% (belief container=25%)
  - Trial 4: red=68% (belief container=17%)
  - Trial 1: red=70% (belief container=20%)
  - Trial 2: purple=72% (belief container=18%)


In [11]:
# Full K-P framing analysis
print("="*100)
print("FRAMING ANALYSIS: Knowledge-Plausible (Full Dataset)")
print("="*100)

kp_full = gpt5_low[(gpt5_low['condition'] == 'Knowledge-plausible') & 
                   (gpt5_low['reasoning'].notna())].copy()

print(f"Total K-P trials with reasoning: {len(kp_full)}")

# Define phrase categories
location_phrases = ['track location', 'tracking location', 'location instead of color', 
                   'position instead', 'location-based', 'original position', 'leftmost',
                   'original location', 'by location', 'spatial location']
knowledge_phrases = ['they know', 'because they know', 'reality bias', 'reality reasoning',
                    'leak', 'where it actually is', 'actually in red', 'true location',
                    'know that', 'know the', 'know where']

location_count = 0
knowledge_count = 0

for idx, row in kp_full.iterrows():
    reasoning = row['reasoning'].lower()
    
    if any(phrase.lower() in reasoning for phrase in location_phrases):
        location_count += 1
    if any(phrase.lower() in reasoning for phrase in knowledge_phrases):
        knowledge_count += 1

print(f"\nFraming Analysis:")
print(f"  Location-confusion framing (Alice confuses position/color): {location_count} ({100*location_count/len(kp_full):.1f}%)")
print(f"  Participant-knowledge framing (participants know reality): {knowledge_count} ({100*knowledge_count/len(kp_full):.1f}%)")
print(f"\n  Ratio: {location_count}:{knowledge_count} (location:knowledge)")

# Examples of correct framing
print("\n" + "="*100)
print("EXAMPLES: Participant-Knowledge Framing (correct understanding)")
print("="*100)

examples_shown = 0
for idx, row in kp_full.iterrows():
    reasoning = row['reasoning']
    reasoning_lower = reasoning.lower()
    
    for phrase in knowledge_phrases:
        if phrase.lower() in reasoning_lower and examples_shown < 4:
            print(f"\n--- {row['names']} Trial {row['trial']} (phrase: '{phrase}') ---")
            print(reasoning[:400] + "..." if len(reasoning) > 400 else reasoning)
            examples_shown += 1
            break

FRAMING ANALYSIS: Knowledge-Plausible (Full Dataset)
Total K-P trials with reasoning: 77

Framing Analysis:
  Location-confusion framing (Alice confuses position/color): 32 (41.6%)
  Participant-knowledge framing (participants know reality): 7 (9.1%)

  Ratio: 32:7 (location:knowledge)

EXAMPLES: Participant-Knowledge Framing (correct understanding)

--- ('Denise', 'Vicky') Trial 4 (phrase: 'they know') ---
**Distributing color preferences**

I'm thinking about how people might respond to color choices. Some could get confused and choose the leftmost red, while others might pick red because they know the violin was moved. For the distribution, I’m considering about 70% for blue, 20% for red, and a small 5% each for purple and green. I want to ensure that these numbers add up to 100%. Now I'll format ...

--- ('Denise', 'Vicky') Trial 1 (phrase: 'true location') ---
**Finalizing color distribution**

I see that the "original location" belief implies that the leftmost position is signifi

## Part 3: Reasoning Effort Comparison (Low vs Medium vs High)

In [12]:
# Reasoning effort gradient comparison
print("="*100)
print("REASONING EFFORT GRADIENT: Low → Medium → High")
print("="*100)

df_full = pd.read_csv("model_responses_predict_human.csv")

color_set_1 = "('blue', 'purple', 'red', 'green')"

results = []
for model in ['gpt-5-low', 'gpt-5-medium', 'gpt-5-high']:
    subset = df_full[df_full['model'] == model].copy()
    subset['pct'] = subset['result'].apply(extract_percentages)
    subset['belief_pct'] = subset.apply(
        lambda row: row['pct'].get('blue', 0) if row['colors'] == color_set_1 
                    else row['pct'].get('green', 0) if row['pct'] else 0, 
        axis=1
    )
    
    for cond in ['Knowledge-plausible', 'Knowledge-implausible']:
        cond_data = subset[subset['condition'] == cond]
        outliers = len(cond_data[cond_data['belief_pct'] < 50])
        results.append({
            'Model': model,
            'Condition': cond.replace('Knowledge-', 'K-'),
            'Mean': cond_data['belief_pct'].mean(),
            'Std': cond_data['belief_pct'].std(),
            'Min': cond_data['belief_pct'].min(),
            'Outliers': outliers
        })

print(f"\n{'Model':<15} {'Condition':<15} {'Mean':>8} {'Std':>8} {'Min':>8} {'Outliers':>10}")
print("-"*70)
for r in results:
    print(f"{r['Model']:<15} {r['Condition']:<15} {r['Mean']:>7.1f}% {r['Std']:>7.1f}% {r['Min']:>7.0f}% {r['Outliers']:>10}")

REASONING EFFORT GRADIENT: Low → Medium → High

Model           Condition           Mean      Std      Min   Outliers
----------------------------------------------------------------------
gpt-5-low       K-plausible        80.9%     8.9%      20%          1
gpt-5-low       K-implausible      76.3%    14.9%      17%          5
gpt-5-medium    K-plausible        81.5%     6.7%      60%          0
gpt-5-medium    K-implausible      79.9%     8.8%      20%          1
gpt-5-high      K-plausible        82.2%     5.9%      70%          0
gpt-5-high      K-implausible      83.0%     5.0%      65%          0


In [13]:
# Compare framing across reasoning levels - UPDATED with research citations
print("="*100)
print("FRAMING COMPARISON ACROSS REASONING LEVELS")
print("="*100)

# Expanded phrase lists
cok_phrases = ['reality bias', 'reality reasoning', 'they know', 'because they know', 
               'know where', 'know that', 'know the', 'where it actually',
               'actually in', 'true location', 'actual location', 'really is',
               'know it was moved', 'know the violin', 'aware of']

position_phrases = ['track location', 'tracking location', 'location instead', 
                   'position instead', 'leftmost', 'original position', 'original location',
                   'by location', 'spatial']

research_phrases = ['research', 'studies', 'study', 'literature', 'classic', 
                    'sally-anne', 'wimmer', 'false-belief task', 'false belief task']

for model in ['gpt-5-low', 'gpt-5-medium', 'gpt-5-high']:
    kp = df_full[(df_full['model'] == model) & 
                 (df_full['condition'] == 'Knowledge-plausible') & 
                 (df_full['reasoning'].notna())]
    
    cok_count = sum(1 for _, r in kp.iterrows() 
                    if any(p in r['reasoning'].lower() for p in cok_phrases))
    pos_count = sum(1 for _, r in kp.iterrows() 
                    if any(p in r['reasoning'].lower() for p in position_phrases))
    res_count = sum(1 for _, r in kp.iterrows() 
                    if any(p in r['reasoning'].lower() for p in research_phrases))
    
    print(f"\n{model} (n={len(kp)}):")
    print(f"  Reality-bias framing (curse-of-knowledge): {cok_count} ({100*cok_count/len(kp):.1f}%)")
    print(f"  Position-tracking framing: {pos_count} ({100*pos_count/len(kp):.1f}%)")
    print(f"  Research/study citations: {res_count} ({100*res_count/len(kp):.1f}%)")

print("\n" + "="*100)
print("EXAMPLES: Research Citations (80-90% from literature)")
print("="*100)

# Show examples of research citations
examples_shown = 0
for model in ['gpt-5-low', 'gpt-5-medium', 'gpt-5-high']:
    data = df_full[(df_full['model'] == model) & (df_full['reasoning'].notna())]
    for idx, row in data.iterrows():
        reasoning = row['reasoning']
        if ('80' in reasoning or '90' in reasoning) and any(p in reasoning.lower() for p in research_phrases):
            if examples_shown < 3:
                print(f"\n--- {model} | {row['condition']} ---")
                print(reasoning[:350] + "..." if len(reasoning) > 350 else reasoning)
                examples_shown += 1
                break

FRAMING COMPARISON ACROSS REASONING LEVELS

gpt-5-low (n=77):
  Reality-bias framing (curse-of-knowledge): 16 (20.8%)
  Position-tracking framing: 28 (36.4%)
  Research/study citations: 35 (45.5%)

gpt-5-medium (n=89):
  Reality-bias framing (curse-of-knowledge): 13 (14.6%)
  Position-tracking framing: 28 (31.5%)
  Research/study citations: 55 (61.8%)

gpt-5-high (n=90):
  Reality-bias framing (curse-of-knowledge): 13 (14.4%)
  Position-tracking framing: 27 (30.0%)
  Research/study citations: 52 (57.8%)

EXAMPLES: Research Citations (80-90% from literature)

--- gpt-5-low | Ignorance ---
**Analyzing color positions**

I'm figuring out how the original blue position changes with a new order: red, green, purple, blue. After swapping, the sequence is now different. It seems like participants might still pick blue 80-90% of the time, but I also need to account for others who might be uncertain. Research suggests adults choose the origi...

--- gpt-5-medium | Ignorance ---
**Choosing distri

In [14]:
# Sample GPT-5-high reasoning that mentions position but doesn't cause outliers
print("="*100)
print("GPT-5-HIGH: How it handles position-tracking without creating outliers")
print("="*100)

ki_high = df_full[(df_full['model'] == 'gpt-5-high') & 
                  (df_full['condition'] == 'Knowledge-implausible') & 
                  (df_full['reasoning'].notna())]

# Find examples that mention position-tracking
examples_shown = 0
for idx, row in ki_high.iterrows():
    reasoning = row['reasoning'].lower()
    if any(p in reasoning for p in location_phrases) and examples_shown < 2:
        pct = extract_percentages(row['result'])
        belief = pct.get('blue', 0) if row['colors'] == color_set_1 else pct.get('green', 0)
        
        print(f"\n--- {row['names']} Trial {row['trial']} (Belief container = {belief}%) ---")
        print(row['reasoning'])
        examples_shown += 1

print("\n" + "="*100)
print("KEY DIFFERENCE: GPT-5-high mentions position-tracking as a MINORITY consideration,")
print("               not as a dominant factor. It still assigns 70-85% to belief container.")
print("="*100)

GPT-5-HIGH: How it handles position-tracking without creating outliers

--- ('Alice', 'Beth') Trial 4 (Belief container = 85%) ---
**Analyzing container choices**

I’m thinking about a scenario where most participants might choose blue—let’s say 80-90%. However, if positions are swapped, some might revert to the original location instead of sticking with the container colors. The original blue container could now be red due to the change in order. So, I need to deduce what that original order looked like and how the swaps impact choices. It gets complicated since participants may not know the earlier arrangements, especially with Alice and Beth’s comments.

--- ('Alice', 'Beth') Trial 0 (Belief container = 85%) ---
**Evaluating container identity tracking**

I’m considering how adults typically track container identity as 'object' rather than by location. If someone says "in the blue container," people might assume they’ll look there. I think I’d assign a majority percentage to blue, m

## Part 4: Effect Analysis (Standard Curse-of-Knowledge Measures)

In [15]:
# Effect calculation using standard curse-of-knowledge measures
print("="*100)
print("STANDARD CURSE-OF-KNOWLEDGE EFFECT MEASURES")
print("="*100)

print("""
The literature (Ryskin 2014) measures curse-of-knowledge by looking at the 
RED container (true location in K-P):

  Knowledge Effect = Red_KP - Red_Ignorance
  Plausibility Effect = Red_KP - Red_KI

If plausibility effect is positive: plausible knowledge biases more than implausible
""")

color_set_1 = "('blue', 'purple', 'red', 'green')"
color_set_2 = "('green', 'red', 'purple', 'blue')"

results = []
for model in ['gpt-5-low', 'gpt-5-medium', 'gpt-5-high']:
    subset = df_full[df_full['model'] == model].copy()
    subset['pct'] = subset['result'].apply(extract_percentages)
    
    # Apply mapping for red (K-P true location)
    subset['red_mapped'] = subset.apply(
        lambda row: row['pct'].get('red', 0) if row['colors'] == color_set_1 
                    else row['pct'].get('purple', 0) if row['pct'] else 0, axis=1)
    
    ign_red = subset[subset['condition'] == 'Ignorance']['red_mapped'].mean()
    kp_red = subset[subset['condition'] == 'Knowledge-plausible']['red_mapped'].mean()
    ki_red = subset[subset['condition'] == 'Knowledge-implausible']['red_mapped'].mean()
    
    results.append({
        'Model': model,
        'Ign Red': ign_red,
        'KP Red': kp_red,
        'KI Red': ki_red,
        'Knowledge Effect': kp_red - ign_red,
        'Plausibility Effect': kp_red - ki_red
    })

print(f"\n{'Model':<15} {'Ign Red':>8} {'KP Red':>8} {'KI Red':>8} {'Know Eff':>10} {'Plaus Eff':>10}")
print("-"*70)
for r in results:
    print(f"{r['Model']:<15} {r['Ign Red']:>7.1f}% {r['KP Red']:>7.1f}% {r['KI Red']:>7.1f}% {r['Knowledge Effect']:>+9.1f}% {r['Plausibility Effect']:>+9.1f}%")

STANDARD CURSE-OF-KNOWLEDGE EFFECT MEASURES

The literature (Ryskin 2014) measures curse-of-knowledge by looking at the 
RED container (true location in K-P):

  Knowledge Effect = Red_KP - Red_Ignorance
  Plausibility Effect = Red_KP - Red_KI

If plausibility effect is positive: plausible knowledge biases more than implausible


Model            Ign Red   KP Red   KI Red   Know Eff  Plaus Eff
----------------------------------------------------------------------
gpt-5-low           9.0%    12.9%    12.1%      +4.0%      +0.9%
gpt-5-medium       11.8%    13.2%     8.2%      +1.4%      +5.0%
gpt-5-high          8.8%    12.5%     5.7%      +3.7%      +6.8%


In [16]:
# Why does GPT-5-low show almost no plausibility effect?
print("="*100)
print("WHY POSITION-TRACKING ERASES THE PLAUSIBILITY EFFECT")
print("="*100)

print("""
Expected pattern (curse-of-knowledge theory):
  - K-P should show MORE bias to red (plausible location)
  - K-I should show LESS bias to red (implausible location)
  - Plausibility Effect should be POSITIVE

What we see:
  - GPT-5-high: Plausibility Effect = +6.8% ✓ (correct pattern)
  - GPT-5-low:  Plausibility Effect = +0.8% ✗ (almost none)

Why GPT-5-low fails:
  - Position-tracking outliers in K-I give 65-70% to RED (leftmost position)
  - This inflates K-I red from ~6% to 12%
  - The plausibility modulation is erased by this bug

In other words:
  - GPT-5-low's K-I red is HIGH for the WRONG reason (position, not knowledge)
  - This makes K-I red ≈ K-P red, eliminating the plausibility effect
""")

WHY POSITION-TRACKING ERASES THE PLAUSIBILITY EFFECT

Expected pattern (curse-of-knowledge theory):
  - K-P should show MORE bias to red (plausible location)
  - K-I should show LESS bias to red (implausible location)
  - Plausibility Effect should be POSITIVE

What we see:
  - GPT-5-high: Plausibility Effect = +6.8% ✓ (correct pattern)
  - GPT-5-low:  Plausibility Effect = +0.8% ✗ (almost none)

Why GPT-5-low fails:
  - Position-tracking outliers in K-I give 65-70% to RED (leftmost position)
  - This inflates K-I red from ~6% to 12%
  - The plausibility modulation is erased by this bug

In other words:
  - GPT-5-low's K-I red is HIGH for the WRONG reason (position, not knowledge)
  - This makes K-I red ≈ K-P red, eliminating the plausibility effect



## Key Findings

### 1. GPT-5 Shows Partial Understanding of Curse-of-Knowledge

GPT-5 demonstrates awareness of the curse-of-knowledge mechanism through "reality bias" reasoning:

| Framing Type | GPT-5-low | GPT-5-medium | GPT-5-high |
|--------------|-----------|--------------|------------|
| Reality-bias ("they know where it is", "reality bias") | 20.8% | 14.6% | 14.4% |
| Position-tracking ("Alice might track location") | 36.4% | 31.5% | 30.0% |
| Research citations ("studies show 80-90%") | 45.5% | 61.8% | 57.8% |

**Key insight:** GPT-5 DOES recognize that participants' knowledge of reality can bias their predictions (reality-bias framing ~15-21%). This IS the curse-of-knowledge mechanism. However, it never uses the term "curse of knowledge" explicitly (0 mentions).

---

### 2. Heavy Reliance on Memorized Research

GPT-5 frequently cites false-belief research, stating that "studies show 80-90% choose the original container":

> *"Research suggests adults choose the original container correctly about 80-90%"*
>
> *"Classic tasks show high accuracy for initial container beliefs"*
>
> *"Previous studies, like the Sally-Anne task, show similar results"*

**The problem:** Those studies typically use 2-container setups (Sally-Anne), not 4-container layouts like our experiment. GPT-5 may be overfitting/memorizing from literature rather than reasoning from first principles about this specific task.

---

### 3. Competing Framings for Why Red Gets Probability

GPT-5 offers two different explanations for why participants might choose red:

1. **Reality-bias (curse of knowledge):** "Some might pick red because they know the violin was moved" ✓
2. **Position-tracking (Alice's confusion):** "Alice might track location instead of color" ✗

The position-tracking explanation is technically wrong—it attributes the bias to Alice's potential confusion, not to participants' knowledge contamination. But both framings lead to assigning some probability to red.

---

### 4. Reasoning Effort Gradient

Higher reasoning effort = more stable predictions:

| Model | K-I Std | K-I Min | K-I Outliers |
|-------|---------|---------|--------------|
| GPT-5-low | 14.9% | 17% | 5 (5.6%) |
| GPT-5-medium | 8.8% | 20% | 1 (1.1%) |
| GPT-5-high | 5.0% | 65% | 0 (0.0%) |

All three use similar framing distributions (~30% position-tracking), but GPT-5-high treats alternative explanations as minority considerations rather than fully committing to them.

---

### 5. Position-Tracking Creates Variance Issues

Position-tracking reasoning creates outliers primarily in K-Implausible because position and reality CONFLICT:

| Condition | Position = Reality? | Outliers (GPT-5-low) |
|-----------|---------------------|----------------------|
| K-Plausible | YES (converge) | 1 (1.1%) |
| K-Implausible | NO (conflict) | 5 (5.6%) |

When GPT-5-low fully commits to position-tracking in K-I, it gives 65-70% to RED (leftmost after swap), which inflates K-I red and erases the plausibility effect.

---

### 6. Standard Curse-of-Knowledge Effects

Using Ryskin 2014 measures:

| Model | Knowledge Effect | Plausibility Effect |
|-------|------------------|---------------------|
| GPT-5-low | +4.0% | **+0.9%** |
| GPT-5-medium | +1.4% | **+5.0%** |
| GPT-5-high | +3.7% | **+6.8%** |

GPT-5-low shows almost no plausibility effect because position-tracking outliers inflate K-I red. GPT-5-high shows the expected pattern.

---

### 7. Summary

1. **GPT-5 partially understands curse-of-knowledge** through "reality bias" framing (~15-21% of trials), recognizing that participants' knowledge of reality can bias predictions

2. **Heavy reliance on memorized research** (~45-62% cite studies) — GPT-5 anchors on "80-90% choose original container" from the literature, but those studies use simpler 2-container setups

3. **Competing framings coexist:** Reality-bias (correct mechanism) and position-tracking (wrong attribution) both appear in reasoning, sometimes in the same trial

4. **Higher reasoning effort improves stability** without changing conceptual framing — GPT-5-high hedges better, treating alternative explanations as minority factors

5. **Position-tracking variance** is a secondary issue that primarily affects GPT-5-low in K-Implausible, erasing the plausibility effect through inflated K-I red values