# Logistic regression

First, we load the result file containing the experimental outputs.

The dataset should include the following columns:

| Column Name | Description |
|--------------|-------------|
| **PersonaID** | Unique identifier of each persona. |
| **BeliefClimateExists** | Persona’s prior belief (1–5 scale) on whether climate change exists. |
| **ClaimID** | Identifier of the presented claim. |
| **ClaimStanceLabel** | Whether the claim *supports* or *refutes* the existence of climate change. |
| **EvidencesVerdict** | Strength of evidence behind the claim (`SUPPORTS`, `REFUTES`, or `NOT_ENOUGH_INFO`). |
| **Evidence** | Text of the evidence used for the verdict. |
| **ClaimEntropy** | Information entropy of the claim (uncertainty measure). |
| **ModelDecisionOfClaim** | Model’s final decision on the claim (`Accept`, `Refute`, `Neutral`). |
| **ModelDecisionOfClaim_Reason** | Model’s reasoning behind its decision. |
| **ModelBeliefClimateExists** | Updated belief (1–5 scale) after reading the claim. |
| **ModelBeliefClimateExists_Reason** | Model’s reasoning behind the updated belief. |

In [None]:
import pandas as pd

# Change the file name here
df = pd.read_csv("data/prompts_level_3_with_model_outputs.csv")

# Filter out invalid rows
mask = (
    (df["ModelDecisionOfClaim_Reason"].notna() & ~df["ModelDecisionOfClaim_Reason"].str.contains("model error", case=False, na=False)) |
    (df["ModelBeliefClimateExists_Reason"].notna() & ~df["ModelBeliefClimateExists_Reason"].str.contains("model error", case=False, na=False))
)
df = df[mask]

# Map the textual responses to numbers
df["BeliefClimateExists_num"] = df["BeliefClimateExists"].map({
    "Strongly disagree": -2, 
    "Strongly Disagree": -2,
    "Slightly disagree": -1,
    "Slightly Disagree": -1,
    "Neutral": 0, 
    "Slightly agree": 1,
    "Slightly Agree": 1,
    "Strongly agree": 2,
    "Strongly Agree": 2
})
df["ClaimStanceLabel_num"] = df["ClaimStanceLabel"].map({"REFUTES": -1, "SUPPORTS": 1})

Alignment captures whether the user’s prior belief matches the claim stance.

In [None]:
# Alignment in {+1/+2, 0, -1/-2}
# +1/+2 if belief sign and claim sign match (both >0 or both <0), -1/-2 otherwise.
# Neutral (belief==0) is assigned 0 for alignment
belief_sign = df["BeliefClimateExists_num"].apply(lambda x: 1 if x > 0 else (-1 if x < 0 else 0))
claim_sign  = df["ClaimStanceLabel_num"]  # already ±1
df["Alignment"] = belief_sign * claim_sign

Simplify the model’s decision to a binary outcome (accept = 1, refute = 0). You can treat “Neutral” as missing or as 0.5 if you want to retain it later.

In [None]:
df["Decision_binary"] = df["ModelDecisionOfClaim"].map({"Accept": 1, "Refute": 0})

Then comes the logistic regression.

We model whether the model/persona **accepts** a displayed claim as a function of  
the alignment between its prior belief and the claim stance, and the strength of evidence.
Instead of a single interaction term, we explicitly include four combinations of
alignment (\( \text{Alignment}^+ \), \( \text{Alignment}^- \)) and evidence polarity
(\( \text{EvidenceStrength}^+ \), \( \text{EvidenceStrength}^- \)):

$$
\text{logit}\big(P(\text{Accept})\big)
= \beta_0
+ \beta_1 \text{Alignment}
+ \beta_2 \text{EvidenceStrength}
+ \beta_{3} I(\text{Alignment}^+, \text{EvidenceStrength}^+)
+ \beta_{4} I(\text{Alignment}^-, \text{EvidenceStrength}^+)
+ \beta_{5} I(\text{Alignment}^+, \text{EvidenceStrength}^-)
+ \beta_{6} I(\text{Alignment}^-, \text{EvidenceStrength}^-)
$$

**Where**

- **$Accept$**: Binary outcome (1 = Accept, 0 = Refute)  
- **$\text{Alignment}^+$ / $\text{Alignment}^-$**: Whether the claim aligns or conflicts with the model/persona’s prior belief  
- **$\text{EvidenceStrength}^+$ / $\text{EvidenceStrength}^-$**: Whether the evidence is strong or weak  
- **$I(\cdot,\cdot)$**: Indicator variables for each specific alignment–evidence combination  
- **$\beta_1,\beta_2$**: Main effects of alignment and evidence polarity  
- **$\beta_{3}$–$\beta_{6}$**: Cell-specific adjustments capturing deviations from the main effects  

**Interpretation**

- **$\beta_{3}$ > 0** → Aligned claims with *strong evidence* are especially likely to be accepted  
  (*strong confirmation bias under favorable evidence*)  
- **$\beta_{4}$ > 0** → Misaligned claims with *strong evidence* still tend to be accepted  
  (*Will the model be swayed by strong evidence*)  
- **$\beta_{5}$ < 0** → Aligned claims with *weak evidence* are less likely to be accepted
- **$\beta_{6}$ < 0** → Misaligned claims with *weak evidence* are readily rejected  
  (*reinforced disconfirmation*)  

This formulation allows us to examine how confirmation bias manifests differently
across favorable and unfavorable evidence contexts rather than relying on a single
interaction term.

In [None]:
import statsmodels.formula.api as smf

# Evidence strength in {-1,+1}, -1 means weak evidence, 1 means strong evidence
# Both refute and not enough info are considered weak evidence
try:
    df["EvidenceStrength_num"] = df["EvidencesVerdict"].map({
        "REFUTES": -1, "NOT_ENOUGH_INFO": -1, "SUPPORTS": 1
    })
except:
    df["ClaimID"] = df["ClaimID"].astype(str)
    evidence_df = pd.read_json("data/claims_EL3.json")
    evidence_df["claim_id"] = evidence_df["claim_id"].astype(str)
    df = df.merge(
        evidence_df[["claim_id", "claim_label"]],
        left_on="ClaimID",
        right_on="claim_id",
        how="left"
    )
    df["EvidenceStrength_num"] = df["claim_label"].map({
        "REFUTES": -1, "NOT_ENOUGH_INFO": -1, "SUPPORTS": 1
    })

# Four interaction variables
df["I_A1_E1"]  = ((df["Alignment"] ==  1) & (df["EvidenceStrength_num"] ==  1)).astype(int)
df["I_A_1_E1"] = ((df["Alignment"] == -1) & (df["EvidenceStrength_num"] ==  1)).astype(int)
df["I_A1_E_1"] = ((df["Alignment"] ==  1) & (df["EvidenceStrength_num"] == -1)).astype(int)
df["I_A_1_E_1"]= ((df["Alignment"] == -1) & (df["EvidenceStrength_num"] == -1)).astype(int)

In [None]:
# Fit logistic regression
use = df.dropna(subset=["Decision_binary", "Alignment", "EvidenceStrength_num"])
model = smf.logit(
    "Decision_binary ~ I_A1_E1 + I_A_1_E1 + I_A1_E_1 + I_A_1_E_1",
    data=use
).fit()
print(model.summary())

Optimization terminated successfully.
         Current function value: 0.618054
         Iterations 5
                           Logit Regression Results                           
Dep. Variable:        Decision_binary   No. Observations:                31504
Model:                          Logit   Df Residuals:                    31499
Method:                           MLE   Df Model:                            4
Date:                Thu, 30 Oct 2025   Pseudo R-squ.:                  0.1027
Time:                        22:43:20   Log-Likelihood:                -19471.
converged:                       True   LL-Null:                       -21700.
Covariance Type:            nonrobust   LLR p-value:                     0.000
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.6237      0.027     22.786      0.000       0.570       0.677
I_A1_E1        0.9523      0.

Proportion inspection.

In [None]:
use.groupby(["Alignment", "EvidenceStrength_num"])["Decision_binary"].mean()

Alignment  EvidenceStrength_num
-1         -1                      0.341779
            1                      0.736359
 0         -1                      0.497667
            1                      0.901434
 1         -1                      0.421397
            1                      0.828646
Name: Decision_binary, dtype: float64