# Logistic regression

First, we load the result file containing the experimental outputs.

The dataset should include the following columns:

| Column Name | Description |
|--------------|-------------|
| **PersonaID** | Unique identifier of each persona. |
| **BeliefClimateExists** | Persona’s prior belief (1–5 scale) on whether climate change exists. |
| **ClaimID** | Identifier of the presented claim. |
| **ClaimStanceLabel** | Whether the claim *supports* or *refutes* the existence of climate change. |
| **EvidencesVerdict** | Strength of evidence behind the claim (`SUPPORTS`, `REFUTES`, or `NOT_ENOUGH_INFO`). |
| **Evidence** | Text of the evidence used for the verdict. |
| **ClaimEntropy** | Information entropy of the claim (uncertainty measure). |
| **ModelDecisionOfClaim** | Model’s final decision on the claim (`Accept`, `Refute`, `Neutral`). |
| **ModelDecisionOfClaim_Reason** | Model’s reasoning behind its decision. |
| **ModelBeliefClimateExists** | Updated belief (1–5 scale) after reading the claim. |
| **ModelBeliefClimateExists_Reason** | Model’s reasoning behind the updated belief. |

In [43]:
import pandas as pd

# Change the file name here
df = pd.read_csv("data/result_level_4.csv")

# Filter out invalid rows
mask = (
    (df["ModelDecisionOfClaim_Reason"].notna() & ~df["ModelDecisionOfClaim_Reason"].str.contains("model error", case=False, na=False)) |
    (df["ModelBeliefClimateExists_Reason"].notna() & ~df["ModelBeliefClimateExists_Reason"].str.contains("model error", case=False, na=False))
)

df = df[mask]

# Map the textual responses to numbers
df["BeliefClimateExists_num"] = df["BeliefClimateExists"].map({
    "Strongly disagree": -2, 
    "Strongly Disagree": -2,
    "Slightly disagree": -1,
    "Slightly Disagree": -1,
    "Neutral": 0, 
    "Slightly agree": 1,
    "Slightly Agree": 1,
    "Strongly agree": 2,
    "Strongly Agree": 2
})
df["ClaimStanceLabel_num"] = df["ClaimStanceLabel"].map({"REFUTES": 0, "SUPPORTS": 1})

Alignment captures whether the user’s prior belief matches the claim stance.

In [44]:
# Alignment: 1 = belief and claim point in same direction; 0 = misaligned
df["Alignment"] = (
    ((df["BeliefClimateExists_num"] > 0) & (df["ClaimStanceLabel_num"] == 1)) |
    ((df["BeliefClimateExists_num"] < 0) & (df["ClaimStanceLabel_num"] == 0))
).astype(int)

Simplify the model’s decision to a binary outcome (accept = 1, refute = 0). You can treat “Neutral” as missing or as 0.5 if you want to retain it later.

In [52]:
df["Decision_binary"] = df["ModelDecisionOfClaim"].map({"Accept": 1, "Refute": 0})

Then comes the logistic regression.

We model whether the model/persona **accepts** a displayed claim as a function of the alignment between its prior belief and the claim stance, and the strength of evidence:

$$
\text{logit}\big(P(\text{Accept})\big)
= \beta_0
+ \beta_1 \text{Alignment}
+ \beta_2 \text{EvidenceStrength}
+ \beta_3 \big(\text{Alignment} \times \text{EvidenceStrength}\big)
$$

**Where:**
- \( \text{Accept} \): Binary outcome (1 = Accept, 0 = Refute)  
- \( \text{Alignment} \): 1 if belief and claim stance are consistent; 0 otherwise  
- \( \text{EvidenceStrength} \): Ordinal variable representing the strength of evidence  
  (e.g., −1 = Refutes, 0 = Not Enough Info, +1 = Supports)  
- \( \beta_3 \): Interaction term capturing how evidence moderates confirmation bias  

**Interpretation:**
- \( \beta_1 > 0 \): Aligned claims are more likely to be accepted (confirmation bias).  
- \( \beta_2 > 0 \): Stronger supporting evidence increases acceptance.  
- \( \beta_3 < 0 \): Strong evidence weakens confirmation bias (bias moderation).

In [50]:
import statsmodels.formula.api as smf

# Encode evidence strength ordinally
df["EvidenceStrength_num"] = df["EvidencesVerdict"].map({
    "REFUTES": -1,
    "NOT_ENOUGH_INFO": 0,
    "SUPPORTS": 1
})

# Drop NaN and fit logistic regression
model = smf.logit(
    "Decision_binary ~ Alignment * EvidenceStrength_num",
    data=df.dropna(subset=["Decision_binary", "Alignment", "EvidenceStrength_num"])
).fit()

print(model.summary())

Optimization terminated successfully.
         Current function value: 0.540792
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:        Decision_binary   No. Observations:                18099
Model:                          Logit   Df Residuals:                    18095
Method:                           MLE   Df Model:                            3
Date:                Wed, 29 Oct 2025   Pseudo R-squ.:                  0.1686
Time:                        13:36:25   Log-Likelihood:                -9787.8
converged:                       True   LL-Null:                       -11772.
Covariance Type:            nonrobust   LLR p-value:                     0.000
                                     coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------------------------------------
Intercept                          0.7419      0.024     31.361     