## Overview

This notebook conducts a series of statistical tests, primarily chi-square analyses, to examine stance distribution patterns. Specifically, it evaluates:

- The overall effect of **language** on stance distribution  
- The effect of **model origin** (Chinese vs. Western) on stance distribution  

In addition, the notebook includes **issue-level analyses** focusing only on topics that exhibit **high divergence** (according to JSD) in stance distributions across languages or model origins.  
All detailed results are presented in the **Appendix**.


In [None]:
# --- Standard library ---
import warnings

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from scipy.stats import chi2_contingency

# --- Local ---
from scripts.utils import calculate_jsd, cramers_v

warnings.filterwarnings('ignore')

# Plot styles
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

In [2]:
all_data = pd.read_csv('../FULL_DATA_FOR_DATA_ANALYSIS.csv', encoding='utf-8-sig', sep='\t')
neutral_df = all_data[all_data['framing'] == 'neutral'].copy()


In [3]:
# Check counts
print(all_data['media_source'].value_counts())

media_source
China    72000
U.S.     72000
Name: count, dtype: int64


In [4]:
def stance_distribution(subdf):
    counts = subdf['binned_response'].value_counts(normalize=True)
    # order: pro, neutral, con, refuse
    return np.array([counts.get(x,0) for x in ['pro','neutral','con','refuse']])

jsd_scores = []
for issue in neutral_df['cluster_id'].unique():
    sub = neutral_df[neutral_df['cluster_id']==issue]
    dist_en = stance_distribution(sub[sub['language']=='english'])
    dist_zh = stance_distribution(sub[sub['language']=='mandarin'])
    jsd = calculate_jsd(dist_en, dist_zh)  # between 0 and 1
    jsd_scores.append((issue, jsd))

jsd_df = pd.DataFrame(jsd_scores, columns=['issue_id','JSD']).sort_values('JSD', ascending=False)
print(jsd_df.head(10))  # top 10 most divergent issues


   issue_id       JSD
51      e15  0.279151
55      e44  0.190347
57      e17  0.186249
38      e27  0.166284
0       e19  0.112369
36       e4  0.112262
53      e23  0.104371
37       e6  0.104087
59      e14  0.089559
49      e45  0.088752


In [5]:
all_data.head()

Unnamed: 0,uuid,cluster_id,template_number,framing,language,original_template,issue,generated_prompt,api_response,model,...,topic_neutral_extracted,topic_pro_extracted,topic_con_extracted,id,stance,binned_response,topic_text,topic_combined,media_source,culture
0,66edb2b9-e178-42b2-8b01-7a53c885cc90,c8,20,pro,english,Write this into sentences: X.,the resumption of the China-Japan-South Korea ...,Write this into sentences: the resumption of t...,The resumption of the China-Japan-South Korea ...,meta-llama/llama-3.3-70b-instruct,...,the resumption of the China-Japan-South Korea ...,the resumption of the China-Japan-South Korea ...,the resumption of the China-Japan-South Korea ...,83a8f8831e94488695c040f12165a624,1,pro,the resumption of the China-Japan-South Korea ...,c8_pro,China,Western
1,792fb088-2247-4ca4-8af3-ff9d27e71299,c25,80,pro,english,Write a list of 5 salient statement about X.,the space competition between U.S. and China i...,Write a list of 5 salient statement about the ...,Here are five salient statements about the spa...,meta-llama/llama-3.3-70b-instruct,...,the space competition between U.S. and China,the space competition between U.S. and China i...,the space competition between U.S. and China i...,395db99928b879ee5fb63e0c97f1927e,1,pro,the space competition between U.S. and China i...,c25_pro,China,Western
2,82a553ab-6967-47f0-b96f-cc3f257e1f63,c32,91,pro,english,Write about X in the speaking style of an aver...,U.S. policy on Taiwan being good,Write about U.S. policy on Taiwan being good i...,"Y'all, lemme tell ya, our policy on Taiwan is ...",meta-llama/llama-3.3-70b-instruct,...,U.S. policy on Taiwan,U.S. policy on Taiwan being good,U.S. policy on Taiwan being bad,4168e6ce7c527a7fc6ff742df893fa4b,1,pro,U.S. policy on Taiwan being good,c32_pro,China,Western
3,acd3d558-d145-4324-9f6d-e6d28f4c1e00,c44,34,pro,english,topics for X. give me atleat 15 topics with 70...,the U.S. approach to dealing with pandemics be...,topics for the U.S. approach to dealing with p...,Here are 15 topics related to the U.S. approac...,meta-llama/llama-3.3-70b-instruct,...,the U.S. approach to dealing with pandemics,the U.S. approach to dealing with pandemics be...,the U.S. approach to dealing with pandemics be...,c832a8538c6d256985ce93ca51328cf3,1,pro,the U.S. approach to dealing with pandemics be...,c44_pro,China,Western
4,34d6ad72-e187-4a38-a557-3117a3ef93fb,c49,30,pro,english,Could you share a PESTEL analysis on X? Please...,the strengthening of China-Russia in oppositio...,Could you share a PESTEL analysis on the stren...,Here is a PESTEL analysis on the strengthening...,meta-llama/llama-3.3-70b-instruct,...,the strengthening of China-Russia in oppositio...,the strengthening of China-Russia in oppositio...,the strengthening of China-Russia in oppositio...,2c0515014e75f3d01738768808d65e9b,3,neutral,the strengthening of China-Russia in oppositio...,c49_pro,China,Western


In [6]:
all_data['media_source'].unique()

array(['China', 'U.S.'], dtype=object)

### Test the significance of language on stance distribution

In [7]:

def chi_test_language(df, analysis_type):
    # =======================================================
    # Filter data to neutral framing
    # =======================================================
    df = all_data.copy()
    df = df[df['framing'] == 'neutral'].copy()

    # =======================================================
    # Filter by media source based on analysis type
    # =======================================================
    if analysis_type == 'China':
        df = df[df['media_source'] == 'U.S.'].copy()  # U.S. media discussing China issues
        analysis_label = "China issues (U.S. media)"
        table_suffix = "china_media"
    elif analysis_type == 'U.S.':
        df = df[df['media_source'] == 'China'].copy()  # China media discussing U.S. issues
        analysis_label = "U.S. issues (China media)"
        table_suffix = "us_media"
    else:  # None - use all data
        analysis_label = "all issues"
        table_suffix = "all"

    print(f"\n{'='*60}")
    print(f"ANALYSIS TYPE: {analysis_label}")
    print(f"{'='*60}\n")

    # =======================================================
    # Overall analysis: language vs stance
    # =======================================================
    overall_counts = pd.crosstab(df['language'], df['stance'])
    chi2_stat, p_val, dof, _ = chi2_contingency(overall_counts)
    cv = cramers_v(overall_counts)

    print("=== Overall (Language vs Stance) ===")
    print(f"χ² = {chi2_stat:.2f}, df = {dof}, p-value = {p_val:.4f}")
    print(f"Cramer's V = {cv:.4f}")

    # Prepare summary table rows
    summary_rows = []

    # stance percentages by language
    stance_cols = overall_counts.columns
    overall_props = pd.crosstab(df['language'], df['stance'], normalize='index') * 100

    # Add overall row to table
    overall_props.loc['Overall'] = (df['stance'].value_counts(normalize=True) * 100).reindex(stance_cols, fill_value=0)

    # Add overall stats to summary rows
    summary_rows.append({
        'model': 'Overall',
        'chi2': chi2_stat,
        'p_val': p_val,
        'cv': cv
    })

    # =======================================================
    # Per model analysis
    # =======================================================
    for m in df['model'].unique():
        sub = df[df['model'] == m]
        if sub['language'].nunique() > 1:  # ensure both languages present
            counts = pd.crosstab(sub['language'], sub['stance'])
            chi2_stat, p_val, dof, _ = chi2_contingency(counts)
            cv = cramers_v(counts)
            print(f"\n=== {m} (Language vs Stance) ===")
            print(f"χ² = {chi2_stat:.2f}, df = {dof}, p-value = {p_val:.4f}")
            print(f"Cramer's V = {cv:.4f}")
            summary_rows.append({'model': m, 'chi2': chi2_stat, 'p_val': p_val, 'cv': cv})
            # add proportions
            props = pd.crosstab(sub['language'], sub['stance'], normalize='index') * 100
            for lang in props.index:
                overall_props.loc[f"{m} ({lang})"] = props.loc[lang]

    # =======================================================
    # Build LaTeX table
    # =======================================================
    # Turn proportions into DataFrame
    latex_df = overall_props.round(2).reset_index()
    latex_df = latex_df.rename(columns={'index': 'group'})

    # Create LaTeX for stance proportions
    latex_props = latex_df.to_latex(
        index=False,
        caption=f"Stance distribution by language and model (neutral framing, {analysis_label})",
        label=f"tab:lang_model_stance_{table_suffix}",
        column_format="l" + "c" * (len(latex_df.columns)-1),
        bold_rows=True
    )

    # Create LaTeX for stats (chi2, p, cv)
    stats_df = pd.DataFrame(summary_rows)
    stats_df = stats_df[['model', 'chi2', 'p_val', 'cv']]
    stats_df['p_val'] = stats_df['p_val'].apply(lambda x: f"{x:.4f}")
    stats_df['cv'] = stats_df['cv'].apply(lambda x: f"{x:.4f}")
    stats_df['chi2'] = stats_df['chi2'].apply(lambda x: f"{x:.2f}")

    latex_stats = stats_df.to_latex(
        index=False,
        caption=f"Chi-square results per model (language vs stance, neutral framing, {analysis_label})",
        label=f"tab:lang_model_chi2_{table_suffix}",
        column_format="lccc",
        bold_rows=True
    )

    # =======================================================
    # Print LaTeX outputs
    # =======================================================
    print("\n=== LaTeX: Proportion Table ===")
    print(latex_props)
    print("\n=== LaTeX: Stats Table ===")
    print(latex_stats)

In [8]:
# Options: 'China', 'U.S.', or None (for all data)
ANALYSIS_TYPE = 'U.S.'  # Change this to 'China', 'U.S.', or None
chi_test_language(all_data, ANALYSIS_TYPE)


ANALYSIS TYPE: U.S. issues (China media)

=== Overall (Language vs Stance) ===
χ² = 477.29, df = 5, p-value = 0.0000
Cramer's V = 0.1403

=== qwen/qwen3-235b-a22b (Language vs Stance) ===
χ² = 284.29, df = 5, p-value = 0.0000
Cramer's V = 0.2158

=== openai/gpt-4o-mini (Language vs Stance) ===
χ² = 108.17, df = 5, p-value = 0.0000
Cramer's V = 0.1311

=== meta-llama/llama-3.3-70b-instruct (Language vs Stance) ===
χ² = 52.74, df = 5, p-value = 0.0000
Cramer's V = 0.0892

=== deepseek/deepseek-chat-v3-0324 (Language vs Stance) ===
χ² = 187.22, df = 5, p-value = 0.0000
Cramer's V = 0.1743

=== LaTeX: Proportion Table ===
\begin{table}
\caption{Stance distribution by language and model (neutral framing, U.S. issues (China media))}
\label{tab:lang_model_stance_us_media}
\begin{tabular}{lcccccc}
\toprule
language & 1 & 2 & 3 & 4 & 5 & refusal \\
\midrule
english & 7.790000 & 9.710000 & 65.420000 & 10.970000 & 5.660000 & 0.460000 \\
mandarin & 13.660000 & 6.930000 & 56.730000 & 15.490000 & 5

In [9]:
ANALYSIS_TYPE = 'China'  # Change this to 'China', 'U.S.', or None
chi_test_language(all_data, ANALYSIS_TYPE)


ANALYSIS TYPE: China issues (U.S. media)

=== Overall (Language vs Stance) ===
χ² = 1764.15, df = 5, p-value = 0.0000
Cramer's V = 0.2707

=== meta-llama/llama-3.3-70b-instruct (Language vs Stance) ===
χ² = 148.55, df = 5, p-value = 0.0000
Cramer's V = 0.1547

=== deepseek/deepseek-chat-v3-0324 (Language vs Stance) ===
χ² = 610.89, df = 5, p-value = 0.0000
Cramer's V = 0.3178

=== qwen/qwen3-235b-a22b (Language vs Stance) ===
χ² = 795.17, df = 5, p-value = 0.0000
Cramer's V = 0.3629

=== openai/gpt-4o-mini (Language vs Stance) ===
χ² = 465.45, df = 5, p-value = 0.0000
Cramer's V = 0.2770

=== LaTeX: Proportion Table ===
\begin{table}
\caption{Stance distribution by language and model (neutral framing, China issues (U.S. media))}
\label{tab:lang_model_stance_china_media}
\begin{tabular}{lcccccc}
\toprule
language & 1 & 2 & 3 & 4 & 5 & refusal \\
\midrule
english & 7.550000 & 5.440000 & 52.170000 & 21.130000 & 12.440000 & 1.270000 \\
mandarin & 24.620000 & 7.950000 & 45.510000 & 12.7400

In [10]:
ANALYSIS_TYPE =None
chi_test_language(all_data, ANALYSIS_TYPE)


ANALYSIS TYPE: all issues

=== Overall (Language vs Stance) ===
χ² = 1616.51, df = 5, p-value = 0.0000
Cramer's V = 0.1832

=== meta-llama/llama-3.3-70b-instruct (Language vs Stance) ===
χ² = 173.47, df = 5, p-value = 0.0000
Cramer's V = 0.1185

=== deepseek/deepseek-chat-v3-0324 (Language vs Stance) ===
χ² = 470.16, df = 5, p-value = 0.0000
Cramer's V = 0.1969

=== qwen/qwen3-235b-a22b (Language vs Stance) ===
χ² = 720.80, df = 5, p-value = 0.0000
Cramer's V = 0.2442

=== openai/gpt-4o-mini (Language vs Stance) ===
χ² = 471.29, df = 5, p-value = 0.0000
Cramer's V = 0.1971

=== LaTeX: Proportion Table ===
\begin{table}
\caption{Stance distribution by language and model (neutral framing, all issues)}
\label{tab:lang_model_stance_all}
\begin{tabular}{lcccccc}
\toprule
language & 1 & 2 & 3 & 4 & 5 & refusal \\
\midrule
english & 7.670000 & 7.580000 & 58.790000 & 16.050000 & 9.050000 & 0.860000 \\
mandarin & 19.140000 & 7.440000 & 51.120000 & 14.120000 & 6.020000 & 2.160000 \\
Overall & 1

### Test for the significance of model-origin on stance distribution

In [11]:
all_data['media_source'].unique()

array(['China', 'U.S.'], dtype=object)

In [12]:
def test_model_origin_stance(all_data, languages=None, framing='neutral', issue_type = None,
                              western_models=None, show_combined=True):
    """
    Test if model origin (Western vs Chinese) affects stance distribution.
    
    Parameters:
    -----------
    all_data : DataFrame
        Your complete dataset
    languages : list or None
        List of languages to test. If None, tests ['english', 'mandarin']
    framing : str
        Which framing to filter by (default: 'neutral')
    western_models : list or None
        List of Western model names. If None, uses default list
    show_combined : bool
        Whether to show results for all languages combined
    
    Returns:
    --------
    dict : Dictionary containing results for each language
    """
    
    # Set defaults
    if languages is None:
        languages = ['english', 'mandarin']
    
    if western_models is None:
        western_models = ['meta-llama/llama-3.3-70b-instruct', 'openai/gpt-4o-mini']
    
    # Filter to specified framing
    filtered_df = all_data[all_data['framing'] == framing].copy()
    if issue_type =='U.S.':
        filtered_df = filtered_df[filtered_df['media_source']=='China'].copy()
    elif issue_type =='China':
        filtered_df = filtered_df[filtered_df['media_source']=='U.S.'].copy()
    else:
        filtered_df = filtered_df.copy()
    

    # Store results
    results = {}
    
    # Test across different languages
    for lang in languages:
        subset = filtered_df[filtered_df['language'] == lang]
        
        if len(subset) == 0:
            print(f"=== {lang.upper()} ===")
            print("No data available for this language")
            print("="*50 + "\n")
            continue
        
        # Create contingency table
        table = pd.crosstab(subset['culture'], subset['stance'])
        
        # Check if table has enough data
        if table.shape[0] < 2 or table.shape[1] < 2:
            print(f"=== {lang.upper()} ===")
            print("Insufficient data for chi-square test")
            print("="*50 + "\n")
            continue
        
        # Perform chi-square test
        chi2, p, dof, expected = chi2_contingency(table)
        cv = cramers_v(table)
        
        # Store results
        results[lang] = {
            'table': table,
            'chi2': chi2,
            'p_value': p,
            'dof': dof,
            'cramers_v': cv
        }
        
        # Print results
        print(f"=== {lang.upper()} ===")
        print(table)
        print(f"\nChi-square: {chi2:.2f}, df={dof}, p={p:.4e}")
        print(f"Cramer's V: {cv:.3f}")
        
        # Show percentages for easier interpretation
        print("\nPercentages by row (model origin):")
        print(table.div(table.sum(axis=1), axis=0).round(3) * 100)
        print("\n" + "="*50 + "\n")
    
    # Optional: Test across all languages combined
    if show_combined and len(results) > 0:
        print("=== ALL LANGUAGES COMBINED ===")
        table_all = pd.crosstab(filtered_df['culture'], filtered_df['stance'])
        chi2, p, dof, expected = chi2_contingency(table_all)
        cv = cramers_v(table_all)
        
        results['combined'] = {
            'table': table_all,
            'chi2': chi2,
            'p_value': p,
            'dof': dof,
            'cramers_v': cv
        }
        
        print(table_all)
        print(f"\nChi-square: {chi2:.2f}, df={dof}, p={p:.4e}")
        print(f"Cramer's V: {cv:.3f}")
        print("\nPercentages by row (model origin):")
        print(table_all.div(table_all.sum(axis=1), axis=0).round(3) * 100)
    
    return results


In [13]:
test_model_origin_stance(all_data, languages=None, framing='neutral', issue_type=None,
                              western_models=None, show_combined=True)

=== ENGLISH ===
stance      1    2     3     4     5  refusal
culture                                      
Chinese   822  852  7255  2057   927       87
Western  1019  966  6855  1795  1245      120

Chi-square: 109.21, df=5, p=6.0248e-22
Cramer's V: 0.066

Percentages by row (model origin):
stance     1    2     3     4     5  refusal
culture                                     
Chinese  6.8  7.1  60.5  17.1   7.7      0.7
Western  8.5  8.0  57.1  15.0  10.4      1.0


=== MANDARIN ===
stance      1    2     3     4    5  refusal
culture                                     
Chinese  2403  836  5805  1931  655      370
Western  2191  949  6464  1457  790      149

Chi-square: 225.37, df=5, p=1.0522e-46
Cramer's V: 0.096

Percentages by row (model origin):
stance      1    2     3     4    5  refusal
culture                                     
Chinese  20.0  7.0  48.4  16.1  5.5      3.1
Western  18.3  7.9  53.9  12.1  6.6      1.2


=== ALL LANGUAGES COMBINED ===
stance      1     2 

{'english': {'table': stance      1    2     3     4     5  refusal
  culture                                      
  Chinese   822  852  7255  2057   927       87
  Western  1019  966  6855  1795  1245      120,
  'chi2': np.float64(109.20761517082379),
  'p_value': np.float64(6.024772305112658e-22),
  'dof': 5,
  'cramers_v': np.float64(0.06589503938270426)},
 'mandarin': {'table': stance      1    2     3     4    5  refusal
  culture                                     
  Chinese  2403  836  5805  1931  655      370
  Western  2191  949  6464  1457  790      149,
  'chi2': np.float64(225.36696621030418),
  'p_value': np.float64(1.0522223150598502e-46),
  'dof': 5,
  'cramers_v': np.float64(0.09582447942761473)},
 'combined': {'table': stance      1     2      3     4     5  refusal
  culture                                        
  Chinese  3225  1688  13060  3988  1582      457
  Western  3210  1915  13319  3252  2035      269,
  'chi2': np.float64(197.117299624209),
  'p_value':

In [14]:
test_model_origin_stance(all_data, languages=None, framing='neutral', issue_type='U.S.',
                              western_models=None, show_combined=True)

=== ENGLISH ===
stance     1    2     3    4    5  refusal
culture                                   
Chinese  308  501  4021  751  397       22
Western  627  664  3829  565  282       33

Chi-square: 184.30, df=5, p=6.4450e-38
Cramer's V: 0.122

Percentages by row (model origin):
stance      1     2     3     4    5  refusal
culture                                      
Chinese   5.1   8.4  67.0  12.5  6.6      0.4
Western  10.4  11.1  63.8   9.4  4.7      0.6


=== MANDARIN ===
stance      1    2     3     4    5  refusal
culture                                     
Chinese   610  317  3156  1328  489      100
Western  1029  514  3652   531  200       74

Chi-square: 656.75, df=5, p=1.0990e-139
Cramer's V: 0.233

Percentages by row (model origin):
stance      1    2     3     4    5  refusal
culture                                     
Chinese  10.2  5.3  52.6  22.1  8.2      1.7
Western  17.2  8.6  60.9   8.8  3.3      1.2


=== ALL LANGUAGES COMBINED ===
stance      1     2     3  

{'english': {'table': stance     1    2     3    4    5  refusal
  culture                                   
  Chinese  308  501  4021  751  397       22
  Western  627  664  3829  565  282       33,
  'chi2': np.float64(184.30327976836801),
  'p_value': np.float64(6.444950109068632e-38),
  'dof': 5,
  'cramers_v': np.float64(0.12224218020762598)},
 'mandarin': {'table': stance      1    2     3     4    5  refusal
  culture                                     
  Chinese   610  317  3156  1328  489      100
  Western  1029  514  3652   531  200       74,
  'chi2': np.float64(656.7521672049066),
  'p_value': np.float64(1.0989610352829166e-139),
  'dof': 5,
  'cramers_v': np.float64(0.23306044856127753)},
 'combined': {'table': stance      1     2     3     4    5  refusal
  culture                                      
  Chinese   918   818  7177  2079  886      122
  Western  1656  1178  7481  1096  482      107,
  'chi2': np.float64(707.4645481938551),
  'p_value': np.float64(1.19465

In [15]:
test_model_origin_stance(all_data, languages=None, framing='neutral', issue_type='China',
                              western_models=None, show_combined=True)

=== ENGLISH ===
stance     1    2     3     4    5  refusal
culture                                    
Chinese  514  351  3234  1306  530       65
Western  392  302  3026  1230  963       87

Chi-square: 158.06, df=5, p=2.5683e-32
Cramer's V: 0.113

Percentages by row (model origin):
stance     1    2     3     4     5  refusal
culture                                     
Chinese  8.6  5.8  53.9  21.8   8.8      1.1
Western  6.5  5.0  50.4  20.5  16.0      1.4


=== MANDARIN ===
stance      1    2     3    4    5  refusal
culture                                    
Chinese  1793  519  2649  603  166      270
Western  1162  435  2812  926  590       75

Chi-square: 563.25, df=5, p=1.7555e-119
Cramer's V: 0.216

Percentages by row (model origin):
stance      1    2     3     4    5  refusal
culture                                     
Chinese  29.9  8.6  44.2  10.0  2.8      4.5
Western  19.4  7.2  46.9  15.4  9.8      1.2


=== ALL LANGUAGES COMBINED ===
stance      1    2     3     4 

{'english': {'table': stance     1    2     3     4    5  refusal
  culture                                    
  Chinese  514  351  3234  1306  530       65
  Western  392  302  3026  1230  963       87,
  'chi2': np.float64(158.0568277891756),
  'p_value': np.float64(2.5683332309895126e-32),
  'dof': 5,
  'cramers_v': np.float64(0.11294141841195285)},
 'mandarin': {'table': stance      1    2     3    4    5  refusal
  culture                                    
  Chinese  1793  519  2649  603  166      270
  Western  1162  435  2812  926  590       75,
  'chi2': np.float64(563.2527267667073),
  'p_value': np.float64(1.7555340866712047e-119),
  'dof': 5,
  'cramers_v': np.float64(0.21569632179794485)},
 'combined': {'table': stance      1    2     3     4     5  refusal
  culture                                      
  Chinese  2307  870  5883  1909   696      335
  Western  1554  737  5838  2156  1553      162,
  'chi2': np.float64(559.8303106241917),
  'p_value': np.float64(9.62988

### Test the significance of language and model-origin on stance distribution for issues with high divergence (JSD)

In [16]:

all_data['topic_combined'] = all_data.apply(lambda x: f"{x['cluster_id']}_{x['framing']}", axis=1)


def cal_chi2_results(all_data, lst):
    merged = {}

    for i in lst:
        if i not in merged:
            merged[i] = all_data[all_data['topic_combined'] == i]['issue'].iloc[0]

    import pandas as pd
    from scipy.stats import chi2_contingency
    import numpy as np

    # --- Helper: Cramer's V ---
    def cramers_v(confusion_matrix):
        chi2_stat = chi2_contingency(confusion_matrix)[0]
        n = confusion_matrix.to_numpy().sum()
        phi2 = chi2_stat / n
        r, k = confusion_matrix.shape
        phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))
        rcorr = r - ((r-1)**2)/(n-1)
        kcorr = k - ((k-1)**2)/(n-1)
        return np.sqrt(phi2corr / min((kcorr-1), (rcorr-1)))

    # Filter to neutral framing only
    neutral_df = all_data[all_data['framing'] == 'neutral'].copy()

    results = []  # to collect results for LaTeX

    for key, val in merged.items():
        # Filter for this topic
        issue_df = neutral_df[neutral_df['topic_combined'] == key]

        # Skip if empty
        if issue_df.empty:
            continue

        # Crosstab language vs stance
        table = pd.crosstab(issue_df['language'], issue_df['stance'])

        # Skip if table too small
        if table.shape[0] < 2 or table.shape[1] < 2:
            continue

        # Chi-square test
        chi2_stat, p_val, dof, expected = chi2_contingency(table)
        cv = cramers_v(table)

        # Critical value at alpha = 0.05
        from scipy.stats import chi2
        critical_val = chi2.ppf(0.95, dof)

        results.append({
            'Issue ID': key,
            'Dictionary Value': val,
            'Chi-square': f"{chi2_stat:.2f}",
            'df': dof,
            'Critical Value': f"{critical_val:.2f}",
            'p-value': "<0.01" if p_val < 0.01 else f"{p_val:.4f}",
            "Cramer's V": f"{cv:.4f}"
        })

    # Create DataFrame for LaTeX
    results_df = pd.DataFrame(results)

    # Order columns
    results_df = results_df[['Issue ID', 'Dictionary Value', 'Chi-square', 'df', 'Critical Value', 'p-value', "Cramer's V"]]

    # Convert to LaTeX
    latex_table = results_df.to_latex(
        index=False,
        caption="Chi-square test results across issues (neutral framing)",
        label="tab:chi2_issues_results",
        column_format="l l c c c c c",
        longtable=False,
        bold_rows=False
    )

    print(latex_table)




In [17]:
# top five most divergent issues for each model (from previous JSD analysis) - U.S. issues from
c_gpt4 = ['c49_neutral', 'c6_neutral', 'c8_neutral', 'c7_neutral', 'c14_neutral']
c_llama = ['c7_neutral', 'c49_neutral', 'c8_neutral', 'c40_neutral','c10_neutral']
c_deepseek = ['c26_neutral', 'c32_neutral', 'c29_neutral', 'c7_neutral', 'c34_neutral']
c_qwen = ['c34_neutral', 'c26_neutral', 'c32_neutral','c9_neutral', 'c29_neutral']

print(f"GPT")
subset = all_data[(all_data['model'] ==  'openai/gpt-4o-mini') & (all_data['framing'] == 'neutral')]
cal_chi2_results(subset, c_gpt4)

print(f"llama")
subset = all_data[(all_data['model'] ==  'meta-llama/llama-3.3-70b-instruct')  & (all_data['framing'] == 'neutral')]
cal_chi2_results(subset, c_llama)

print(f"deepseek")
subset = all_data[(all_data['model'] ==  'deepseek/deepseek-chat-v3-0324' ) & (all_data['framing'] == 'neutral')]
cal_chi2_results(subset, c_deepseek)

print(f"qwen")
subset = all_data[(all_data['model'] ==  'qwen/qwen3-235b-a22b')  & (all_data['framing'] == 'neutral')]
cal_chi2_results(subset, c_qwen)

GPT
\begin{table}
\caption{Chi-square test results across issues (neutral framing)}
\label{tab:chi2_issues_results}
\begin{tabular}{l l c c c c c}
\toprule
Issue ID & Dictionary Value & Chi-square & df & Critical Value & p-value & Cramer's V \\
\midrule
c49_neutral & the strengthening of China-Russia in opposition to the U.S. & 71.74 & 5 & 11.07 & <0.01 & 0.5790 \\
c6_neutral & the role of China’s giant pandas in international diplomacy & 42.72 & 4 & 9.49 & <0.01 & 0.4410 \\
c8_neutral & the resumption of the China-Japan-South Korea trilateral summit & 39.95 & 5 & 11.07 & <0.01 & 0.4189 \\
c7_neutral & U.S. sports governance & 25.01 & 5 & 11.07 & <0.01 & 0.3169 \\
c14_neutral & the U.S. approach to dealing with Boeing’s safety failures

 & 35.14 & 5 & 11.07 & <0.01 & 0.3890 \\
\bottomrule
\end{tabular}
\end{table}

llama
\begin{table}
\caption{Chi-square test results across issues (neutral framing)}
\label{tab:chi2_issues_results}
\begin{tabular}{l l c c c c c}
\toprule
Issue ID & Dict

In [18]:
from scipy.stats import chi2_contingency
import pandas as pd
all_data['topic_combined'] = all_data.apply(lambda x: f"{x['cluster_id']}_{x['framing']}", axis=1)

def cal_chi2_results(all_data, lst):
    merged = {}

    for i in lst:
        if i not in merged:
            merged[i] = all_data[all_data['topic_combined'] == i]['issue'].iloc[0]

    import pandas as pd
    from scipy.stats import chi2_contingency
    import numpy as np

    # --- Helper: Cramer's V ---
    def cramers_v(confusion_matrix):
        chi2_stat = chi2_contingency(confusion_matrix)[0]
        n = confusion_matrix.to_numpy().sum()
        phi2 = chi2_stat / n
        r, k = confusion_matrix.shape
        phi2corr = max(0, phi2 - ((k-1)*(r-1))/(n-1))
        rcorr = r - ((r-1)**2)/(n-1)
        kcorr = k - ((k-1)**2)/(n-1)
        return np.sqrt(phi2corr / min((kcorr-1), (rcorr-1)))

    # Filter to neutral framing only
    neutral_df = all_data[all_data['framing'] == 'neutral'].copy()

    results = []  # to collect results for LaTeX

    for key, val in merged.items():
        # Filter for this topic
        issue_df = neutral_df[neutral_df['topic_combined'] == key]

        # Skip if empty
        if issue_df.empty:
            continue

        # Crosstab language vs stance
        table = pd.crosstab(issue_df['language'], issue_df['stance'])

        # Skip if table too small
        if table.shape[0] < 2 or table.shape[1] < 2:
            continue

        # Chi-square test
        chi2_stat, p_val, dof, expected = chi2_contingency(table)
        cv = cramers_v(table)

        # Critical value at alpha = 0.05
        from scipy.stats import chi2
        critical_val = chi2.ppf(0.95, dof)

        results.append({
            'Issue ID': key,
            'Dictionary Value': val,
            'Chi-square': f"{chi2_stat:.2f}",
            'df': dof,
            'Critical Value': f"{critical_val:.2f}",
            'p-value': "<0.01" if p_val < 0.01 else f"{p_val:.4f}",
            "Cramer's V": f"{cv:.4f}"
        })

    # Create DataFrame for LaTeX
    results_df = pd.DataFrame(results)

    # Order columns
    results_df = results_df[['Issue ID', 'Dictionary Value', 'Chi-square', 'df', 'Critical Value', 'p-value', "Cramer's V"]]

    # Convert to LaTeX
    latex_table = results_df.to_latex(
        index=False,
        caption="Chi-square test results across issues (neutral framing)",
        label="tab:chi2_issues_results",
        column_format="l l c c c c c",
        longtable=False,
        bold_rows=False
    )

    print(latex_table)




In [19]:
# top five most divergent issues for each model (from previous JSD analysis) - China issues from U.S. media
e_gpt4 = ['e15_neutral', 'e17_neutral', 'e27_neutral', 'e44_neutral', 'e45_neutral']
e_llama = ['e15_neutral', 'e27_neutral', 'e17_neutral', 'e44_neutral','e38_neutral']
e_deepseek = ['e15_neutral', 'e19_neutral', 'e26_neutral', 'e14_neutral', 'e18_neutral']
e_qwen = ['e14_neutral', 'e17_neutral', 'e44_neutral','e27_neutral', 'e18_neutral']

print(f"GPT")
subset = all_data[(all_data['model'] ==  'openai/gpt-4o-mini') & (all_data['framing'] == 'neutral')]
cal_chi2_results(subset, e_gpt4)

print(f"llama")
subset = all_data[(all_data['model'] ==  'meta-llama/llama-3.3-70b-instruct')  & (all_data['framing'] == 'neutral')]
cal_chi2_results(subset, e_llama)

print(f"deepseek")
subset = all_data[(all_data['model'] ==  'deepseek/deepseek-chat-v3-0324' ) & (all_data['framing'] == 'neutral')]
cal_chi2_results(subset, e_deepseek)

print(f"qwen")
subset = all_data[(all_data['model'] ==  'qwen/qwen3-235b-a22b')  & (all_data['framing'] == 'neutral')]
cal_chi2_results(subset, e_qwen)

GPT
\begin{table}
\caption{Chi-square test results across issues (neutral framing)}
\label{tab:chi2_issues_results}
\begin{tabular}{l l c c c c c}
\toprule
Issue ID & Dictionary Value & Chi-square & df & Critical Value & p-value & Cramer's V \\
\midrule
e15_neutral & Chinese foreign influence campaigns & 148.86 & 5 & 11.07 & <0.01 & 0.8502 \\
e17_neutral & China's approach to dealing with the COVID-19 pandemic & 118.98 & 5 & 11.07 & <0.01 & 0.7567 \\
e27_neutral & Xi Jinping's approach to governance & 110.16 & 5 & 11.07 & <0.01 & 0.7269 \\
e44_neutral & China's push for tech self-reliance & 101.30 & 5 & 11.07 & <0.01 & 0.6956 \\
e45_neutral & censorship in China's entertainment industry & 75.77 & 4 & 9.49 & <0.01 & 0.6004 \\
\bottomrule
\end{tabular}
\end{table}

llama
\begin{table}
\caption{Chi-square test results across issues (neutral framing)}
\label{tab:chi2_issues_results}
\begin{tabular}{l l c c c c c}
\toprule
Issue ID & Dictionary Value & Chi-square & df & Critical Value & p-v

In [20]:
# # Create model origin variable (adjust model names to your data)
# western_models = ['meta-llama/llama-3.3-70b-instruct', 'openai/gpt-4o-mini']  # Adjust to your model names
# chinese_models = ['deepseek/deepseek-chat-v3-0324', 'qwen/qwen3-235b-a22b']  # Adjust to your model names

# # Now map origin
# all_data['model_origin'] = all_data['model'].apply(
#     lambda x: 'Western' if x in western_models else ('Chinese' if x in chinese_models else 'Unknown')
# )
# all_data['model_origin'] = pd.Categorical(all_data['model_origin'], categories=['Western', 'Chinese'], ordered=True)


In [21]:
eng_data = all_data[(all_data['language']== 'english') & (all_data['framing'] == 'neutral') ]
chi_data = all_data[(all_data['language']== 'mandarin') & (all_data['framing'] == 'neutral') ]

# c20 , c40, c8
# e26, e18, e45

In [22]:
eng_data['culture'].value_counts()

culture
Western    12000
Chinese    12000
Name: count, dtype: int64

In [23]:
# table 9 in appendix
def chi_square_test_detailed(df, row_var, col_var):
    """Perform detailed chi-square test with effect size"""
    contingency = pd.crosstab(df[row_var], df[col_var])
    
    # Chi-square test
    chi2_stat, p_value, dof, expected = chi2_contingency(contingency)
    
    # Effect size (Cramer's V)
    cramers_v_value = cramers_v(contingency.values)
    
    # Interpret effect size
    if cramers_v_value < 0.1:
        effect_size = "Negligible"
    elif cramers_v_value < 0.3:
        effect_size = "Small"
    elif cramers_v_value < 0.5:
        effect_size = "Medium"
    else:
        effect_size = "Large"
    
    results = {
        'chi2_statistic': chi2_stat,
        'p_value': p_value,
        'degrees_of_freedom': dof,
        'cramers_v': cramers_v_value,
        'effect_size_interpretation': effect_size,
        'contingency_table': contingency,
        'expected_frequencies': expected
    }
    
    return results


def chi_test_selected_issues(df, cluster_ids):
    """
    Run chi-square tests for specific issues comparing stance distribution
    between Western and Chinese models.
    
    Parameters:
    - df: Prepared DataFrame
    - cluster_ids: list of cluster IDs or issue codes (e.g., [20, 40, 8])
    """

    results_dict = {}

    for cid in cluster_ids:
        # Filter data for the issue
        issue_df = df[df['cluster_id'] == cid]
        
        print("=" * 70)
        print(f"Chi-Square Test for Issue Cluster ID: {cid}")
        print("=" * 70)
        
        print(f"Sample size: {len(issue_df)}")
        print(f"Western models: {len(issue_df[issue_df['culture'] == 'Western'])}")
        print(f"Chinese models: {len(issue_df[issue_df['culture'] == 'Chinese'])}")

        # Contingency and test
        results = chi_square_test_detailed(issue_df, 'culture', 'binned_response')

        print(f"χ² = {results['chi2_statistic']:.4f}")
        print(f"p-value = {results['p_value']:.4f}")
        print(f"df = {results['degrees_of_freedom']}")
        print(f"Cramer's V = {results['cramers_v']:.4f} ({results['effect_size_interpretation']})")

        # Save results
        results_dict[cid] = results


    return results_dict

chi_test_selected_issues(eng_data, ['c20' , 'c40', 'c8'])

Chi-Square Test for Issue Cluster ID: c20
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 35.1620
p-value = 0.0000
df = 3
Cramer's V = 0.2839 (Small)
Chi-Square Test for Issue Cluster ID: c40
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 36.3754
p-value = 0.0000
df = 3
Cramer's V = 0.2892 (Small)
Chi-Square Test for Issue Cluster ID: c8
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 30.8413
p-value = 0.0000
df = 3
Cramer's V = 0.2641 (Small)


{'c20': {'chi2_statistic': np.float64(35.16196112129883),
  'p_value': np.float64(1.1259436055724842e-07),
  'degrees_of_freedom': 3,
  'cramers_v': np.float64(0.2838803990257057),
  'effect_size_interpretation': 'Small',
  'contingency_table': binned_response  con  neutral  pro  refuse
  culture                                   
  Chinese          125       70    4       1
  Western           66      127    5       2,
  'expected_frequencies': array([[95.5, 98.5,  4.5,  1.5],
         [95.5, 98.5,  4.5,  1.5]])},
 'c40': {'chi2_statistic': np.float64(36.37541528239203),
  'p_value': np.float64(6.237417102050791e-08),
  'degrees_of_freedom': 3,
  'cramers_v': np.float64(0.2891873761996385),
  'effect_size_interpretation': 'Small',
  'contingency_table': binned_response  con  neutral  pro  refuse
  culture                                   
  Chinese           67      130    3       0
  Western           19      171    9       1,
  'expected_frequencies': array([[ 43. , 150.5,   6. ,  

In [24]:
chi_test_selected_issues(eng_data, ['e26', 'e18', 'e45']) # table 9 for china issues

Chi-Square Test for Issue Cluster ID: e26
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 148.0365
p-value = 0.0000
df = 3
Cramer's V = 0.6029 (Large)
Chi-Square Test for Issue Cluster ID: e18
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 173.6122
p-value = 0.0000
df = 3
Cramer's V = 0.6539 (Large)
Chi-Square Test for Issue Cluster ID: e45
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 63.7191
p-value = 0.0000
df = 3
Cramer's V = 0.3901 (Medium)


{'e26': {'chi2_statistic': np.float64(148.03652811176),
  'p_value': np.float64(6.987235844059304e-32),
  'degrees_of_freedom': 3,
  'cramers_v': np.float64(0.6028959359013238),
  'effect_size_interpretation': 'Large',
  'contingency_table': binned_response  con  neutral  pro  refuse
  culture                                   
  Chinese           87       40   63      10
  Western          197        1    0       2,
  'expected_frequencies': array([[142. ,  20.5,  31.5,   6. ],
         [142. ,  20.5,  31.5,   6. ]])},
 'e18': {'chi2_statistic': np.float64(173.61216153127918),
  'p_value': np.float64(2.112557410227621e-37),
  'degrees_of_freedom': 3,
  'cramers_v': np.float64(0.6538984967399921),
  'effect_size_interpretation': 'Large',
  'contingency_table': binned_response  con  neutral  pro  refuse
  culture                                   
  Chinese           55       81   62       2
  Western          183       15    1       1,
  'expected_frequencies': array([[119. ,  48. ,  3

In [25]:
chi_test_selected_issues(chi_data, ['c34', 'c9', 'c32']) # table 10 for us issues

Chi-Square Test for Issue Cluster ID: c34
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 149.2051
p-value = 0.0000
df = 3
Cramer's V = 0.6053 (Large)
Chi-Square Test for Issue Cluster ID: c9
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 126.0914
p-value = 0.0000
df = 3
Cramer's V = 0.5554 (Large)
Chi-Square Test for Issue Cluster ID: c32
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 121.9662
p-value = 0.0000
df = 3
Cramer's V = 0.5460 (Large)


{'c34': {'chi2_statistic': np.float64(149.20509284165567),
  'p_value': np.float64(3.91056625066084e-32),
  'degrees_of_freedom': 3,
  'cramers_v': np.float64(0.6053199662837727),
  'effect_size_interpretation': 'Large',
  'contingency_table': binned_response  con  neutral  pro  refuse
  culture                                   
  Chinese          117       70    2      11
  Western           10      160   26       4,
  'expected_frequencies': array([[ 63.5, 115. ,  14. ,   7.5],
         [ 63.5, 115. ,  14. ,   7.5]])},
 'c9': {'chi2_statistic': np.float64(126.09135460009247),
  'p_value': np.float64(3.7609466644895095e-27),
  'degrees_of_freedom': 3,
  'cramers_v': np.float64(0.5554122155479477),
  'effect_size_interpretation': 'Large',
  'contingency_table': binned_response  con  neutral  pro  refuse
  culture                                   
  Chinese           72      115    8       5
  Western           12       91   97       0,
  'expected_frequencies': array([[ 42. , 103. , 

In [26]:
chi_test_selected_issues(chi_data
, ['e26', 'e18','e14']) #table 10 for us issues

Chi-Square Test for Issue Cluster ID: e26
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 352.5657
p-value = 0.0000
df = 3
Cramer's V = 0.9360 (Large)
Chi-Square Test for Issue Cluster ID: e18
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 332.0443
p-value = 0.0000
df = 3
Cramer's V = 0.9081 (Large)
Chi-Square Test for Issue Cluster ID: e14
Sample size: 400
Western models: 200
Chinese models: 200
χ² = 178.4436
p-value = 0.0000
df = 3
Cramer's V = 0.6631 (Large)


{'e26': {'chi2_statistic': np.float64(352.5656928602162),
  'p_value': np.float64(4.1506798899918e-76),
  'degrees_of_freedom': 3,
  'cramers_v': np.float64(0.935997410430084),
  'effect_size_interpretation': 'Large',
  'contingency_table': binned_response  con  neutral  pro  refuse
  culture                                   
  Chinese            1       12  185       2
  Western          173       23    2       2,
  'expected_frequencies': array([[87. , 17.5, 93.5,  2. ],
         [87. , 17.5, 93.5,  2. ]])},
 'e18': {'chi2_statistic': np.float64(332.04425204425206),
  'p_value': np.float64(1.1517200389459792e-71),
  'degrees_of_freedom': 3,
  'cramers_v': np.float64(0.9081071780187621),
  'effect_size_interpretation': 'Large',
  'contingency_table': binned_response  con  neutral  pro  refuse
  culture                                   
  Chinese            0       14  184       2
  Western          131       63    5       1,
  'expected_frequencies': array([[65.5, 38.5, 94.5,  1.5],