# BRMW – Regression Analysis

This notebook reproduces the OLS regression models reported in:

**Efthymiou, A., Kackovic, M., Rudinac, S., Worring, M., & Wijnberg, N. M. (2025).**  
*Being Ranked in a Material World: The Visual Originality of an Artwork and Its Effects on the Artist’s Canonization.*  
In *Organization Studies*.

---

It uses the dataset (`../data/brmw_dataset.csv`) to estimate the models presented in the paper.  
Each set of regressions corresponds to one evaluative regime: **expert**, **peer**, and **market** selection systems.

In [None]:
# === 1. Import dependencies ==========================================================
import numpy as np
import pandas as pd
import os
import statsmodels.formula.api as smf

from tqdm.notebook import tqdm

In [None]:
# === 2. Load dataset ================================================================
file = '../data/brmw_dataset.csv'
df = pd.read_csv(file)

# Display structure to confirm expected variables and types
df.info()

In [None]:
# === 3. Define model specifications =================================================
# Each model incrementally adds theoretically relevant covariates

cols = {
    'model_1': 'visual_originality_max',
    'model_2': 'visual_originality_max + career_stage + style_stage',
    'model_3': 'visual_originality_max + career_stage + style_stage + '
               'gender + active_years + number_of_styles + number_of_paintings',
    'model_4': 'visual_originality_dominant_style_max',
    'model_5': 'visual_originality_dominant_style_max + career_stage + style_stage',
    'model_6': 'visual_originality_dominant_style_max + career_stage + style_stage + '
               'gender + active_years + number_of_styles + number_of_paintings'
}

In [None]:
# === 4. Choose evaluative system (dependent variable) ===============================
# You can switch this variable to:
#   'peer_selection_system_ranking'   or
#   'market_selection_system_ranking'
# to reproduce regressions for other evaluative regimes.
ranking = 'expert_selection_system_ranking'

In [None]:
# === 5. Fit OLS regressions =========================================================
results = {}

for k, v in tqdm(cols.items(), desc="Estimating models"):
    mod = smf.ols(formula=f"{ranking} ~ {v}", data=df)
    res = mod.fit()   # Standard OLS SEs
    results[k] = res

In [None]:
# === 6A. Example full regression output =============================================
# Display the detailed summary of the final model (Model 6) for the selected evaluative system.
# This includes coefficients, standard errors, t-statistics, and diagnostic statistics.
results['model_6'].summary()

In [None]:
# === 6B. Summarize regression results ================================================
variable_order = [
    'visual_originality_max',                    # VISUAL ORIGINALITY
    'visual_originality_dominant_style_max',     # VISUAL ORIGINALITY (DOMINANT STYLE)
    'career_stage',                              # CAREER STAGE
    'style_stage',                               # STYLE STAGE
    'gender',                                    # GENDER
    'active_years',                              # ACTIVE YEARS
    'number_of_styles',                          # # OF STYLES
    'number_of_paintings'                        # # OF PAINTINGS
]

summary_table = summary_col(
    results=[results[m] for m in results],
    stars=True,
    float_format='%0.3f',
    model_names=list(results.keys()),
    info_dict={
        'N': lambda x: f"{int(x.nobs)}",
        'R2': lambda x: f"{x.rsquared:.3f}",
        'AIC': lambda x: f"{x.aic:.1f}",
        'BIC': lambda x: f"{x.bic:.1f}",
        'Log-Likelihood': lambda x: f"{x.llf:.1f}"
    },
    regressor_order=variable_order
)

print(summary_table)