# Multiple Regression Models without Dummy Variables - SABE Dataset

This notebook implements multiple regression models on the SABE dataset keeping original variables (without dummy encoding). Steps include:
1. Identifying numeric, ordinal, and binary variables
2. Standardizing only numeric variables (nunique ≥ 5)
3. Checking for multicollinearity
4. Implementing regression models for three dependent variables:
   - minimental
   - coherencia
   - memoria_subjetiva

In [179]:
# Standard imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
sns.set(style='whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)

## 1. Loading and Exploring the Dataset

In [180]:
# Load the SABE dataset with coherencia
df = pd.read_csv('transformed_csv/sabe_with_coherencia.csv')
print(f"Dataset shape: {df.shape}")
df.head()

Dataset shape: (16601, 101)


Unnamed: 0,accesibilidad_vivienda,accesibilidad_hogar,seguridad_barrio,inseguridad_ambiental,percepcion_tradicional_vejez,percepcion_funcional_vejez,vejez_positiva,vejez_negativa,uso_medios_tradicionales,uso_medios_digitales,...,categoria_cognitiva,ataque_corazon,diabetes,infarto,osteoporosis,artritis,hipertension,minimental_norm,memoria_subjetiva_norm,coherencia
0,4.0,3.0,25.0,2.0,0.0,1.0,0,1,4,2,...,1,0,0,0,0,0,0,0.5,0.625,-0.125
1,1.0,1.0,24.0,2.0,1.0,1.0,3,3,4,2,...,1,0,0,1,0,0,0,0.833333,0.375,0.458333
2,1.0,2.0,22.0,2.0,1.0,2.0,2,0,4,2,...,1,0,1,0,0,0,1,0.666667,1.0,-0.333333
3,3.0,2.0,26.0,3.0,0.0,3.0,2,0,4,2,...,1,0,0,0,0,0,0,0.166667,0.5,-0.333333
4,4.0,0.0,22.0,1.0,1.0,2.0,1,1,4,2,...,1,0,0,1,0,1,0,0.166667,0.5,-0.333333


In [181]:
# Check data types and missing values
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16601 entries, 0 to 16600
Columns: 101 entries, accesibilidad_vivienda to coherencia
dtypes: float64(54), int64(47)
memory usage: 12.8 MB


In [182]:
# Check summary statistics
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
accesibilidad_vivienda,16601.0,2.992952,1.026580,0.0,2.000000,3.000000,4.000000,6.000
accesibilidad_hogar,16601.0,1.701163,1.406835,0.0,1.000000,1.000000,2.000000,10.000
seguridad_barrio,16601.0,14.723089,4.816063,1.0,11.000000,13.000000,18.000000,30.000
inseguridad_ambiental,16601.0,1.985784,0.905294,0.0,1.000000,2.000000,3.000000,7.000
percepcion_tradicional_vejez,16601.0,0.736582,0.777820,0.0,0.000000,1.000000,1.000000,2.000
...,...,...,...,...,...,...,...,...
artritis,16601.0,0.255045,0.435900,0.0,0.000000,0.000000,1.000000,1.000
hipertension,16601.0,0.516656,0.499738,0.0,0.000000,1.000000,1.000000,1.000
minimental_norm,16601.0,0.601450,0.320992,0.0,0.333333,0.666667,0.833333,1.000
memoria_subjetiva_norm,16601.0,0.635255,0.169714,0.0,0.500000,0.625000,0.750000,1.000


## 2. Identifying Variable Types (Numeric, Ordinal, Binary)

In [183]:
# Function to check if a column is numeric
def is_numeric(column):
    return pd.api.types.is_numeric_dtype(column)

In [184]:
# Count unique values for each column
unique_counts = pd.DataFrame({'column': df.columns, 'nunique': df.nunique()})
unique_counts['is_numeric'] = [is_numeric(df[col]) for col in df.columns]
unique_counts = unique_counts.sort_values('nunique', ascending=False)

print("Unique value counts for each column:")
unique_counts

Unique value counts for each column:


Unnamed: 0,column,nunique,is_numeric
peso,peso,90,True
circ_cintura,circ_cintura,67,True
talla,talla,55,True
impacto_salud_bucal,impacto_salud_bucal,49,True
edad,edad,40,True
...,...,...,...
ejercicio,ejercicio,2,True
cancer,cancer,2,True
asma,asma,2,True
derrame_cerebral,derrame_cerebral,2,True


In [185]:
# Categorize variables based on nunique and type
numeric_vars = [col for col in df.columns if is_numeric(df[col]) and df[col].nunique() >= 5]
ordinal_vars = [col for col in df.columns if is_numeric(df[col]) and 2 < df[col].nunique() <= 4]
binary_vars = [col for col in df.columns if df[col].nunique() <= 2]

# Print summary of variable types
print(f"Number of numeric variables (nunique ≥ 5): {len(numeric_vars)}")
print(f"Number of ordinal variables (nunique ≤ 4): {len(ordinal_vars)}")
print(f"Number of binary variables (nunique ≤ 2): {len(binary_vars)}")

# Verify that all columns are accounted for
total_categorized = len(numeric_vars) + len(ordinal_vars) + len(binary_vars)
print(f"Total categorized variables: {total_categorized}")
print(f"Total columns in dataset: {len(df.columns)}")

# Check if any columns weren't categorized
all_categorized = set(numeric_vars + ordinal_vars + binary_vars)
uncategorized = [col for col in df.columns if col not in all_categorized]
if uncategorized:
    print(f"Uncategorized variables: {uncategorized}")

Number of numeric variables (nunique ≥ 5): 43
Number of ordinal variables (nunique ≤ 4): 17
Number of binary variables (nunique ≤ 2): 41
Total categorized variables: 101
Total columns in dataset: 101


In [186]:
# List the variables in each category
print("Numeric variables (sample):")
print(numeric_vars[:10] if len(numeric_vars) > 10 else numeric_vars)
print("\nOrdinal variables:")
print(ordinal_vars)
print("\nBinary variables (sample):")
print(binary_vars[:10] if len(binary_vars) > 10 else binary_vars)

Numeric variables (sample):
['accesibilidad_vivienda', 'accesibilidad_hogar', 'seguridad_barrio', 'inseguridad_ambiental', 'uso_medios_tradicionales', 'recibe_ayuda', 'hace_trabajo_voluntario', 'nivel_apoyo_otros', 'memoria_subjetiva', 'estres_vida_temprana']

Ordinal variables:
['percepcion_tradicional_vejez', 'percepcion_funcional_vejez', 'vejez_positiva', 'vejez_negativa', 'uso_medios_digitales', 'uso_redes_sociales_informales', 'participacion_fisica_salud', 'participacion_cultural_comunitaria', 'participacion_politica_gremial', 'dieta_balanceada', 'tiene_depresion', 'uses_public_transport', 'comido_menos', 'comidas_al_dia', 'consumo_cigarrillo', 'color_piel', 'categoria_cognitiva']

Binary variables (sample):
['no_esta_informado', 'participacion_religiosa', 'participacion_personas_mayores', 'no_participa_grupos', 'salida_forzada', 'salida_desastre', 'salida_economica', 'salida_servicios', 'salida_familiar', 'area_vivienda']


## 3. Standardizing Numeric Variables

In [187]:
# Identify the dependent variables
dependent_vars = ['minimental', 'coherencia', 'memoria_subjetiva']

# Check if they exist in the dataset and find alternatives if not
actual_dependent_vars = []
for var in dependent_vars:
    if var in df.columns:
        actual_dependent_vars.append(var)
    else:
        # Try to find a similar variable
        if var == 'minimental':
            potential_matches = [col for col in df.columns if 'mini' in col.lower() and 'mental' in col.lower()]
        elif var == 'coherencia':
            potential_matches = [col for col in df.columns if 'coher' in col.lower()]
        elif var == 'memoria_subjetiva':
            potential_matches = [col for col in df.columns if 'memoria' in col.lower() and 'subj' in col.lower()]
        else:
            potential_matches = []
            
        if potential_matches:
            print(f"Using '{potential_matches[0]}' instead of '{var}'")
            actual_dependent_vars.append(potential_matches[0])
        else:
            print(f"Warning: No match found for {var}")

print(f"Dependent variables for regression: {actual_dependent_vars}")

Dependent variables for regression: ['minimental', 'coherencia', 'memoria_subjetiva']


In [188]:
# Create a copy of the dataframe for standardization
sabe_std = df.copy()

# Get numeric variables to standardize (excluding dependent variables)
numeric_to_standardize = [var for var in numeric_vars if var not in actual_dependent_vars]
print(f"Number of numeric variables to standardize: {len(numeric_to_standardize)}")

# Also standardize the dependent variables if they are numeric
dep_to_standardize = [var for var in actual_dependent_vars if var in numeric_vars]
if dep_to_standardize:
    print(f"Also standardizing these dependent variables: {dep_to_standardize}")
    numeric_to_standardize += dep_to_standardize

Number of numeric variables to standardize: 40
Also standardizing these dependent variables: ['minimental', 'coherencia', 'memoria_subjetiva']


In [189]:
# Standardize numeric variables using z-scores (mean=0, std=1)
scaler = StandardScaler()
if numeric_to_standardize:
    # Only standardize rows without NaN values
    mask = ~df[numeric_to_standardize].isnull().any(axis=1)
    sabe_std.loc[mask, numeric_to_standardize] = scaler.fit_transform(df.loc[mask, numeric_to_standardize])
    
    print("After standardization (sample of numeric variables):")
    print(sabe_std[numeric_to_standardize[:5]].describe().T[['mean', 'std']] if len(numeric_to_standardize) >= 5 else 
          sabe_std[numeric_to_standardize].describe().T[['mean', 'std']])

After standardization (sample of numeric variables):
                                  mean      std
accesibilidad_vivienda   -5.274044e-15  1.00003
accesibilidad_hogar      -7.874485e-16  1.00003
seguridad_barrio          1.929920e-15  1.00003
inseguridad_ambiental     1.254530e-15  1.00003
uso_medios_tradicionales  1.454122e-14  1.00003


In [190]:
# Save the standardized dataset
sabe_std.to_csv('sabe_std_without_dummies.csv', index=False)
print("Standardized dataset saved as 'sabe_std_without_dummies.csv'")

Standardized dataset saved as 'sabe_std_without_dummies.csv'


## 4. Checking for Multicollinearity

In [191]:
sabe_std.columns[sabe_std.isna().sum() > 0]

Index([], dtype='object')

In [192]:
# Function to calculate VIF
def calculate_vif(X):
    vif_data = pd.DataFrame()
    vif_data["Variable"] = X.columns
    vif_data["VIF"] = [variance_inflation_factor(X.values, i) for i in range(X.shape[1])]
    return vif_data.sort_values("VIF", ascending=False)

In [193]:
# Prepare variables for regression (predictors)
all_predictors = numeric_to_standardize + ordinal_vars + binary_vars
predictors = [var for var in all_predictors if var not in actual_dependent_vars]

# For VIF calculation, we need to drop rows with NaN values
sabe_dropna = sabe_std[predictors].dropna()
print(f"Shape after dropping NaN values: {sabe_dropna.shape}")

Shape after dropping NaN values: (16601, 98)


In [194]:
vif_predictors = predictors
print(f"Using all {len(vif_predictors)} predictors for VIF calculation")

Using all 98 predictors for VIF calculation


In [195]:
# Calculate VIF
X_vif = sabe_dropna[vif_predictors]
vif_df = calculate_vif(X_vif)

# Display predictors with high VIF (> 10 indicates potential multicollinearity)
print("Predictors with high VIF (> 10):")
high_vif = vif_df[vif_df['VIF'] > 10]
high_vif

Predictors with high VIF (> 10):


Unnamed: 0,Variable,VIF
92,ataque_corazon,inf
81,derrame_cerebral,inf
68,sabe_leer,48.330983
69,sabe_escribir,45.369077
50,tiene_depresion,34.222687
53,comidas_al_dia,29.067236
67,sexo,25.762315
51,uses_public_transport,19.958451
56,categoria_cognitiva,18.099564
57,no_esta_informado,17.285786


In [196]:
# Create a list of predictors with acceptable VIF values (< 10)
low_vif_predictors = vif_df[vif_df['VIF'] < 10]['Variable'].tolist()
print(f"Number of predictors with acceptable VIF: {len(low_vif_predictors), low_vif_predictors}")

# For the regression, we'll use predictors with low VIF and the remaining predictors that weren't checked
final_predictors = low_vif_predictors 
print(f"Total predictors for regression: {len(final_predictors)}")

Number of predictors with acceptable VIF: (83, ['uso_medios_digitales', 'consumo_cigarrillo', 'color_piel', 'habitacion_unica', 'no_participa_grupos', 'dieta_balanceada', 'vejez_positiva', 'recibio_dinero_ultimo_mes', 'dependencia_economica', 'comido_menos', 'participacion_religiosa', 'percepcion_funcional_vejez', 'peso', 'uso_medios_tradicionales', 'usa_gafas', 'circ_cintura', 'hipertension', 'talla', 'percepcion_tradicional_vejez', 'nivel_educativo', 'va_al_medico', 'a_educacion', 'hospitalizacion', 'vejez_negativa', 'homeopatia', 'beneficiario_col_mayor', 'sintomas_ultimo_mes', 'num_veces_hospitalizado', 'artritis', 'desplazado', 'participacion_personas_mayores', 'estrato', 'ejercicio', 'minimental_norm', 'salida_economica', 'num_people_depending', 'participacion_cultural_comunitaria', 'participacion_fisica_salud', 'num_personas_hogar', 'salida_forzada', 'problemas_auditivos', 'autopercepcion_salud', 'edad', 'salida_familiar', 'salida_servicios', 'num_rooms_house', 'diabetes', 'oste

## 5. Implementing Multiple Regression Models

In [197]:
# Function to run regression and evaluate results
def run_regression(df, y_var, predictors, print_summary=True):
    # Prepare X and y
    X = df[predictors]
    y = df[y_var]
    
    # Add constant
    X = sm.add_constant(X)
    
    # Fit the model
    model = sm.OLS(y, X).fit()
    
    # Get predictions
    y_pred = model.predict(X)
    
    # Calculate metrics
    metrics = {
        'r2': model.rsquared,
        'adj_r2': model.rsquared_adj,
        'rmse': np.sqrt(mean_squared_error(y, y_pred)),
        'mae': mean_absolute_error(y, y_pred)
    }
    
    if print_summary:
        print(f"\n===== Regression Results for {y_var} =====\n")
        print(f"Number of predictors: {len(predictors)}")
        print(f"R²: {metrics['r2']:.4f}")
        print(f"Adjusted R²: {metrics['adj_r2']:.4f}")
        print(f"RMSE: {metrics['rmse']:.4f}")
        print(f"MAE: {metrics['mae']:.4f}")
        print("\nModel Summary:")
        print(model.summary().tables[1])
    
    return model, metrics

### 5.1 Regression for Minimental

In [198]:
# Get the minimental variable
minimental_var = 'minimental'
final_predictors = [col for col in final_predictors if col not in ['minimental', 'coherencia','minimental_norm']]
    
# Run regression for minimental
minimental_model, minimental_metrics = run_regression(
    sabe_std, minimental_var, final_predictors
)

# Identify significant predictors (p-value < 0.05)
minimental_results = minimental_model.summary().tables[1].data
significant_predictors = []
for row in minimental_results[1:]:  # Skip the header row
    var_name = row[0]
    p_value = float(row[4])
    if p_value < 0.05:
        significant_predictors.append((var_name, p_value))

print("\nSignificant predictors of minimental (p < 0.05):")
for var, p in sorted(significant_predictors, key=lambda x: x[1])[:10]:
        print(f"{var}: p = {p:.4f}")


===== Regression Results for minimental =====

Number of predictors: 82
R²: 0.1358
Adjusted R²: 0.1315
RMSE: 0.9296
MAE: 0.7854

Model Summary:
                                         coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------------------
const                                 -0.0482      0.057     -0.850      0.395      -0.159       0.063
uso_medios_digitales                   0.0035      0.013      0.278      0.781      -0.021       0.028
consumo_cigarrillo                     0.0041      0.007      0.598      0.550      -0.009       0.018
color_piel                            -0.1142      0.011    -10.386      0.000      -0.136      -0.093
habitacion_unica                       0.0374      0.021      1.752      0.080      -0.004       0.079
no_participa_grupos                    0.0034      0.027      0.126      0.900      -0.050       0.057
dieta_balanceada               

### 5.3 Regression for Coherencia

In [205]:
# Get the coherencia variable
coherencia_var = 'coherencia'
final_predictors = [col for col in final_predictors if col not in ['minimental', 'coherencia','memoria_subjetiva', 'minimental_norm','memoria_subjetiva_norm']]
    
# Run regression for coherencia
coherencia_model, coherencia_metrics = run_regression(
    sabe_std, coherencia_var, final_predictors
)

# Identify significant predictors (p-value < 0.05)
coherencia_results = coherencia_model.summary().tables[1].data
significant_predictors = []
for row in coherencia_results[1:]:  # Skip the header row
    var_name = row[0]
    p_value = float(row[4])
    if p_value < 0.05:
        significant_predictors.append((var_name, p_value))

print("\nSignificant predictors of coherencia (p < 0.05):")
for var, p in sorted(significant_predictors, key=lambda x: x[1])[:10]:
    print(f"{var}: p = {p:.4f}")


===== Regression Results for coherencia =====

Number of predictors: 81
R²: 0.1755
Adjusted R²: 0.1715
RMSE: 0.9080
MAE: 0.7474

Model Summary:
                                         coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------------------
const                                 -0.0357      0.055     -0.646      0.518      -0.144       0.073
uso_medios_digitales                  -0.0070      0.012     -0.572      0.568      -0.031       0.017
consumo_cigarrillo                     0.0019      0.007      0.287      0.774      -0.011       0.015
color_piel                            -0.0646      0.011     -6.024      0.000      -0.086      -0.044
habitacion_unica                       0.0255      0.021      1.225      0.220      -0.015       0.066
no_participa_grupos                    0.0235      0.026      0.890      0.374      -0.028       0.075
dieta_balanceada               

### 5.4 Regression for Memoria Subjetiva

In [206]:
# Get the memoria_subjetiva variable
memoria_var = 'memoria_subjetiva'
final_predictors = [col for col in final_predictors if col not in ['coherencia','memoria_subjetiva', 'memoria_subjetiva_norm']]
    
    # Run regression for memoria_subjetiva
memoria_model, memoria_metrics = run_regression(
    sabe_std, memoria_var, final_predictors
)
    
# Identify significant predictors (p-value < 0.05)
memoria_results = memoria_model.summary().tables[1].data
significant_predictors = []
for row in memoria_results[1:]:  # Skip the header row
    var_name = row[0]
    p_value = float(row[4])
    if p_value < 0.05:
        significant_predictors.append((var_name, p_value))

print("\nSignificant predictors of memoria_subjetiva (p < 0.05):")
for var, p in sorted(significant_predictors, key=lambda x: x[1])[:10]:
    print(f"{var}: p = {p:.4f}")


===== Regression Results for memoria_subjetiva =====

Number of predictors: 81
R²: 0.1861
Adjusted R²: 0.1821
RMSE: 0.9022
MAE: 0.7061

Model Summary:
                                         coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------------------------
const                                 -0.0111      0.055     -0.201      0.840      -0.119       0.097
uso_medios_digitales                   0.0204      0.012      1.681      0.093      -0.003       0.044
consumo_cigarrillo                     0.0033      0.007      0.489      0.625      -0.010       0.016
color_piel                            -0.0676      0.011     -6.341      0.000      -0.088      -0.047
habitacion_unica                       0.0131      0.021      0.635      0.526      -0.027       0.054
no_participa_grupos                   -0.0423      0.026     -1.610      0.107      -0.094       0.009
dieta_balanceada        

## 6. Comparing Model Results

In [207]:
# Create a comparison table for the three models
model_comparison = {}
observations = {}
if model_comparison:
    # Create a dataframe with metrics for each model
    comparison_df = pd.DataFrame(model_comparison).T
    comparison_df['n_observations'] = pd.Series(observations)
    comparison_df = comparison_df.round(4)
    
    print("Model Comparison:")
    print(comparison_df)
    
    # Visualize the results
    plt.figure(figsize=(12, 6))
    sns.barplot(x=comparison_df.index, y=comparison_df['r2'])
    plt.title("R² Comparison Across Models")
    plt.ylabel("R²")
    plt.xlabel("Dependent Variable")
    plt.ylim(0, 1)  # R² is between 0 and 1
    plt.grid(axis='y', alpha=0.3)
    plt.show()
    
    plt.figure(figsize=(12, 6))
    sns.barplot(x=comparison_df.index, y=comparison_df['adj_r2'])
    plt.title("Adjusted R² Comparison Across Models")
    plt.ylabel("Adjusted R²")
    plt.xlabel("Dependent Variable")
    plt.ylim(0, 1)  # Adjusted R² is between 0 and 1
    plt.grid(axis='y', alpha=0.3)
    plt.show()

## 7. Comparing Top Predictors Across Models

In [208]:
# Function to extract top predictors from a model
def get_top_predictors(model, top_n=10):
    results = model.summary().tables[1].data
    predictors = []
    
    for row in results[1:]:  # Skip the header row
        var_name = row[0]
        coefficient = float(row[1])
        p_value = float(row[4])
        
        if p_value < 0.05 and var_name != 'const':  # Significant and not the intercept
            predictors.append({
                'variable': var_name,
                'coefficient': coefficient,
                'p_value': p_value,
                'abs_coef': abs(coefficient)
            })
    
    # Sort by absolute coefficient value
    predictors.sort(key=lambda x: x['abs_coef'], reverse=True)
    return predictors[:top_n]

In [209]:
# Get top predictors for each model
top_predictors = {}

if 'minimental_model' in locals():
    top_predictors['minimental'] = get_top_predictors(minimental_model)
    
if 'coherencia_model' in locals():
    top_predictors['coherencia'] = get_top_predictors(coherencia_model)
    
if 'memoria_model' in locals():
    top_predictors['memoria_subjetiva'] = get_top_predictors(memoria_model)

# Print top predictors for each model
for model_name, predictors in top_predictors.items():
    print(f"\nTop predictors for {model_name}:")
    for i, p in enumerate(predictors, 1):
        print(f"{i}. {p['variable']}: coef = {p['coefficient']:.4f}, p = {p['p_value']:.4f}")


Top predictors for minimental:
1. nivel_educativo: coef = 0.1274, p = 0.0000
2. color_piel: coef = -0.1142, p = 0.0000
3. edad: coef = -0.1134, p = 0.0000
4. enfermedad_mental: coef = -0.1109, p = 0.0000
5. usa_gafas: coef = 0.1090, p = 0.0000
6. cancer: coef = 0.1037, p = 0.0050
7. artritis: coef = 0.0927, p = 0.0000
8. a_educacion: coef = 0.0672, p = 0.0000
9. infarto: coef = 0.0602, p = 0.0080
10. desplazado: coef = 0.0579, p = 0.0060

Top predictors for coherencia:
1. nivel_educativo: coef = 0.1377, p = 0.0000
2. salida_desastre: coef = -0.1267, p = 0.0270
3. enfermedad_mental: coef = -0.1155, p = 0.0000
4. problemas_auditivos: coef = -0.1129, p = 0.0000
5. edad: coef = -0.1120, p = 0.0000
6. a_educacion: coef = 0.0729, p = 0.0000
7. autopercepcion_salud: coef = 0.0659, p = 0.0000
8. color_piel: coef = -0.0646, p = 0.0000
9. comido_menos: coef = 0.0614, p = 0.0000
10. usa_gafas: coef = 0.0609, p = 0.0000

Top predictors for memoria_subjetiva:
1. problemas_auditivos: coef = 0.2872,

## 8. Conclusion

This analysis has performed multiple regression modeling on the SABE dataset without converting variables to dummy variables. The main findings include:

1. **Data Preparation**:
   - Identified and standardized numeric variables (nunique ≥ 5)
   - Preserved ordinal (nunique ≤ 4) and binary variables in their original form
   - Checked for multicollinearity and selected predictors with acceptable VIF values

2. **Model Comparison**:
   - Minimental: R² = [value], Adjusted R² = [value]
   - Coherencia: R² = [value], Adjusted R² = [value]
   - Memoria Subjetiva: R² = [value], Adjusted R² = [value]

3. **Key Predictors**:
   - For Minimental: [list top predictors]
   - For Coherencia: [list top predictors]
   - For Memoria Subjetiva: [list top predictors]
   - Common predictors across models: [list common predictors]

4. **Implications**:
   - The findings suggest that [interpretation based on results]
   - Coherence between objective and subjective memory is influenced by [interpretation]
   - Different factors predict objective performance versus subjective assessment

This non-dummy approach provides a more interpretable model where coefficients directly reflect the relationship between the original variables and the outcomes of interest.

## 9. Next Steps

Potential next steps for further analysis:

1. **Model Refinement**: Create more parsimonious models using only significant predictors
2. **Compare Approaches**: Compare these results with those from models using dummy variables
3. **Interaction Effects**: Test for interactions between key predictors
4. **Cross-Validation**: Implement k-fold cross-validation to assess model stability
5. **Subgroup Analysis**: Run models separately for different demographic groups (e.g., age, education)