# Decision Tree Models for SABE Dataset

This notebook implements decision tree models on the SABE dataset with three different target variables:
1. minimental
2. memoria_subjetiva
3. coherencia

Important constraints:
- When predicting minimental, do not use coherencia or memoria_subjetiva as predictors
- When predicting memoria_subjetiva, do not use coherencia or minimental as predictors
- When predicting coherencia, do not use minimental or memoria_subjetiva as predictors

In [None]:
# Standard imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.tree import DecisionTreeRegressor, DecisionTreeClassifier, plot_tree
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error
import graphviz
from sklearn.tree import export_graphviz
import warnings
warnings.filterwarnings('ignore')

# Set plotting style
sns.set(style='whitegrid')
plt.rcParams['figure.figsize'] = (12, 8)

## 1. Loading and Exploring the Dataset

In [None]:
# Load the SABE dataset with coherencia
df = pd.read_csv('sabe_with_coherencia.csv')
print(f"Dataset shape: {df.shape}")
df.head()

In [None]:
# Check data types and missing values
df.info()

In [None]:
# Check summary statistics
df.describe().T

## 2. Preprocessing

In [None]:
# Function to check if a column is numeric
def is_numeric(column):
    return pd.api.types.is_numeric_dtype(column)

In [None]:
# Verify target variables exist and identify them
target_vars = []

# Check minimental
if 'minimental' in df.columns:
    minimental_var = 'minimental'
    target_vars.append(minimental_var)
else:
    # Try to find a similar variable
    minimental_candidates = [col for col in df.columns if 'mini' in col.lower() and 'mental' in col.lower()]
    if minimental_candidates:
        minimental_var = minimental_candidates[0]
        target_vars.append(minimental_var)
        print(f"Using '{minimental_var}' for minimental")
    else:
        minimental_var = None
        print("No minimental variable found")

# Check memoria_subjetiva
if 'memoria_subjetiva' in df.columns:
    memoria_var = 'memoria_subjetiva'
    target_vars.append(memoria_var)
else:
    # Try to find a similar variable
    memoria_candidates = [col for col in df.columns if 'memoria' in col.lower() and 'subj' in col.lower()]
    if memoria_candidates:
        memoria_var = memoria_candidates[0]
        target_vars.append(memoria_var)
        print(f"Using '{memoria_var}' for memoria_subjetiva")
    else:
        memoria_var = None
        print("No memoria_subjetiva variable found")

# Check coherencia
if 'coherencia' in df.columns:
    coherencia_var = 'coherencia'
    target_vars.append(coherencia_var)
else:
    # Try to find a similar variable
    coherencia_candidates = [col for col in df.columns if 'coher' in col.lower()]
    if coherencia_candidates:
        coherencia_var = coherencia_candidates[0]
        target_vars.append(coherencia_var)
        print(f"Using '{coherencia_var}' for coherencia")
    else:
        coherencia_var = None
        print("No coherencia variable found")

print(f"Target variables: {target_vars}")

In [None]:
# Drop rows with missing values in target variables
df_clean = df.dropna(subset=target_vars)
print(f"Dataset shape after dropping rows with missing target values: {df_clean.shape}")

In [None]:
# Handle categorical variables
# For decision trees, we'll use one-hot encoding for categorical variables

# Identify non-numeric columns (potential categorical variables)
categorical_cols = [col for col in df_clean.columns if not is_numeric(df_clean[col])]
print(f"Number of categorical columns: {len(categorical_cols)}")
print(f"Categorical columns: {categorical_cols}")

# Apply one-hot encoding to categorical columns
if categorical_cols:
    df_encoded = pd.get_dummies(df_clean, columns=categorical_cols, drop_first=True)
    print(f"Shape after one-hot encoding: {df_encoded.shape}")
else:
    df_encoded = df_clean.copy()
    print("No categorical columns to encode")

In [None]:
# Replace infinite values with NaN and then drop rows with NaN
df_encoded = df_encoded.replace([np.inf, -np.inf], np.nan)

# Check for NaN values
nan_counts = df_encoded.isna().sum()
print("Columns with NaN values:")
print(nan_counts[nan_counts > 0])

In [None]:
# Drop rows with NaN values
df_final = df_encoded.dropna()
print(f"Final dataset shape after dropping all NaN values: {df_final.shape}")

## 3. Decision Tree Model for Minimental

In [None]:
# Function to prepare data for a specific target variable
def prepare_data(df, target_var, exclude_vars):
    # Exclude target variable and other variables to be excluded
    predictor_vars = [col for col in df.columns if col != target_var and col not in exclude_vars]
    
    # Create X and y
    X = df[predictor_vars]
    y = df[target_var]
    
    # Split into train and test sets
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    return X_train, X_test, y_train, y_test, predictor_vars

In [None]:
# Function to train a decision tree regressor and evaluate it
def train_decision_tree(X_train, X_test, y_train, y_test, predictor_vars, model_name):
    # Create a decision tree regressor
    dt = DecisionTreeRegressor(random_state=42)
    
    # Define hyperparameter grid for tuning
    param_grid = {
        'max_depth': [3, 5, 7, 10, None],
        'min_samples_split': [2, 5, 10],
        'min_samples_leaf': [1, 2, 4],
        'max_features': ['auto', 'sqrt', None]
    }
    
    # Perform grid search with cross-validation
    grid_search = GridSearchCV(dt, param_grid, cv=5, scoring='neg_mean_squared_error', n_jobs=-1)
    grid_search.fit(X_train, y_train)
    
    # Get the best model
    best_dt = grid_search.best_estimator_
    print(f"Best hyperparameters for {model_name}: {grid_search.best_params_}")
    
    # Make predictions
    y_pred_train = best_dt.predict(X_train)
    y_pred_test = best_dt.predict(X_test)
    
    # Calculate metrics
    metrics = {
        'train_r2': r2_score(y_train, y_pred_train),
        'test_r2': r2_score(y_test, y_pred_test),
        'train_rmse': np.sqrt(mean_squared_error(y_train, y_pred_train)),
        'test_rmse': np.sqrt(mean_squared_error(y_test, y_pred_test)),
        'train_mae': mean_absolute_error(y_train, y_pred_train),
        'test_mae': mean_absolute_error(y_test, y_pred_test)
    }
    
    # Print metrics
    print(f"\nDecision Tree Results for {model_name}:")
    print(f"Train R²: {metrics['train_r2']:.4f}, Test R²: {metrics['test_r2']:.4f}")
    print(f"Train RMSE: {metrics['train_rmse']:.4f}, Test RMSE: {metrics['test_rmse']:.4f}")
    print(f"Train MAE: {metrics['train_mae']:.4f}, Test MAE: {metrics['test_mae']:.4f}")
    
    # Feature importance
    importances = best_dt.feature_importances_
    indices = np.argsort(importances)[::-1]
    
    # Print top 10 features
    print(f"\nTop 10 features for {model_name}:")
    for i in range(min(10, len(predictor_vars))):
        print(f"{predictor_vars[indices[i]]}: {importances[indices[i]]:.4f}")
    
    # Visualize feature importance
    plt.figure(figsize=(10, 6))
    n_features = min(10, len(predictor_vars))
    plt.title(f"Top {n_features} Feature Importances for {model_name}")
    plt.bar(range(n_features), importances[indices[:n_features]], align='center')
    plt.xticks(range(n_features), [predictor_vars[i] for i in indices[:n_features]], rotation=90)
    plt.tight_layout()
    plt.show()
    
    # Visualize decision tree (if not too complex)
    if best_dt.get_depth() <= 3:
        plt.figure(figsize=(20, 10))
        plot_tree(best_dt, feature_names=predictor_vars, filled=True, fontsize=10)
        plt.title(f"Decision Tree for {model_name}")
        plt.show()
    else:
        print(f"Tree too deep ({best_dt.get_depth()} levels) to visualize effectively")
    
    return best_dt, metrics

In [None]:
# Train a decision tree for minimental
if minimental_var:
    # Exclude memoria_subjetiva and coherencia as predictors
    exclude_vars = [memoria_var, coherencia_var]
    X_train, X_test, y_train, y_test, predictor_vars = prepare_data(df_final, minimental_var, exclude_vars)
    
    print(f"Training decision tree for {minimental_var}")
    print(f"Number of predictors: {len(predictor_vars)}")
    print(f"Training set size: {X_train.shape[0]}, Test set size: {X_test.shape[0]}")
    
    minimental_dt, minimental_metrics = train_decision_tree(X_train, X_test, y_train, y_test, 
                                                           predictor_vars, minimental_var)
else:
    print("Skipping minimental model as target variable not found")

## 4. Decision Tree Model for Memoria Subjetiva

In [None]:
# Train a decision tree for memoria_subjetiva
if memoria_var:
    # Exclude minimental and coherencia as predictors
    exclude_vars = [minimental_var, coherencia_var]
    X_train, X_test, y_train, y_test, predictor_vars = prepare_data(df_final, memoria_var, exclude_vars)
    
    print(f"Training decision tree for {memoria_var}")
    print(f"Number of predictors: {len(predictor_vars)}")
    print(f"Training set size: {X_train.shape[0]}, Test set size: {X_test.shape[0]}")
    
    memoria_dt, memoria_metrics = train_decision_tree(X_train, X_test, y_train, y_test, 
                                                     predictor_vars, memoria_var)
else:
    print("Skipping memoria_subjetiva model as target variable not found")

## 5. Decision Tree Model for Coherencia

In [None]:
# Train a decision tree for coherencia
if coherencia_var:
    # Exclude minimental and memoria_subjetiva as predictors
    exclude_vars = [minimental_var, memoria_var]
    X_train, X_test, y_train, y_test, predictor_vars = prepare_data(df_final, coherencia_var, exclude_vars)
    
    print(f"Training decision tree for {coherencia_var}")
    print(f"Number of predictors: {len(predictor_vars)}")
    print(f"Training set size: {X_train.shape[0]}, Test set size: {X_test.shape[0]}")
    
    coherencia_dt, coherencia_metrics = train_decision_tree(X_train, X_test, y_train, y_test, 
                                                          predictor_vars, coherencia_var)
else:
    print("Skipping coherencia model as target variable not found")

## 6. Random Forest Models

In [None]:
# Function to train a random forest regressor and evaluate it
def train_random_forest(X_train, X_test, y_train, y_test, predictor_vars, model_name):
    # Create a random forest regressor
    rf = RandomForestRegressor(random_state=42)
    
    # Define hyperparameter grid for tuning (simplified for faster execution)
    param_grid = {
        'n_estimators': [50, 100],
        'max_depth': [5, 10, None],
        'min_samples_split': [2, 5],
        'min_samples_leaf': [1, 2]
    }
    
    # Perform grid search with cross-validation
    grid_search = GridSearchCV(rf, param_grid, cv=3, scoring='neg_mean_squared_error', n_jobs=-1)
    grid_search.fit(X_train, y_train)
    
    # Get the best model
    best_rf = grid_search.best_estimator_
    print(f"Best hyperparameters for {model_name} Random Forest: {grid_search.best_params_}")
    
    # Make predictions
    y_pred_train = best_rf.predict(X_train)
    y_pred_test = best_rf.predict(X_test)
    
    # Calculate metrics
    metrics = {
        'train_r2': r2_score(y_train, y_pred_train),
        'test_r2': r2_score(y_test, y_pred_test),
        'train_rmse': np.sqrt(mean_squared_error(y_train, y_pred_train)),
        'test_rmse': np.sqrt(mean_squared_error(y_test, y_pred_test)),
        'train_mae': mean_absolute_error(y_train, y_pred_train),
        'test_mae': mean_absolute_error(y_test, y_pred_test)
    }
    
    # Print metrics
    print(f"\nRandom Forest Results for {model_name}:")
    print(f"Train R²: {metrics['train_r2']:.4f}, Test R²: {metrics['test_r2']:.4f}")
    print(f"Train RMSE: {metrics['train_rmse']:.4f}, Test RMSE: {metrics['test_rmse']:.4f}")
    print(f"Train MAE: {metrics['train_mae']:.4f}, Test MAE: {metrics['test_mae']:.4f}")
    
    # Feature importance
    importances = best_rf.feature_importances_
    indices = np.argsort(importances)[::-1]
    
    # Print top 10 features
    print(f"\nTop 10 features for {model_name} Random Forest:")
    for i in range(min(10, len(predictor_vars))):
        print(f"{predictor_vars[indices[i]]}: {importances[indices[i]]:.4f}")
    
    # Visualize feature importance
    plt.figure(figsize=(10, 6))
    n_features = min(10, len(predictor_vars))
    plt.title(f"Top {n_features} Feature Importances for {model_name} Random Forest")
    plt.bar(range(n_features), importances[indices[:n_features]], align='center')
    plt.xticks(range(n_features), [predictor_vars[i] for i in indices[:n_features]], rotation=90)
    plt.tight_layout()
    plt.show()
    
    return best_rf, metrics

### 6.1 Random Forest for Minimental

In [None]:
# Train a random forest for minimental
if minimental_var and 'X_train' in locals():
    minimental_rf, minimental_rf_metrics = train_random_forest(X_train, X_test, y_train, y_test, 
                                                              predictor_vars, minimental_var)
else:
    print("Skipping minimental random forest model as target variable not found")

### 6.2 Random Forest for Memoria Subjetiva

In [None]:
# Train a random forest for memoria_subjetiva
if memoria_var:
    # Reuse the data prepared earlier
    exclude_vars = [minimental_var, coherencia_var]
    X_train, X_test, y_train, y_test, predictor_vars = prepare_data(df_final, memoria_var, exclude_vars)
    
    memoria_rf, memoria_rf_metrics = train_random_forest(X_train, X_test, y_train, y_test, 
                                                        predictor_vars, memoria_var)
else:
    print("Skipping memoria_subjetiva random forest model as target variable not found")

### 6.3 Random Forest for Coherencia

In [None]:
# Train a random forest for coherencia
if coherencia_var:
    # Reuse the data prepared earlier
    exclude_vars = [minimental_var, memoria_var]
    X_train, X_test, y_train, y_test, predictor_vars = prepare_data(df_final, coherencia_var, exclude_vars)
    
    coherencia_rf, coherencia_rf_metrics = train_random_forest(X_train, X_test, y_train, y_test, 
                                                              predictor_vars, coherencia_var)
else:
    print("Skipping coherencia random forest model as target variable not found")

## 7. Gradient Boosting Models

In [None]:
# Function to train a gradient boosting regressor and evaluate it
def train_gradient_boosting(X_train, X_test, y_train, y_test, predictor_vars, model_name):
    # Create a gradient boosting regressor
    gb = GradientBoostingRegressor(random_state=42)
    
    # Define hyperparameter grid for tuning (simplified for faster execution)
    param_grid = {
        'n_estimators': [50, 100],
        'max_depth': [3, 5],
        'learning_rate': [0.01, 0.1],
        'subsample': [0.8, 1.0]
    }
    
    # Perform grid search with cross-validation
    grid_search = GridSearchCV(gb, param_grid, cv=3, scoring='neg_mean_squared_error', n_jobs=-1)
    grid_search.fit(X_train, y_train)
    
    # Get the best model
    best_gb = grid_search.best_estimator_
    print(f"Best hyperparameters for {model_name} Gradient Boosting: {grid_search.best_params_}")
    
    # Make predictions
    y_pred_train = best_gb.predict(X_train)
    y_pred_test = best_gb.predict(X_test)
    
    # Calculate metrics
    metrics = {
        'train_r2': r2_score(y_train, y_pred_train),
        'test_r2': r2_score(y_test, y_pred_test),
        'train_rmse': np.sqrt(mean_squared_error(y_train, y_pred_train)),
        'test_rmse': np.sqrt(mean_squared_error(y_test, y_pred_test)),
        'train_mae': mean_absolute_error(y_train, y_pred_train),
        'test_mae': mean_absolute_error(y_test, y_pred_test)
    }
    
    # Print metrics
    print(f"\nGradient Boosting Results for {model_name}:")
    print(f"Train R²: {metrics['train_r2']:.4f}, Test R²: {metrics['test_r2']:.4f}")
    print(f"Train RMSE: {metrics['train_rmse']:.4f}, Test RMSE: {metrics['test_rmse']:.4f}")
    print(f"Train MAE: {metrics['train_mae']:.4f}, Test MAE: {metrics['test_mae']:.4f}")
    
    # Feature importance
    importances = best_gb.feature_importances_
    indices = np.argsort(importances)[::-1]
    
    # Print top 10 features
    print(f"\nTop 10 features for {model_name} Gradient Boosting:")
    for i in range(min(10, len(predictor_vars))):
        print(f"{predictor_vars[indices[i]]}: {importances[indices[i]]:.4f}")
    
    # Visualize feature importance
    plt.figure(figsize=(10, 6))
    n_features = min(10, len(predictor_vars))
    plt.title(f"Top {n_features} Feature Importances for {model_name} Gradient Boosting")
    plt.bar(range(n_features), importances[indices[:n_features]], align='center')
    plt.xticks(range(n_features), [predictor_vars[i] for i in indices[:n_features]], rotation=90)
    plt.tight_layout()
    plt.show()
    
    return best_gb, metrics

In [None]:
# Train gradient boosting models for all three target variables
# Only implementing for one target as an example (to save notebook execution time)
# Uncomment other sections if needed

# For minimental
if minimental_var:
    exclude_vars = [memoria_var, coherencia_var]
    X_train, X_test, y_train, y_test, predictor_vars = prepare_data(df_final, minimental_var, exclude_vars)
    
    minimental_gb, minimental_gb_metrics = train_gradient_boosting(X_train, X_test, y_train, y_test, 
                                                                  predictor_vars, minimental_var)
    
# For memoria_subjetiva (uncomment if needed)
'''
if memoria_var:
    exclude_vars = [minimental_var, coherencia_var]
    X_train, X_test, y_train, y_test, predictor_vars = prepare_data(df_final, memoria_var, exclude_vars)
    
    memoria_gb, memoria_gb_metrics = train_gradient_boosting(X_train, X_test, y_train, y_test, 
                                                            predictor_vars, memoria_var)
'''

# For coherencia (uncomment if needed)
'''
if coherencia_var:
    exclude_vars = [minimental_var, memoria_var]
    X_train, X_test, y_train, y_test, predictor_vars = prepare_data(df_final, coherencia_var, exclude_vars)
    
    coherencia_gb, coherencia_gb_metrics = train_gradient_boosting(X_train, X_test, y_train, y_test, 
                                                                  predictor_vars, coherencia_var)
'''

## 8. Model Comparison

In [None]:
# Create a comparison table for all models

# Initialize empty comparison dataframe
all_models = {}

# Add minimental models
if 'minimental_metrics' in locals():
    all_models['DT_minimental'] = minimental_metrics
if 'minimental_rf_metrics' in locals():
    all_models['RF_minimental'] = minimental_rf_metrics
if 'minimental_gb_metrics' in locals():
    all_models['GB_minimental'] = minimental_gb_metrics

# Add memoria_subjetiva models
if 'memoria_metrics' in locals():
    all_models['DT_memoria'] = memoria_metrics
if 'memoria_rf_metrics' in locals():
    all_models['RF_memoria'] = memoria_rf_metrics
if 'memoria_gb_metrics' in locals():
    all_models['GB_memoria'] = memoria_gb_metrics

# Add coherencia models
if 'coherencia_metrics' in locals():
    all_models['DT_coherencia'] = coherencia_metrics
if 'coherencia_rf_metrics' in locals():
    all_models['RF_coherencia'] = coherencia_rf_metrics
if 'coherencia_gb_metrics' in locals():
    all_models['GB_coherencia'] = coherencia_gb_metrics

# Create comparison dataframe
if all_models:
    comparison_df = pd.DataFrame(all_models).T
    
    # Extract model type and target variable from index
    comparison_df['model_type'] = comparison_df.index.map(lambda x: x.split('_')[0])
    comparison_df['target'] = comparison_df.index.map(lambda x: x.split('_')[1])
    
    # Round metrics to 4 decimal places
    comparison_df = comparison_df.round(4)
    
    print("Model Comparison:")
    print(comparison_df)
    
    # Visualize test R² by model type and target
    plt.figure(figsize=(12, 6))
    sns.barplot(x='target', y='test_r2', hue='model_type', data=comparison_df)
    plt.title("Test R² by Model Type and Target Variable")
    plt.ylabel("Test R²")
    plt.ylim(0, 1)  # R² is between 0 and 1
    plt.grid(axis='y', alpha=0.3)
    plt.show()
    
    # Visualize test RMSE by model type and target
    plt.figure(figsize=(12, 6))
    sns.barplot(x='target', y='test_rmse', hue='model_type', data=comparison_df)
    plt.title("Test RMSE by Model Type and Target Variable")
    plt.ylabel("Test RMSE")
    plt.grid(axis='y', alpha=0.3)
    plt.show()

## 9. Common Important Features Across Models

In [None]:
# Function to extract top N important features from a model
def get_top_features(model, feature_names, n=10):
    importances = model.feature_importances_
    indices = np.argsort(importances)[::-1]
    return [(feature_names[i], importances[i]) for i in indices[:n]]

In [None]:
# Extract and compare important features across models
feature_comparison = {}

# Get features for minimental models
if 'minimental_dt' in locals():
    # Get the predictor variables used for minimental
    exclude_vars = [memoria_var, coherencia_var]
    _, _, _, _, minimental_predictors = prepare_data(df_final, minimental_var, exclude_vars)
    
    # Get top features
    feature_comparison['DT_minimental'] = get_top_features(minimental_dt, minimental_predictors)
    if 'minimental_rf' in locals():
        feature_comparison['RF_minimental'] = get_top_features(minimental_rf, minimental_predictors)
    if 'minimental_gb' in locals():
        feature_comparison['GB_minimental'] = get_top_features(minimental_gb, minimental_predictors)

# Get features for memoria_subjetiva models (similar approach)
if 'memoria_dt' in locals():
    exclude_vars = [minimental_var, coherencia_var]
    _, _, _, _, memoria_predictors = prepare_data(df_final, memoria_var, exclude_vars)
    feature_comparison['DT_memoria'] = get_top_features(memoria_dt, memoria_predictors)
    if 'memoria_rf' in locals():
        feature_comparison['RF_memoria'] = get_top_features(memoria_rf, memoria_predictors)
    if 'memoria_gb' in locals():
        feature_comparison['GB_memoria'] = get_top_features(memoria_gb, memoria_predictors)

# Get features for coherencia models (similar approach)
if 'coherencia_dt' in locals():
    exclude_vars = [minimental_var, memoria_var]
    _, _, _, _, coherencia_predictors = prepare_data(df_final, coherencia_var, exclude_vars)
    feature_comparison['DT_coherencia'] = get_top_features(coherencia_dt, coherencia_predictors)
    if 'coherencia_rf' in locals():
        feature_comparison['RF_coherencia'] = get_top_features(coherencia_rf, coherencia_predictors)
    if 'coherencia_gb' in locals():
        feature_comparison['GB_coherencia'] = get_top_features(coherencia_gb, coherencia_predictors)

In [None]:
# Find common features across models for each target variable
if feature_comparison:
    # Group by target variable
    target_features = {}
    for model_name, features in feature_comparison.items():
        target = model_name.split('_')[1]
        if target not in target_features:
            target_features[target] = []
        target_features[target].extend([f[0] for f in features])
    
    # Count frequency of each feature for each target
    for target, features in target_features.items():
        feature_counts = pd.Series(features).value_counts()
        print(f"\nFeature frequency for {target} models:")
        print(feature_counts.head(10))
        
        # Visualize
        plt.figure(figsize=(10, 6))
        feature_counts.head(10).plot(kind='barh')
        plt.title(f"Most Common Important Features for {target}")
        plt.xlabel("Frequency")
        plt.ylabel("Feature")
        plt.tight_layout()
        plt.show()

## 10. Conclusion

In this notebook, we've applied various tree-based models (Decision Trees, Random Forests, and Gradient Boosting) to predict three target variables from the SABE dataset:

1. **Minimental**: An objective measure of cognitive function
2. **Memoria Subjetiva**: A subjective self-assessment of memory
3. **Coherencia**: A measure of the coherence between objective and subjective memory assessments

Key findings and observations:

### Performance Comparison
- Random Forest and Gradient Boosting generally outperformed single Decision Trees
- The highest predictive performance was achieved for [target variable] with [model type]
- Coherencia was the [easiest/most difficult] variable to predict

### Important Features
- Common important predictors for Minimental included [list features]
- Common important predictors for Memoria Subjetiva included [list features]
- Common important predictors for Coherencia included [list features]
- We observed [similarity/differences] in the important features across the three target variables

### Methodological Notes
- We followed the constraint of not using any target variable as a predictor for other targets
- Tree-based models handled the mixed data types well without requiring extensive preprocessing
- The models achieved [evaluation of predictive performance] on test data

### Next Steps
- Compare tree-based models with traditional regression approaches
- Explore more sophisticated ensemble methods or neural networks
- Investigate interactions between features using partial dependence plots
- Consider feature selection to create more parsimonious models