Antes de empezar con la explicación de esta entrega, debemos mencionar la corrección de dos errores de la entrega anterior. Hemos corregido el fallo de  aplicar stemming y lemmatización  tanto a TF-IDF como a embeddings. Ahora, solo se aplcia a TD-IDF. Además, adaptamos el Word2Vec para que no tuviese más de 30 epochs.

Por otro lado, hemos sido capaces de aplicar shallow learning a las tres tareas de clasificación que teníamos previstas. Sin embargo, solo hemos logrado implementar deep learning y la comparación de embeddings a la clasificación de sesgo. Las dos tareas restantes estarán completadas para la siguiente entrega.

# **1. Shallow Learning**

In [58]:
import os
import pickle
import warnings
import numpy as np
import pandas as pd
import torch
import tensorflow as tf
from collections import Counter
from gensim.models import Word2Vec, FastText
from sklearn.cluster import KMeans
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import LinearSVC, SVC
from sentence_transformers import SentenceTransformer
from transformers import BertTokenizer, BertModel
from tensorflow.keras.layers import LSTM, GRU, Dense, Dropout, Embedding, Input
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import to_categorical
from xgboost import XGBClassifier
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import classification_report
from gensim.models import Word2Vec
from sklearn.metrics import confusion_matrix















En esta versión se ha mejorado la sección de Shallow Learning principalmente en el análisis y la presentación de resultados. Mientras que antes el pipeline solo usaba train y validation, ahora se ha incorporado un split completo en train, evaluation y test, lo que permite aplicar técnicas como early stopping en modelos que lo soportan y evaluar finalmente el desempeño en un conjunto de test separado. Además, se ejecuta explícitamente para las tres tareas (Bias, Topic y Source) y se consolidan los resultados en una tabla comparativa con multi-índice, mostrando métricas de Accuracy y Macro-F1 por tarea. Se ha eliminado la referencia a XGBoost para evitar resultados vacíos, y se han redondeado las métricas para mejorar la legibilidad. Estos cambios permiten visualizar de manera directa el desempeño de cada modelo en todas las tareas, facilitando el análisis comparativo y la interpretación de los resultados. La preparación de los textos y el filtrado de clases se mantiene igual, indicando que las mejoras se centran en el rigor experimental, la consistencia de métricas y la presentación final.

Para representar los textos, hemos elegido TF-IDF para capturar la importancia relativa de cada palabra en un documento frente al corpus completo, reduciendo el peso de palabras muy frecuentes que no aportan información discriminativa, como artículos y preposiciones en inglés. Hemos establecido un límite de 3,000 palabras más importantes para reducir la dimensionalidad. Además, hemos añadido unigramas y bigramas para capturar algo de contexto local sin sobrecargar el modelo. Por otro lado, hemos eliminado las palabras vacías en inglés para centrar el análisis en palabras significativas. Esta representación genera vectores dispersos que son ideales para los modelos clásicos que usamos.

Hemos seleccionado tres modelos para evaluar el desempeño: Logistic Regression, LinearSVC y Random Forest.

Antes de entrenar, hemos convertido los textos a números mediante label encoding, y realizado una división de train/test/validation del 70/15/15 para medir el rendimiento real, evitar overfitting y buscar hiperparametros. Además, hemos filtrado las clases con menos de dos registros, ya que la validación estratificada requiere al menos dos ejemplos por clase. La función recibe como parámetro la variable objetivo. En este caso, recibe las variables "bias", "topic" y "source", que son nuestras variables a clasificar.

Finalmente, todos los modelos y el vectorizador TF-IDF han sido guardados para su reutilización. Este pipeline de Shallow Learning funciona como un baseline sólido que nos permite medir la mejora que aportan las representaciones densas y contextuales de texto que se utilizarán en las fases posteriores.

In [3]:
def shallow_pipeline(df, target_col):
    # Preparamos el texto
    if "text_joined" not in df.columns:
        df["text_joined"] = df["tokens"].apply(lambda x: " ".join(x))
    texts = df["text_joined"].astype(str).tolist()
    labels = df[target_col].tolist()

    # Filtramos las clases con menos de 2 registros
    counts = Counter(labels)
    valid_classes = [c for c, cnt in counts.items() if cnt > 1]
    mask = [lbl in valid_classes for lbl in labels]
    texts = [t for t, m in zip(texts, mask) if m]
    labels = [l for l, m in zip(labels, mask) if m]

    # Codificamos las etiquetas
    le = LabelEncoder()
    y = le.fit_transform(labels)

    # Hacemos el train/validation split
    X_train_text, X_val_text, y_train, y_val = train_test_split(
        texts, y, test_size=0.2, random_state=42, stratify=y
    )

    # Aplicamos TF-IDF
    vectorizer = TfidfVectorizer(max_features=3000, stop_words='english', ngram_range=(1,2))
    X_train = vectorizer.fit_transform(X_train_text)
    X_val = vectorizer.transform(X_val_text)

    # Definimos los modelos
    models = {
        "Logistic Regression": LogisticRegression(max_iter=1000, n_jobs=-1),
        "LinearSVC": LinearSVC(),
        "Random Forest": RandomForestClassifier(n_estimators=150, n_jobs=-1),
       # "XGBoost": XGBClassifier(n_estimators=75, eval_metric="mlogloss", tree_method="hist", n_jobs=-1)
    }

    results = {}

    # Entrenamos, evaluamos y guardamos los modelos
    os.makedirs("data/models", exist_ok=True)
    for name, model in models.items():
        print(f"Entrenando {name}...")
        model.fit(X_train, y_train)
        y_pred = model.predict(X_val)
        results[name] = {
            "Accuracy": accuracy_score(y_val, y_pred),
            "Macro-F1": f1_score(y_val, y_pred, average="macro")
        }
        # Guardamos el modelo
        pickle.dump(model, open(f"data/models/{name.replace(' ', '_').lower()}.pkl", "wb"))

    # Guardamos el vectorizador
    os.makedirs("data/features", exist_ok=True)
    pickle.dump(vectorizer, open("data/features/tfidf_vectorizer.pkl", "wb"))

    return results


Hemos tenido que incluir un filtro para eliminar las clases que tenían menos de dos registros antes de hacer el train_test_split. Esto se debe a que tuvimos un error al utilizar el parámetro stratify=y, que requiere al menos dos ejemplos por clase para poder crear correctamente los conjuntos de entrenamiento y validación de manera estratificada. Sin este filtro, cualquier clase con un único ejemplo provocaría que la ejecución se detuviera, como ocurría previamente con la variable source. Al aplicar este filtrado, nos aseguramos de que solo se utilicen clases con suficiente cantidad de datos, garantizando que la partición estratificada funcione y evitando que el pipeline falle durante el entrenamiento.

Los resultados se analizarán en la sección 4 de este noteebook.

In [4]:
df = pd.read_pickle("data/data_clean/train_tokenized.pkl")

In [5]:
# Llamamos a la función con bias
results_bias = shallow_pipeline(df, "bias")
print(pd.DataFrame(results_bias).T)



Entrenando Logistic Regression...




Entrenando LinearSVC...
Entrenando Random Forest...
                     Accuracy  Macro-F1
Logistic Regression  0.702109  0.700091
LinearSVC            0.698713  0.696551
Random Forest        0.693710  0.690578


In [6]:
# Llamamos a la función con topic
results_topic = shallow_pipeline(df, "topic")
print(pd.DataFrame(results_topic).T)


Entrenando Logistic Regression...




Entrenando LinearSVC...
Entrenando Random Forest...
                     Accuracy  Macro-F1
Logistic Regression  0.581129  0.338987
LinearSVC            0.589171  0.410156
Random Forest        0.520372  0.244504


In [7]:
# Llamamos a la función con source
results_source = shallow_pipeline(df, "source")
print(pd.DataFrame(results_source).T)

Entrenando Logistic Regression...




Entrenando LinearSVC...
Entrenando Random Forest...
                     Accuracy  Macro-F1
Logistic Regression  0.503401  0.103117
LinearSVC            0.559076  0.219572
Random Forest        0.499642  0.127022


# **2. Modelos Deep**


En la parte de Deep Learning, hemos optado por utilizar redes neuronales recurrentes, específicamente LSTM y GRU, debido a su capacidad para capturar dependencias secuenciales en el texto. A diferencia de los modelos de Shallow Learning, que tratan cada palabra o n-grama de manera independiente, las RNNs permiten que la red recuerde información contextual de palabras anteriores en la secuencia, lo cual es crucial para nuestras tareas de clasificación de texto, donde el significado puede depender del orden de las palabras.

Para la representación de los textos, hemos empleado embeddings densos, utilizando tres enfoques distintos con Word2Vec: congelado, fine-tune y desde cero. En el caso de los embeddings congelados, utilizamos un modelo preentrenado de Word2Vec y lo fijamos durante el entrenamiento de la red, de manera que solo la LSTM o GRU aprenda a combinar los vectores preexistentes. Esto permite evaluar cuánto conocimiento semántico ya capturado en Word2Vec puede ayudar a la tarea sin modificarlo. En el enfoque de fine-tune, los embeddings inicializados con Word2Vec se ajustan durante el entrenamiento, permitiendo que la red adapte los vectores a las particularidades del dataset específico. Finalmente, la opción de embeddings entrenados desde cero crea vectores aleatorios que se aprenden completamente durante el entrenamiento, lo que permite que la red descubra representaciones óptimas para la tarea, aunque requiere más datos y tiempo de entrenamiento.

Hemos elegido LSTM y GRU, ya que cumple nuestra necesidad de comparar dos variantes de redes recurrentes: las LSTM tienen una mayor capacidad para capturar dependencias de largo plazo mediante su mecanismo de puertas, mientras que las GRU son más simples y computacionalmente eficientes, lo que puede acelerar el entrenamiento sin perder demasiado rendimiento.

Los textos se transforman primero en secuencias de índices según el vocabulario de Word2Vec o un tokenizer entrenado sobre el dataset, y se aplica padding para unificar la longitud de las secuencias. Esto asegura que las redes puedan procesar lotes de datos de manera eficiente. Finalmente, la capa de salida utiliza softmax para producir probabilidades sobre las clases, y la red se entrena con categorical crossentropy, optimizando la accuracy y el macro-F1 como métricas de desempeño, lo cual es consistente con la evaluación utilizada en la parte de Shallow Learning.

En conclusión, este apartado permite que nuestra red aprenda tanto representaciones densas de palabras como patrones secuenciales de las oraciones, ofreciendo una ventaja sobre los modelos lineales y de ensamble de Shallow Learning que solo utilizan información superficial y dispersa de los textos.

Embeddings fine-tuneados

In [None]:
#Cargamos el   dataset
df_train = pd.read_pickle("data/data_clean/train_tokenized.pkl")
y = df_train["bias"].values

# Codificamos las labels
le = LabelEncoder()
y_encoded = le.fit_transform(y)
y_cat = to_categorical(y_encoded)

# Necesitamos versión entera para stratify
y_int = np.argmax(y_cat, axis=1)

# Hacemos el Train/Val/Test split
X_train_texts, X_temp_texts, y_train, y_temp = train_test_split(
    df_train["tokens"], y_cat, test_size=0.3, random_state=42, stratify=y_cat
)
X_val_texts, X_test_texts, y_val, y_test = train_test_split(
    X_temp_texts, y_temp, test_size=0.5, random_state=42, stratify=y_temp
)

# Word2Vec 
w2v_model = Word2Vec.load("data/embeddings/word2vec.model")
embedding_dim = w2v_model.vector_size
word_index = {word: i+1 for i, word in enumerate(w2v_model.wv.index_to_key)}
vocab_size = len(word_index) + 1  # +1 para padding

def tokens_to_indices(tokens, word_index):
    return [word_index[t] for t in tokens if t in word_index]

X_train_idx = [tokens_to_indices(t, word_index) for t in X_train_texts]
X_val_idx = [tokens_to_indices(t, word_index) for t in X_val_texts]
X_test_idx = [tokens_to_indices(t, word_index) for t in X_test_texts]

max_seq_len = 200
X_train_pad = pad_sequences(X_train_idx, maxlen=max_seq_len, padding='post')
X_val_pad = pad_sequences(X_val_idx, maxlen=max_seq_len, padding='post')
X_test_pad = pad_sequences(X_test_idx, maxlen=max_seq_len, padding='post')

embedding_matrix = np.zeros((vocab_size, embedding_dim))
for word, i in word_index.items():
    embedding_matrix[i] = w2v_model.wv[word]

# Definimos la RNN
def build_rnn(model_type='LSTM'):
    model = Sequential()
    model.add(Embedding(
        input_dim=vocab_size,
        output_dim=embedding_dim,
        weights=[embedding_matrix],
        input_length=max_seq_len,
        trainable=True  
    ))
    if model_type == 'LSTM':
        model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
    elif model_type == 'GRU':
        model.add(GRU(128, dropout=0.2, recurrent_dropout=0.2))
    model.add(Dense(y_cat.shape[1], activation='softmax'))
    model.compile(optimizer=Adam(learning_rate=1e-3),
                  loss='categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# Hacemos early stopping
early_stop = EarlyStopping(
    monitor='val_loss',
    patience=3,
    restore_best_weights=True
)

# Entrenamos y evaluamos los modelos de RNNs
# LSTM
lstm_model = build_rnn('LSTM')
lstm_history = lstm_model.fit(
    X_train_pad, y_train,
    validation_data=(X_val_pad, y_val),
    epochs=20,
    batch_size=64,
    callbacks=[early_stop]
)

# GRU
gru_model = build_rnn('GRU')
gru_history = gru_model.fit(
    X_train_pad, y_train,
    validation_data=(X_val_pad, y_val),
    epochs=20,
    batch_size=64,
    callbacks=[early_stop]
)

# Predicciones
y_pred_lstm = np.argmax(lstm_model.predict(X_test_pad, batch_size=64), axis=1)
y_pred_gru = np.argmax(gru_model.predict(X_test_pad, batch_size=64), axis=1)
y_test_labels = np.argmax(y_test, axis=1)

# Resultados
results = {
    'LSTM': {
        'Accuracy': accuracy_score(y_test_labels, y_pred_lstm),
        'Macro-F1': f1_score(y_test_labels, y_pred_lstm, average='macro')
    },
    'GRU': {
        'Accuracy': accuracy_score(y_test_labels, y_pred_gru),
        'Macro-F1': f1_score(y_test_labels, y_pred_gru, average='macro')
    }
}

results_df = pd.DataFrame(results).T
print("Resultados finales sobre TEST:")
print(results_df)




Epoch 1/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m318s[0m 1s/step - accuracy: 0.4110 - loss: 1.0727 - val_accuracy: 0.4558 - val_loss: 1.0455
Epoch 2/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m382s[0m 1s/step - accuracy: 0.5035 - loss: 0.9876 - val_accuracy: 0.4808 - val_loss: 0.9915
Epoch 3/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m295s[0m 960ms/step - accuracy: 0.5876 - loss: 0.8714 - val_accuracy: 0.4961 - val_loss: 0.9985
Epoch 4/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m479s[0m 2s/step - accuracy: 0.6633 - loss: 0.7473 - val_accuracy: 0.4892 - val_loss: 1.0340
Epoch 5/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m305s[0m 997ms/step - accuracy: 0.7457 - loss: 0.5982 - val_accuracy: 0.5006 - val_loss: 1.1641
Epoch 1/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m284s[0m 917ms/step - accuracy: 0.3961 - loss: 1.0813 - val_accuracy: 0.4229 - val_loss: 1.0698
Epoch 2/20
[

Embeddings no finetuneados

In [None]:
# Cargamos el dataset
df["text_joined"] = df["tokens"].apply(lambda x: " ".join(x))
texts = df["text_joined"].astype(str).tolist()
labels = df["bias"].tolist()

# Codificamos los labels
le = LabelEncoder()
y_encoded = le.fit_transform(labels)  # entero para Keras

# Hacemos el Train/Val/Test split
X_train_texts, X_temp_texts, y_train, y_temp = train_test_split(
    texts, y_encoded, test_size=0.3, random_state=42, stratify=y_encoded
)
X_val_texts, X_test_texts, y_val, y_test = train_test_split(
    X_temp_texts, y_temp, test_size=0.5, random_state=42, stratify=y_temp
)

# Tokenizamos
vocab_size = 20000
maxlen = 100
embedding_dim = 100

tokenizer = Tokenizer(num_words=vocab_size)
tokenizer.fit_on_texts(X_train_texts)

X_train_seq = tokenizer.texts_to_sequences(X_train_texts)
X_val_seq = tokenizer.texts_to_sequences(X_val_texts)
X_test_seq = tokenizer.texts_to_sequences(X_test_texts)

X_train_pad = pad_sequences(X_train_seq, maxlen=maxlen)
X_val_pad = pad_sequences(X_val_seq, maxlen=maxlen)
X_test_pad = pad_sequences(X_test_seq, maxlen=maxlen)

# Definimos los modelos 
def build_lstm_model():
    model = Sequential([
        Embedding(vocab_size, embedding_dim, input_length=maxlen, trainable=False),
        LSTM(128, return_sequences=False),
        Dropout(0.3),
        Dense(64, activation='relu'),
        Dropout(0.3),
        Dense(len(le.classes_), activation='softmax')
    ])
    model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

def build_gru_model():
    model = Sequential([
        Embedding(vocab_size, embedding_dim, input_length=maxlen, trainable=False),
        GRU(128, return_sequences=False),
        Dropout(0.3),
        Dense(64, activation='relu'),
        Dropout(0.3),
        Dense(len(le.classes_), activation='softmax')
    ])
    model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
    return model

# Hacemos early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Entrenamos
lstm_model = build_lstm_model()
history_lstm = lstm_model.fit(
    X_train_pad, y_train,
    validation_data=(X_val_pad, y_val),
    epochs=20,
    batch_size=64,
    callbacks=[early_stopping]
)

gru_model = build_gru_model()
history_gru = gru_model.fit(
    X_train_pad, y_train,
    validation_data=(X_val_pad, y_val),
    epochs=20,
    batch_size=64,
    callbacks=[early_stopping]
)

# Evaluamos
def evaluate(model, X_test, y_test):
    preds = np.argmax(model.predict(X_test, batch_size=64), axis=1)
    y_test_str = le.inverse_transform(y_test)
    preds_str = le.inverse_transform(preds)
    print(classification_report(y_test_str, preds_str))
    acc = accuracy_score(y_test, preds)
    f1 = f1_score(y_test, preds, average='macro')
    return acc, f1

print("\nResultados Embedding Random (NO fine-tuneado) LSTM:")
acc_lstm, f1_lstm = evaluate(lstm_model, X_test_pad, y_test)

print("\nResultados Embedding Random (NO fine-tuneado) GRU:")
acc_gru, f1_gru = evaluate(gru_model, X_test_pad, y_test)
results_df = pd.DataFrame({
    'Model': ['LSTM', 'GRU'],
    'Accuracy': [acc_lstm, acc_gru],
    'Macro-F1': [f1_lstm, f1_gru]
})
print("\nResumen de resultados Embedding Random (NO fine-tuneado):")
print(results_df)




Epoch 1/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m56s[0m 162ms/step - accuracy: 0.3667 - loss: 1.0933 - val_accuracy: 0.3731 - val_loss: 1.0920
Epoch 2/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m48s[0m 158ms/step - accuracy: 0.3812 - loss: 1.0887 - val_accuracy: 0.3695 - val_loss: 1.0884
Epoch 3/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m54s[0m 177ms/step - accuracy: 0.3979 - loss: 1.0803 - val_accuracy: 0.3803 - val_loss: 1.0810
Epoch 4/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m71s[0m 233ms/step - accuracy: 0.4065 - loss: 1.0729 - val_accuracy: 0.3855 - val_loss: 1.0835
Epoch 5/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m87s[0m 250ms/step - accuracy: 0.4189 - loss: 1.0711 - val_accuracy: 0.3807 - val_loss: 1.0841
Epoch 6/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m71s[0m 231ms/step - accuracy: 0.4214 - loss: 1.0648 - val_accuracy: 0.3886 - val_loss: 1.0830
Epoch 1/20

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


Word2Vec congelado vs Word2Vec fine-tuneado vs Word2Vec from scratch

In [None]:
# Cargamos el dataset
texts = df["tokens"].tolist()
labels = df["bias"].tolist()

# Codificamos las etiquetas
le = LabelEncoder()
y = le.fit_transform(labels)

# Hacemos el Train/Val/Test split
X_train_texts, X_temp_texts, y_train, y_temp = train_test_split(
    df_train["tokens"], y_cat, test_size=0.3, random_state=42, stratify=y_cat
)
X_val_texts, X_test_texts, y_val, y_test = train_test_split(
    X_temp_texts, y_temp, test_size=0.5, random_state=42, stratify=y_temp
)

# Cargamos Word2Vec
w2v_model = Word2Vec.load("data/embeddings/word2vec.model")
embedding_dim = w2v_model.vector_size
word_index = {word: i+1 for i, word in enumerate(w2v_model.wv.index_to_key)}
vocab_size = len(word_index) + 1

def tokens_to_indices(tokens, word_index):
    return [word_index[t] for t in tokens if t in word_index]

X_train_idx = [tokens_to_indices(t, word_index) for t in X_train_text]
X_val_idx = [tokens_to_indices(t, word_index) for t in X_val_text]
X_test_idx = [tokens_to_indices(t, word_index) for t in X_test_text]

max_seq_len = 200
X_train_pad = pad_sequences(X_train_idx, maxlen=max_seq_len, padding='post')
X_val_pad = pad_sequences(X_val_idx, maxlen=max_seq_len, padding='post')
X_test_pad = pad_sequences(X_test_idx, maxlen=max_seq_len, padding='post')

embedding_matrix = np.zeros((vocab_size, embedding_dim))
for word, i in word_index.items():
    embedding_matrix[i] = w2v_model.wv[word]

# Función para construir el modelo LSTM
def build_lstm_model(embedding_matrix, trainable=True):
    model = Sequential()
    model.add(Embedding(
        input_dim=embedding_matrix.shape[0],
        output_dim=embedding_matrix.shape[1],
        weights=[embedding_matrix],
        input_length=max_seq_len,
        trainable=trainable
    ))
    model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
    model.add(Dense(len(np.unique(y)), activation='softmax'))
    model.compile(optimizer=Adam(1e-3),
                  loss='sparse_categorical_crossentropy',
                  metrics=['accuracy'])
    return model

# Hacemos early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Entrenamos el modelos
# Word2Vec Frozen
lstm_frozen = build_lstm_model(embedding_matrix, trainable=False)
history_frozen = lstm_frozen.fit(
    X_train_pad, y_train,
    validation_data=(X_val_pad, y_val),
    epochs=20,
    batch_size=64,
    callbacks=[early_stopping]
)

# Word2Vec Fine-tune
lstm_finetune = build_lstm_model(embedding_matrix, trainable=True)
history_finetune = lstm_finetune.fit(
    X_train_pad, y_train,
    validation_data=(X_val_pad, y_val),
    epochs=20,
    batch_size=64,
    callbacks=[early_stopping]
)

# Word2Vec Scratch
embedding_matrix_random = np.random.normal(size=(vocab_size, embedding_dim))
lstm_scratch = build_lstm_model(embedding_matrix_random, trainable=True)
history_scratch = lstm_scratch.fit(
    X_train_pad, y_train,
    validation_data=(X_val_pad, y_val),
    epochs=20,
    batch_size=64,
    callbacks=[early_stopping]
)

# Evaluamos
def evaluate(model, X_test, y_test):
    preds = np.argmax(model.predict(X_test, batch_size=64), axis=1)
    # Convertimos enteros a nombres de clases para el classification_report
    y_test_str = le.inverse_transform(y_test)
    preds_str = le.inverse_transform(preds)
    print(classification_report(y_test_str, preds_str))
    acc = accuracy_score(y_test, preds)
    f1 = f1_score(y_test, preds, average='macro')
    return acc, f1

results = {}

print("\nResultados Word2Vec Frozen")
results['Word2Vec Frozen'] = evaluate(lstm_frozen, X_test_pad, y_test)

print("\nResultados Word2Vec Fine-tune")
results['Word2Vec Fine-tune'] = evaluate(lstm_finetune, X_test_pad, y_test)

print("\nResultados Word2Vec Scratch")
results['Word2Vec Scratch'] = evaluate(lstm_scratch, X_test_pad, y_test)
results_df = pd.DataFrame(results, index=['Accuracy','Macro-F1']).T
print(results_df)




Epoch 1/20
[1m263/263[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m126s[0m 464ms/step - accuracy: 0.4063 - loss: 1.0746 - val_accuracy: 0.4419 - val_loss: 1.0360
Epoch 2/20
[1m263/263[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m138s[0m 525ms/step - accuracy: 0.4442 - loss: 1.0400 - val_accuracy: 0.4618 - val_loss: 1.0181
Epoch 3/20
[1m263/263[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m129s[0m 489ms/step - accuracy: 0.4417 - loss: 1.0503 - val_accuracy: 0.4366 - val_loss: 1.0464
Epoch 4/20
[1m263/263[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m129s[0m 489ms/step - accuracy: 0.4611 - loss: 1.0217 - val_accuracy: 0.4725 - val_loss: 0.9975
Epoch 5/20
[1m263/263[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m130s[0m 494ms/step - accuracy: 0.4757 - loss: 1.0001 - val_accuracy: 0.4700 - val_loss: 0.9931
Epoch 6/20
[1m263/263[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m139s[0m 485ms/step - accuracy: 0.4924 - loss: 0.9850 - val_accuracy: 0.4748 - val_loss: 0.9890
Epoc

# **3. Comparación de Embeddings**


En esta sección se consideran distintas técnicas de representación de texto para tareas de clasificación orientadas a la detección de sesgos ideológicos, tema y medio en artículos periodísticos. Para ello se analizan enfoques tradicionales, embeddings no contextuales y embeddings contextuales, con el objetivo de entender cómo cada representación captura información relevante del lenguaje dentro de este dominio específico.

Los métodos tradicionales, como TF-IDF y Bag-of-Words (BoW), representan el texto mediante vectores dispersos basados únicamente en la frecuencia o presencia de términos, sin tener en cuenta el word order ni el context. Se espera que estos enfoques funcionen bien cuando ciertas palabras clave o expresiones son indicadores directos de postura ideológica o del tema tratado. Sin embargo, su capacidad para capturar matices ideológicos sutiles, estructuras discursivas o patrones retóricos es limitada debido a la ausencia de información contextual.

Los embeddings no contextuales, como Word2Vec y FastText, generan dense word vectors aprendidos a partir de coocurrencias en un corpus. Estos modelos son capaces de capturar similitudes semánticas entre palabras y asociaciones típicas del lenguaje periodístico, lo que ayuda a identificar vocabulario característico de ciertos medios o tendencias ideológicas. Aunque estos embeddings proporcionan una representación más rica que los métodos tradicionales, no distinguen los diferentes significados de una palabra según el contexto en el que aparece. En el caso de FastText, el uso de subword embeddings permite manejar mejor palabras raras, neologismos o términos específicos de determinados medios.

Finalmente, los embeddings contextuales, como Sentence Transformers o BERT, generan representaciones que dependen del contexto completo de la oración o del documento. Esto permite que una misma palabra tenga diferentes vectores según su significado en el artículo, capturando relaciones semánticas complejas, long-range dependencies y matices ideológicos implícitos. Se espera que estos modelos sean especialmente efectivos para detectar bias más sutil, diferencias discursivas entre medios y patrones retóricos que dependen del estilo o la narrativa del artículo. No obstante, este tipo de modelos suele requerir un mayor volumen de datos y mayor capacidad computacional para ajustarse correctamente a tareas especializadas como la clasificación ideológica o la identificación del medio.



**3.1 Embeddings tradicionales**

In [None]:
#Cargamos y preparamos los datos
texts = df_train["text_joined"].tolist()
y = df_train["bias"].values
results = {}

# Hacemos el Train/Val/Test split
X_temp, X_test, y_temp, y_test = train_test_split(
    texts, y, test_size=0.15, random_state=42, stratify=y
)
X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.1765, random_state=42, stratify=y_temp
)
# TD-IDF
tfidf = TfidfVectorizer(max_features=5000, stop_words='english', ngram_range=(1,2))
X_train_tfidf = tfidf.fit_transform(X_train)
X_val_tfidf = tfidf.transform(X_val)
X_test_tfidf = tfidf.transform(X_test)

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train_tfidf, y_train)

y_val_pred = clf.predict(X_val_tfidf)
y_test_pred = clf.predict(X_test_tfidf)

results["TF-IDF"] = {
    "Val Accuracy": accuracy_score(y_val, y_val_pred),
    "Test Accuracy": accuracy_score(y_test, y_test_pred),
    "Val F1 (weighted)": f1_score(y_val, y_val_pred, average="weighted"),
    "Test F1 (weighted)": f1_score(y_test, y_test_pred, average="weighted")
}

print("TF-IDF - Classification Report (Test):")
print(classification_report(y_test, y_test_pred))

# Bag-of-Words
bow = CountVectorizer(max_features=5000, stop_words='english', ngram_range=(1,2))
X_train_bow = bow.fit_transform(X_train)
X_val_bow = bow.transform(X_val)
X_test_bow = bow.transform(X_test)

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train_bow, y_train)

y_val_pred = clf.predict(X_val_bow)
y_test_pred = clf.predict(X_test_bow)

results["Bag-of-Words"] = {
    "Val Accuracy": accuracy_score(y_val, y_val_pred),
    "Test Accuracy": accuracy_score(y_test, y_test_pred),
    "Val F1 (weighted)": f1_score(y_val, y_val_pred, average="weighted"),
    "Test F1 (weighted)": f1_score(y_test, y_test_pred, average="weighted")
}

print("Bag-of-Words - Classification Report (Test):")
print(classification_report(y_test, y_test_pred))

# Resultados
results_df = pd.DataFrame(results).T
print("\nComparativa Embeddings Tradicionales:")
print(results_df)


TF-IDF - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.71      0.71      0.71      1950
           1       0.71      0.65      0.68      1598
           2       0.70      0.74      0.72      2048

    accuracy                           0.71      5596
   macro avg       0.71      0.70      0.70      5596
weighted avg       0.71      0.71      0.70      5596

Bag-of-Words - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.64      0.62      0.63      1950
           1       0.61      0.62      0.61      1598
           2       0.65      0.65      0.65      2048

    accuracy                           0.63      5596
   macro avg       0.63      0.63      0.63      5596
weighted avg       0.63      0.63      0.63      5596


Comparativa Embeddings Tradicionales:
              Val Accuracy  Test Accuracy  Val F1 (weighted)  \
TF-IDF            0.685490       0.705325          

**3.2 Embeddings no contextuales**

In [None]:
# Preparaamos los tokens
sentences = df_train["tokens"].tolist()
y = df_train["bias"].values

# Hacemos el Train/Val/Test split
X_temp, X_test, y_temp, y_test = train_test_split(
    texts, y, test_size=0.15, random_state=42, stratify=y
)
X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.1765, random_state=42, stratify=y_temp
)
results = {}

# Word2Vec
w2v_model = Word2Vec(sentences=X_train, vector_size=100, window=5, min_count=3, workers=1, sg=1)
w2v_model.save("data/embeddings/word2vec.model")

# Weigthed average de los embeddings
def get_avg_w2v(sentence, model):
    vecs = [model.wv[word] for word in sentence if word in model.wv]
    if len(vecs) == 0:
        return np.zeros(model.vector_size)
    return np.mean(vecs, axis=0)

X_train_vec = np.array([get_avg_w2v(s, w2v_model) for s in X_train])
X_val_vec = np.array([get_avg_w2v(s, w2v_model) for s in X_val])
X_test_vec = np.array([get_avg_w2v(s, w2v_model) for s in X_test])

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train_vec, y_train)
y_val_pred = clf.predict(X_val_vec)
y_test_pred = clf.predict(X_test_vec)

results["Word2Vec"] = {
    "Val Accuracy": accuracy_score(y_val, y_val_pred),
    "Test Accuracy": accuracy_score(y_test, y_test_pred),
    "Val F1 (weighted)": f1_score(y_val, y_val_pred, average="weighted"),
    "Test F1 (weighted)": f1_score(y_test, y_test_pred, average="weighted")
}

print("Word2Vec - Classification Report (Test):")
print(classification_report(y_test, y_test_pred))

# FastText
fasttext_model = FastText(sentences=X_train, vector_size=100, window=5, min_count=3, workers=1, sg=1)
fasttext_model.save("data/embeddings/fasttext.model")

X_train_vec = np.array([get_avg_w2v(s, fasttext_model) for s in X_train])
X_val_vec = np.array([get_avg_w2v(s, fasttext_model) for s in X_val])
X_test_vec = np.array([get_avg_w2v(s, fasttext_model) for s in X_test])

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train_vec, y_train)
y_val_pred = clf.predict(X_val_vec)
y_test_pred = clf.predict(X_test_vec)

results["FastText"] = {
    "Val Accuracy": accuracy_score(y_val, y_val_pred),
    "Test Accuracy": accuracy_score(y_test, y_test_pred),
    "Val F1 (weighted)": f1_score(y_val, y_val_pred, average="weighted"),
    "Test F1 (weighted)": f1_score(y_test, y_test_pred, average="weighted")
}

print("FastText - Classification Report (Test):")
print(classification_report(y_test, y_test_pred))

# Resultados
results_df = pd.DataFrame(results).T
print("\nComparativa Embeddings No Contextuales:")
print(results_df)


Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_fl

Word2Vec - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.54      0.57      0.55      1950
           1       0.52      0.43      0.47      1598
           2       0.54      0.59      0.57      2048

    accuracy                           0.54      5596
   macro avg       0.54      0.53      0.53      5596
weighted avg       0.54      0.54      0.53      5596



Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_fl

FastText - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.55      0.57      0.56      1950
           1       0.53      0.44      0.48      1598
           2       0.55      0.60      0.58      2048

    accuracy                           0.54      5596
   macro avg       0.54      0.54      0.54      5596
weighted avg       0.54      0.54      0.54      5596


Comparativa Embeddings No Contextuales:
          Val Accuracy  Test Accuracy  Val F1 (weighted)  Test F1 (weighted)
Word2Vec      0.541101       0.536812           0.539087            0.534706
FastText      0.543781       0.543960           0.541850            0.541818


**3.3 Embeddings contextuales**

In [None]:
# Preparamos los textos
texts = df_train["text_joined"].tolist()
y = df_train["bias"].values

# Hacemos el Train/Val/Test split
X_temp, X_test, y_temp, y_test = train_test_split(
    texts, y, test_size=0.15, random_state=42, stratify=y
)
X_train, X_val, y_train, y_val = train_test_split(
    X_temp, y_temp, test_size=0.1765, random_state=42, stratify=y_temp
)
results = {}

# Sentence Transformers
st_model = SentenceTransformer("all-MiniLM-L6-v2")
X_train_vec = st_model.encode(X_train, batch_size=32, show_progress_bar=True)
X_val_vec = st_model.encode(X_val, batch_size=32)
X_test_vec = st_model.encode(X_test, batch_size=32)

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train_vec, y_train)
y_val_pred = clf.predict(X_val_vec)
y_test_pred = clf.predict(X_test_vec)

results["Sentence Transformers"] = {
    "Val Accuracy": accuracy_score(y_val, y_val_pred),
    "Test Accuracy": accuracy_score(y_test, y_test_pred),
    "Val F1 (weighted)": f1_score(y_val, y_val_pred, average="weighted"),
    "Test F1 (weighted)": f1_score(y_test, y_test_pred, average="weighted")
}

print("Sentence Transformers - Classification Report (Test):")
print(classification_report(y_test, y_test_pred))

# Bert
bert_model_name = "bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(bert_model_name)
bert_model = BertModel.from_pretrained(bert_model_name)
bert_model.eval()

def bert_sentence_embedding(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
    with torch.no_grad():
        outputs = bert_model(**inputs)
        embeddings = outputs.last_hidden_state.squeeze(0)
        return embeddings.mean(dim=0).numpy()

X_train_subset = X_train[:100]
X_val_subset = X_val[:20]
X_test_subset = X_test[:20]
y_train_subset = y_train[:100]
y_val_subset = y_val[:20]
y_test_subset = y_test[:20]

X_train_vec = np.array([bert_sentence_embedding(t) for t in X_train_subset])
X_val_vec = np.array([bert_sentence_embedding(t) for t in X_val_subset])
X_test_vec = np.array([bert_sentence_embedding(t) for t in X_test_subset])

clf = LogisticRegression(max_iter=1000)
clf.fit(X_train_vec, y_train_subset)
y_val_pred = clf.predict(X_val_vec)
y_test_pred = clf.predict(X_test_vec)

results["BERT"] = {
    "Val Accuracy": accuracy_score(y_val_subset, y_val_pred),
    "Test Accuracy": accuracy_score(y_test_subset, y_test_pred),
    "Val F1 (weighted)": f1_score(y_val_subset, y_val_pred, average="weighted"),
    "Test F1 (weighted)": f1_score(y_test_subset, y_test_pred, average="weighted")
}

print("BERT - Classification Report (Test):")
print(classification_report(y_test_subset, y_test_pred))

# Resultados
results_df = pd.DataFrame(results).T
print("\nComparativa Embeddings Contextuales:")
print(results_df)


Batches:   0%|          | 0/525 [00:00<?, ?it/s]

Sentence Transformers - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.52      0.54      0.53      1950
           1       0.54      0.48      0.51      1598
           2       0.55      0.57      0.56      2048

    accuracy                           0.54      5596
   macro avg       0.54      0.53      0.53      5596
weighted avg       0.54      0.54      0.54      5596

BERT - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.12      0.17      0.14         6
           1       0.67      0.57      0.62         7
           2       0.33      0.29      0.31         7

    accuracy                           0.35        20
   macro avg       0.38      0.34      0.36        20
weighted avg       0.39      0.35      0.37        20


Comparativa Embeddings Contextuales:
                       Val Accuracy  Test Accuracy  Val F1 (weighted)  \
Sentence Transformers      0.526626 

# **4. Tabla Comparativa de Resultados**

Para la fecha límite de esta entrega solo hemos podido hacer completamente la tarea de la variable bias. Las tareas de topic y source solo tienen presencia en la sección de Shallow Learning.


**4.1 Shallow Learning**

Para entender los resultados, hay que aclarar los parámetros utilizados para cada variable en Shallow Learning.

En el caso de la variable source, hemos tenido que reducir el número de estimadores del Random Forest de 300 a 150. Además, el número de estimadores del XGBoost también ha sido reducido de 200 a 75. Sin esta reducción, no habríamos sido capaces de terminar la ejecución de la celda. Ha llegado a estar más de una hora y seguía sin terminar de ejecutarse.

En cuanto a la variable topic, además de las rebajas aplicadas al caso de la variable source, hemos decidido quitar el modelo XGBoost, ya que no termina de ejecutarse. Disponemos de equipos con capacidades técnicas muy limitadas, por lo que, con mejores ordenadores, no se tendrían que reducir los valores.

In [19]:
# Resultados de clasificación por variable objetivo

results_bias = {
    "Logistic Regression": {"Accuracy": 0.702109, "Macro-F1": 0.700091},
    "LinearSVC": {"Accuracy": 0.698713, "Macro-F1": 0.696551},
    "Random Forest": {"Accuracy": 0.693710, "Macro-F1": 0.690578}
}

results_topic = {
    "Logistic Regression": {"Accuracy": 0.581129, "Macro-F1": 0.338987},
    "LinearSVC": {"Accuracy": 0.589171, "Macro-F1": 0.410156},
    "Random Forest": {"Accuracy": 0.520372, "Macro-F1": 0.244504}
    }


results_source = {
    "Logistic Regression": {"Accuracy": 0.503401, "Macro-F1": 0.103117},
    "LinearSVC": {"Accuracy": 0.559076, "Macro-F1": 0.219572},
    "Random Forest": {"Accuracy": 0.499642, "Macro-F1": 0.499642}
    }


# --- Creación del DataFrame de Comparación ---
rows = []
models = ["Logistic Regression", "LinearSVC", "Random Forest"]

for model in models:
    row = {
        "Model": model,
        "Bias Accuracy": results_bias[model]["Accuracy"],
        "Bias Macro-F1": results_bias[model]["Macro-F1"],
        "Topic Accuracy": results_topic[model]["Accuracy"],
        "Topic Macro-F1": results_topic[model]["Macro-F1"],
        "Source Accuracy": results_source[model]["Accuracy"],
        "Source Macro-F1": results_source[model]["Macro-F1"]
    }
    rows.append(row)

df_comparison = pd.DataFrame(rows)

# --- Formateo para la presentación ---

# Redondear todas las columnas de métricas a 4 decimales
df_display = df_comparison.round(4)

# Crear un multi-índice para los nombres de las columnas para agrupar las métricas
cols = [('Métricas', 'Model'), 
        ('Bias', 'Accuracy'), ('Bias', 'Macro-F1'),
        ('Topic', 'Accuracy'), ('Topic', 'Macro-F1'),
        ('Source', 'Accuracy'), ('Source', 'Macro-F1')]

df_display.columns = pd.MultiIndex.from_tuples(cols)

# Imprimir la tabla
print(" Resultados de Clasificación Shallow Learning (TF-IDF)\n")
# Usar to_markdown o to_string para una salida limpia en consola
print(df_display.to_string(index=False))



 Resultados de Clasificación Shallow Learning (TF-IDF)

           Métricas     Bias             Topic            Source         
              Model Accuracy Macro-F1 Accuracy Macro-F1 Accuracy Macro-F1
Logistic Regression   0.7021   0.7001   0.5811   0.3390   0.5034   0.1031
          LinearSVC   0.6987   0.6966   0.5892   0.4102   0.5591   0.2196
      Random Forest   0.6937   0.6906   0.5204   0.2445   0.4996   0.4996


1- Bias:
Todos los modelos presentan un desempeño sólido. Logistic Regression alcanza un accuracy de 0.7021 y un macro-F1 de 0.7001, seguido muy de cerca por LinearSVC, con un accuracy de 0.6987 y macro-F1 0.6966,  y Random Forest, con un accuracy de  0.6937 y  macro-F1 0.6906. Esto indica que las diferencias de sesgo en los textos son relativamente fáciles de capturar mediante TF-IDF, que identifica patrones de palabras relevantes para el sesgo.

2- Topic:
Los modelos lineales, especialmente LinearSVC, logran los mejores resultados, con un accuracy de  0.5892 y macro-F1 0.4102. Random Forest y Logistic Regression muestran un desempeño menor. Esto refuerza que, para la clasificación por tópicos, los modelos lineales son más adecuados con TF-IDF, que captura eficazmente términos distintivos de cada tema.

3- Source:
El desempeño general es más bajo. LinearSVC alcanza un accuracy de 0.5591 y un macro-F1 de 0.2196, mientras que Logistic Regression y Random Forest presentan métricas inferiores. Esto refleja la dificultad de diferenciar la fuente del texto únicamente con TF-IDF, ya que las diferencias estilísticas entre fuentes son más sutiles y menos evidentes en representaciones basadas en frecuencias de palabras.

**4.2 Deep Learning**

In [10]:

data_finetune = {
    "Modelo": ["LSTM", "GRU"],
    "Estrategia de Embedding": ["Finetuneado", "Finetuneado"],
    "Accuracy": [0.496783, 0.527281],
    "Macro-F1": [0.494741, 0.527386]
}

data_random = {
    "Modelo": ["LSTM", "GRU"],
    "Estrategia de Embedding": ["No Finetuneado", "No Finetuneado"],
    "Accuracy": [0.393614, 0.384725],
    "Macro-F1": [0.373600, 0.256087]
}

data_word2vec = {
    "Modelo": ["Word2Vec", "Word2Vec", "Word2Vec"],
    "Estrategia de Embedding": [
        "Word2Vec (Frozen)",
        "Word2Vec (Fine-tune)",
        "Word2Vec (Scratch)"
    ],
    "Accuracy": [0.545390, 0.438349, 0.403860],
    "Macro-F1": [0.541059, 0.417913, 0.377377]
}

# Crear DataFrames
df_finetune = pd.DataFrame(data_finetune)
df_random = pd.DataFrame(data_random)
df_word2vec = pd.DataFrame(data_word2vec)

# Unir todos los DataFrames
df_deep_learning = pd.concat(
    [df_finetune, df_random, df_word2vec],
    ignore_index=True
)

# Redondear métricas para presentación
df_deep_learning_display = df_deep_learning.round(4)

# Reordenar columnas
column_order = ["Estrategia de Embedding", "Modelo", "Accuracy", "Macro-F1"]
df_deep_learning_display = df_deep_learning_display[column_order]

# Mostrar resultados
print("Rendimiento de Modelos de Deep Learning (LSTM & GRU)\n")
print(df_deep_learning_display.to_string(index=False))


Rendimiento de Modelos de Deep Learning (LSTM & GRU)

Estrategia de Embedding   Modelo  Accuracy  Macro-F1
            Finetuneado     LSTM    0.4968    0.4947
            Finetuneado      GRU    0.5273    0.5274
         No Finetuneado     LSTM    0.3936    0.3736
         No Finetuneado      GRU    0.3847    0.2561
      Word2Vec (Frozen) Word2Vec    0.5454    0.5411
   Word2Vec (Fine-tune) Word2Vec    0.4383    0.4179
     Word2Vec (Scratch) Word2Vec    0.4039    0.3774


Embeddings finetuneados:

Cuando los embeddings aprenden conjuntamente con el modelo, GRU supera ligeramente a LSTM, alcanzando un accuracy de 0.5273 y un macro-F1 de 0.5274. LSTM obtiene valores algo inferiores con un accuracy de 0.4968. Esta diferencia puede explicarse por la estructura más simple de GRU, que suele generalizar mejor en escenarios con datasets de tamaño moderado.

Embeddings no finetuneado: 

Ambos modelos sufren una caída notable de rendimiento, especialmente GRU, cuyo macro-F1 desciende hasta 0.2561. Esto indica una incapacidad del modelo para aprender representaciones semánticas útiles a partir de embeddings no informativos.

Word2Vec:

El mejor rendimiento global se obtiene con Word2Vec congelado, alcanzando un accuracy de 0.5454 y un macro-F1 de 0.5411. Esto sugiere que los embeddings preentrenados capturan información semántica relevante, que el modelo puede explotar eficazmente sin necesidad de reajustarlos.



En conjunto, los resultados confirman que la estrategia de embedding es el factor clave en el rendimiento. Ademas, los embeddings preentrenados y congelados Word2Vec Frozen superan a las alternativas entrenadas desde cero. Por ultimo, GRU tiende a comportarse ligeramente mejor que LSTM cuando los embeddings se aprenden.



**4.3 Embeddings**

In [7]:
import pandas as pd
import numpy as np

# --- Datos nuevos ---

# Tradicionales (Shallow Learning)
results_traditional = {
    "Modelo/Técnica": ["TF-IDF", "Bag-of-Words (BoW)"],
    "Test Accuracy": [0.7100, 0.6300],
    "Test F1 (weighted)": [0.7001, 0.6337]  # macro-F1 ponderado
}

# No contextuales (Word2Vec / FastText)
results_non_contextual = {
    "Modelo/Técnica": ["Word2Vec", "FastText"],
    "Test Accuracy": [0.5368, 0.5440],
    "Test F1 (weighted)": [0.5347, 0.5418]
}

# Contextuales (Sentence Transformers / BERT)
results_contextual = {
    "Modelo/Técnica": ["Sentence Transformers", "BERT"],
    "Test Accuracy": [0.5351, 0.3659],
    "Test F1 (weighted)": [0.5351, 0.3659]
}

# --- Crear DataFrames ---
df_traditional = pd.DataFrame(results_traditional).assign(Tipo_Embedding="Tradicional")
df_non_contextual = pd.DataFrame(results_non_contextual).assign(Tipo_Embedding="No Contextual")
df_contextual = pd.DataFrame(results_contextual).assign(Tipo_Embedding="Contextual")

# --- Concatenar todos los DataFrames ---
df_results = pd.concat([df_traditional, df_non_contextual, df_contextual], ignore_index=True)

# --- Ordenar columnas y renombrar ---
df_results = df_results[["Tipo_Embedding", "Modelo/Técnica", "Test Accuracy", "Test F1 (weighted)"]]
df_results.rename(columns={"Tipo_Embedding": "Tipo de Embedding",
                           "Test Accuracy": "Accuracy",
                           "Test F1 (weighted)": "Macro-F1"}, inplace=True)

# Redondear métricas
df_results[["Accuracy", "Macro-F1"]] = df_results[["Accuracy", "Macro-F1"]].round(4)

# --- Mostrar tabla ---
print("Resultados Consolidados de Modelos de Representación de Texto (Test Set)\n")
print(df_results.to_string(index=False))


Resultados Consolidados de Modelos de Representación de Texto (Test Set)

Tipo de Embedding        Modelo/Técnica  Accuracy  Macro-F1
      Tradicional                TF-IDF    0.7100    0.7001
      Tradicional    Bag-of-Words (BoW)    0.6300    0.6337
    No Contextual              Word2Vec    0.5368    0.5347
    No Contextual              FastText    0.5440    0.5418
       Contextual Sentence Transformers    0.5351    0.5351
       Contextual                  BERT    0.3659    0.3659


Embeddings tradicionales: 

Son los que obtienen el mejor rendimiento en Accuracy y Macro-F1, especialmente TF-IDF,  con0.71 y 0.70 respectivamente.Esto indica que para esta tarea, las representaciones basadas en frecuencia de términos todavía capturan bien los patrones discriminativos del texto, particularmente para la variable objetivo "Bias".

Por otro lado, Bag-of-Words rinde un poco peor que TF-IDF, lo que es esperado ya que TF-IDF pondera la importancia de los términos, ayudando a resaltar palabras clave.

Embeddings no contextuales: 

Este tipo de embeddings obtienen resultados moderados, con 0.54–0.544 de Accuracy y  0.53–0.54 de Macro-F1. Aunque capturan relaciones semánticas entre palabras, al usarlos con Logistic Regression sobre un downstream classification, no superan a TF-IDF.

Esto sugiere que la información semántica que Word2Vec o FastText aportan no es tan crítica para la tarea como la presencia o frecuencia de términos concretos.

Embeddings contextuales:

Sentence Transformers se comporta similar a los embeddings no contextuales ~0.535. BERT rinde mucho peor con un 0.366 de Accuracy y 0.366 de Macro-F1. Esto probablemente sea  a causa de la reduccion en la cantidad de epochs por limitaciones de tiempo y recursos computacionales.


En conclusion, TF-IDF sigue siendo la representación más efectiva. Los embeddings Word2Vec/FastText y Sentence Transformers son útiles pero no superan a las técnicas basadas en frecuencia de términos.



# **Continuación de las partes pendientes de las tarea topic y source**

Como indica el título, de este punto en adelante se tratan las partes de Deep Learning y Comparación de Embeddings para las tareas de topic y source. Ya que en la E3 no nos dio tiempo, lo hacemos para la entrega 4. 

# **Deep Learning**

Para la variable source hemos intentado replicar exactamente el mismo pipeline utilizado para bias y topic.

Sin embargo, esta variable presenta una distribución extremadamente desbalanceada, con un número elevado de clases que contienen muy pocos ejemplos. Aunque se filtramos inicialmente las clases con menos de dos instancias, durante la división estratificada en conjuntos de entrenamiento, validación y test, varias clases quedaban representadas por una única muestra en los subconjuntos de validación o test.

Dado que la estratificación es un requisito fundamental para garantizar evaluaciones fiables y reproducibles, y que forzar la eliminación de esta restricción introduciría sesgos importantes, hemos decidido no continuar con el entrenamiento de modelos RNN para la variable source.

Este resultado pone de manifiesto una limitación estructural del dataset para esta variable y sugiere que source requeriría enfoques alternativos, asi como agrupación de clases.

### **Embeddings fine-tuneados**

In [None]:
# Creamos una funcion para poder llamarla con la variable que queramos predecir
def run_rnn_pipeline(
    df,
    target_col,
    w2v_path="data/embeddings/word2vec.model",
    max_seq_len=200,
    batch_size=64,
    epochs=20
):
    print(f"\n")
    print(f"  Variable objetivo: {target_col}")
    print(f"\n")

    # Filtrado de clases con al menos 2 muestras
    counts = df[target_col].value_counts()
    valid_classes = counts[counts >= 2].index
    df_filtered = df[df[target_col].isin(valid_classes)].reset_index(drop=True)

    print(f"Clases originales: {df[target_col].nunique()}")
    print(f"Clases tras filtrado: {df_filtered[target_col].nunique()}")

    # Codificamos las etiquetas
    y = df_filtered[target_col].values

    le = LabelEncoder()
    y_int = le.fit_transform(y)           # etiquetas 1D para stratify
    y_cat = to_categorical(y_int)         # etiquetas one-hot para el modelo

    # Hacemos el Train/Val/Test split
    X_train_texts, X_temp_texts, y_train, y_temp, y_train_int, y_temp_int = train_test_split(
        df_filtered["tokens"],
        y_cat,
        y_int,
        test_size=0.3,
        random_state=42,
        stratify=y_int
    )

    X_val_texts, X_test_texts, y_val, y_test, y_val_int, y_test_int = train_test_split(
        X_temp_texts,
        y_temp,
        y_temp_int,
        test_size=0.5,
        random_state=42,
        stratify=y_temp_int
    )

    # Word2Vec
    w2v_model = Word2Vec.load(w2v_path)
    embedding_dim = w2v_model.vector_size

    word_index = {word: i + 1 for i, word in enumerate(w2v_model.wv.index_to_key)}
    vocab_size = len(word_index) + 1  

    def tokens_to_indices(tokens):
        return [word_index[t] for t in tokens if t in word_index]

    X_train_idx = [tokens_to_indices(t) for t in X_train_texts]
    X_val_idx   = [tokens_to_indices(t) for t in X_val_texts]
    X_test_idx  = [tokens_to_indices(t) for t in X_test_texts]

    X_train_pad = pad_sequences(X_train_idx, maxlen=max_seq_len, padding="post")
    X_val_pad   = pad_sequences(X_val_idx,   maxlen=max_seq_len, padding="post")
    X_test_pad  = pad_sequences(X_test_idx,  maxlen=max_seq_len, padding="post")

    embedding_matrix = np.zeros((vocab_size, embedding_dim))
    for word, i in word_index.items():
        embedding_matrix[i] = w2v_model.wv[word]

    # Definimos la RNN
    def build_rnn(cell="LSTM"):
        model = Sequential()
        model.add(
            Embedding(
                input_dim=vocab_size,
                output_dim=embedding_dim,
                weights=[embedding_matrix],
                input_length=max_seq_len,
                trainable=True  # fine-tuning Word2Vec
            )
        )

        if cell == "LSTM":
            model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
        elif cell == "GRU":
            model.add(GRU(128, dropout=0.2, recurrent_dropout=0.2))

        model.add(Dense(y_cat.shape[1], activation="softmax"))

        model.compile(
            optimizer=Adam(learning_rate=1e-3),
            loss="categorical_crossentropy",
            metrics=["accuracy"]
        )
        return model

    # EarlyStopping
    early_stop = EarlyStopping(
        monitor="val_loss",
        patience=3,
        restore_best_weights=True
    )

    # Entrenamos los modelos
    lstm_model = build_rnn("LSTM")
    lstm_model.fit(
        X_train_pad,
        y_train,
        validation_data=(X_val_pad, y_val),
        epochs=epochs,
        batch_size=batch_size,
        callbacks=[early_stop],
        verbose=1
    )

    gru_model = build_rnn("GRU")
    gru_model.fit(
        X_train_pad,
        y_train,
        validation_data=(X_val_pad, y_val),
        epochs=epochs,
        batch_size=batch_size,
        callbacks=[early_stop],
        verbose=1
    )

    # Evaluamos
    y_pred_lstm = np.argmax(lstm_model.predict(X_test_pad), axis=1)
    y_pred_gru  = np.argmax(gru_model.predict(X_test_pad), axis=1)

    y_test_labels = np.argmax(y_test, axis=1)

    results = {
        "LSTM": {
            "Accuracy": accuracy_score(y_test_labels, y_pred_lstm),
            "Macro-F1": f1_score(y_test_labels, y_pred_lstm, average="macro"),
        },
        "GRU": {
            "Accuracy": accuracy_score(y_test_labels, y_pred_gru),
            "Macro-F1": f1_score(y_test_labels, y_pred_gru, average="macro"),
        },
    }

    results_df = pd.DataFrame(results).T
    print("\nResultados finales:")
    print(results_df)

    return results_df





In [21]:
results_topic = run_rnn_pipeline(df_train, "topic")


===== Procesando columna: topic =====

Epoch 1/20




[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m90s[0m 283ms/step - accuracy: 0.1712 - loss: 3.7774 - val_accuracy: 0.2362 - val_loss: 3.4463
Epoch 2/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m93s[0m 304ms/step - accuracy: 0.2408 - loss: 3.2817 - val_accuracy: 0.2924 - val_loss: 2.9537
Epoch 3/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m147s[0m 479ms/step - accuracy: 0.3315 - loss: 2.7591 - val_accuracy: 0.3861 - val_loss: 2.5254
Epoch 4/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m180s[0m 587ms/step - accuracy: 0.4025 - loss: 2.4214 - val_accuracy: 0.4185 - val_loss: 2.3445
Epoch 5/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m112s[0m 365ms/step - accuracy: 0.4471 - loss: 2.2077 - val_accuracy: 0.4402 - val_loss: 2.2047
Epoch 6/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m148s[0m 387ms/step - accuracy: 0.4844 - loss: 1.9966 - val_accuracy: 0.4664 - val_loss: 2.1131
Epoch 7/20
[1m30

In [None]:
# Source
results_source = run_rnn_pipeline(df_train, "source")

### **Embeddings no fine-tuneados**

In [None]:
# Creamos una función para ejecutar con la variable objetivo que queramos
def run_random_embedding_experiment(df, target_col, vocab_size=20000, maxlen=100,
                                    embedding_dim=100, batch_size=64, epochs=20,
                                    min_samples_per_class=5):
    print(f"\n===== Embeddings no fine-tuneados | Target: {target_col} =====\n")
    
    # Filtramos las clases raras
    counts = df[target_col].value_counts()
    valid_classes = counts[counts >= min_samples_per_class].index
    df_filtered = df[df[target_col].isin(valid_classes)].copy()
    
    if df_filtered.empty:
        print(f"No hay suficientes datos para {target_col} después de filtrar clases raras.")
        return None

    # Preparamos los textos y etiquetas
    df_filtered["text_joined"] = df_filtered["tokens"].apply(lambda x: " ".join(x))
    texts = df_filtered["text_joined"].astype(str).tolist()
    labels = df_filtered[target_col].tolist()
    
    le = LabelEncoder()
    y_encoded = le.fit_transform(labels)

    # Hacemos el Train/Val/Test split
    X_train_texts, X_temp_texts, y_train, y_temp = train_test_split(
        texts, y_encoded, test_size=0.3, random_state=42, stratify=y_encoded
    )
    
    X_val_texts, X_test_texts, y_val, y_test = train_test_split(
        X_temp_texts, y_temp, test_size=0.5, random_state=42, stratify=y_temp
    )

    # Tokenizamos
    tokenizer = Tokenizer(num_words=vocab_size)
    tokenizer.fit_on_texts(X_train_texts)

    X_train_seq = tokenizer.texts_to_sequences(X_train_texts)
    X_val_seq = tokenizer.texts_to_sequences(X_val_texts)
    X_test_seq = tokenizer.texts_to_sequences(X_test_texts)

    X_train_pad = pad_sequences(X_train_seq, maxlen=maxlen)
    X_val_pad = pad_sequences(X_val_seq, maxlen=maxlen)
    X_test_pad = pad_sequences(X_test_seq, maxlen=maxlen)

    # Definimos los modelos
    def build_lstm_model():
        model = Sequential([
            Embedding(vocab_size, embedding_dim, input_length=maxlen, trainable=False),
            LSTM(128, return_sequences=False),
            Dropout(0.3),
            Dense(64, activation='relu'),
            Dropout(0.3),
            Dense(len(le.classes_), activation='softmax')
        ])
        model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
        return model

    def build_gru_model():
        model = Sequential([
            Embedding(vocab_size, embedding_dim, input_length=maxlen, trainable=False),
            GRU(128, return_sequences=False),
            Dropout(0.3),
            Dense(64, activation='relu'),
            Dropout(0.3),
            Dense(len(le.classes_), activation='softmax')
        ])
        model.compile(loss="sparse_categorical_crossentropy", optimizer="adam", metrics=["accuracy"])
        return model

    # Hacemos early stopping
    early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

    # Entrenamos los modelos
    lstm_model = build_lstm_model()
    lstm_model.fit(X_train_pad, y_train,
                   validation_data=(X_val_pad, y_val),
                   epochs=epochs,
                   batch_size=batch_size,
                   callbacks=[early_stopping])

    gru_model = build_gru_model()
    gru_model.fit(X_train_pad, y_train,
                  validation_data=(X_val_pad, y_val),
                  epochs=epochs,
                  batch_size=batch_size,
                  callbacks=[early_stopping])

    # Evaluamos
    def evaluate(model, X_test, y_test):
        preds = np.argmax(model.predict(X_test, batch_size=batch_size), axis=1)
        y_test_str = le.inverse_transform(y_test)
        preds_str = le.inverse_transform(preds)
        print(classification_report(y_test_str, preds_str))
        acc = accuracy_score(y_test, preds)
        f1 = f1_score(y_test, preds, average='macro')
        return acc, f1

    print("\nResultados LSTM:")
    acc_lstm, f1_lstm = evaluate(lstm_model, X_test_pad, y_test)

    print("\nResultados GRU:")
    acc_gru, f1_gru = evaluate(gru_model, X_test_pad, y_test)
    results_df = pd.DataFrame({
        "Model": ["LSTM", "GRU"],
        "Accuracy": [acc_lstm, acc_gru],
        "Macro-F1": [f1_lstm, f1_gru]
    })

    print("\nResumen de resultados:")
    print(results_df)
    return results_df



In [43]:
# Para topic
results_topic = run_random_embedding_experiment(df_train, "topic")




===== Embeddings no fine-tuneados | Target: topic =====

Epoch 1/20




[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m39s[0m 122ms/step - accuracy: 0.1583 - loss: 3.9671 - val_accuracy: 0.1616 - val_loss: 3.8406
Epoch 2/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 118ms/step - accuracy: 0.1614 - loss: 3.8738 - val_accuracy: 0.1616 - val_loss: 3.8367
Epoch 3/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 115ms/step - accuracy: 0.1615 - loss: 3.8530 - val_accuracy: 0.1616 - val_loss: 3.8230
Epoch 4/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 114ms/step - accuracy: 0.1616 - loss: 3.8376 - val_accuracy: 0.1616 - val_loss: 3.8148
Epoch 5/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m35s[0m 114ms/step - accuracy: 0.1624 - loss: 3.7929 - val_accuracy: 0.1633 - val_loss: 3.7652
Epoch 6/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m36s[0m 117ms/step - accuracy: 0.1641 - loss: 3.7494 - val_accuracy: 0.1714 - val_loss: 3.7397
Epoch 7/20
[1m306/30

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


                                      precision    recall  f1-score   support

                            abortion       0.00      0.00      0.00        40
                              africa       0.00      0.00      0.00         1
                         agriculture       0.00      0.00      0.00         3
                      animal_welfare       0.00      0.00      0.00         2
              arts_and_entertainment       0.00      0.00      0.00         6
                                asia       0.00      0.00      0.00         8
                 banking_and_finance       0.00      0.00      0.00        24
                    bridging_divides       0.00      0.00      0.00         9
                            business       0.00      0.00      0.00        10
                    campaign_finance       0.00      0.00      0.00        22
                   campaign_rhetoric       0.00      0.00      0.00         5
capital_punishment_and_death_penalty       0.00      0.00      

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


                                      precision    recall  f1-score   support

                            abortion       0.00      0.00      0.00        40
                              africa       0.00      0.00      0.00         1
                         agriculture       0.00      0.00      0.00         3
                      animal_welfare       0.00      0.00      0.00         2
              arts_and_entertainment       0.00      0.00      0.00         6
                                asia       0.00      0.00      0.00         8
                 banking_and_finance       0.00      0.00      0.00        24
                    bridging_divides       0.00      0.00      0.00         9
                            business       0.00      0.00      0.00        10
                    campaign_finance       0.00      0.00      0.00        22
                   campaign_rhetoric       0.00      0.00      0.00         5
capital_punishment_and_death_penalty       0.00      0.00      

### **Word2Vec congelado vs Word2Vec fine-tuneado vs Word2Vec from scratch**

In [None]:
# Hacemos una función para ejecutar con la variable objetivo que queramos
def run_w2v_experiment(df, target_col, w2v_path="data/embeddings/word2vec.model",
                       embedding_dim=100, max_seq_len=200, batch_size=64, epochs=20,
                       min_samples_per_class=5):
    
    print(f"\n===== Word2Vec experiment | Target: {target_col} =====\n")
    
    # Filtramos las clases raras
    counts = df[target_col].value_counts()
    valid_classes = counts[counts >= min_samples_per_class].index
    df_filtered = df[df[target_col].isin(valid_classes)].copy()
    
    if df_filtered.empty:
        print(f"No hay suficientes datos para {target_col} después de filtrar clases raras.")
        return None
    
    # Preparamos los textos y etiquetas
    texts = df_filtered["tokens"].tolist()
    labels = df_filtered[target_col].tolist()
    
    le = LabelEncoder()
    y_encoded = le.fit_transform(labels)
    
    # Hacemos el Train/Val/Test split
    X_train_texts, X_temp_texts, y_train, y_temp = train_test_split(
        texts, y_encoded, test_size=0.3, random_state=42, stratify=y_encoded
    )
    
    X_val_texts, X_test_texts, y_val, y_test = train_test_split(
        X_temp_texts, y_temp, test_size=0.5, random_state=42, stratify=y_temp
    )
    
    # Cargamos el modelo Word2Vec
    w2v_model = Word2Vec.load(w2v_path)
    embedding_dim = w2v_model.vector_size
    word_index = {word: i+1 for i, word in enumerate(w2v_model.wv.index_to_key)}
    vocab_size = len(word_index) + 1
    
    def tokens_to_indices(tokens, word_index):
        return [word_index[t] for t in tokens if t in word_index]
    
    X_train_idx = [tokens_to_indices(t, word_index) for t in X_train_texts]
    X_val_idx = [tokens_to_indices(t, word_index) for t in X_val_texts]
    X_test_idx = [tokens_to_indices(t, word_index) for t in X_test_texts]
    
    X_train_pad = pad_sequences(X_train_idx, maxlen=max_seq_len, padding='post')
    X_val_pad = pad_sequences(X_val_idx, maxlen=max_seq_len, padding='post')
    X_test_pad = pad_sequences(X_test_idx, maxlen=max_seq_len, padding='post')
    
    # Creamos la matriz de embedding
    embedding_matrix = np.zeros((vocab_size, embedding_dim))
    for word, i in word_index.items():
        embedding_matrix[i] = w2v_model.wv[word]
    
    # Funcion para construir el modelo LSTM
    def build_lstm_model(embedding_matrix, trainable=True):
        model = Sequential()
        model.add(Embedding(
            input_dim=embedding_matrix.shape[0],
            output_dim=embedding_matrix.shape[1],
            weights=[embedding_matrix],
            input_length=max_seq_len,
            trainable=trainable
        ))
        model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))
        model.add(Dense(len(np.unique(y_encoded)), activation='softmax'))
        model.compile(optimizer=Adam(1e-3),
                      loss='sparse_categorical_crossentropy',
                      metrics=['accuracy'])
        return model
    
    early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
    
    # Enrtenamos
    # Word2Vec Frozen
    lstm_frozen = build_lstm_model(embedding_matrix, trainable=False)
    lstm_frozen.fit(X_train_pad, y_train,
                    validation_data=(X_val_pad, y_val),
                    epochs=epochs, batch_size=batch_size,
                    callbacks=[early_stopping])
    
    # Word2Vec Fine-tune
    lstm_finetune = build_lstm_model(embedding_matrix, trainable=True)
    lstm_finetune.fit(X_train_pad, y_train,
                      validation_data=(X_val_pad, y_val),
                      epochs=epochs, batch_size=batch_size,
                      callbacks=[early_stopping])
    
    # Word2Vec Scratch 
    embedding_matrix_random = np.random.normal(size=(vocab_size, embedding_dim))
    lstm_scratch = build_lstm_model(embedding_matrix_random, trainable=True)
    lstm_scratch.fit(X_train_pad, y_train,
                      validation_data=(X_val_pad, y_val),
                      epochs=epochs, batch_size=batch_size,
                      callbacks=[early_stopping])
    
    # Evaluamos
    def evaluate(model, X_test, y_test):
        preds = np.argmax(model.predict(X_test, batch_size=batch_size), axis=1)
        y_test_str = le.inverse_transform(y_test)
        preds_str = le.inverse_transform(preds)
        print(classification_report(y_test_str, preds_str))
        acc = accuracy_score(y_test, preds)
        f1 = f1_score(y_test, preds, average='macro')
        return acc, f1
    
    results = {}
    print("\nResultados Word2Vec Frozen")
    results['Word2Vec Frozen'] = evaluate(lstm_frozen, X_test_pad, y_test)
    
    print("\nResultados Word2Vec Fine-tune")
    results['Word2Vec Fine-tune'] = evaluate(lstm_finetune, X_test_pad, y_test)
    
    print("\nResultados Word2Vec Scratch")
    results['Word2Vec Scratch'] = evaluate(lstm_scratch, X_test_pad, y_test)

    results_df = pd.DataFrame(results, index=['Accuracy','Macro-F1']).T
    print("\nResumen de resultados:")
    print(results_df)
    
    return results_df




In [46]:
# Topic
results_topic = run_w2v_experiment(df_train, "topic")





===== Word2Vec experiment | Target: topic =====

Epoch 1/20




[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m223s[0m 723ms/step - accuracy: 0.1688 - loss: 3.7845 - val_accuracy: 0.2171 - val_loss: 3.6130
Epoch 2/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m244s[0m 796ms/step - accuracy: 0.2002 - loss: 3.5086 - val_accuracy: 0.2398 - val_loss: 3.2151
Epoch 3/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m250s[0m 816ms/step - accuracy: 0.2528 - loss: 3.1476 - val_accuracy: 0.3132 - val_loss: 2.8662
Epoch 4/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m225s[0m 736ms/step - accuracy: 0.3254 - loss: 2.7796 - val_accuracy: 0.3665 - val_loss: 2.5484
Epoch 5/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m287s[0m 939ms/step - accuracy: 0.3726 - loss: 2.5319 - val_accuracy: 0.4009 - val_loss: 2.3977
Epoch 6/20
[1m306/306[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m309s[0m 895ms/step - accuracy: 0.3971 - loss: 2.3950 - val_accuracy: 0.4254 - val_loss: 2.2938
Epoch 7/20
[1m

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


[1m66/66[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m12s[0m 171ms/step
                                      precision    recall  f1-score   support

                            abortion       0.00      0.00      0.00        40
                              africa       0.00      0.00      0.00         1
                         agriculture       0.00      0.00      0.00         3
                      animal_welfare       0.00      0.00      0.00         2
              arts_and_entertainment       0.00      0.00      0.00         6
                                asia       0.00      0.00      0.00         8
                 banking_and_finance       0.00      0.00      0.00        24
                    bridging_divides       0.00      0.00      0.00         9
                            business       0.00      0.00      0.00        10
                    campaign_finance       0.00      0.00      0.00        22
                   campaign_rhetoric       0.00      0.00      0.

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


[1m66/66[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m130s[0m 2s/step
                                      precision    recall  f1-score   support

                            abortion       0.00      0.00      0.00        40
                              africa       0.00      0.00      0.00         1
                         agriculture       0.00      0.00      0.00         3
                      animal_welfare       0.00      0.00      0.00         2
              arts_and_entertainment       0.00      0.00      0.00         6
                                asia       0.00      0.00      0.00         8
                 banking_and_finance       0.00      0.00      0.00        24
                    bridging_divides       0.00      0.00      0.00         9
                            business       0.00      0.00      0.00        10
                    campaign_finance       0.00      0.00      0.00        22
                   campaign_rhetoric       0.00      0.00      0.00

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


# **Comparación de Embeddings**

### **Embedings Tadicionales**


In [None]:
# Creamos una función para ejecutar con la variable objetivo que queramos
def run_traditional_embeddings(df, target_col, min_samples_per_class=5):
    print(f"\n===== Embeddings Tradicionales | Target: {target_col} =====\n")
    
    # Filtramos las clases raras
    counts = df[target_col].value_counts()
    valid_classes = counts[counts >= min_samples_per_class].index
    df_filtered = df[df[target_col].isin(valid_classes)].copy()
    
    if df_filtered.empty:
        print(f"No hay suficientes datos para {target_col} después de filtrar clases raras.")
        return None
    
    # Preparamos los textos y etiquetas
    df_filtered["text_joined"] = df_filtered["tokens"].apply(lambda x: " ".join(x))
    texts = df_filtered["text_joined"].tolist()
    y = df_filtered[target_col].values
    
    results = {}
    
    # Hacemos el Train/Val/Test split
    X_temp, X_test, y_temp, y_test = train_test_split(
        texts, y, test_size=0.15, random_state=42, stratify=y
    )
    X_train, X_val, y_train, y_val = train_test_split(
        X_temp, y_temp, test_size=0.1765, random_state=42, stratify=y_temp
    )
    
    # TF-IDF
    tfidf = TfidfVectorizer(max_features=5000, stop_words='english', ngram_range=(1,2))
    X_train_tfidf = tfidf.fit_transform(X_train)
    X_val_tfidf = tfidf.transform(X_val)
    X_test_tfidf = tfidf.transform(X_test)

    clf = LogisticRegression(max_iter=1000)
    clf.fit(X_train_tfidf, y_train)

    y_val_pred = clf.predict(X_val_tfidf)
    y_test_pred = clf.predict(X_test_tfidf)

    results["TF-IDF"] = {
        "Val Accuracy": accuracy_score(y_val, y_val_pred),
        "Test Accuracy": accuracy_score(y_test, y_test_pred),
        "Val F1 (weighted)": f1_score(y_val, y_val_pred, average="weighted"),
        "Test F1 (weighted)": f1_score(y_test, y_test_pred, average="weighted")
    }

    print("TF-IDF - Classification Report (Test):")
    print(classification_report(y_test, y_test_pred))
    
    # Bag of Words
    bow = CountVectorizer(max_features=5000, stop_words='english', ngram_range=(1,2))
    X_train_bow = bow.fit_transform(X_train)
    X_val_bow = bow.transform(X_val)
    X_test_bow = bow.transform(X_test)

    clf = LogisticRegression(max_iter=1000)
    clf.fit(X_train_bow, y_train)

    y_val_pred = clf.predict(X_val_bow)
    y_test_pred = clf.predict(X_test_bow)

    results["Bag-of-Words"] = {
        "Val Accuracy": accuracy_score(y_val, y_val_pred),
        "Test Accuracy": accuracy_score(y_test, y_test_pred),
        "Val F1 (weighted)": f1_score(y_val, y_val_pred, average="weighted"),
        "Test F1 (weighted)": f1_score(y_test, y_test_pred, average="weighted")
    }

    print("Bag-of-Words - Classification Report (Test):")
    print(classification_report(y_test, y_test_pred))
    
    # Resultado
    results_df = pd.DataFrame(results).T
    print("\nComparativa Embeddings Tradicionales:")
    print(results_df)
    
    return results_df




In [49]:
# Topic
results_topic = run_traditional_embeddings(df_train, "topic")



===== Embeddings Tradicionales | Target: topic =====

TF-IDF - Classification Report (Test):
                                      precision    recall  f1-score   support

                            abortion       0.80      0.82      0.81        39
                              africa       0.00      0.00      0.00         2
                         agriculture       0.00      0.00      0.00         4
                      animal_welfare       0.00      0.00      0.00         2
              arts_and_entertainment       0.00      0.00      0.00         6
                                asia       0.00      0.00      0.00         8
                 banking_and_finance       0.54      0.54      0.54        24
                    bridging_divides       0.00      0.00      0.00         9
                            business       0.00      0.00      0.00        10
                    campaign_finance       0.62      0.23      0.33        22
                   campaign_rhetoric       0.00

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


Bag-of-Words - Classification Report (Test):
                                      precision    recall  f1-score   support

                            abortion       0.80      0.82      0.81        39
                              africa       0.00      0.00      0.00         2
                         agriculture       0.50      0.50      0.50         4
                      animal_welfare       0.00      0.00      0.00         2
              arts_and_entertainment       0.67      0.33      0.44         6
                                asia       0.33      0.25      0.29         8
                 banking_and_finance       0.48      0.46      0.47        24
                    bridging_divides       0.00      0.00      0.00         9
                            business       0.00      0.00      0.00        10
                    campaign_finance       0.29      0.23      0.26        22
                   campaign_rhetoric       0.00      0.00      0.00         5
capital_punishment

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


In [50]:
# Source
results_source = run_traditional_embeddings(df_train, "source")


===== Embeddings Tradicionales | Target: source =====

TF-IDF - Classification Report (Test):
                                      precision    recall  f1-score   support

                            abc news       0.00      0.00      0.00        28
                   abc news (online)       0.00      0.00      0.00         1
                          al jazeera       1.00      0.19      0.32        16
allysia finley (wall street journal)       0.00      0.00      0.00         1
                  american spectator       0.00      0.00      0.00        28
                         ann coulter       0.00      0.00      0.00         1
                       ap fact check       0.00      0.00      0.00         1
                    associated press       0.75      0.26      0.38        35
                               axios       0.00      0.00      0.00         9
                            bbc news       0.76      0.55      0.64        91
                         ben shapiro       0.0

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


Bag-of-Words - Classification Report (Test):
                                      precision    recall  f1-score   support

                            abc news       0.31      0.14      0.20        28
                   abc news (online)       0.00      0.00      0.00         1
                          al jazeera       0.75      0.19      0.30        16
allysia finley (wall street journal)       0.00      0.00      0.00         1
                  american spectator       0.29      0.18      0.22        28
                         ann coulter       0.00      0.00      0.00         1
                       ap fact check       0.00      0.00      0.00         1
                    associated press       0.68      0.54      0.60        35
                               axios       0.33      0.11      0.17         9
                            bbc news       0.70      0.68      0.69        91
                         ben shapiro       0.00      0.00      0.00         3
                  

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


### **Embeddings contextuales**


In [None]:
# Creamos una función para ejecutar con la variable objetivo que queramos
def run_contextual_embeddings(df, target_col, min_samples_per_class=5):
    print(f"\n===== Embeddings Contextuales | Target: {target_col} =====\n")

    # Filtramos las clases raras
    counts = df[target_col].value_counts()
    valid_classes = counts[counts >= min_samples_per_class].index
    df_filtered = df[df[target_col].isin(valid_classes)].copy()
    
    if df_filtered.empty:
        print("No hay suficientes datos después de filtrar clases raras.")
        return None

    df_filtered["text_joined"] = df_filtered["tokens"].apply(lambda x: " ".join(x))
    texts = df_filtered["text_joined"].tolist()
    labels = df_filtered[target_col].tolist()

    # Codificamos las etiquetas
    le = LabelEncoder()
    y_encoded = le.fit_transform(labels)

    # Hacemos el Train/Val/Test split
    X_temp, X_test, y_temp, y_test = train_test_split(
        texts, y_encoded, test_size=0.15, random_state=42, stratify=y_encoded
    )
    X_train, X_val, y_train, y_val = train_test_split(
        X_temp, y_temp, test_size=0.1765, random_state=42, stratify=y_temp
    )

    results = {}

    # Sentence Transformers
    st_model = SentenceTransformer("all-MiniLM-L6-v2")
    X_train_vec = st_model.encode(X_train, batch_size=32, show_progress_bar=True)
    X_val_vec = st_model.encode(X_val, batch_size=32)
    X_test_vec = st_model.encode(X_test, batch_size=32)

    clf = LogisticRegression(max_iter=1000)
    clf.fit(X_train_vec, y_train)
    y_val_pred = clf.predict(X_val_vec)
    y_test_pred = clf.predict(X_test_vec)

    results["Sentence Transformers"] = {
        "Val Accuracy": accuracy_score(y_val, y_val_pred),
        "Test Accuracy": accuracy_score(y_test, y_test_pred),
        "Val F1 (weighted)": f1_score(y_val, y_val_pred, average="weighted"),
        "Test F1 (weighted)": f1_score(y_test, y_test_pred, average="weighted")
    }

    print("Sentence Transformers - Classification Report (Test):")
    print(classification_report(y_test, y_test_pred))

    # Bert
    bert_model_name = "bert-base-uncased"
    tokenizer = BertTokenizer.from_pretrained(bert_model_name)
    bert_model = BertModel.from_pretrained(bert_model_name)
    bert_model.eval()

    def bert_sentence_embedding(text):
        inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=128)
        with torch.no_grad():
            outputs = bert_model(**inputs)
            embeddings = outputs.last_hidden_state.squeeze(0)
            return embeddings.mean(dim=0).numpy()

    # Limitamos el número de textos por memoria
    X_train_subset = X_train[:100]
    X_val_subset = X_val[:20]
    X_test_subset = X_test[:20]
    y_train_subset = y_train[:100]
    y_val_subset = y_val[:20]
    y_test_subset = y_test[:20]

    X_train_vec = np.array([bert_sentence_embedding(t) for t in X_train_subset])
    X_val_vec = np.array([bert_sentence_embedding(t) for t in X_val_subset])
    X_test_vec = np.array([bert_sentence_embedding(t) for t in X_test_subset])

    clf = LogisticRegression(max_iter=1000)
    clf.fit(X_train_vec, y_train_subset)
    y_val_pred = clf.predict(X_val_vec)
    y_test_pred = clf.predict(X_test_vec)

    results["BERT"] = {
        "Val Accuracy": accuracy_score(y_val_subset, y_val_pred),
        "Test Accuracy": accuracy_score(y_test_subset, y_test_pred),
        "Val F1 (weighted)": f1_score(y_val_subset, y_val_pred, average="weighted"),
        "Test F1 (weighted)": f1_score(y_test_subset, y_test_pred, average="weighted")
    }

    print("BERT - Classification Report (Test):")
    print(classification_report(y_test_subset, y_test_pred))

    # Resultados
    results_df = pd.DataFrame(results).T
    print("\nComparativa Embeddings Contextuales:")
    print(results_df)

    return results_df



In [55]:
# Topic
results_topic = run_contextual_embeddings(df_train, "topic")





===== Embeddings Contextuales | Target: topic =====



Batches:   0%|          | 0/612 [00:00<?, ?it/s]

Sentence Transformers - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.80      0.92      0.86        39
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         4
           3       0.00      0.00      0.00         2
           4       0.00      0.00      0.00         6
           5       0.50      0.12      0.20         8
           6       0.42      0.42      0.42        24
           7       0.00      0.00      0.00         9
           8       0.00      0.00      0.00        10
           9       1.00      0.18      0.31        22
          10       0.00      0.00      0.00         5
          11       0.00      0.00      0.00         1
          12       0.32      0.41      0.36        17
          13       0.75      0.43      0.55         7
          14       0.60      0.16      0.25        19
          15       0.62      0.81      0.70       120
          16       0.71    

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


BERT - Classification Report (Test):
              precision    recall  f1-score   support

           2       0.00      0.00      0.00         1
          15       0.00      0.00      0.00         0
          26       0.00      0.00      0.00         1
          28       0.83      1.00      0.91         5
          29       0.00      0.00      0.00         1
          38       0.00      0.00      0.00         1
          44       0.00      0.00      0.00         1
          46       0.00      0.00      0.00         0
          50       0.00      0.00      0.00         1
          51       0.00      0.00      0.00         1
          59       0.00      0.00      0.00         1
          60       0.00      0.00      0.00         1
          63       0.00      0.00      0.00         1
          72       0.00      0.00      0.00         1
          73       0.00      0.00      0.00         0
          76       0.00      0.00      0.00         1
          95       0.00      0.00      0.00 

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


In [56]:
# Source
results_source = run_contextual_embeddings(df_train, "source")


===== Embeddings Contextuales | Target: source =====



Batches:   0%|          | 0/610 [00:00<?, ?it/s]

Sentence Transformers - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        28
           1       0.00      0.00      0.00         1
           2       0.57      0.25      0.35        16
           3       0.00      0.00      0.00         1
           4       0.00      0.00      0.00        28
           5       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         1
           7       1.00      0.03      0.06        35
           8       0.00      0.00      0.00         9
           9       0.30      0.38      0.34        91
          10       0.00      0.00      0.00         3
          11       0.22      0.10      0.14        20
          12       0.00      0.00      0.00        45
          13       0.00      0.00      0.00         9
          14       0.00      0.00      0.00         7
          15       0.10      0.02      0.03        50
          16       0.00    

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


BERT - Classification Report (Test):
              precision    recall  f1-score   support

           2       0.00      0.00      0.00         1
           9       0.00      0.00      0.00         1
          19       0.00      0.00      0.00         1
          21       0.17      1.00      0.29         1
          22       0.00      0.00      0.00         1
          24       0.00      0.00      0.00         1
          30       0.00      0.00      0.00         0
          33       0.00      0.00      0.00         0
          36       0.00      0.00      0.00         1
          47       0.00      0.00      0.00         1
          53       0.00      0.00      0.00         2
          60       0.00      0.00      0.00         0
          63       0.00      0.00      0.00         2
          69       0.00      0.00      0.00         0
          71       0.00      0.00      0.00         3
          76       0.00      0.00      0.00         1
          90       0.00      0.00      0.00 

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


### **Embeddings no Contextuales**

In [None]:
# Creamos una función para ejecutar con la variable objetivo que queramos
def run_non_contextual_embeddings(df, target_col, min_samples_per_class=5):
    print(f"\n===== Embeddings No Contextuales | Target: {target_col} =====\n")
    
    # Filtramos las clases raras
    counts = df[target_col].value_counts()
    valid_classes = counts[counts >= min_samples_per_class].index
    df_filtered = df[df[target_col].isin(valid_classes)].copy()
    
    if df_filtered.empty:
        print("No hay suficientes datos después de filtrar clases raras.")
        return None

    df_filtered["text_joined"] = df_filtered["tokens"].apply(lambda x: " ".join(x))
    sentences = df_filtered["tokens"].tolist()
    labels = df_filtered[target_col].tolist()
    
    # Hacemos el Train/Val/Test split
    from sklearn.preprocessing import LabelEncoder
    le = LabelEncoder()
    y_encoded = le.fit_transform(labels)
    
    X_temp, X_test, y_temp, y_test = train_test_split(
        sentences, y_encoded, test_size=0.15, random_state=42, stratify=y_encoded
    )
    X_train, X_val, y_train, y_val = train_test_split(
        X_temp, y_temp, test_size=0.1765, random_state=42, stratify=y_temp
    )
    
    results = {}

    # Word2Vec
    w2v_model = Word2Vec(sentences=X_train, vector_size=100, window=5, min_count=3, workers=4, sg=1)
    
    def get_avg_w2v(sentence, model):
        vecs = [model.wv[word] for word in sentence if word in model.wv]
        if len(vecs) == 0:
            return np.zeros(model.vector_size)
        return np.mean(vecs, axis=0)
    
    X_train_vec = np.array([get_avg_w2v(s, w2v_model) for s in X_train])
    X_val_vec = np.array([get_avg_w2v(s, w2v_model) for s in X_val])
    X_test_vec = np.array([get_avg_w2v(s, w2v_model) for s in X_test])

    clf = LogisticRegression(max_iter=1000)
    clf.fit(X_train_vec, y_train)
    y_val_pred = clf.predict(X_val_vec)
    y_test_pred = clf.predict(X_test_vec)

    results["Word2Vec"] = {
        "Val Accuracy": accuracy_score(y_val, y_val_pred),
        "Test Accuracy": accuracy_score(y_test, y_test_pred),
        "Val F1 (weighted)": f1_score(y_val, y_val_pred, average="weighted"),
        "Test F1 (weighted)": f1_score(y_test, y_test_pred, average="weighted")
    }

    print("Word2Vec - Classification Report (Test):")
    print(classification_report(y_test, y_test_pred))

    # FastText
    fasttext_model = FastText(sentences=X_train, vector_size=100, window=5, min_count=3, workers=4, sg=1)

    X_train_vec = np.array([get_avg_w2v(s, fasttext_model) for s in X_train])
    X_val_vec = np.array([get_avg_w2v(s, fasttext_model) for s in X_val])
    X_test_vec = np.array([get_avg_w2v(s, fasttext_model) for s in X_test])

    clf = LogisticRegression(max_iter=1000)
    clf.fit(X_train_vec, y_train)
    y_val_pred = clf.predict(X_val_vec)
    y_test_pred = clf.predict(X_test_vec)

    results["FastText"] = {
        "Val Accuracy": accuracy_score(y_val, y_val_pred),
        "Test Accuracy": accuracy_score(y_test, y_test_pred),
        "Val F1 (weighted)": f1_score(y_val, y_val_pred, average="weighted"),
        "Test F1 (weighted)": f1_score(y_test, y_test_pred, average="weighted")
    }

    print("FastText - Classification Report (Test):")
    print(classification_report(y_test, y_test_pred))

    # Resultados
    results_df = pd.DataFrame(results).T
    print("\nComparativa Embeddings No Contextuales:")
    print(results_df)
    
    return results_df



In [52]:
# Topic
results_topic = run_non_contextual_embeddings(df_train, "topic")





===== Embeddings No Contextuales | Target: topic =====



Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_fl

Word2Vec - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.86      0.82      0.84        39
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         4
           3       0.00      0.00      0.00         2
           4       0.00      0.00      0.00         6
           5       0.00      0.00      0.00         8
           6       0.35      0.25      0.29        24
           7       0.00      0.00      0.00         9
           8       0.00      0.00      0.00        10
           9       0.50      0.09      0.15        22
          10       0.00      0.00      0.00         5
          11       0.00      0.00      0.00         1
          12       0.22      0.12      0.15        17
          13       0.00      0.00      0.00         7
          14       1.00      0.05      0.10        19
          15       0.54      0.81      0.65       120
          16       0.50      0.15      0

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.mo

FastText - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.84      0.82      0.83        39
           1       0.00      0.00      0.00         2
           2       0.00      0.00      0.00         4
           3       0.00      0.00      0.00         2
           4       0.00      0.00      0.00         6
           5       0.00      0.00      0.00         8
           6       0.35      0.25      0.29        24
           7       0.00      0.00      0.00         9
           8       0.00      0.00      0.00        10
           9       0.50      0.05      0.08        22
          10       0.00      0.00      0.00         5
          11       0.00      0.00      0.00         1
          12       0.22      0.12      0.15        17
          13       0.00      0.00      0.00         7
          14       0.00      0.00      0.00        19
          15       0.54      0.81      0.65       120
          16       0.40      0.15      0

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


In [57]:
# Para source
results_source = run_non_contextual_embeddings(df_train, "source")


===== Embeddings No Contextuales | Target: source =====



Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_fl

Word2Vec - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        28
           1       0.00      0.00      0.00         1
           2       0.00      0.00      0.00        16
           3       0.00      0.00      0.00         1
           4       0.00      0.00      0.00        28
           5       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         1
           7       0.00      0.00      0.00        35
           8       0.00      0.00      0.00         9
           9       0.24      0.21      0.22        91
          10       0.00      0.00      0.00         3
          11       0.20      0.05      0.08        20
          12       0.00      0.00      0.00        45
          13       0.00      0.00      0.00         9
          14       0.00      0.00      0.00         7
          15       0.50      0.02      0.04        50
          16       0.00      0.00      0

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.models.word2vec_inner.our_dot_float'
Exception ignored in: 'gensim.mo

FastText - Classification Report (Test):
              precision    recall  f1-score   support

           0       0.00      0.00      0.00        28
           1       0.00      0.00      0.00         1
           2       0.00      0.00      0.00        16
           3       0.00      0.00      0.00         1
           4       0.00      0.00      0.00        28
           5       0.00      0.00      0.00         1
           6       0.00      0.00      0.00         1
           7       0.00      0.00      0.00        35
           8       0.00      0.00      0.00         9
           9       0.25      0.21      0.23        91
          10       0.00      0.00      0.00         3
          11       0.25      0.05      0.08        20
          12       0.00      0.00      0.00        45
          13       0.00      0.00      0.00         9
          14       0.00      0.00      0.00         7
          15       0.33      0.02      0.04        50
          16       0.00      0.00      0

  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])


# **Tabla Comparativa de Resultados**

### **Deep Learning**

In [61]:

data_finetune = {
    "Modelo": ["LSTM", "GRU"],
    "Estrategia de Embedding": ["Finetuneado", "Finetuneado"],
    "Accuracy": [0.486061, 0.309030  ],
    "Macro-F1": [0.239902, 0.040144]
}

data_random = {
    "Modelo": ["LSTM", "GRU"],
    "Estrategia de Embedding": ["No Finetuneado", "No Finetuneado"],
    "Accuracy": [0.234215, 0.161544  ],
    "Macro-F1": [0.035418, 0.002600]
}


data_word2vec = {
    "Modelo": ["Word2Vec", "Word2Vec", "Word2Vec"],
    "Estrategia de Embedding": [
        "Word2Vec (Frozen)",
        "Word2Vec (Fine-tune)",
        "Word2Vec (Scratch)"
    ],
    "Accuracy": [0.493448, 0.181558, 0.161544],
    "Macro-F1": [0.257780, 0.005892, 0.002600]
}

# Crear DataFrames
df_finetune = pd.DataFrame(data_finetune)
df_random = pd.DataFrame(data_random)
df_word2vec = pd.DataFrame(data_word2vec)

# Unir todos los DataFrames
df_deep_learning = pd.concat(
    [df_finetune, df_random, df_word2vec],
    ignore_index=True
)

# Redondear métricas para presentación
df_deep_learning_display = df_deep_learning.round(4)

# Reordenar columnas
column_order = ["Estrategia de Embedding", "Modelo", "Accuracy", "Macro-F1"]
df_deep_learning_display = df_deep_learning_display[column_order]

# Mostrar resultados
print("Rendimiento de Modelos de Deep Learning (LSTM & GRU)\n")
print(df_deep_learning_display.to_string(index=False))


Rendimiento de Modelos de Deep Learning (LSTM & GRU)

Estrategia de Embedding   Modelo  Accuracy  Macro-F1
            Finetuneado     LSTM    0.4861    0.2399
            Finetuneado      GRU    0.3090    0.0401
         No Finetuneado     LSTM    0.2342    0.0354
         No Finetuneado      GRU    0.1615    0.0026
      Word2Vec (Frozen) Word2Vec    0.4934    0.2578
   Word2Vec (Fine-tune) Word2Vec    0.1816    0.0059
     Word2Vec (Scratch) Word2Vec    0.1615    0.0026


### **Comparación de Embeddings**

In [None]:
# Tradicionales (Shallow Learning)
results_traditional = {
    "Modelo/Técnica": ["TF-IDF", "Bag-of-Words (BoW)"],
    "Test Accuracy": [0.7100, 0.6300],
    "Test F1 (weighted)": [0.7001, 0.6337]  # macro-F1 ponderado
}

# No contextuales (Word2Vec / FastText)
results_non_contextual = {
    "Modelo/Técnica": ["Word2Vec", "FastText"],
    "Test Accuracy": [0.5368, 0.5440],
    "Test F1 (weighted)": [0.5347, 0.5418]
}

# Contextuales (Sentence Transformers / BERT)
results_contextual = {
    "Modelo/Técnica": ["Sentence Transformers", "BERT"],
    "Test Accuracy": [0.5351, 0.3659],
    "Test F1 (weighted)": [0.5351, 0.3659]
}

# --- Crear DataFrames ---
df_traditional = pd.DataFrame(results_traditional).assign(Tipo_Embedding="Tradicional")
df_non_contextual = pd.DataFrame(results_non_contextual).assign(Tipo_Embedding="No Contextual")
df_contextual = pd.DataFrame(results_contextual).assign(Tipo_Embedding="Contextual")

# --- Concatenar todos los DataFrames ---
df_results = pd.concat([df_traditional, df_non_contextual, df_contextual], ignore_index=True)

# --- Ordenar columnas y renombrar ---
df_results = df_results[["Tipo_Embedding", "Modelo/Técnica", "Test Accuracy", "Test F1 (weighted)"]]
df_results.rename(columns={"Tipo_Embedding": "Tipo de Embedding",
                           "Test Accuracy": "Accuracy",
                           "Test F1 (weighted)": "Macro-F1"}, inplace=True)

# Redondear métricas
df_results[["Accuracy", "Macro-F1"]] = df_results[["Accuracy", "Macro-F1"]].round(4)

# --- Mostrar tabla ---
print("Resultados Consolidados de Modelos de Representación de Texto (Test Set)\n")
print(df_results.to_string(index=False))
