### Feature Extraction for Classification

- We will first try extract information from the text based data which is collected previously.

- Let's load the data and drop the all irrelevant columns (determined in the feature analysis notebook)

In [94]:
import numpy as np
import pandas as pd
import gzip
import json
import re

from pprint import pprint
from sklearn.feature_extraction.text import TfidfVectorizer
from pandas import json_normalize

# Load the compressed JSON file
df = pd.read_json('training-dataset.jsonl.gz', lines=True, compression='gzip')

In [95]:
df_profile = json_normalize(df['profile'])
df_posts = df['posts']

In [96]:
dropped_columns = [
    # From cell 8
    'id', 'bio_links', 'country_block', 'eimu_id', 'external_url', 'fbid',
    'profile_pic_url', 'profile_picture_base64', 'fb_profile_biolink',
    'show_account_transparency_details', 'business_address_json',
    
    # From cell 13
    'business_phone_number', 'overall_category_name', 'business_email',
    'ai_agent_type', 'restricted_by_viewer', 'business_category_name',
    'post_count', 'category_enum',
    
    # From cell 19 (zero_variance_columns)
    'is_professional_account', 'is_private', 'is_regulated_c18',
    'is_guardian_of_viewer', 'is_supervised_by_viewer', 'is_supervised_user',
    'is_embeds_disabled', 'is_joined_recently', 'is_verified_by_mv4b',
    'is_supervision_enabled',
    
    # From cell 26
    'business_contact_method',
    
    # From cell 33
    'should_show_category', 'has_clips', 'hide_like_and_view_counts',
    'should_show_public_contacts'
]

In [97]:
df_profile = df_profile.drop(dropped_columns, axis=1)

In [98]:
df_profile.head(0)

Unnamed: 0,username,full_name,biography,category_name,follower_count,following_count,is_business_account,is_verified,highlight_reel_count,entities


In [99]:
text_colums = ["category_name","biography", "username", "entities", "full_name"]

# Fill missing values with empty strings
df_profile[text_colums] = df_profile[text_colums].fillna('')

- I will merge all the text based entities into a single block.

- Normally category name should be encoded, however, when we encode it we get 432 different categories and that is too much.

- We will try to use the same information by concatanating it to bio

- Full name also concatanated to bio

In [100]:
df_profile['text_features'] = df_profile['category_name'] + ' ' + df_profile['biography'] + ' ' + df_profile['entities'] + ' ' + df_profile['full_name']

In [101]:
df_profile.head(0)

Unnamed: 0,username,full_name,biography,category_name,follower_count,following_count,is_business_account,is_verified,highlight_reel_count,entities,text_features


Now, I will proccess the posts, I will concatanate all the posts into a single post

In [102]:

import nltk
from nltk.corpus import stopwords

nltk.download('stopwords')
turkish_stopwords = stopwords.words('turkish')

def preprocess_text(text: str):
    # Lower casing Turkish Text, Don't use str.lower :)
    text = str(text)
    text = text.casefold()

    # Remove URLs
    text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE)

    # Remove special characters and punctuation
    # HERE THE EMOJIS stuff are being removed, you may want to keep them :D
    text = re.sub(r'[^a-zçğıöşü0-9\s#@]', '', text)

    # Remove numbers
    text = re.sub(r'\d+', '', text)

    # Remove extra whitespaces
    text = re.sub(r'\s+', ' ', text).strip()

    return text

# Remove Turkish stopwords
def remove_stopwords(text):
    text = str(text)
    words = text.split()
    words = [word for word in words if word not in turkish_stopwords]
    return ' '.join(words)


df_profile['text_features'].apply(preprocess_text).apply(remove_stopwords)

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\TahaÇAKMAK\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


0       local business #mediaplanning #mediabuying #so...
1       personal blog beyaz yakalıların dünyasına hoşg...
2       brand sogutozuftz avm ankara macrocenter migro...
3       dijital iletişim yönetimiinfo@vimerangcomq dij...
4       energy company totalenergies istasyonları resm...
                              ...                        
5410    healthbeauty şifalı bitkiler atölyesi whatsapp...
5411    yildir yuvamiz türkiye fabrikamız çalışanımız ...
5412                                     taris zeytinyagı
5413    public figure iklimce sohbetlerle ilgili iklim...
5414    coffee shop yıl mahişçi blokları opet benzinli...
Name: text_features, Length: 5415, dtype: object

In [103]:
posts = json_normalize(df['posts'])

In [104]:
posts.shape

(5415, 35)

In [105]:
# Initialize lists to store results
total_likes = []
total_comments = []
max_likes = []
captions_concatenated = []
averages = []


for r in range(0, 5415):
    total_like = 0
    total_comment = 0
    max_like = 0
    captions = ""
    for c in range(0, 35):
        post = posts.loc[r, c]
        # Check if post is not None
        if post is not None:
            like_count = post.get("like_count", 0)  
            comment_count = post.get("comments_count", 0)  

            # Calculate total likes and comments, and track the max likes
            total_like += int(like_count) if like_count is not None else 0
            total_comment += int(comment_count) if comment_count is not None else 0
            max_like = max(max_like, int(like_count) if like_count is not None else 0)
            captions += post.get("caption", "") if post.get("caption") is not None else ""
            captions += " "
            

    # Append the results for the current user (row)
    total_likes.append(total_like)
    total_comments.append(total_comment)
    max_likes.append(max_like)
    averages.append(total_like/35)  
    captions_concatenated.append(captions)

# After the loop, create a DataFrame to store these results
results_df = pd.DataFrame({
    'total_likes': total_likes,
    'total_comments': total_comments,
    'max_likes': max_likes,
    'average like': averages,
    'captions_concatenated': captions_concatenated
})

results_df.head()

results_df['captions_concatenated'] = results_df['captions_concatenated'].apply(preprocess_text).apply(remove_stopwords)


In [106]:
results_df.head()

Unnamed: 0,total_likes,total_comments,max_likes,average like,captions_concatenated
0,404,12,26,11.542857,cumhuriyetimizin yılı kutlu olsun oriflame duo...
1,1126,33,122,32.171429,diyaloğun yaşanmadığı bir online toplantı olma...
2,1103,13,77,31.514286,bugün bir fincan köpüklü türk kahvesiyle taçla...
3,8186,70,7345,233.885714,saygı özlemle #atatürk #kasım #kasim #vimerang...
4,7960,55905,1250,227.428571,başöğretmenimiz gazi mustafa kemal atatürkün ı...


In [107]:
# Create a new dataframe with concatenated text
df_combined = pd.DataFrame({
    'combined_text': df_profile['text_features'] + ' ' + results_df['captions_concatenated']
})

# Apply the same preprocessing to keep consistency
df_combined['combined_text'] = df_combined['combined_text'].apply(preprocess_text).apply(remove_stopwords)
# Concat usernames to the combined text
df_combined = pd.concat([df_profile['username'],df_combined], axis=1)


df_combined.head()

Unnamed: 0,username,combined_text
0,deparmedya,local business #mediaplanning #mediabuying #so...
1,beyazyakaliyiz,personal blog beyaz yakalıların dünyasına hoşg...
2,kafesfirin,brand sogutozuftz avm ankara macrocenter migro...
3,vimerang,dijital iletişim yönetimiinfo@vimerangcomq dij...
4,totalenergies_istasyonlari,energy company totalenergies istasyonları resm...


In [108]:
# Load the training classification data
training_data = pd.read_csv('train-classification.csv')
training_data.head()

Unnamed: 0.1,Unnamed: 0,label
0,taskirancemal,Mom and Children
1,tam_kararinda,Food
2,spart4nn,Food
3,sosyalyiyiciler,Food
4,sonaydizdarahad,Mom and Children


In [109]:
# Rename the column Unnamed: 0 to username
training_data = training_data.rename(columns={'Unnamed: 0': 'username'})
training_data.head()
# Merge the df_profile and training_data dataframes
df_combined = pd.merge(df_combined, training_data, on='username')

df_combined.head()

Unnamed: 0,username,combined_text,label
0,deparmedya,local business #mediaplanning #mediabuying #so...,Tech
1,kafesfirin,brand sogutozuftz avm ankara macrocenter migro...,Food
2,vimerang,dijital iletişim yönetimiinfo@vimerangcomq dij...,Tech
3,mustafa_yalcinn38,politician talas belediye baskanı talas beledi...,Health and Lifestyle
4,zorluenergysolutions,türkiyenin ilindeki yaygın elektrikli şarj ist...,Tech


In [110]:
# Convert all labels to lowercase for consistency
df_combined['label'] = df_combined['label'].str.lower()

In [111]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report

# Load your dataframe
# df = pd.read_csv('your_file.csv')  # Uncomment and load your file

# Preprocessing
X = df_combined['combined_text']
y = df_combined['label']

# Splitting data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Vectorizing the text
vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Model training
model = RandomForestClassifier()
model.fit(X_train_tfidf, y_train)

# Predictions
y_pred = model.predict(X_test_tfidf)

# Evaluation
print("Classification Report for TF-IDF + Random Forest:")
print(classification_report(y_test, y_pred))


Classification Report for TF-IDF + Random Forest:
                      precision    recall  f1-score   support

                 art       0.31      0.13      0.19        38
       entertainment       0.36      0.37      0.37        59
             fashion       0.57      0.75      0.65        55
                food       0.76      0.93      0.83       114
              gaming       0.00      0.00      0.00         5
health and lifestyle       0.51      0.67      0.58        96
    mom and children       1.00      0.03      0.06        34
              sports       1.00      0.15      0.26        27
                tech       0.62      0.80      0.70        59
              travel       0.77      0.66      0.71        62

            accuracy                           0.60       549
           macro avg       0.59      0.45      0.43       549
        weighted avg       0.63      0.60      0.56       549



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [112]:
from gensim.models import Word2Vec
from sklearn.preprocessing import StandardScaler
import numpy as np

# Tokenize the text data
def tokenize_text(text):
    return text.split()

# Prepare sentences for Word2Vec
sentences = df_combined['combined_text'].apply(tokenize_text).values

# Train Word2Vec model
w2v_model = Word2Vec(sentences, 
                    vector_size=100,  # Dimension of word vectors
                    window=5,         # Context window size
                    min_count=1,      # Ignore words that appear less than this
                    workers=4)        # Number of threads to train the model

# Function to create document vectors by averaging word vectors
def get_doc_vector(text, model):
    words = text.split()
    word_vecs = []
    for word in words:
        if word in model.wv:
            word_vecs.append(model.wv[word])
    if len(word_vecs) == 0:
        return np.zeros(model.vector_size)
    return np.mean(word_vecs, axis=0)

# Create document vectors for all texts
X_w2v = np.array([get_doc_vector(text, w2v_model) for text in df_combined['combined_text']])

# Scale the features
scaler = StandardScaler()
X_w2v_scaled = scaler.fit_transform(X_w2v)

# Split the data
X_train_w2v, X_test_w2v, y_train, y_test = train_test_split(X_w2v_scaled, y, test_size=0.2, random_state=42)

# Train Random Forest with Word2Vec features
rf_w2v = RandomForestClassifier(n_estimators=100, random_state=42)
rf_w2v.fit(X_train_w2v, y_train)

# Make predictions
y_pred_w2v = rf_w2v.predict(X_test_w2v)

# Print classification report
print("Classification Report for Word2Vec + Random Forest:")
print(classification_report(y_test, y_pred_w2v))

Classification Report for Word2Vec + Random Forest:
                      precision    recall  f1-score   support

                 art       0.33      0.13      0.19        38
       entertainment       0.30      0.42      0.35        59
             fashion       0.45      0.64      0.53        55
                food       0.77      0.71      0.74       114
              gaming       0.00      0.00      0.00         5
health and lifestyle       0.48      0.61      0.54        96
    mom and children       0.60      0.09      0.15        34
              sports       0.45      0.19      0.26        27
                tech       0.49      0.68      0.57        59
              travel       0.56      0.45      0.50        62

            accuracy                           0.51       549
           macro avg       0.45      0.39      0.38       549
        weighted avg       0.52      0.51      0.49       549



  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [114]:
from transformers import AutoTokenizer, AutoModel
from torch import nn
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
import numpy as np
from tqdm import tqdm
from sklearn.metrics import classification_report

# Initialize BERT tokenizer and model
tokenizer = AutoTokenizer.from_pretrained('dbmdz/bert-base-turkish-uncased')
bert = AutoModel.from_pretrained('dbmdz/bert-base-turkish-uncased')

# Custom Dataset class
class TextDataset(Dataset):
    def __init__(self, texts, labels, tokenizer, max_len=256):
        self.texts = texts
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len
        
    def __len__(self):
        return len(self.texts)
    
    def __getitem__(self, idx):
        text = str(self.texts.iloc[idx])
        label = self.labels.iloc[idx]
        
        encoding = self.tokenizer(
            text,
            add_special_tokens=True,
            max_length=self.max_len,
            return_token_type_ids=False,
            padding='max_length',
            truncation=True,
            return_attention_mask=True,
            return_tensors='pt'
        )
        
        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label)
        }

# BERT Classifier
class BERTClassifier(nn.Module):
    def __init__(self, bert, num_classes):
        super(BERTClassifier, self).__init__()
        self.bert = bert
        self.dropout = nn.Dropout(0.1)
        self.classifier = nn.Linear(bert.config.hidden_size, num_classes)
        
    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs[0][:, 0]
        pooled_output = self.dropout(pooled_output)
        logits = self.classifier(pooled_output)
        return logits

# Prepare data
# Convert labels to numerical format
label_dict = {label: idx for idx, label in enumerate(df_combined['label'].unique())}
y_numeric = df_combined['label'].map(label_dict)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    df_combined['combined_text'], 
    y_numeric, 
    test_size=0.2, 
    random_state=42
)

# Create datasets
train_dataset = TextDataset(X_train, y_train, tokenizer)
test_dataset = TextDataset(X_test, y_test, tokenizer)

# Create dataloaders
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=16)

# Initialize model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = BERTClassifier(bert, len(label_dict))
model = model.to(device)

# Training parameters
optimizer = torch.optim.AdamW(model.parameters(), lr=3e-5)
criterion = nn.CrossEntropyLoss()
epochs = 5

# Training loop
model.train()
for epoch in range(epochs):
    total_loss = 0
    for batch in tqdm(train_loader, desc=f'Epoch {epoch + 1}/{epochs}'):
        optimizer.zero_grad()
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        
        outputs = model(input_ids, attention_mask)
        loss = criterion(outputs, labels)
        
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    
    print(f'Epoch {epoch + 1}, Loss: {total_loss/len(train_loader)}')

# Evaluation
model.eval()
predictions = []
actual = []

with torch.no_grad():
    for batch in test_loader:
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        
        outputs = model(input_ids, attention_mask)
        _, predicted = torch.max(outputs, 1)
        
        predictions.extend(predicted.cpu().numpy())
        actual.extend(labels.cpu().numpy())

# Convert numeric predictions back to labels
reverse_label_dict = {v: k for k, v in label_dict.items()}
predictions = [reverse_label_dict[p] for p in predictions]
actual = [reverse_label_dict[a] for a in actual]

# Print classification report
print("\nClassification Report:")
print(classification_report(actual, predictions))

Epoch 1/5: 100%|██████████| 137/137 [01:34<00:00,  1.44it/s]


Epoch 1, Loss: 1.4909805912170968


Epoch 2/5: 100%|██████████| 137/137 [01:36<00:00,  1.42it/s]


Epoch 2, Loss: 0.8066387352717184


Epoch 3/5: 100%|██████████| 137/137 [01:38<00:00,  1.39it/s]


Epoch 3, Loss: 0.49136408722966257


Epoch 4/5: 100%|██████████| 137/137 [01:37<00:00,  1.40it/s]


Epoch 4, Loss: 0.2871207228324709


Epoch 5/5: 100%|██████████| 137/137 [01:37<00:00,  1.40it/s]


Epoch 5, Loss: 0.1480402083741161

Classification Report:
                      precision    recall  f1-score   support

                 art       0.40      0.74      0.52        38
       entertainment       0.39      0.41      0.40        59
             fashion       0.74      0.84      0.79        55
                food       0.82      0.93      0.87       114
              gaming       1.00      0.60      0.75         5
health and lifestyle       0.90      0.45      0.60        96
    mom and children       0.50      0.38      0.43        34
              sports       0.88      0.78      0.82        27
                tech       0.75      0.83      0.79        59
              travel       0.72      0.69      0.70        62

            accuracy                           0.68       549
           macro avg       0.71      0.66      0.67       549
        weighted avg       0.72      0.68      0.68       549



In [115]:
from torch.optim import AdamW
from transformers import get_linear_schedule_with_warmup

import torch.nn.functional as F

# Hyperparameters
EPOCHS = 10
BATCH_SIZE = 16
LEARNING_RATE = 2e-5
WARMUP_STEPS = 0.1  # 10% of total steps
WEIGHT_DECAY = 0.01
DROPOUT_RATE = 0.2

# Update model's dropout
model.bert.dropout = nn.Dropout(DROPOUT_RATE)
model.dropout = nn.Dropout(DROPOUT_RATE)

# Calculate total steps for scheduler
total_steps = len(train_loader) * EPOCHS
warmup_steps = int(total_steps * WARMUP_STEPS)

# Initialize optimizer with weight decay
no_decay = ['bias', 'LayerNorm.weight']
optimizer_grouped_parameters = [
    {
        'params': [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)],
        'weight_decay': WEIGHT_DECAY
    },
    {
        'params': [p for n, p in model.named_parameters() if any(nd in n for nd in no_decay)],
        'weight_decay': 0.0
    }
]

optimizer = AdamW(optimizer_grouped_parameters, lr=LEARNING_RATE)
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=warmup_steps, num_training_steps=total_steps)

# Training loop with validation
best_val_loss = float('inf')
for epoch in range(EPOCHS):
    # Training
    model.train()
    total_train_loss = 0
    for batch in tqdm(train_loader, desc=f'Epoch {epoch + 1}/{EPOCHS}'):
        optimizer.zero_grad()
        
        input_ids = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        labels = batch['labels'].to(device)
        
        outputs = model(input_ids, attention_mask)
        loss = F.cross_entropy(outputs, labels)
        
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        
        optimizer.step()
        scheduler.step()
        
        total_train_loss += loss.item()
    
    avg_train_loss = total_train_loss / len(train_loader)
    
    # Validation
    model.eval()
    total_val_loss = 0
    predictions = []
    actual = []
    
    with torch.no_grad():
        for batch in test_loader:
            input_ids = batch['input_ids'].to(device)
            attention_mask = batch['attention_mask'].to(device)
            labels = batch['labels'].to(device)
            
            outputs = model(input_ids, attention_mask)
            loss = F.cross_entropy(outputs, labels)
            total_val_loss += loss.item()
            
            _, predicted = torch.max(outputs, 1)
            predictions.extend(predicted.cpu().numpy())
            actual.extend(labels.cpu().numpy())
    
    avg_val_loss = total_val_loss / len(test_loader)
    
    # Save best model
    if avg_val_loss < best_val_loss:
        best_val_loss = avg_val_loss
        torch.save(model.state_dict(), 'best_model.pt')
    
    print(f'Epoch {epoch + 1}:')
    print(f'Average training loss: {avg_train_loss:.4f}')
    print(f'Average validation loss: {avg_val_loss:.4f}')
    
    # Convert predictions back to labels and print classification report
    predictions = [reverse_label_dict[p] for p in predictions]
    actual = [reverse_label_dict[a] for a in actual]
    print("\nClassification Report:")
    print(classification_report(actual, predictions))
    print("="*50)

Epoch 1/10: 100%|██████████| 137/137 [01:35<00:00,  1.43it/s]


Epoch 1:
Average training loss: 0.0570
Average validation loss: 1.5067

Classification Report:
                      precision    recall  f1-score   support

                 art       0.49      0.66      0.56        38
       entertainment       0.48      0.27      0.35        59
             fashion       0.65      0.87      0.74        55
                food       0.86      0.89      0.88       114
              gaming       1.00      0.60      0.75         5
health and lifestyle       0.69      0.67      0.68        96
    mom and children       0.56      0.41      0.47        34
              sports       0.87      0.74      0.80        27
                tech       0.71      0.81      0.76        59
              travel       0.80      0.79      0.80        62

            accuracy                           0.71       549
           macro avg       0.71      0.67      0.68       549
        weighted avg       0.70      0.71      0.70       549



Epoch 2/10: 100%|██████████| 137/137 [01:37<00:00,  1.41it/s]


Epoch 2:
Average training loss: 0.0284
Average validation loss: 1.7672

Classification Report:
                      precision    recall  f1-score   support

                 art       0.49      0.55      0.52        38
       entertainment       0.48      0.25      0.33        59
             fashion       0.65      0.87      0.74        55
                food       0.85      0.90      0.88       114
              gaming       1.00      0.80      0.89         5
health and lifestyle       0.77      0.65      0.70        96
    mom and children       0.49      0.50      0.49        34
              sports       0.84      0.78      0.81        27
                tech       0.70      0.83      0.76        59
              travel       0.72      0.76      0.74        62

            accuracy                           0.70       549
           macro avg       0.70      0.69      0.69       549
        weighted avg       0.70      0.70      0.69       549



Epoch 3/10: 100%|██████████| 137/137 [01:37<00:00,  1.40it/s]


Epoch 3:
Average training loss: 0.0085
Average validation loss: 1.7789

Classification Report:
                      precision    recall  f1-score   support

                 art       0.55      0.45      0.49        38
       entertainment       0.45      0.47      0.46        59
             fashion       0.66      0.84      0.74        55
                food       0.85      0.90      0.88       114
              gaming       1.00      0.60      0.75         5
health and lifestyle       0.80      0.58      0.67        96
    mom and children       0.56      0.44      0.49        34
              sports       0.85      0.81      0.83        27
                tech       0.72      0.81      0.76        59
              travel       0.67      0.77      0.72        62

            accuracy                           0.70       549
           macro avg       0.71      0.67      0.68       549
        weighted avg       0.71      0.70      0.70       549



Epoch 4/10: 100%|██████████| 137/137 [01:37<00:00,  1.40it/s]


Epoch 4:
Average training loss: 0.0016
Average validation loss: 1.8745

Classification Report:
                      precision    recall  f1-score   support

                 art       0.52      0.61      0.56        38
       entertainment       0.49      0.34      0.40        59
             fashion       0.68      0.87      0.76        55
                food       0.87      0.90      0.89       114
              gaming       1.00      0.80      0.89         5
health and lifestyle       0.78      0.67      0.72        96
    mom and children       0.57      0.59      0.58        34
              sports       0.81      0.78      0.79        27
                tech       0.73      0.83      0.78        59
              travel       0.77      0.76      0.76        62

            accuracy                           0.73       549
           macro avg       0.72      0.71      0.71       549
        weighted avg       0.72      0.73      0.72       549



Epoch 5/10: 100%|██████████| 137/137 [01:37<00:00,  1.41it/s]


Epoch 5:
Average training loss: 0.0006
Average validation loss: 1.9171

Classification Report:
                      precision    recall  f1-score   support

                 art       0.51      0.63      0.56        38
       entertainment       0.49      0.36      0.41        59
             fashion       0.70      0.87      0.77        55
                food       0.87      0.89      0.88       114
              gaming       1.00      0.80      0.89         5
health and lifestyle       0.73      0.66      0.69        96
    mom and children       0.58      0.44      0.50        34
              sports       0.84      0.78      0.81        27
                tech       0.68      0.85      0.76        59
              travel       0.80      0.76      0.78        62

            accuracy                           0.72       549
           macro avg       0.72      0.70      0.71       549
        weighted avg       0.72      0.72      0.71       549



Epoch 6/10: 100%|██████████| 137/137 [01:37<00:00,  1.41it/s]


Epoch 6:
Average training loss: 0.0005
Average validation loss: 2.0398

Classification Report:
                      precision    recall  f1-score   support

                 art       0.46      0.66      0.54        38
       entertainment       0.47      0.31      0.37        59
             fashion       0.70      0.85      0.77        55
                food       0.87      0.89      0.88       114
              gaming       1.00      0.80      0.89         5
health and lifestyle       0.71      0.70      0.71        96
    mom and children       0.50      0.38      0.43        34
              sports       0.88      0.78      0.82        27
                tech       0.73      0.81      0.77        59
              travel       0.80      0.76      0.78        62

            accuracy                           0.71       549
           macro avg       0.71      0.69      0.70       549
        weighted avg       0.71      0.71      0.71       549



Epoch 7/10: 100%|██████████| 137/137 [01:37<00:00,  1.40it/s]


Epoch 7:
Average training loss: 0.0011
Average validation loss: 1.9643

Classification Report:
                      precision    recall  f1-score   support

                 art       0.55      0.58      0.56        38
       entertainment       0.47      0.41      0.44        59
             fashion       0.69      0.87      0.77        55
                food       0.87      0.89      0.88       114
              gaming       1.00      0.80      0.89         5
health and lifestyle       0.76      0.66      0.70        96
    mom and children       0.62      0.47      0.53        34
              sports       0.84      0.78      0.81        27
                tech       0.71      0.86      0.78        59
              travel       0.77      0.76      0.76        62

            accuracy                           0.72       549
           macro avg       0.73      0.71      0.71       549
        weighted avg       0.72      0.72      0.72       549



Epoch 8/10: 100%|██████████| 137/137 [01:37<00:00,  1.41it/s]


Epoch 8:
Average training loss: 0.0003
Average validation loss: 1.9943

Classification Report:
                      precision    recall  f1-score   support

                 art       0.55      0.63      0.59        38
       entertainment       0.50      0.41      0.45        59
             fashion       0.70      0.87      0.77        55
                food       0.87      0.89      0.88       114
              gaming       1.00      0.80      0.89         5
health and lifestyle       0.75      0.67      0.71        96
    mom and children       0.58      0.41      0.48        34
              sports       0.84      0.78      0.81        27
                tech       0.70      0.86      0.77        59
              travel       0.78      0.76      0.77        62

            accuracy                           0.73       549
           macro avg       0.73      0.71      0.71       549
        weighted avg       0.72      0.73      0.72       549



Epoch 9/10: 100%|██████████| 137/137 [01:37<00:00,  1.41it/s]


Epoch 9:
Average training loss: 0.0003
Average validation loss: 2.0121

Classification Report:
                      precision    recall  f1-score   support

                 art       0.51      0.63      0.56        38
       entertainment       0.51      0.39      0.44        59
             fashion       0.70      0.87      0.77        55
                food       0.87      0.89      0.88       114
              gaming       1.00      0.80      0.89         5
health and lifestyle       0.75      0.66      0.70        96
    mom and children       0.62      0.47      0.53        34
              sports       0.84      0.78      0.81        27
                tech       0.70      0.86      0.77        59
              travel       0.80      0.76      0.78        62

            accuracy                           0.73       549
           macro avg       0.73      0.71      0.71       549
        weighted avg       0.73      0.73      0.72       549



Epoch 10/10: 100%|██████████| 137/137 [01:37<00:00,  1.40it/s]


Epoch 10:
Average training loss: 0.0003
Average validation loss: 2.0156

Classification Report:
                      precision    recall  f1-score   support

                 art       0.50      0.63      0.56        38
       entertainment       0.50      0.37      0.43        59
             fashion       0.70      0.87      0.77        55
                food       0.87      0.89      0.88       114
              gaming       1.00      0.80      0.89         5
health and lifestyle       0.75      0.66      0.70        96
    mom and children       0.62      0.47      0.53        34
              sports       0.84      0.78      0.81        27
                tech       0.70      0.86      0.77        59
              travel       0.80      0.76      0.78        62

            accuracy                           0.72       549
           macro avg       0.73      0.71      0.71       549
        weighted avg       0.72      0.72      0.72       549

