# FinBERT Profiling Notebook

This notebook profiles the FinBERT training and inference process to establish performance baselines. It uses PyTorch Profiler to analyze:
- Data loading time
- Forward pass time
- Backward pass time
- Optimizer step time
- Inference time

This serves as a benchmark before optimization.

**Note on Device Support:**
- **CUDA (NVIDIA GPUs)**: Full profiling support with separate CPU and CUDA time tracking
- **MPS (Apple Silicon)**: Only CPU time profiling available. While computation runs on GPU, PyTorch Profiler cannot separately track MPS GPU time
- **CPU**: Standard CPU time profiling


## Modules


In [1]:
from pathlib import Path
import shutil
import os
import logging
import sys
sys.path.append('..')

from textblob import TextBlob
from pprint import pprint
from sklearn.metrics import classification_report

from transformers import AutoModelForSequenceClassification

from finbert.finbert import *
import finbert.utils as tools

import torch
from torch.profiler import profile, record_function, ProfilerActivity
import numpy as np
import pandas as pd

%load_ext autoreload
%autoreload 2

project_dir = Path.cwd().parent
pd.set_option('max_colwidth', None)


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s -   %(message)s',
                    datefmt = '%m/%d/%Y %H:%M:%S',
                    level = logging.ERROR)


## Profiled FinBERT Class

This class extends the base FinBERT class to add profiling instrumentation to the training process.


In [None]:
class ProfiledFinBert(FinBert):
    """Extended FinBert class with profiling instrumentation.
    
    Note: GPU-specific profiling (ProfilerActivity.CUDA) only works with NVIDIA CUDA devices.
    For MPS (Apple Silicon), only CPU profiling is available, though actual computation runs on GPU.
    """
    
    def __init__(self, config):
        super().__init__(config)
        self.profile_results = {}
    
    def train(self, train_examples, model):
        """
        Trains the model with profiling instrumentation.
        """
        validation_examples = self.get_data('validation')
        global_step = 0
        self.validation_losses = []
        
        # Training
        train_dataloader = self.get_loader(train_examples, 'train')
        model.train()
        step_number = len(train_dataloader)
        
        # Setup profiler - CUDA profiling only works with NVIDIA GPUs, not MPS
        activities = [ProfilerActivity.CPU]
        if self.device.type == "cuda":
            activities.append(ProfilerActivity.CUDA)
        
        print("\\n" + "="*80)
        print("Starting Profiled Training")
        print(f"Device: {self.device}")
        print(f"Profiling activities: {activities}")
        if self.device.type == "mps":
            print("Note: MPS profiling shows CPU time only. Actual GPU execution time not separately tracked.")
        print("="*80 + "\\n")
        
        i = 0
        
        with profile(
            activities=activities,
            record_shapes=True,
            profile_memory=True,
            with_stack=False
        ) as prof:
            
            for epoch in trange(int(self.config.num_train_epochs), desc="Epoch"):
                model.train()
                tr_loss = 0
                nb_tr_examples, nb_tr_steps = 0, 0
                
                for step, batch in enumerate(tqdm(train_dataloader, desc='Iteration')):
                    
                    # Gradual unfreezing logic
                    if (self.config.gradual_unfreeze and i == 0):
                        for param in model.bert.parameters():
                            param.requires_grad = False
                    
                    if (step % (step_number // 3)) == 0:
                        i += 1
                    
                    if (self.config.gradual_unfreeze and i > 1 and i < self.config.encoder_no):
                        for k in range(i - 1):
                            try:
                                for param in model.bert.encoder.layer[self.config.encoder_no - 1 - k].parameters():
                                    param.requires_grad = True
                            except:
                                pass
                    
                    if (self.config.gradual_unfreeze and i > self.config.encoder_no + 1):
                        for param in model.bert.embeddings.parameters():
                            param.requires_grad = True
                    
                    # Data loading profiling
                    with record_function("data_transfer"):
                        batch = tuple(t.to(self.device) for t in batch)
                        input_ids, attention_mask, token_type_ids, label_ids, agree_ids = batch
                    
                    # Forward pass profiling
                    with record_function("forward_pass"):
                        logits = model(input_ids, attention_mask, token_type_ids)[0]
                    
                    # Loss calculation profiling
                    with record_function("loss_calculation"):
                        weights = self.class_weights.to(self.device)
                        if self.config.output_mode == "classification":
                            loss_fct = CrossEntropyLoss(weight=weights)
                            loss = loss_fct(logits.view(-1, self.num_labels), label_ids.view(-1))
                        elif self.config.output_mode == "regression":
                            loss_fct = MSELoss()
                            loss = loss_fct(logits.view(-1), label_ids.view(-1))
                        
                        if self.config.gradient_accumulation_steps > 1:
                            loss = loss / self.config.gradient_accumulation_steps
                    
                    # Backward pass profiling
                    with record_function("backward_pass"):
                        loss.backward()
                    
                    tr_loss += loss.item()
                    nb_tr_examples += input_ids.size(0)
                    nb_tr_steps += 1
                    
                    # Optimizer step profiling
                    if (step + 1) % self.config.gradient_accumulation_steps == 0:
                        with record_function("optimizer_step"):
                            if self.config.fp16:
                                lr_this_step = self.config.learning_rate * warmup_linear(
                                    global_step / self.num_train_optimization_steps, self.config.warm_up_proportion)
                                for param_group in self.optimizer.param_groups:
                                    param_group['lr'] = lr_this_step
                            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
                            self.optimizer.step()
                            self.scheduler.step()
                            self.optimizer.zero_grad()
                            global_step += 1
                    
                    # Only profile first epoch to save time
                    if epoch == 0 and step >= 20:
                        break
                
                # Break after first epoch for profiling
                if epoch == 0:
                    print("\\n" + "="*80)
                    print("Profiling complete for first epoch (20 steps)")
                    print("Continuing full training without profiling...")
                    print("="*80 + "\\n")
                    break
        
        # Print profiler results
        print("\\n" + "="*80)
        print("PROFILING RESULTS - Training")
        print("="*80 + "\\n")
        
        print("\\nBy CPU Time:")
        print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=20))
        
        if self.device.type == "cuda":
            print("\\nBy CUDA Time:")
            print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=20))
        
        print("\\n" + "="*80 + "\\n")
        
        # Store results
        self.profile_results['training'] = prof.key_averages()
        
        # Continue with full training without profiling
        for epoch in trange(int(self.config.num_train_epochs), desc="Epoch"):
            model.train()
            tr_loss = 0
            nb_tr_examples, nb_tr_steps = 0, 0
            
            for step, batch in enumerate(tqdm(train_dataloader, desc='Iteration')):
                
                if (self.config.gradual_unfreeze and i == 0):
                    for param in model.bert.parameters():
                        param.requires_grad = False
                
                if (step % (step_number // 3)) == 0:
                    i += 1
                
                if (self.config.gradual_unfreeze and i > 1 and i < self.config.encoder_no):
                    for k in range(i - 1):
                        try:
                            for param in model.bert.encoder.layer[self.config.encoder_no - 1 - k].parameters():
                                param.requires_grad = True
                        except:
                            pass
                
                if (self.config.gradual_unfreeze and i > self.config.encoder_no + 1):
                    for param in model.bert.embeddings.parameters():
                        param.requires_grad = True
                
                batch = tuple(t.to(self.device) for t in batch)
                input_ids, attention_mask, token_type_ids, label_ids, agree_ids = batch
                
                logits = model(input_ids, attention_mask, token_type_ids)[0]
                weights = self.class_weights.to(self.device)
                
                if self.config.output_mode == "classification":
                    loss_fct = CrossEntropyLoss(weight=weights)
                    loss = loss_fct(logits.view(-1, self.num_labels), label_ids.view(-1))
                elif self.config.output_mode == "regression":
                    loss_fct = MSELoss()
                    loss = loss_fct(logits.view(-1), label_ids.view(-1))
                
                if self.config.gradient_accumulation_steps > 1:
                    loss = loss / self.config.gradient_accumulation_steps
                else:
                    loss.backward()
                
                tr_loss += loss.item()
                nb_tr_examples += input_ids.size(0)
                nb_tr_steps += 1
                
                if (step + 1) % self.config.gradient_accumulation_steps == 0:
                    if self.config.fp16:
                        lr_this_step = self.config.learning_rate * warmup_linear(
                            global_step / self.num_train_optimization_steps, self.config.warm_up_proportion)
                        for param_group in self.optimizer.param_groups:
                            param_group['lr'] = lr_this_step
                    torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
                    self.optimizer.step()
                    self.scheduler.step()
                    self.optimizer.zero_grad()
                    global_step += 1
            
            # Validation
            validation_loader = self.get_loader(validation_examples, phase='eval')
            model.eval()
            
            valid_loss, valid_accuracy = 0, 0
            nb_valid_steps, nb_valid_examples = 0, 0
            
            for input_ids, attention_mask, token_type_ids, label_ids, agree_ids in tqdm(validation_loader, desc="Validating"):
                input_ids = input_ids.to(self.device)
                attention_mask = attention_mask.to(self.device)
                token_type_ids = token_type_ids.to(self.device)
                label_ids = label_ids.to(self.device)
                agree_ids = agree_ids.to(self.device)
                
                with torch.no_grad():
                    logits = model(input_ids, attention_mask, token_type_ids)[0]
                    
                    if self.config.output_mode == "classification":
                        loss_fct = CrossEntropyLoss(weight=weights)
                        tmp_valid_loss = loss_fct(logits.view(-1, self.num_labels), label_ids.view(-1))
                    elif self.config.output_mode == "regression":
                        loss_fct = MSELoss()
                        tmp_valid_loss = loss_fct(logits.view(-1), label_ids.view(-1))
                    
                    valid_loss += tmp_valid_loss.mean().item()
                    nb_valid_steps += 1
            
            valid_loss = valid_loss / nb_valid_steps
            self.validation_losses.append(valid_loss)
            print("Validation losses: {}".format(self.validation_losses))
            
            if valid_loss == min(self.validation_losses):
                try:
                    os.remove(self.config.model_dir / ('temporary' + str(best_model)))
                except:
                    print('No best model found')
                torch.save({'epoch': str(epoch), 'state_dict': model.state_dict()},
                           self.config.model_dir / ('temporary' + str(epoch)))
                best_model = epoch
        
        # Save the trained model
        checkpoint = torch.load(self.config.model_dir / ('temporary' + str(best_model)))
        model.load_state_dict(checkpoint['state_dict'])
        model_to_save = model.module if hasattr(model, 'module') else model
        output_model_file = os.path.join(self.config.model_dir, WEIGHTS_NAME)
        torch.save(model_to_save.state_dict(), output_model_file)
        output_config_file = os.path.join(self.config.model_dir, CONFIG_NAME)
        with open(output_config_file, 'w') as f:
            f.write(model_to_save.config.to_json_string())
        os.remove(self.config.model_dir / ('temporary' + str(best_model)))
        
        return model


## Profiled Predict Function

This function profiles the inference process for sentiment prediction.


In [None]:
def profiled_predict(text, model, write_to_csv=False, path=None, use_gpu=False, gpu_name='cuda:0', batch_size=5):
    """
    Predict sentiments with profiling instrumentation.
    
    Note: GPU-specific profiling (ProfilerActivity.CUDA) only works with NVIDIA CUDA devices.
    For MPS (Apple Silicon), only CPU profiling is available, though actual computation runs on GPU.
    """
    from nltk.tokenize import sent_tokenize
    from finbert.utils import InputExample, convert_examples_to_features, softmax, chunks, get_device
    from transformers import AutoTokenizer
    
    model.eval()
    tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
    
    # Use the device helper function for better device selection
    if use_gpu:
        device = get_device(no_cuda=False)
        # If user specified a specific CUDA device, use it
        if device.type == "cuda" and gpu_name.startswith("cuda:"):
            device = torch.device(gpu_name)
    else:
        device = torch.device("cpu")
    
    print(f"\\n{'='*80}")
    print(f"Starting Profiled Inference")
    print(f"Device: {device}")
    if device.type == "mps":
        print("Note: MPS profiling shows CPU time only. Actual GPU execution time not separately tracked.")
    print(f"{'='*80}\\n")
    
    label_list = ['positive', 'negative', 'neutral']
    label_dict = {0: 'positive', 1: 'negative', 2: 'neutral'}
    result = pd.DataFrame(columns=['sentence', 'logit', 'prediction', 'sentiment_score'])
    
    # Setup profiler - CUDA profiling only works with NVIDIA GPUs, not MPS
    activities = [ProfilerActivity.CPU]
    if device.type == "cuda":
        activities.append(ProfilerActivity.CUDA)
    
    with profile(
        activities=activities,
        record_shapes=True,
        profile_memory=True,
        with_stack=False
    ) as prof:
        
        with record_function("sentence_tokenization"):
            sentences = sent_tokenize(text)
        
        for batch in chunks(sentences, batch_size):
            with record_function("create_examples"):
                examples = [InputExample(str(i), sentence) for i, sentence in enumerate(batch)]
            
            with record_function("convert_to_features"):
                features = convert_examples_to_features(examples, label_list, 64, tokenizer)
            
            with record_function("prepare_tensors"):
                all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long).to(device)
                all_attention_mask = torch.tensor([f.attention_mask for f in features], dtype=torch.long).to(device)
                all_token_type_ids = torch.tensor([f.token_type_ids for f in features], dtype=torch.long).to(device)
            
            with torch.no_grad():
                with record_function("model_to_device"):
                    model = model.to(device)
                
                with record_function("inference_forward"):
                    logits = model(all_input_ids, all_attention_mask, all_token_type_ids)[0]
                
                with record_function("postprocess_results"):
                    logits = softmax(np.array(logits.cpu()))
                    sentiment_score = pd.Series(logits[:, 0] - logits[:, 1])
                    predictions = np.squeeze(np.argmax(logits, axis=1))
                    
                    batch_result = {'sentence': batch,
                                    'logit': list(logits),
                                    'prediction': predictions,
                                    'sentiment_score': sentiment_score}
                    
                    batch_result = pd.DataFrame(batch_result)
                    result = pd.concat([result, batch_result], ignore_index=True)
    
    # Print profiler results
    print(f"\\n{'='*80}")
    print("PROFILING RESULTS - Inference")
    print(f"{'='*80}\\n")
    
    print("\\nBy CPU Time:")
    print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=20))
    
    if device.type == "cuda":
        print("\\nBy CUDA Time:")
        print(prof.key_averages().table(sort_by="cuda_time_total", row_limit=20))
    
    print(f"\\n{'='*80}\\n")
    
    result['prediction'] = result.prediction.apply(lambda x: label_dict[x])
    if write_to_csv:
        result.to_csv(path, sep=',', index=False)
    
    return result


### Setting path variables


In [5]:
cl_path = project_dir/'models'/'sentiment'
cl_data_path = project_dir/'data'/'sentiment_data'


###  Configuring training parameters


In [6]:
# Clean the cl_path
try:
    shutil.rmtree(cl_path) 
except:
    pass

bertmodel = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', cache_dir=None, num_labels=3)

config = Config(   data_dir=cl_data_path,
                   bert_model=bertmodel,
                   num_train_epochs=4,
                   model_dir=cl_path,
                   max_seq_length = 48,
                   train_batch_size = 32,
                   learning_rate = 2e-5,
                   output_mode='classification',
                   warm_up_proportion=0.2,
                   local_rank=-1,
                   discriminate=True,
                   gradual_unfreeze=True)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [7]:
finbert = ProfiledFinBert(config)
finbert.base_model = 'bert-base-uncased'
finbert.config.discriminate=True
finbert.config.gradual_unfreeze=True


In [None]:

finbert.prepare_model(label_list=['positive','negative','neutral'])


11/22/2025 20:16:47 - INFO - finbert.finbert -   device: mps n_gpu: 1, distributed training: False, 16-bits training: False


## Fine-tune the model with profiling


In [9]:
# Get the training examples
train_data = finbert.get_data('train')


In [10]:
model = finbert.create_the_model()


### Training with Profiling

This will profile the first 20 steps of the first epoch, then continue with normal training.


In [11]:
trained_model = finbert.train(train_examples = train_data, model = model)


11/22/2025 20:16:48 - INFO - finbert.utils -   *** Example ***
11/22/2025 20:16:48 - INFO - finbert.utils -   guid: train-1
11/22/2025 20:16:48 - INFO - finbert.utils -   tokens: [CLS] after the reporting period , bio ##tie north american licensing partner so ##max ##on pharmaceuticals announced positive results with na ##lm ##efe ##ne in a pilot phase 2 clinical trial for smoking ce ##ssa ##tion [SEP]
11/22/2025 20:16:48 - INFO - finbert.utils -   input_ids: 101 2044 1996 7316 2558 1010 16012 9515 2167 2137 13202 4256 2061 17848 2239 24797 2623 3893 3463 2007 6583 13728 27235 2638 1999 1037 4405 4403 1016 6612 3979 2005 9422 8292 11488 3508 102 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:16:48 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:16:48 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/

Starting Profiled Training
Device: mps
Profiling activities: [<ProfilerActivity.CPU: 0>]


Iteration:  18%|█▊        | 20/109 [00:07<00:34,  2.56it/s]
Epoch:   0%|          | 0/4 [00:07<?, ?it/s]


Profiling complete for first epoch (20 steps)
Continuing full training without profiling...
PROFILING RESULTS - Training
\nBy CPU Time:
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                       loss_calculation         0.07%       5.979ms        38.69%        3.253s     154.928ms           0 B           0 B            21  
                                           forward_pass         0.82%      69.062ms        37.32%        3.138s     149.445ms           0 B         -84 B            21  
              

Iteration: 100%|██████████| 109/109 [00:23<00:00,  4.73it/s]
11/22/2025 20:17:24 - INFO - finbert.utils -   *** Example ***
11/22/2025 20:17:24 - INFO - finbert.utils -   guid: validation-1
11/22/2025 20:17:24 - INFO - finbert.utils -   tokens: [CLS] our in - depth expertise extends to the fields of energy , industry , urban & mobility and water & environment [SEP]
11/22/2025 20:17:24 - INFO - finbert.utils -   input_ids: 101 2256 1999 1011 5995 11532 8908 2000 1996 4249 1997 2943 1010 3068 1010 3923 1004 12969 1998 2300 1004 4044 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:17:24 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:17:24 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:17:24 - INFO - finbert.utils -   label: neutral (id = 2)
11/22/2025 20:17:24 

Validation losses: [0.8772950309973496]
No best model found


Iteration: 100%|██████████| 109/109 [00:37<00:00,  2.94it/s]
11/22/2025 20:18:04 - INFO - finbert.utils -   *** Example ***
11/22/2025 20:18:04 - INFO - finbert.utils -   guid: validation-1
11/22/2025 20:18:04 - INFO - finbert.utils -   tokens: [CLS] our in - depth expertise extends to the fields of energy , industry , urban & mobility and water & environment [SEP]
11/22/2025 20:18:04 - INFO - finbert.utils -   input_ids: 101 2256 1999 1011 5995 11532 8908 2000 1996 4249 1997 2943 1010 3068 1010 3923 1004 12969 1998 2300 1004 4044 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:18:04 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:18:04 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:18:04 - INFO - finbert.utils -   label: neutral (id = 2)
11/22/2025 20:18:04 

Validation losses: [0.8772950309973496, 0.6009280452361474]


Iteration: 100%|██████████| 109/109 [00:44<00:00,  2.45it/s]
11/22/2025 20:18:51 - INFO - finbert.utils -   *** Example ***
11/22/2025 20:18:51 - INFO - finbert.utils -   guid: validation-1
11/22/2025 20:18:51 - INFO - finbert.utils -   tokens: [CLS] our in - depth expertise extends to the fields of energy , industry , urban & mobility and water & environment [SEP]
11/22/2025 20:18:51 - INFO - finbert.utils -   input_ids: 101 2256 1999 1011 5995 11532 8908 2000 1996 4249 1997 2943 1010 3068 1010 3923 1004 12969 1998 2300 1004 4044 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:18:51 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:18:51 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:18:51 - INFO - finbert.utils -   label: neutral (id = 2)
11/22/2025 20:18:51 

Validation losses: [0.8772950309973496, 0.6009280452361474, 0.5078935806567852]


Iteration: 100%|██████████| 109/109 [00:56<00:00,  1.94it/s]
11/22/2025 20:19:50 - INFO - finbert.utils -   *** Example ***
11/22/2025 20:19:50 - INFO - finbert.utils -   guid: validation-1
11/22/2025 20:19:50 - INFO - finbert.utils -   tokens: [CLS] our in - depth expertise extends to the fields of energy , industry , urban & mobility and water & environment [SEP]
11/22/2025 20:19:50 - INFO - finbert.utils -   input_ids: 101 2256 1999 1011 5995 11532 8908 2000 1996 4249 1997 2943 1010 3068 1010 3923 1004 12969 1998 2300 1004 4044 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:19:50 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:19:50 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:19:50 - INFO - finbert.utils -   label: neutral (id = 2)
11/22/2025 20:19:50 

Validation losses: [0.8772950309973496, 0.6009280452361474, 0.5078935806567852, 0.46449002852806676]


Epoch: 100%|██████████| 4/4 [02:51<00:00, 42.87s/it]


## Test the model


In [12]:
test_data = finbert.get_data('test')


In [13]:
results = finbert.evaluate(examples=test_data, model=trained_model)


11/22/2025 20:19:54 - INFO - finbert.utils -   *** Example ***
11/22/2025 20:19:54 - INFO - finbert.utils -   guid: test-1
11/22/2025 20:19:54 - INFO - finbert.utils -   tokens: [CLS] the bristol port company has sealed a one million pound contract with cooper specialised handling to supply it with four 45 - ton ##ne , custom ##ised reach stack ##ers from ko ##ne ##cr ##ane ##s [SEP]
11/22/2025 20:19:54 - INFO - finbert.utils -   input_ids: 101 1996 7067 3417 2194 2038 10203 1037 2028 2454 9044 3206 2007 6201 17009 8304 2000 4425 2009 2007 2176 3429 1011 10228 2638 1010 7661 5084 3362 9991 2545 2013 12849 2638 26775 7231 2015 102 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:19:54 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:19:54 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:19:54 - IN

### Prepare the classification report


In [14]:
def report(df, cols=['label','prediction','logits']):
    cs = CrossEntropyLoss(weight=finbert.class_weights)
    loss = cs(torch.tensor(list(df[cols[2]])),torch.tensor(list(df[cols[0]])))
    print("Loss:{0:.2f}".format(loss))
    print("Accuracy:{0:.2f}".format((df[cols[0]] == df[cols[1]]).sum() / df.shape[0]) )
    print("\\nClassification Report:")
    print(classification_report(df[cols[0]], df[cols[1]]))


In [15]:
results['prediction'] = results.predictions.apply(lambda x: np.argmax(x,axis=0))


In [16]:
report(results,cols=['labels','prediction','predictions'])


Loss:0.51
Accuracy:0.77
\nClassification Report:
              precision    recall  f1-score   support

           0       0.66      0.76      0.71       267
           1       0.62      0.91      0.74       128
           2       0.90      0.74      0.81       575

    accuracy                           0.77       970
   macro avg       0.73      0.81      0.75       970
weighted avg       0.80      0.77      0.77       970



  loss = cs(torch.tensor(list(df[cols[2]])),torch.tensor(list(df[cols[0]])))


## Get predictions with profiling


In [17]:
text = "Later that day Apple said it was revising down its earnings expectations in \
the fourth quarter of 2018, largely because of lower sales and signs of economic weakness in China. \
The news rapidly infected financial markets. Apple's share price fell by around 7% in after-hours \
trading and the decline was extended to more than 10% when the market opened. The dollar fell \
by 3.7% against the yen in a matter of minutes after the announcement, before rapidly recovering \
some ground. Asian stockmarkets closed down on January 3rd and European ones opened lower. \
Yields on government bonds fell as investors fled to the traditional haven in a market storm."


In [18]:
cl_path = project_dir/'models'/'sentiment'
model = AutoModelForSequenceClassification.from_pretrained(cl_path, cache_dir=None, num_labels=3)


In [19]:
import nltk
nltk.download('punkt')


[nltk_data] Downloading package punkt to
[nltk_data]     /Users/taimurshaikh/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


True

In [20]:
result = profiled_predict(text, model)


11/22/2025 20:19:59 - INFO - finbert.utils -   *** Example ***
11/22/2025 20:19:59 - INFO - finbert.utils -   guid: 0
11/22/2025 20:19:59 - INFO - finbert.utils -   tokens: [CLS] later that day apple said it was rev ##ising down its earnings expectations in the fourth quarter of 2018 , largely because of lower sales and signs of economic weakness in china . [SEP]
11/22/2025 20:19:59 - INFO - finbert.utils -   input_ids: 101 2101 2008 2154 6207 2056 2009 2001 7065 9355 2091 2049 16565 10908 1999 1996 2959 4284 1997 2760 1010 4321 2138 1997 2896 4341 1998 5751 1997 3171 11251 1999 2859 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:19:59 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:19:59 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 

Starting Profiled Inference
Device: cpu


  logits = softmax(np.array(logits.cpu()))
  result = pd.concat([result, batch_result], ignore_index=True)
11/22/2025 20:19:59 - INFO - finbert.utils -   *** Example ***
11/22/2025 20:19:59 - INFO - finbert.utils -   guid: 0
11/22/2025 20:19:59 - INFO - finbert.utils -   tokens: [CLS] yields on government bonds fell as investors fled to the traditional haven in a market storm . [SEP]
11/22/2025 20:19:59 - INFO - finbert.utils -   input_ids: 101 16189 2006 2231 9547 3062 2004 9387 6783 2000 1996 3151 4033 1999 1037 3006 4040 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:19:59 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:19:59 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

PROFILING RESULTS - Inference
\nBy CPU Time:
-----------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                 Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
-----------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                    inference_forward        13.80%      37.599ms        87.63%     238.733ms     119.367ms          72 B    -248.22 MB             2  
                                         aten::linear         0.78%       2.130ms        58.05%     158.139ms       1.069ms     121.52 MB           0 B           148  
                                          aten::addmm        52.06%     141.819ms        56.46%     153.818ms      

In [21]:
blob = TextBlob(text)
result['textblob_prediction'] = [sentence.sentiment.polarity for sentence in blob.sentences]
result.head()


Unnamed: 0,sentence,logit,prediction,sentiment_score,textblob_prediction
0,"Later that day Apple said it was revising down its earnings expectations in the fourth quarter of 2018, largely because of lower sales and signs of economic weakness in China.","[0.14201558, 0.83104205, 0.026942372]",negative,-0.689026,0.051746
1,The news rapidly infected financial markets.,"[0.0788608, 0.59356976, 0.32756948]",negative,-0.514709,0.0
2,Apple's share price fell by around 7% in after-hours trading and the decline was extended to more than 10% when the market opened.,"[0.056227226, 0.90114564, 0.042627137]",negative,-0.844918,0.5
3,"The dollar fell by 3.7% against the yen in a matter of minutes after the announcement, before rapidly recovering some ground.","[0.12553856, 0.8490222, 0.025439167]",negative,-0.723484,0.0
4,Asian stockmarkets closed down on January 3rd and European ones opened lower.,"[0.07658135, 0.7828065, 0.14061213]",negative,-0.706225,-0.051111


In [22]:
print(f'Average sentiment is %.2f.' % (result.sentiment_score.mean()))


Average sentiment is -0.71.


## Second example with profiling


In [23]:
text2 = "Shares in the spin-off of South African e-commerce group Naspers surged more than 25% \
in the first minutes of their market debut in Amsterdam on Wednesday. Bob van Dijk, CEO of \
Naspers and Prosus Group poses at Amsterdam's stock exchange, as Prosus begins trading on the \
Euronext stock exchange in Amsterdam, Netherlands, September 11, 2019. REUTERS/Piroschka van de Wouw \
Prosus comprises Naspers' global empire of consumer internet assets, with the jewel in the crown a \
31% stake in Chinese tech titan Tencent. There is 'way more demand than is even available, so that's \
good,' said the CEO of Euronext Amsterdam, Maurice van Tilburg. 'It's going to be an interesting \
hour of trade after opening this morning.' Euronext had given an indicative price of 58.70 euros \
per share for Prosus, implying a market value of 95.3 billion euros ($105 billion). The shares \
jumped to 76 euros on opening and were trading at 75 euros at 0719 GMT."


In [24]:
result2 = profiled_predict(text2, model)
blob = TextBlob(text2)
result2['textblob_prediction'] = [sentence.sentiment.polarity for sentence in blob.sentences]


11/22/2025 20:20:00 - INFO - finbert.utils -   *** Example ***
11/22/2025 20:20:00 - INFO - finbert.utils -   guid: 0
11/22/2025 20:20:00 - INFO - finbert.utils -   tokens: [CLS] shares in the spin - off of south african e - commerce group nas ##pers surged more than 25 % in the first minutes of their market debut in amsterdam on wednesday . [SEP]
11/22/2025 20:20:00 - INFO - finbert.utils -   input_ids: 101 6661 1999 1996 6714 1011 2125 1997 2148 3060 1041 1011 6236 2177 17235 7347 18852 2062 2084 2423 1003 1999 1996 2034 2781 1997 2037 3006 2834 1999 7598 2006 9317 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:20:00 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:20:00 - INFO - finbert.utils -   token_type_ids: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Starting Profiled Inference
Device: cpu


  logits = softmax(np.array(logits.cpu()))
  result = pd.concat([result, batch_result], ignore_index=True)
11/22/2025 20:20:00 - INFO - finbert.utils -   *** Example ***
11/22/2025 20:20:00 - INFO - finbert.utils -   guid: 0
11/22/2025 20:20:00 - INFO - finbert.utils -   tokens: [CLS] euro ##ne ##xt had given an indicative price of 58 . 70 euros per share for pro ##sus , implying a market value of 95 . 3 billion euros ( $ 105 billion ) . [SEP]
11/22/2025 20:20:00 - INFO - finbert.utils -   input_ids: 101 9944 2638 18413 2018 2445 2019 24668 3976 1997 5388 1012 3963 19329 2566 3745 2005 4013 13203 1010 20242 1037 3006 3643 1997 5345 1012 1017 4551 19329 1006 1002 8746 4551 1007 1012 102 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:20:00 - INFO - finbert.utils -   attention_mask: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
11/22/2025 20:20:00 - INFO - finbert.utils -   token_type_id

PROFILING RESULTS - Inference
\nBy CPU Time:
-----------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                                 Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg       CPU Mem  Self CPU Mem    # of Calls  
-----------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  
                                    inference_forward        19.26%      39.431ms        93.35%     191.154ms      95.577ms          84 B    -289.52 MB             2  
                                         aten::linear         0.84%       1.722ms        56.77%     116.238ms     785.391us     141.77 MB           0 B           148  
                                          aten::addmm        49.25%     100.856ms        54.92%     112.465ms     7

In [25]:
result2


Unnamed: 0,sentence,logit,prediction,sentiment_score,textblob_prediction
0,Shares in the spin-off of South African e-commerce group Naspers surged more than 25% in the first minutes of their market debut in Amsterdam on Wednesday.,"[0.7369177, 0.06687665, 0.19620562]",positive,0.670041,0.25
1,"Bob van Dijk, CEO of Naspers and Prosus Group poses at Amsterdam's stock exchange, as Prosus begins trading on the Euronext stock exchange in Amsterdam, Netherlands, September 11, 2019.","[0.18849334, 0.043508735, 0.7679979]",neutral,0.144985,0.0
2,"REUTERS/Piroschka van de Wouw Prosus comprises Naspers' global empire of consumer internet assets, with the jewel in the crown a 31% stake in Chinese tech titan Tencent.","[0.26907253, 0.02341426, 0.70751315]",neutral,0.245658,0.0
3,"There is 'way more demand than is even available, so that's good,' said the CEO of Euronext Amsterdam, Maurice van Tilburg.","[0.73105943, 0.06360316, 0.2053374]",positive,0.667456,0.533333
4,'It's going to be an interesting hour of trade after opening this morning.',"[0.58056855, 0.10513434, 0.31429714]",positive,0.475434,0.5
5,"Euronext had given an indicative price of 58.70 euros per share for Prosus, implying a market value of 95.3 billion euros ($105 billion).","[0.231448, 0.05059284, 0.7179592]",neutral,0.180855,0.0
6,The shares jumped to 76 euros on opening and were trading at 75 euros at 0719 GMT.,"[0.33516562, 0.042202037, 0.6226323]",neutral,0.292964,0.0


In [26]:
print(f'Average sentiment is %.2f.' % (result2.sentiment_score.mean()))


Average sentiment is 0.38.


## Summary

This notebook provides baseline profiling data for:
1. **Training operations**: Data loading, forward pass, loss calculation, backward pass, optimizer step
2. **Inference operations**: Tokenization, feature conversion, model forward pass, postprocessing

The profiling output shows CPU time (and CUDA time if available) for each operation, which can be used to identify optimization opportunities.
