# Script 3 - Sentiment Analysis

_Script by Tim Hebestreit, thebestr@smail.uni-koeln.de_

In this script we use a German BERT model pretrained for sentiment analysis. We utilize the transformers library from huggingface to load a model to analyze whether the articles have positive, neutral, or negative sentiment. The model runs are additionally logged using Weights and biases (W&B) for additional metrics and information about inference runs.

The needed packages are installed first:

In [1]:
!pip install transformers torch wandb weave



Now we install the needed libraries that are used for the analysis.

In [2]:
# --- IMPORTS ---

import pandas as pd
import torch
from transformers import pipeline
import os
from tqdm.auto import tqdm
import wandb
import time
import weave

2025-12-22 17:06:30.430885: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:479] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-12-22 17:06:33.733461: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:10575] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-12-22 17:06:33.755016: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1442] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-12-22 17:06:37.045753: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


The output from Script 2 (corpus.csv) is used as input for this script, which contains the parsed and processed corpus of German news articles. The output file, which is the corpus with added sentiment, is defined here as well.

Further, we define a threshold for articles to be classified as positive or negative. This is done as the language in news articles is often pretty neutral, even when describing positive or negative events. An example could be "Die Technologie birgt einige Risiken", which is very likely to be classified as neutral, even though the statements talks about the technology in a negative fashion. By reducing the threshhold, more articles are being classified as positive or negative if the likelihood of them being in that category is sufficiently high.

In [3]:
# --- CONFIG ---

INPUT_FILE = "../data/csv/corpus.csv"
OUTPUT_FILE = "../data/csv/corpus_with_sentiment.csv"

THRESHOLD_NEGATIVE = 0.35 
THRESHOLD_POSITIVE = 0.35

The corpus data is imported, along with a quick look at its shape and first 5 entries.

In [4]:
# --- LOAD AND INSPECT CORPUS DATA ---

if os.path.exists(INPUT_FILE):
    df = pd.read_csv(INPUT_FILE)
    print(f"Corpus data loaded. Initial shape: {df.shape}")
    print(df.head(5))
    
else:
    print(f"Input file not found: {INPUT_FILE}")
    print("Please run Script 1 to generate raw data, and then run Script 2 to create corpus data.")

Corpus data loaded. Initial shape: (54133, 9)
                                  Source  \
0                            Focus-Money   
1                       Berliner Zeitung   
2  Broadcaster: SRF 08:08 AM MEZ/CET SRF   
3                        dpa-AFX ProFeed   
4                        VDI nachrichten   

                                               Title  \
0  BUCHHALTUNGSPROGRAMME IM TEST; Software mit Gü...   
1  Wie in Hollywood; Das Computerspiel "Total War...   
2  nano spezial: «... wollt Ihr ewig leben?» vom ...   
3  IRW-News: Codebase Ventures Inc.: Pressland sc...   
4                    Gesucht: Leitbild für Mobilität   

                                                Text        Date  Word_Count  \
0  Die Buchhaltungssoftware von Lexware ist wie i...  2023-05-03        1914   
1  VON FELIX FIRME Tao Ying und Mei Yun sind best...  2019-05-31         965   
2  es ist alles nicht was Erhöhung Einzelheit sei...  2018-06-06        3204   
3  IRW-PRESS: Codebase Ventures 

W&B is used to track inference runs of this project. The config for that is set up here. Note that to run this script, you do not need to use W&B and can skip the login and authentication process. Also, we set whether the script is run using CPU or GPU.

In [5]:
# --- WEIGHTS AND BIASES CONFIG ---

# The device used for this run is set. If your device supports CUDA, GPU is used for faster processing, if it does not (like for machines that do not have a Nvidia GPU), CPU is used
# While GPU is faster, CPU is sufficient for this relatively small model
device = "gpu" if torch.cuda.is_available() else "cpu"

# Initialize W&B project
wandb.init(
    project="it-news-sentiment",
    name="bert-inference-full-run-final",
    config={
        "model_name": "oliverguhr/german-sentiment-bert",
        "threshold_negative": THRESHOLD_NEGATIVE,
        "threshold_positive": THRESHOLD_POSITIVE,
        "dataset_size": len(df),
        "batch_size": 32,
        "device": device,
        "truncation": True,
        "max_length": 512
    }
)

[34m[1mwandb[0m: Currently logged in as: [33mthebestr[0m to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


[34m[1mwandb[0m: Initializing weave.
[36m[1mweave[0m: Logged in as Weights & Biases user: thebestr.
[36m[1mweave[0m: View Weave data at https://wandb.ai/thebestr/it-news-sentiment/weave


For the sentiment analysis, the german-sentiment-bert model from huggingface was chosen: https://huggingface.co/oliverguhr/german-sentiment-bert. Inference on BERT models runs faster than for generative LLMs while being a lot smaller in size. To analyze the ~50.000 articles, this model is powerful enough without being overkill for the scope of the paper.

In [6]:
# --- LOAD MODEL ---
model_name = "oliverguhr/german-sentiment-bert"
print(f"Loading model: {model_name} ...")

# Create a transformers pipeline, which is useful to model inference in a very simple way
usegpu = -1 if device == "cpu" else 0 # device paramtere for pipeline function accepts -1 for CPU and 0 for GPU
sentiment_pipeline = pipeline("sentiment-analysis", model=model_name, device=usegpu, truncation=True, top_k=None) # Use truncation, as start of the article sufficient for sentiment

Loading model: oliverguhr/german-sentiment-bert ...


This helper function prepares the text by combining title and text, while only returning the first 2000 characters to speed up tokenizer performance. This is not a problem as the model only sees the first 400-500 words due to truncation, which is sufficient to analyse the sentiment of the article.

In [7]:
# --- HELPER FUNCTION ---

def prepare_text(row):
    
    # Combine title and text
    title = str(row['Title']) if pd.notna(row['Title']) else ""
    text = str(row['Text']) if pd.notna(row['Text']) else "" 
    combined = title + ". " + text
    # Shorten combined text to 2000 characters to improve performance
    return combined[:2000] 

This helper function is utilized to classify articles according to the custom sentiment threshold defined above.

In [8]:
# --- HELPER FUNCTION 2 ---

def apply_custom_logic(scores_list):

    # The input for this function are the raw scores (e.g. {'label': 'neutral', 'score': 0.6})
    # First, we transform these into a simple dictionary: {'neutral': 0.6}
    scores = {item['label']: item['score'] for item in scores_list}

    # From the dict, we extract the positive and negative score
    neg_score = scores.get('negative', 0)
    pos_score = scores.get('positive', 0)
    
    # We first check for the negative score, as this will more often be large enough than the positive score
    if neg_score >= THRESHOLD_NEGATIVE:
        # Check if the sentiment is still more positive than negative (e.g. neg: 0.35, pos: 0.6)
        if pos_score > neg_score:
            return 'positive', pos_score
        return 'negative', neg_score
        
    if pos_score >= THRESHOLD_POSITIVE:
        return 'positive', pos_score
        
    # If no threshholds were exceeded, return the label with the hjighest score (neutral)
    max_label = max(scores, key=scores.get)
    return max_label, scores[max_label]

The final preparation steps are to apply the prepare text function, set the batch size and results arrays. If you do not want to run the analysis on the entire dataset, uncomment the line to only use the head of the DataFrame to only analyze e.g. 50 articles.

In [9]:
# --- PREPARATION ---

df_analyze = df.copy()
#df_analyze = df.head(50).copy() #WAS USED AS A TEST RUN, CAN BE UNCOMMENTED AND ADJUSTED TO USE LESS ARTICLES

# Prepare text using helper function above
print("Prepare text (shortening to 2000 characters)...")
texts_to_analyze = df_analyze.apply(prepare_text, axis=1).tolist()

# Set batch size and results list
batch_size = 32
final_labels = []
final_scores = []
print("Setup complete. Ready to start sentiment analysis.")

Prepare text (shortening to 2000 characters)...
Setup complete. Ready to start sentiment analysis.


Now, this cell performs the actual analysis.

In [10]:
# --- RUN THE ANALYSIS ---

start_time = time.time()

# Loops through all texts to analyze batch per batch
for i in tqdm(range(0, len(texts_to_analyze), batch_size)):
    batch = texts_to_analyze[i : i + batch_size]
    
    try:
        # Pipeline gibt jetzt Listen von Listen zurück (wegen top_k=None)
        # The pipeline returns lists of lists, with the labels and scores for each article 
        batch_results_raw = sentiment_pipeline(batch)
        
        # Use helper function to classify the sentiment of each article
        for result_list in batch_results_raw:
            label, score = apply_custom_logic(result_list)
            final_labels.append(label)
            final_scores.append(score)
            
    except Exception as e:
        print(f"Error processing batch {i}: {e}")
        final_labels.extend(['error'] * len(batch))
        final_scores.extend([0.0] * len(batch))
    
    # W&B logging
    if i % (batch_size * 10) == 0:
        wandb.log({"progress": (i / len(texts_to_analyze)) * 100})

# Stop time
wandb.log({"duration_seconds": time.time() - start_time})
duration = time.time() - start_time

  0%|          | 0/1692 [00:00<?, ?it/s]

Finally, we have the sentiment label and score for each article. All that is left to do is to add them as new columns to the dataframe, save the results to a csv file and finish the Weights and Biases run.

In [14]:
# --- COMBINE AND SAVE RESULTS ---

if len(final_labels) == len(df_analyze):
    # Save sentiment label and score as new columns of the dataframe
    df_analyze['Sentiment_Label'] = final_labels
    df_analyze['Sentiment_Score'] = final_scores
    
    # Save final csv corpus
    df_analyze.to_csv(OUTPUT_FILE, index=False)
    print("-" * 60)
    print(f"File saved at:\n{OUTPUT_FILE}")
    print("-" * 60)
    
    # Give a short overview on the sentiment distribution
    print(df['Sentiment_Label'].value_counts())
    
    # Log distribution as W&B table
    sentiment_counts = df_analyze['Sentiment_Label'].value_counts()
    data = [[label, count] for label, count in sentiment_counts.items()]
    table = wandb.Table(data=data, columns=["label", "count"])
    wandb.log({"sentiment_distribution_sensitive": wandb.plot.bar(table, "label", "count")})
    print("W&B Run complete.")
    
# Throw error if size of output does not match input size
else:
    print("Error: Result size does not match corpus size.")
    print(f"Articles: {len(df_analyze)}, Results: {len(final_labels)}")

# Finish W&B run
wandb.finish()

------------------------------------------------------------
File saved at:
../data/csv/corpus_with_sentiment_sensitive.csv
------------------------------------------------------------
Sentiment_Label
neutral     47676
negative     6239
positive      218
Name: count, dtype: int64
W&B Run complete.


0,1
duration_seconds,▁
progress,▁▂▂▂▂▂▃▃▃▃▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇▇███

0,1
duration_seconds,16745.90152
progress,99.90209
