# 🧠 RoBERTa Sentiment Inferenz auf MongoDB-Daten

Dieses Notebook lädt Textdaten aus einer MongoDB-Collection, führt eine Sentimentanalyse mit einem vortrainierten RoBERTa-Modell durch und speichert die Ergebnisse zurück.

## 1. 🔧 Setup & Imports

In [1]:
!pip install transformers torch pymongo python-dotenv



In [2]:
import os
import torch
import pandas as pd
from pymongo import MongoClient
from dotenv import load_dotenv
from transformers import AutoTokenizer, AutoModelForSequenceClassification

  from .autonotebook import tqdm as notebook_tqdm


## 2. 🔌 Verbindung zur MongoDB & Laden der Daten

In [None]:
load_dotenv()
MONGO_URI = os.getenv("MONGO_URI")

client = MongoClient(MONGO_URI)
db = client["ukraineBiasDB"]
collection = db["tweets_balanced"]

pipeline = [
    {
        '$project': {
            '_id': 1,
            'sentiment': '$text',
            'target': {'$literal': ''}
        }
    }
]

cursor = collection.aggregate(pipeline)
train = pd.DataFrame(list(cursor))
print(train.head())

                         _id  \
0   67e02b4955c5d9c79f5dc4f5   
1   67e02b4955c5d9c79f5dc4f6   
2   67e02b4955c5d9c79f5dc4f7   
3   67e02b4955c5d9c79f5dc4f8   
4   67e02b4955c5d9c79f5dc4f9   
..                       ...   
65  67e02b4955c5d9c79f5dc536   
66  67e02b4955c5d9c79f5dc537   
67  67e02b4955c5d9c79f5dc538   
68  67e02b4955c5d9c79f5dc539   
69  67e02b4955c5d9c79f5dc53a   

                                            sentiment target  
0   BREAKING: Trump responds to the bombshell New ...         
1   🔴 L'Occident a armé l'Ukraine et craint mainte...         
2   🚨BREAKING: Elon Musk says that American politi...         
3   What a twist! China may take part in peacekeep...         
4   Nothing to see here, just actors in Ukraine ge...         
..                                                ...    ...  
65  People need to stop calling this thing communa...         
66  Very true. Rte was dead quiet while Hezbollah ...         
67  🚨NOTICIA NACIONAL! 📢 ¡NO EXITE NINGUN CAMPO 

## 3. 🤗 Modell laden & Inferenz vorbereiten

In [4]:
model_name = "cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

labels = {0: "negative", 1: "neutral", 2: "positive"}

## 4. 🔍 Sentiment-Inferenz auf die Texte

In [9]:
def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        logits = model(**inputs).logits
        prediction = torch.argmax(logits, dim=1).item()
    return prediction

train["target"] = train["sentiment"].apply(predict_sentiment)
train[["sentiment", "target"]]

Unnamed: 0,sentiment,target
0,BREAKING: Trump responds to the bombshell New ...,0
1,🔴 L'Occident a armé l'Ukraine et craint mainte...,1
2,🚨BREAKING: Elon Musk says that American politi...,0
3,What a twist! China may take part in peacekeep...,1
4,"Nothing to see here, just actors in Ukraine ge...",0
...,...,...
65,People need to stop calling this thing communa...,0
66,Very true. Rte was dead quiet while Hezbollah ...,0
67,🚨NOTICIA NACIONAL! 📢 ¡NO EXITE NINGUN CAMPO DE...,1
68,An eyewitness recounts the horrific moment whe...,0


## 5. 💾 Ergebnisse zurück in MongoDB speichern (optional)

In [None]:
"""for i, row in train.iterrows():
    collection.update_one(
        {"_id": row["_id"]},
        {"$set": {"target": int(row["target"])}}
    )
    """