<a href="https://colab.research.google.com/github/ujie22/Sentiment-Analysis/blob/main/Sentiment-Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 情緒語調中文資料集

In [None]:
!pip install -U gdown



In [None]:
file_id = "11SiaL55Lqq68FaUEhwOOHYWKoBeTrAlc"
file_name = "data.csv"

!gdown --id $file_id -O $file_name


Downloading...
From: https://drive.google.com/uc?id=11SiaL55Lqq68FaUEhwOOHYWKoBeTrAlc
To: /content/data.csv
100% 315k/315k [00:00<00:00, 84.7MB/s]


In [None]:
import pandas as pd

# 讀取 CSV 檔案
df = pd.read_csv("data.csv")
df.head()

Unnamed: 0,text,emotion
0,你要不要去吃午餐？,平淡語氣
1,誒誒誒！我甄選上了！,開心語調
2,我幾天身體好像有點不太舒服，肚子好痛,悲傷語調
3,我的小專題組員都不做事，幹!超後悔跟他一組,憤怒語調
4,他們是不是吵架了？不會打起來吧？,平淡語氣


In [None]:
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
import torch

In [None]:
label2id = {
    '厭惡語調': 0,
    '平淡語氣': 1,
    '悲傷語調': 2,
    '憤怒語調': 3,
    '疑問語調': 4,
    '開心語調': 5,
    '關切語調': 6,
    '驚奇語調': 7
}
id2label = {v: k for k, v in label2id.items()}
print(id2label)

{0: '厭惡語調', 1: '平淡語氣', 2: '悲傷語調', 3: '憤怒語調', 4: '疑問語調', 5: '開心語調', 6: '關切語調', 7: '驚奇語調'}


### 安裝並引入必要套件

In [None]:
!pip install transformers datasets scikit-learn



### 載入中文模型

In [None]:
# 載入中文模型
model_name = "IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment"
hf_model=BertForSequenceClassification.from_pretrained(
    model_name,
    num_labels=len(label2id), # Use the number of labels from the dataset
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/785 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/409M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/409M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([2, 768]) in the checkpoint and torch.Size([8, 768]) in the model instantiated
- classifier.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([8]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
hf_model.to(device)

BertForSequenceClassification(
  (bert): BertModel(
    (embeddings): BertEmbeddings(
      (word_embeddings): Embedding(21128, 768, padding_idx=1)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): BertEncoder(
      (layer): ModuleList(
        (0-11): 12 x BertLayer(
          (attention): BertAttention(
            (self): BertSdpaSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): BertSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm((768,), eps=1e

### 如果要從 Google Drive 裡存取或讀取資料，複製模型檔案到 Colab 工作目錄(直接訓練的話不用跑)

In [None]:
# 雲端硬碟
from google.colab import drive
drive.mount('/content/drive')

# 查看模型資料夾內容
!ls /content/drive/MyDrive/emotion_model

# 如果檔案都在，才複製到目前目錄
!cp -r /content/drive/MyDrive/emotion_model ./


#### 載入訓練好的模型與 tokenizer(直接訓練的話不用跑)

In [None]:
model = BertForSequenceClassification.from_pretrained("./emotion_model/checkpoint-1248")
tokenizer = BertTokenizer.from_pretrained("./emotion_model/checkpoint-1248")

### 建立訓練資料集與預處理

In [None]:
from datasets import load_dataset, Dataset
# 載入 Hugging Face 中文情緒資料集
df = pd.read_csv("data.csv")  # 包含 text, emotion 欄位
dataset = Dataset.from_pandas(df)
dataset = dataset.train_test_split(test_size=0.2)
# 建立標籤對應表
label_names = list(set(dataset["train"]["emotion"]).union(set(dataset["test"]["emotion"]))) # 確保包含訓練集和測試集的所有標籤
label_names.sort() # 排序以確保順序一致
label2id = {label: idx for idx, label in enumerate(label_names)}
id2label = {v: k for k, v in label2id.items()}

# 載入中文 tokenizer
tokenizer = BertTokenizer.from_pretrained("IDEA-CCNL/Erlangshen-Roberta-110M-Sentiment")
# Tokenization
def tokenize(example):
    return {
        **tokenizer(example["text"], truncation=True, padding="max_length", max_length=128),
        "label": torch.tensor(label2id[example["emotion"]]).to(device)
    }

tokenized_dataset = dataset.map(tokenize)



vocab.txt:   0%|          | 0.00/110k [00:00<?, ?B/s]

Map:   0%|          | 0/3327 [00:00<?, ? examples/s]

Map:   0%|          | 0/832 [00:00<?, ? examples/s]

### 使用 Hugging Face 的 Trainer 進行模型訓練

In [None]:
training_args = TrainingArguments(
    output_dir="./emotion_model",
    eval_strategy="epoch",
    report_to="none",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
)

trainer = Trainer(
    model=hf_model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
)

trainer.train()

  trainer = Trainer(


Epoch,Training Loss,Validation Loss
1,No log,0.446777
2,0.666800,0.47372
3,0.230600,0.462898


TrainOutput(global_step=1248, training_loss=0.3884927737407195, metrics={'train_runtime': 302.2292, 'train_samples_per_second': 33.025, 'train_steps_per_second': 4.129, 'total_flos': 656563229079552.0, 'train_loss': 0.3884927737407195, 'epoch': 3.0})

### 模型訓練好可以存到Google Drive，不用一直重新訓練(直接訓練不儲存的話不用跑)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# 手動複製訓練好的模型資料夾到雲端
!cp -r ./emotion_model /content/drive/MyDrive/


### 測試模型的判斷(做測試用)

In [None]:
import torch.nn.functional as F
# 取得使用者輸入
text = input("請輸入你的心情或想法句子：\n> ")

# 編碼並丟進模型
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
    logits =hf_model(**inputs).logits
    probs = F.softmax(logits, dim=-1)[0].cpu().numpy()  # 轉換成百分比分佈


In [None]:
print("\n你的情緒分析結果：")
for label, idx in label2id.items():
    print(f"{label:<8}: {probs[idx]*100:.2f}%")

### 設定好我們要的 LLM

In [None]:
from openai import OpenAI
import gradio as gr

In [None]:
import os
from google.colab import userdata

### 使用 Groq 服務

In [None]:
api_key = userdata.get('Groq')

In [None]:
os.environ["OPENAI_API_KEY"] = api_key

In [None]:
llm_model = "meta-llama/llama-4-scout-17b-16e-instruct"
base_url="https://api.groq.com/openai/v1"

In [None]:
client = OpenAI(
    base_url=base_url # 使用 OpenAI 本身不需要這段
)

### 根據使用者的輸入情緒、說的話、追問內容與選擇的「角色」，動態生成一段陪伴式的回應

In [None]:
def generate_emotion_response(emotion, user_input, followup_question, role):
    prompt_options = {
    "溫柔姐姐👧🏼": "請用溫柔、理解、像姊姊般的語氣給出陪伴與安慰，用列點給出建議",
    "理性分析師🙋‍♀️": "請用冷靜、理性、邏輯清晰的語氣分析使用者狀況，並列點提供實用建議",
    "搞笑朋友🧏": "請用幽默、搞笑但不失溫暖的語氣安慰對方，讓他笑出來,像朋友在身邊一樣",
    "療癒小狗🐶": "請用像忠誠可愛的小狗陪伴主人的方式來回應，語氣天真、暖心，像汪汪叫的撒嬌與貼心陪伴，可以用一些可愛語氣或小表情符號增加親切感，讓主人可以更開心的面對各種處境",
    "貼心男友👦🏻": "請用溫柔體貼、像男朋友對待戀人那樣的語氣給予安慰與支持，展現自己的貼心,語氣親密但不輕浮，像真的陪在身邊。可以適當加入暱稱、鼓勵的語氣、願意傾聽的態度，讓對方感覺到有人懂、有人在。請列點給出實用建議，語氣可幽默但以溫柔為主。",
    }
    prompt_choices = list(prompt_options.keys())
    # 取得該角色對應的語氣說明
    tone_instruction = prompt_options.get(role, "")

    # 組合 prompt
    prompt = f"""
    你是一位具有同理心的情緒陪伴聊天機器人，目前的角色是：{role}。

    {tone_instruction}
    請根據以下資訊生成回應：
    - 使用者目前的主要情緒是：{emotion}
    - 使用者說：{user_input}
    - 若使用者有追問：{followup_question}

    請包含：
    1. 一句安慰或支持的語句
    2. 1～2 個具體可行的建議（例如：呼吸練習、日記書寫、與朋友傾訴、出門散心、看影片放鬆等），越有趣越有效越好，每次不要一直重複
    3. 可視情況加入角色風格（如幽默或理性分析）
    4. 如果是作業問答的問題的話，請認真解決問題!!

    風格要求：
    - 回應長度 80～150 字
    - 語氣應符合 {role} 的風格
    - 若表達嚴重壓力，可適度提醒求助他人，提出有效解決方法
    - 可加入顏文字或表情符號增加親和力
    請全文使用繁體中文回覆
    """
    response = client.chat.completions.create(
        model=llm_model,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

### 加入總結與建議

In [None]:
def summarize_emotions(history):
    if not history:
        return "目前尚未有任何對話。"

    full_text = "\n".join([f"使用者：{u}\nAI：{a}" for u, a in history])
    prompt = f"""
    你是一個溫柔有洞察力的情緒諮詢師。請幫我總結以下對話紀錄，並給出 2~3 句溫暖且列點出具體的建議，可以說服使用者：

    對話內容如下：
    {full_text}

    請使用繁體中文回覆，語氣溫柔與鼓勵。
    """
    response = client.chat.completions.create(
        model=llm_model,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

### 產生文字檔

In [None]:
def export_summary_txt(history, summary):
    full_content = "【對話紀錄】\n\n"
    for user_msg, bot_resp in history:
        full_content += f"使用者：{user_msg}\nAI：{bot_resp}\n\n"

    full_content += "【總結與建議】\n\n" + summary

    path = "/content/情緒總結.txt"
    with open(path, "w", encoding="utf-8") as f:
        f.write(full_content)
    return path


### Colab 進行matplotlib繪圖時顯示繁體中文
### 下載台北思源黑體並命名taipei_sans_tc_beta.ttf，移至指定路徑

In [None]:
!wget -O TaipeiSansTCBeta-Regular.ttf https://drive.google.com/uc?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_&export=download

import matplotlib

matplotlib.font_manager.fontManager.addfont('TaipeiSansTCBeta-Regular.ttf')
matplotlib.rc('font', family='Taipei Sans TC Beta')

--2025-06-20 12:31:33--  https://drive.google.com/uc?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_
Resolving drive.google.com (drive.google.com)... 142.251.31.100, 142.251.31.138, 142.251.31.139, ...
Connecting to drive.google.com (drive.google.com)|142.251.31.100|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://drive.usercontent.google.com/download?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_ [following]
--2025-06-20 12:31:34--  https://drive.usercontent.google.com/download?id=1eGAsTN1HBpJAkeVM57_C7ccp7hbgSz3_
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 74.125.143.132, 2a00:1450:4013:c03::84
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|74.125.143.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20659344 (20M) [application/octet-stream]
Saving to: ‘TaipeiSansTCBeta-Regular.ttf’


2025-06-20 12:31:40 (31.7 MB/s) - ‘TaipeiSansTCBeta-Regular.ttf’ saved [20659344/20659344]



## 用 Gradio 打造 Web App

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from io import BytesIO
def respond(message, history, role):
    inputs = tokenizer(message, return_tensors="pt", truncation=True, padding=True, max_length=128).to(device)
    with torch.no_grad():
        logits =hf_model(**inputs).logits
        probs = F.softmax(logits, dim=-1)[0].cpu().numpy()

    # 找出最大情緒
    pred_idx = np.argmax(probs)
    pred_emotion = hf_model.config.id2label[pred_idx]
    response = f"你說的是：「{message}」，我感受到你的情緒是「{pred_emotion}」"

    # 使用 LLM 根據情緒產生安慰對話
    response_text = generate_emotion_response(
        emotion=pred_emotion,
        user_input=message,
        followup_question="",
        role=role
    )
    # 加入聊天紀錄
    history.append((message, response_text))

    # 畫出情緒分佈圖
    fig, ax = plt.subplots()
    labels = ['厭惡語調', '平淡語氣', '悲傷語調', '憤怒語調', '疑問語調', '開心語調', '關切語調', '驚奇語調']

    fig, ax = plt.subplots()
    ax.barh(labels, probs, color="skyblue")
    ax.set_xlim(0, 1)
    ax.set_title("情緒分佈")
    plt.tight_layout()

    buf = BytesIO()
    plt.savefig(buf, format="png")
    buf.seek(0)
    emotion_img = Image.open(buf)
    return history, emotion_img
with gr.Blocks() as demo:
    gr.Markdown("# 情緒分析陪伴聊天機器人🩷💛")
    with gr.Row():
        with gr.Column(scale=2):
             with gr.Row():
                role_selector = gr.Dropdown(
                    choices=["溫柔姐姐👧🏼", "理性分析師🙋‍♀️", "搞笑朋友🧏","療癒小狗🐶","貼心男友👦🏻"],
                    label="選擇聊天角色",
                    value="溫柔姐姐👧🏼"
                )
             with gr.Row():
                chatbot = gr.Chatbot(label="聊天對話", height=600)
        with gr.Column(scale=2):
            with gr.Row():
                msg = gr.Textbox(placeholder="輸入你的心情...", label="請輸入")
            with gr.Row():
                emotion_plot = gr.Image(label="即時情緒分佈", type="pil", height=300)
            summary_output = gr.Textbox(label="總結與建議", lines=5)
            with gr.Row():
                summary_btn = gr.Button("📑 總結心情並給予建議")
                txt_download_btn = gr.Button("⬇️ 下載文字檔")
            file_output = gr.File(label="點我下載 .txt 檔")
    state = gr.State([])
    msg.submit(respond, [msg, state,role_selector], [chatbot, emotion_plot]) \
       .then(lambda x: "", None, msg)  # 清空輸入欄
    summary_btn.click(summarize_emotions, inputs=state, outputs=summary_output)
    txt_download_btn.click(export_summary_txt, inputs=[state, summary_output], outputs=file_output)
demo.launch(debug=True)

  chatbot = gr.Chatbot(label="聊天對話", height=600)


It looks like you are running Gradio on a hosted a Jupyter notebook. For the Gradio app to work, sharing must be enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://2bd51a3d8cf353f9e8.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://2bd51a3d8cf353f9e8.gradio.live


