# MedGemma 1.5 量化结果对比

## 对比项目
- **Original (W4A16/FP16)**：全精度基线
- **W4A4**：4-bit 权重 + 4-bit 激活
- **W4A8**：4-bit 权重 + 8-bit 激活

## 前置条件
需先运行 01/02/03 Notebooks 生成各自的 scores.json

## 评估指标
- **RadGraph F1**：RG_E（实体）、RG_ER（实体+关系）、RG_ER_bar（完整）
- **GPU 显存**：峰值占用（GB）

# Cell 1: 加载结果

In [None]:
import json
import os
import pandas as pd

def load_scores(filename):
    path = f"/kaggle/working/{filename}"
    if not os.path.exists(path):
        print(f"⚠️ 未找到 {filename}，请先运行对应 Notebook")
        return None
    with open(path) as f:
        return json.load(f)

original = load_scores("original_scores.json")
w4a4 = load_scores("w4a4_scores.json")
w4a8 = load_scores("w4a8_scores.json")

print("✅ 结果加载完成")

# Cell 2: RadGraph F1 对比

In [None]:
# 构建对比表
data = []

if original:
    s = original["scores"]
    data.append(["Original (FP16)", f"{s['rg_e']*100:.2f}", f"{s['rg_er']*100:.2f}", f"{s['rg_er_bar']*100:.2f}", f"{original['gpu_gb']:.2f}"])

if w4a4:
    s = w4a4["scores"]
    delta = ((s['rg_er'] - original['scores']['rg_er']) / original['scores']['rg_er'] * 100) if original else 0
    mem_delta = ((w4a4['gpu_gb'] - original['gpu_gb']) / original['gpu_gb'] * 100) if original else 0
    data.append(["W4A4", f"{s['rg_e']*100:.2f}", f"{s['rg_er']*100:.2f} ({delta:+.1f}%)", f"{s['rg_er_bar']*100:.2f}", f"{w4a4['gpu_gb']:.2f} ({mem_delta:+.1f}%)"])

if w4a8:
    s = w4a8["scores"]
    delta = ((s['rg_er'] - original['scores']['rg_er']) / original['scores']['rg_er'] * 100) if original else 0
    mem_delta = ((w4a8['gpu_gb'] - original['gpu_gb']) / original['gpu_gb'] * 100) if original else 0
    data.append(["W4A8", f"{s['rg_e']*100:.2f}", f"{s['rg_er']*100:.2f} ({delta:+.1f}%)", f"{s['rg_er_bar']*100:.2f}", f"{w4a8['gpu_gb']:.2f} ({mem_delta:+.1f}%)"])

df_compare = pd.DataFrame(data, columns=["Model", "RG_E", "RG_ER (论文常用)", "RG_ER_bar", "GPU Memory (GB)"])

print("\n" + "=" * 80)
print("MedGemma 1.5 量化对比（Kaggle P100）")
print("=" * 80)
print(df_compare.to_string(index=False))
print("=" * 80)

# 保存对比表
df_compare.to_csv("/kaggle/working/comparison_results.csv", index=False)
print("\n✅ 对比结果已保存至 /kaggle/working/comparison_results.csv")

# Cell 3: 可视化对比

In [None]:
import matplotlib.pyplot as plt

if original and w4a4 and w4a8:
    models = ["Original", "W4A4", "W4A8"]
    rg_er_scores = [
        original["scores"]["rg_er"] * 100,
        w4a4["scores"]["rg_er"] * 100,
        w4a8["scores"]["rg_er"] * 100
    ]
    gpu_mem = [
        original["gpu_gb"],
        w4a4["gpu_gb"],
        w4a8["gpu_gb"]
    ]

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

    # RadGraph F1
    ax1.bar(models, rg_er_scores, color=["#4CAF50", "#FF9800", "#2196F3"])
    ax1.set_ylabel("RG_ER Score")
    ax1.set_title("RadGraph F1 (Entity+Relation)")
    ax1.set_ylim(0, max(rg_er_scores) * 1.2)
    for i, v in enumerate(rg_er_scores):
        ax1.text(i, v + 1, f"{v:.2f}", ha="center", fontsize=10)

    # GPU Memory
    ax2.bar(models, gpu_mem, color=["#4CAF50", "#FF9800", "#2196F3"])
    ax2.set_ylabel("GPU Memory (GB)")
    ax2.set_title("Peak GPU Memory Usage")
    ax2.set_ylim(0, max(gpu_mem) * 1.2)
    for i, v in enumerate(gpu_mem):
        ax2.text(i, v + 0.2, f"{v:.2f}", ha="center", fontsize=10)

    plt.tight_layout()
    plt.savefig("/kaggle/working/comparison_plot.png", dpi=150, bbox_inches="tight")
    plt.show()
    print("✅ 可视化已保存至 /kaggle/working/comparison_plot.png")
else:
    print("⚠️ 缺少部分结果，请先运行 01/02/03 Notebooks")

# 总结

## 预期结果
- **Original**：RG_ER ≈ 27–30（MIMIC-CXR 论文基线），GPU ≈ 8–10 GB
- **W4A4**：RG_ER 略降 2–5%，GPU 显著降低至 3–5 GB
- **W4A8**：RG_ER 接近原始（<2% 降幅），GPU 适中降低至 5–7 GB

## 关键发现
- W4A8 在 P100 上可作为最佳平衡点（精度损失小，显存节省明显）
- W4A4 适合极端显存受限场景（如边缘设备）