RobustnessPilot — Automated LLM evaluation framework testing 3 models (14B–70B params) across 23 failure modes with 7 prompt strategies | First-author paper submitted to IEEE SRDS 2026
python benchmarking research deep-learning hpc ieee prompt-engineering llm-evaluation robustness-testing
-
Updated
Apr 21, 2026 - Java