Repositories
Showing 10 of 69 repositories
git-disl/awesome_LLM-harmful-fine-tuning-papers’s past year of commit activity
git-disl/awesome-LLM-game-agent-papers’s past year of commit activity
Safety-Tax
Public
This is the official code for the paper "Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable".
git-disl/Safety-Tax’s past year of commit activity
Python
6
Apache-2.0
0
0
0
Updated Mar 4, 2025
Virus
Public
This is the official code for the paper "Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation"
git-disl/Virus’s past year of commit activity
Python
44
Apache-2.0
2
0
0
Updated Feb 2, 2025
Booster
Public
This is the official code for the paper "Booster: Tackling Harmful Fine-tuning for Large Language Models via Attenuating Harmful Perturbation" (ICLR2025).
git-disl/Booster’s past year of commit activity
Shell
16
Apache-2.0
0
0
0
Updated Jan 5, 2025
git-disl/llm-topla’s past year of commit activity
Jupyter Notebook
5
0
1
0
Updated Jan 2, 2025
git-disl/PFT’s past year of commit activity
Python
1
0
0
0
Updated Dec 6, 2024
git-disl/Chameleon’s past year of commit activity
Python
6
1
1
0
Updated Nov 18, 2024
Vaccine
Public
This is the official code for the paper "Vaccine: Perturbation-aware Alignment for Large Language Models" (NeurIPS2024)
git-disl/Vaccine’s past year of commit activity
Shell
39
Apache-2.0
4
0
0
Updated Nov 18, 2024
git-disl/PokeLLMon’s past year of commit activity
Python
176
15
1
0
Updated Oct 12, 2024
Most used topics
Loading…
You can’t perform that action at this time.