Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
intervention
interpretability
mechanistic-interpretability
activation-intervention
activation-patching
-
Updated
May 18, 2024 - Python