Does Quantization Kill Interpretability? Scaling study across 5 models (124M-2.8B): RTN destroys induction heads in small models, GPTQ preserves them at all scales.
pythia quantization ai-safety sparse-autoencoder mechanistic-interpretability gptq transformerlens transformer-circuits induction-heads scaling-study
-
Updated
Mar 11, 2026 - Python