Automated Interpretability and Feature Discovery in Language Models with AI Agents
ai feature-discovery sparse-autoencoders interpretability ai-agents sparse-autoencoder interpretable-machine-learning ai-agent mechanistic-interpretability automated-interpretability llm-feature-extraction llm-features
-
Updated
May 7, 2026 - Python