mechanistic-interpretability

Mechanistically interpretable neurosymbolic AI (Nature Comput Sci 2024): losslessly compressing NNs to computer code and discovering new algorithms which generalize out-of-distribution and outperform human-designed algorithms

program-synthesis knowledge-distillation inductive-logic-programming domain-adaptation explainable-ai interpretable distilling neurosymbolic model-distillation out-of-distribution-generalization mechanistic-interpretability

Updated Feb 20, 2024
Python

stanfordnlp / pyvene

Star

Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions

intervention interpretability mechanistic-interpretability activation-intervention activation-patching

Updated Jul 5, 2024
Python

Improve this page

Add a description, image, and links to the mechanistic-interpretability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mechanistic-interpretability topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mechanistic-interpretability

Here are 11 public repositories matching this topic...

cx0 / mech-interpretability

AlejoAcelas / Mech-Interp-Challenges

Nix07 / binding-circuit-discovery

evan-lloyd / graphpatch

francescortu / comp-mech

koayon / atp_star

aryamanarora / causalgym

steering-vectors / steering-vectors

taufeeque9 / codebook-features

pauljblazek / deepdistilling

stanfordnlp / pyvene

Improve this page

Add this topic to your repo