#

mechanistic-interpretability

Here are 32 public repositories matching this topic...

SAGARSS24 / MTB_manuscript_data

Physiological modeling into the metaverse of Mycobacterium tuberculosis beta CA inhibition mechanism

machine-learning systems-biology drug-design tuberculosis mechanism-of-action mechanistic-interpretability

Updated May 23, 2024

Lejoon / cup-transformer

A project that simulates a game of shuffling cups with a hidden ball underneath one of them. It also trains a Transformer based deep learning model to predict the final position of the ball after a series of swaps.

deep-learning transformers mechanistic-interpretability

Updated Jan 7, 2024
Jupyter Notebook

Zhaoyi-Li21 / creme

Implementation for paper "Understanding and Patching Compositional Reasoning in LLMs" @ ACL2024-Findings, Bangkok, Thailand.

multi-hop-reasoning large-language-models mechanistic-interpretability compositional-reasoning factual-reasoning

Updated May 17, 2024

AlejoAcelas / Interp-Benchmarks

Reversed-engineered Transformer models as a benchmark for interpretability methods

benchmark pytorch causal-analysis mechanistic-interpretability

Updated Feb 1, 2024
Jupyter Notebook

AlejoAcelas / Organizer-Mech-Interp-Challenges

Organizer's repository for the Transformer Interpretability CodaBench competition

competitive-programming transformer mechanistic-interpretability

Updated Jan 8, 2024
Jupyter Notebook

AlejoAcelas / ARENA_2.0_Exhibit

Solution to ML assignments from the Alignment Research Engineering Accelerator (ARENA) in-person program

nlp cuda transformers pytorch rl torch-lightning mechanistic-interpretability

Updated Sep 11, 2023
Jupyter Notebook

matthiasdellago / visualising-attention

Visualising (self)-attention as a vector field: exploring and building intuition. Based on anvaka.github.io/fieldplay.

visualization machine-learning transformer attention attention-mechanism vector-field mechanistic-interpretability

Updated May 6, 2023
GLSL

cx0 / mech-interpretability

Exploring length generalization in the context of indirect object identification (IOI) task for mechanistic interpretability.

ioi mechanistic-interpretability indirect-object-identification

Updated Jan 5, 2024
Python

zroe1 / toy-models-of-superposition

A replication of "Toy Models of Superposition," a groundbreaking machine learning research paper published by authors affiliated with Anthropic and Harvard in 2022.

machine-learning python3 pytorch toy-models mechanistic-interpretability

Updated Dec 30, 2023
Jupyter Notebook

AlejoAcelas / bayesian-transformers

Interpretability on 1-layer Transformer models that converge on the Bayesian-optimal solution for statistical tasks

transformers bayesian-inference mechanistic-interpretability

Updated Jan 8, 2024
Jupyter Notebook

AlejoAcelas / Mech-Interp-Challenges

Starting Kit for the CodaBench competition on Transformer Interpretability

competitive-programming transformer mechanistic-interpretability

Updated Sep 8, 2023
Python

Nix07 / binding-circuit-discovery

This repository contains the code used for the experiments in the paper "Discovering Variable Binding Circuitry with Desiderata".

mechanistic-interpretability science-of-deep-learning

Updated Mar 12, 2024
Python

daspartho / pronoun-prediction

Identifying Circuit behind Pronoun Prediction in GPT-2 Small

interpretability gpt-2 mechanistic-interpretability

Updated May 24, 2023
Jupyter Notebook

evan-lloyd / graphpatch

graphpatch is a library for activation patching on PyTorch neural network models.

pytorch interpretability large-language-models mechanistic-interpretability

Updated Jun 15, 2024
Python

DeanHazineh / Emergent-World-Representations-Othello

A mechanistic interpretability study invvestigating a sequential model trained to play the board game Othello

intervention othello-ai gpt-2 mechanistic-interpretability

Updated Jun 18, 2024
Jupyter Notebook

francescortu / comp-mech

Competition of Mechanisms: Tracing How Language Models Handle Facts and Counterfactuals

interpretability llm mechanistic-interpretability

Updated May 24, 2024
Python

lkopf / cosy

CoSy: Evaluating Textual Explanations

machine-learning xai-evaluation mechanistic-interpretability global-explainability

Updated May 31, 2024
Jupyter Notebook

apartresearch / deepdecipher

🦠 DeepDecipher: An open source API to MLP neurons

api website machine-learning research academic interpretability interpretability-methods interpretability-jam mechanistic-interpretability

Updated May 2, 2024
Rust

koayon / atp_star

PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)

machine-learning large-language-models mechanistic-interpretability

Updated Apr 16, 2024
Python

Nix07 / finetuning

This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking".

finetuning entity-tracking mechanistic-interpretability science-of-deep-learning

Updated Mar 21, 2024
Jupyter Notebook

Improve this page

Add a description, image, and links to the mechanistic-interpretability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the mechanistic-interpretability topic, visit your repo's landing page and select "manage topics."