A game theoretic approach to explain the output of any machine learning model.
-
Updated
Jun 4, 2024 - Jupyter Notebook
A game theoretic approach to explain the output of any machine learning model.
ir_explain: a Python Library of Explainable IR Methods
A curated list of awesome responsible machine learning resources.
For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research. Open-sourced and constantly updated.
A methodology designed to measure the contribution of the features to the predictive performance of any econometric or machine learning model.
[ICML'24] Official PyTorch Implementation of TimeX++
TrustyAI Explainability Toolkit
ReFT: Representation Finetuning for Language Models
graphpatch is a library for activation patching on PyTorch neural network models.
Stanford NLP Python Library for Understanding and Improving PyTorch Models via Interventions
Creating a PyTorch LSTM and Transformer to classify movies by genre and visualizing the LSTM's reasoning process
An attribution library for LLMs
The nnsight package enables interpreting and manipulating the internals of deep learned models.
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
Fit interpretable models. Explain blackbox machine learning.
Explain a black-box module in natural language.
Class activation maps for your PyTorch models (CAM, Grad-CAM, Grad-CAM++, Smooth Grad-CAM++, Score-CAM, SS-CAM, IS-CAM, XGrad-CAM, Layer-CAM)
Explaining black boxes with a SMILE: Statistical Mode-agnostic Interpretability with Local Explanations
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
Knowledge Circuits in Pretrained Transformers
Add a description, image, and links to the interpretability topic page so that developers can more easily learn about it.
To associate your repository with the interpretability topic, visit your repo's landing page and select "manage topics."