WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference

This repository is the Pytorch implementation of WINA (Weight Informed Neuron Activation). WINA is a novel, simple, and training-free sparse activation framework that jointly considers hidden state magnitudes and the column-wise ℓ2-norms of weight matrices. We show that this leads to a sparsification strategy that obtains optimal approximation error bounds with theoretical guarantees tighter than existing techniques. Empirically, WINA also outperforms state-of-the-art methods in average performance at the same sparsity levels, across a diverse set of LLM architectures and datasets. These results position WINA as a new performance frontier for training-free sparse activation in LLM inference, advancing training-free sparse activation methods and setting a robust baseline for efficient inference.

The current release supports:

Llama-2-7B
Llama-3-8B
Qwen-2.5-7B
Phi-4 (14B)

Install

Clone the repo and navigate to wina:

git clone https://github.com/microsoft/wina.git
cd wina

Set up environment:

conda create -yn wina python=3.11
conda activate wina

pip install -r requirements.txt

Sparsity Allocation

We assign sparsity for each weight matrices instead of a uniform sparsity across the model through the greedy algorithm proposed in TEAL.

💡Note: If you want to assign uniform sparsity for all matrices, you can skip this part.

Construct hidden states histograms.

python wina/grab_acts.py --model_name [MODEL_PATH] --output_path [OUTPUT_PATH] --sparse_mode [SPARSE_MODE] --transform(Optional)

Allocate sparsity

python wina/greedyopt.py --model_name [MODEL_PATH] --output_path [OUTPUT_PATH] --sparse_mode [SPARSE_MODE] --model_type [MODEL_TYPE] --transform(Optional)

💡Notes: The '--transform' flag enables tensor transformation. Simply omit this flag if you wish to skip the transformation process.

ℹ️ Arguments

[MODEL_PATH]: Path to base model, can be remote path or local path
[OUTPUT_PATH]: Path to save outputs, including hidden states histograms and sparsity allocation results.
[SPARSE_MODE]: Can be WINA or TEAL
[MODEL_TYPE]: Can be one of the following four:
- Llama-3-8B
- Llama-2-7B
- Qwen-2.5-7B
- Phi-4-14B

Evaluation

python eval.py --base_model [MODEL_PATH] --save_path [OUTPUT_PATH] --sparsity [sparsity] --sparse_mode [SPARSE_MODE] --greedy

💡Notes: The --greedy flag enables per-matrix sparsity assignment. Ensure Sparsity Allocation is completed before using this option.

Reproduction

Dense model

bash scripts/baseline/qwen-2.5-7b.bash [MODEL_PATH]
bash scripts/baseline/llama-2-7b.bash [MODEL_PATH]
bash scripts/baseline/llama-3-8b.bash [MODEL_PATH]
bash scripts/baseline/phi_4_14b.bash [MODEL_PATH]

WINA Sparsification

bash scripts/wina/qwen_2.5_7b.bash [MODEL_PATH]
bash scripts/wina/llama_2_7b.bash [MODEL_PATH]
bash scripts/wina/llama_3_8b.bash [MODEL_PATH]
bash scripts/wina/phi_4_14b.bash [MODEL_PATH]

TEAL Sparsification

bash scripts/teal/qwen_2.5_7b.bash [MODEL_PATH]
bash scripts/teal/llama_2_7b.bash [MODEL_PATH]
bash scripts/teal/llama_3_8b.bash [MODEL_PATH]
bash scripts/teal/phi_4_14b.bash [MODEL_PATH]

TEAL-transform Sparsification

bash scripts/teal_transform/qwen_2.5_7b.bash [MODEL_PATH]
bash scripts/teal_transform/llama_2_7b.bash [MODEL_PATH]
bash scripts/teal_transform/llama_3_8b.bash [MODEL_PATH]
bash scripts/teal_transform/phi_4_14b.bash [MODEL_PATH]

Citation

If you find this repo useful, please consider citing:

@misc{chen2025winaweightinformedneuron,
      title={WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference}, 
      author={Sihan Chen and Dan Zhao and Jongwoo Ko and Colby Banbury and Huiping Zhuang and Luming Liang and Tianyi Chen},
      year={2025},
      eprint={2505.19427},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.19427}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
figures		figures
lm_eval		lm_eval
scripts		scripts
utils		utils
wina		wina
README.md		README.md
SECURITY.md		SECURITY.md
eval.py		eval.py
lm_eval_harness.py		lm_eval_harness.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference

Contents

Install

Sparsity Allocation

Evaluation

Reproduction

Dense model

WINA Sparsification

TEAL Sparsification

TEAL-transform Sparsification

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

microsoft/wina

Folders and files

Latest commit

History

Repository files navigation

WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference

Contents

Install

Sparsity Allocation

Evaluation

Reproduction

Dense model

WINA Sparsification

TEAL Sparsification

TEAL-transform Sparsification

Citation

About

Resources

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages