Maryam Aliakbarpour · Vladimir Braverman · Junze Yin · Haochen Zhang
AISTATS 2026 Spotlight
📄 Paper: Support Basis: Fast Attention Beyond Bounded Entries
🔗 Code: https://github.com/yinj66/support_basis
Softmax attention has quadratic complexity in sequence length ( n ), which limits the scalability of large language models (LLMs). Prior sub-quadratic algorithms rely on the bounded-entry assumption, which rarely holds in practice.
This repository implements Support-Basis Decomposition, a new framework that:
- Removes the bounded-entry assumption
- Achieves sub-quadratic runtime
- Matches the error guarantees of prior work
The repository includes:
- 🔬 Computational efficiency experiments
- 📊 Entry distribution visualizations
- 🧠 Downstream benchmark evaluations
- 🤖 Modified LLaDA model implementation
├── Computational efficiency experiment
│ └── supp_basis_optimization.py
├── Downstream task experiment
│ ├── generate.py
│ ├── run_all.sh
│ ├── run_chebyshev_ARC_challenge.py
│ ├── run_chebyshev_ARC_easy.py
│ ├── run_chebyshev_GPQA.py
│ ├── run_chebyshev_hellaswag.py
│ └── run_chebyshev_llmu.py
├── Entry distributions
│ ├── opt_tinyllama_phi.py
│ ├── visualize_llada.py
│ └── visualize_tiny_llama.py
├── LLaDA-8B-Instruct-B&S-light
│ ├── modeling_llada.py
│ ├── configuration_llada.py
│ ├── config.json
│ └── ...
└── README.md
File:
Computational efficiency experiment/supp_basis_optimization.py
This script implements:
- Exact attention
- Chebyshev polynomial approximation
- Hybrid support-basis attention
- Runtime comparison
To run:
cd "Computational efficiency experiment"
python supp_basis_optimization.py
You may modify:
N(sequence length)d(hidden dimension)DEGREE(Chebyshev polynomial degree)
We empirically validate that Q and K entries behave sub-Gaussian.
Supported models:
- TinyLlama/TinyLlama-1.1B-Chat-v1.0
- microsoft/phi-2
- GSAI-ML/LLaDA-8B-Base
- facebook/opt-1.3b
Scripts:
Entry distributions/
├── opt_tinyllama_phi.py
├── visualize_llada.py
└── visualize_tiny_llama.py
Each script:
- Extracts Q/K entries
- Plots entry distributions
- Draws ±√(log n) thresholds
Example:
cd "Entry distributions"
python visualize_tiny_llama.py
These experiments support the theoretical assumption that large entries are rare.
Directory:
Downstream task experiment/
We evaluate hybrid Chebyshev attention on multiple benchmarks.
- ARC-Challenge
- ARC-Easy
- GPQA
- HellaSwag
- MMLU
Each script modifies the LLaDA model configuration:
model.config.hybrid_exact_ratio = args.exact_ratio
model.config.hybrid_chebyshev_degree = args.chebyshev_degree
python run_chebyshev_ARC_challenge.py \
--model_name "path/to/LLaDA-8B-Instruct-B&S" \
--exact_ratio 0.2 \
--chebyshev_degree 4
bash run_all.sh
This runs:
- Degree 4 and 6
- Exact ratio = 0.2
- All downstream tasks
| Parameter | Meaning |
|---|---|
exact_ratio |
Fraction of entries computed exactly |
chebyshev_degree |
Polynomial degree for dense part |
steps |
Reverse diffusion steps |
block_length |
Semi-autoregressive block size |
Directory:
LLaDA-8B-Instruct-B&S-light/
Contains:
- Modified attention implementation
- Hybrid support-basis logic
- Config hooks
Key parameters:
model.config.hybrid_exact_ratio
model.config.hybrid_chebyshev_degree
These control:
- Single-threshold support basis
- Polynomial degree
- Exact computation sparsity
-
Split Q, K into:
- Small entries (dense)
- Large entries (sparse)
-
Compute:
- Exact attention on sparse support
- Polynomial approximation on dense support
-
Combine via:
$$ \exp(A^{(s)}) + \exp(A^{(L)}) - \mathbf{1}_{n \times n} $$
Install:
pip install torch transformers datasets scipy matplotlib tqdm
For GPU experiments:
- CUDA 11+
- PyTorch ≥ 2.0 recommended
Computational efficiency experiment/
Entry distributions/
Downstream task experiment/
If you use this work, please cite:
@inproceedings{aliakbarpour2026support,
title={Support Basis: Fast Attention Beyond Bounded Entries},
author={Maryam Aliakbarpour and Vladimir Braverman and Junze Yin and Haochen Zhang},
booktitle={The 29th International Conference on Artificial Intelligence and Statistics},
year={2026},
url={https://openreview.net/forum?id=IgpnZIGFsD}
}