[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
-
Updated
Jun 17, 2025 - Jupyter Notebook
[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Add a description, image, and links to the system-algorithm-deisgn topic page so that developers can more easily learn about it.
To associate your repository with the system-algorithm-deisgn topic, visit your repo's landing page and select "manage topics."