Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
-
Updated
May 13, 2019 - Python
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
[NeurIPS'23] Speculative Decoding with Big Little Decoder
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
Fast Forward-Only Deep Neural Network Library for the Nao Robots
An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding
Verification of the effect of speculative decoding in Japanese.
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder
Add a description, image, and links to the fast-inference topic page so that developers can more easily learn about it.
To associate your repository with the fast-inference topic, visit your repo's landing page and select "manage topics."