fast-inference

Star

Here are 10 public repositories matching this topic...

foolwood / pytorch-slimming

Star

Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.

deep-learning pytorch weight-pruning l1-regularization fast-inference

Updated May 13, 2019
Python

aredden / flux-fp8-api

Star

Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.

flux pytorch quantization diffusion fast-inference fp8

Updated Oct 12, 2024
Python

kssteven418 / BigLittleDecoder

Star

[NeurIPS'23] Speculative Decoding with Big Little Decoder

decoding efficient-inference speculative-execution fast-inference llm speculative-decoding

Updated Feb 6, 2024
Python

dvlab-research / Q-LLM

Star

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"

fast-inference inference-acceleration large-language-models long-context kv-cache-compression

Updated Jul 16, 2024
Python

romsto / Speculative-Decoding

Star

Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.

fast-inference llm llm-inference speculative-decoding llm-optimization

Updated Oct 28, 2024
Python

lim142857 / Sparsifiner

Star

Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"

attention-mechanism fast-inference sparse-neural-networks low-rank vision-transformer efficient-transformers sparse-attention efficient-vision-transformers

Updated Jul 4, 2023
Python

szemenyeim / RoboDNN

Star

Fast Forward-Only Deep Neural Network Library for the Nao Robots

library deep-neural-networks deep-learning neural-network robocup pruning fast-inference nao-robots

Updated Jun 6, 2019
C++

Academich / translation-transformer

Star

An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding

chemistry transformer fast-inference reaction-prediction single-step-retrosynthesis

Updated Oct 23, 2024
Python

u-hyszk / japanese-speculative-decoding

Star

Verification of the effect of speculative decoding in Japanese.

nlp japanese fast-inference speculative-decoding

Updated Mar 4, 2024
Python

PopoDev / BiLD

Star

Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder

reproducibility fast-inference llm speculative-decoding

Updated May 30, 2024
Python

Improve this page

Add a description, image, and links to the fast-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fast-inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast-inference

Here are 10 public repositories matching this topic...

foolwood / pytorch-slimming

aredden / flux-fp8-api

kssteven418 / BigLittleDecoder

dvlab-research / Q-LLM

romsto / Speculative-Decoding

lim142857 / Sparsifiner

szemenyeim / RoboDNN

Academich / translation-transformer

u-hyszk / japanese-speculative-decoding

PopoDev / BiLD

Improve this page

Add this topic to your repo