A high-performance inference system for large language models, designed for production environments.
-
Updated
Jun 29, 2024 - C++
A high-performance inference system for large language models, designed for production environments.
Port of OpenAI's Whisper model in C/C++
🍅🍅🍅YOLOv5-Lite: Evolved from yolov5 and the size of model is only 900+kb (int8) and 1.7M (fp16). Reach 15 FPS on the Raspberry Pi 4B~
Explore LLM model deployment based on AXera's AI chips
Whisper Dart is a cross platform library for dart and flutter that allows converting audio to text / speech to text / inference from Open AI models
Transformer related optimization, including BERT, GPT
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
A Fast Neural Machine Translation System developed in C++.
An Implementation of Transformer (Attention Is All You Need) in DyNet
Fast Tensors Packaging library for text, image, video, and audio data compatible with PyTorch, TensorFlow, & NumPy 🖼️🎵🎥 ➡️ 🧠
Tuatara: Deep Learning OCR Engine
a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.
LightSeq: A High Performance Library for Sequence Processing and Generation
使用ONNXRuntime部署LSTR基于Transformer的端到端实时车道线检测,包含C++和Python两个版本的程序
Final semester project in electrical engineering
an open-source implementation of sequence-to-sequence based speech processing engine
Running BERT without Padding
Transforms common data formats from one type to another, such as JSON, XML and datasets
Add a description, image, and links to the transformer topic page so that developers can more easily learn about it.
To associate your repository with the transformer topic, visit your repo's landing page and select "manage topics."