-
Spyral AI
- London, UK
-
03:34
- 12h behind - https://www.spyral.ai
- @rob_clucas
Lists (4)
Sort Name ascending (A-Z)
Stars
A web based graphical editor of ZMK keymaps.
EFFICIENT AND OPTIMIZED TOKENIZER ENGINE FOR LLM INFERENCE SERVING
A Datacenter Scale Distributed Inference Serving Framework
[COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
A collection of 500+ real-world ML & LLM system design case studies from 100+ companies. Learn how top tech firms implement GenAI in production.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
aider is AI pair programming in your terminal
✨ AI-powered coding, seamlessly in Neovim
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A lightweight data processing framework built on DuckDB and 3FS.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
A high-performance and efficient message queue developed in Rust
Reward-guided Speculative Decoding (RSD) for efficiency and effectiveness.
FlashInfer: Kernel Library for LLM Serving
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
Experiments on speculative sampling with Llama models
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
Caliptra IP and firmware for integrated Root of Trust block