-
Georgia Institute of Technology
- ATL, GA
-
14:38
- 4h behind - in/lxaw
- https://lxaw.github.io/index.html
Highlights
- Pro
Stars
Everything about the SmolLM2 and SmolVLM family of models
[ICLR 2025] SDTT: a simple and effective distillation method for discrete diffusion models
Fast inference from large lauguage models via speculative decoding
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
A high-throughput and memory-efficient inference and serving engine for LLMs
Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Official PyTorch implementation for "Large Language Diffusion Models"
[ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length
[ICML 2024] CLLMs: Consistency Large Language Models
Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
Best practices for distilling large language models.
Development repository for the Triton language and compiler
[CVPR 2025] Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
ConceptAttention: A method for interpreting multi-modal diffusion transformers.
Helpful tools and examples for working with flex-attention
Official codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling".
Some Conferences' accepted paper lists (including AI, ML, Robotic)
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & V…
Examples and guides for using the OpenAI API
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.