-
I work for myself
- HangZhou
- https://chenghuawang.github.io/keep-moving-forward/
Lists (10)
Sort Name ascending (A-Z)
👀 AI
📖 Learning
☄️ Compile & Building
compiler tools. building tools.⛏️ Computer Graphics
📚 Database
Databases.🍇 Desktop App toolchain
🌱 Distributed Sys
Distributed system🌟 MLSys
machine learning system. deep learning framework.Starred repositories
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
Touying is a powerful package for creating presentation slides in Typst.
nanobind: tiny and efficient C++/Python bindings
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
A lightweight data processing framework built on DuckDB and 3FS.
Wan: Open and Advanced Large-Scale Video Generative Models
Analyze computation-communication overlap in V3/R1.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
Benchmark code for the "Online normalizer calculation for softmax" paper
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
MoBA: Mixture of Block Attention for Long-Context LLMs
A template for modern C++ projects using CMake, Clang-Format, CI, unit testing and more, with support for downstream inclusion.
Witness the aha moment of VLM with less than $3.
A fork to add multimodal model training to open-r1
Fully open data curation for reasoning models
My learning notes/codes for ML SYS.
Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.
A fluent design widgets library based on C++ Qt/PyQt/PySide. Make Qt Great Again.
🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.
[CVPR'25] RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
Fully open reproduction of DeepSeek-R1