Skip to content
View chenghuaWang's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report chenghuaWang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…

Python 67,835 8,319 Updated Mar 8, 2025

Touying is a powerful package for creating presentation slides in Typst.

Typst 1,139 31 Updated Mar 5, 2025

nanobind: tiny and efficient C++/Python bindings

C++ 2,635 220 Updated Mar 2, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 7,738 685 Updated Mar 8, 2025

A lightweight data processing framework built on DuckDB and 3FS.

Python 4,028 336 Updated Mar 5, 2025

Wan: Open and Advanced Large-Scale Video Generative Models

Python 7,713 789 Updated Mar 7, 2025

Analyze computation-communication overlap in V3/R1.

899 116 Updated Mar 3, 2025

Expert Parallelism Load Balancer

Python 1,040 151 Updated Feb 27, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,542 246 Updated Mar 5, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 4,857 474 Updated Mar 10, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 7,093 612 Updated Mar 6, 2025

Benchmark code for the "Online normalizer calculation for softmax" paper

Cuda 85 7 Updated Jul 27, 2018

FlashMLA: Efficient MLA decoding kernels

C++ 11,219 785 Updated Mar 1, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 610 40 Updated Mar 9, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

6,700 195 Updated Mar 4, 2025

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,629 93 Updated Mar 7, 2025

A template for modern C++ projects using CMake, Clang-Format, CI, unit testing and more, with support for downstream inclusion.

CMake 1,782 220 Updated Mar 16, 2024

Witness the aha moment of VLM with less than $3.

Python 3,099 242 Updated Mar 1, 2025

A fork to add multimodal model training to open-r1

Python 998 51 Updated Feb 8, 2025

Fully open data curation for reasoning models

Python 1,477 126 Updated Feb 23, 2025

My learning notes/codes for ML SYS.

Python 1,335 69 Updated Mar 8, 2025

s1: Simple test-time scaling

Python 5,899 678 Updated Mar 6, 2025

Code for Neurips24 paper: QuaRot, an end-to-end 4-bit inference of large language models.

Python 352 32 Updated Nov 26, 2024

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Python 1,034 125 Updated Apr 17, 2024

📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.

Cuda 129 5 Updated Mar 5, 2025

A fluent design widgets library based on C++ Qt/PyQt/PySide. Make Qt Great Again.

Python 6,288 609 Updated Mar 9, 2025

🌾 OAT: A research-friendly framework for LLM online alignment, including preference learning, reinforcement learning, etc.

Python 213 12 Updated Feb 24, 2025

[CVPR'25] RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness

Python 305 12 Updated Mar 4, 2025

Fully open reproduction of DeepSeek-R1

Python 22,431 2,011 Updated Mar 9, 2025
Next
Showing results