You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
g023's TurboXInf 🚀: 2x+ faster inference for Qwen3-1.77B or Qwen3.5-2B on RTX 3060! Custom Triton INT8 GEMV kernels halve memory traffic by fusing dequantization, paired with torch.compile. Hits 113 tok/s (vs 56.4 baseline) with no quality loss with INT8 even better results for INT4. MIT License.