An optimized implementation of the Kimi Linear architecture - a hybrid linear attention mechanism outperforming traditional full attention.
-
Updated
May 19, 2026 - Python
An optimized implementation of the Kimi Linear architecture - a hybrid linear attention mechanism outperforming traditional full attention.
A project to build GPU acceleration for LLaMA models on local computers and AWS, leveraging GPU resources for efficient inference and training.
AdaAttn is a GPU-native attention mechanism that dynamically adapts both numerical precision and matrix rank at runtime, reducing memory bandwidth and computational overhead in large language models without sacrificing model quality. By aligning linear algebra operations with modern GPU hardware characteristics.
Air-gapped, on-prem LLM assistant for software engineering teams. No external network calls. Full audit trail. RBAC + OIDC/LDAP auth.
A comprehensive implementation of Nvidia NeMo Guardrails for AI safety and responsible AI development.
This project implements PDDL-INSTRUCT with Logical Chain-of-Thought (LCoT), a novel approach to improve Large Language Model (LLM) performance on automated planning tasks. The system enhances planning capabilities through:
CSNePS Knowledge Graph Service is a production-ready enterprise system that bridges symbolic AI reasoning with modern ontology engineering. The system combines CSNePS (Cognitive Systems for Natural language Processing and Structured information) - a powerful semantic network reasoning engine - with comprehensive OWL ontology support, advanced graph
Add a description, image, and links to the gpu-llm topic page so that developers can more easily learn about it.
To associate your repository with the gpu-llm topic, visit your repo's landing page and select "manage topics."