A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jun 15, 2025 - Python
A high-throughput and memory-efficient inference and serving engine for LLMs
Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
🎨ComfyUI standalone pack for Intel GPUs. | 英特尔显卡 ComfyUI 整合包
Purplecoin/XPU Core integration/staging tree
Sentiment classification app built with RoBERTa and optimized using Intel OpenVINO for deployment on Intel XPU devices. This project demonstrates how large language models (LLMs) can be fine-tuned and accelerated for real-time inference, making it ideal for low-latency AI applications.
Add a description, image, and links to the xpu topic page so that developers can more easily learn about it.
To associate your repository with the xpu topic, visit your repo's landing page and select "manage topics."