inference

Star

Here are 45 public repositories matching this topic...

raketenkater / llm-server

Star

Smart launcher for llama.cpp / ik_llama.cpp — auto-detects GPUs, optimizes MoE placement, crash recovery

cli metal cuda inference moe multi-gpu apple-silicon llm llama-cpp local-ai gguf ik-llama-cpp

Updated Apr 21, 2026
Shell

Project-MONAI / monai-deploy

Star

MONAI Deploy aims to become the de-facto standard for developing, packaging, testing, deploying and running medical AI applications in clinical production.

python machine-learning ai deep-learning inference dicom pytorch healthcare medical-imaging fhir guidelines radiology pathology open-standard mlops ml-platform monai ai-application-development ai-application-deployment

Updated Mar 26, 2025
Shell

aws-samples / aws-do-eks

Star

Create, List, Update, Delete Amazon EKS clusters. Deploy and manage software on EKS. Run distributed model training and inference examples.

docker deployment terraform inference observability distributed-training eks eksctl do-framework

Updated Apr 20, 2026
Shell

monk1337 / auto-ollama

Star

run ollama & gguf easily with a single command

inference openai llama lora mistral llm llm-inference ollama gguf autoollama autogguf mergelora

Updated May 15, 2024
Shell

stampby / halo-ai-core

Star

[ARCHIVED 2026-04-20 — stampby retired; see bong-water-water-bong] no longer actively developed

privacy ai amd gpu systemd inference arch-linux self-hosted caddy gaia bare-metal lemonade rocm agent-framework llama-cpp local-ai ryzen-ai strix-halo

Updated Apr 20, 2026
Shell

enescingoz / mac-llm-bench

Star

Community benchmark database for running LLMs on Apple Silicon Macs

benchmark inference apple-silicon llm llama-cpp local-llm llm-benchmark tokens-per-second

Updated Apr 20, 2026
Shell

intel / edge-insights-vision

Star

Edge Insights for Vision (eiv) is a package that helps to auto install Intel® GPU drivers and setup environment for Inference application development using OpenVINO™ toolkit

machine-learning computer-vision deep-learning inference openvino

Updated Sep 29, 2025
Shell

Docker image for a self-hosted Whisper speech-to-text server with an OpenAI-compatible transcription API. Powered by faster-whisper. Supports all Whisper models, JSON/SRT/VTT output, SSE streaming, offline mode, and multi-arch (amd64, arm64).

audio linux docker ai deep-learning docker-compose docker-image inference self-hosted speech-recognition openai automatic-speech-recognition speech-to-text transcription quantization whisper faster-whisper openai-compatible

Updated Apr 22, 2026
Shell

AEON-7 / vllm-dflash

Star

DFlash vLLM for DGX Spark — Plug & Play Block-Diffusion Speculative Decoding

docker inference nvidia blackwell llm vllm qwen speculative-decoding block-diffusion nvfp4 dgx-spark dflash

Updated Apr 13, 2026
Shell

hogeheer499-commits / strix-halo-guide

Star

The definitive Strix Halo LLM guide — 65 t/s on a $2,999 mini PC. Live benchmarks, tested optimizations, and everything that doesn't work.

benchmark amd optimization vulkan inference rocm mini-pc unified-memory beelink llm llama-cpp local-llm ollama gguf rdna3 strix-halo gfx1151 dgx-spark ryzen-ai-max

Updated Mar 21, 2026
Shell

shamily / gemma4-llama-dgx-spark

Star

Dockerized inference server and benchmarks for quantized Gemma 4 family on the NVIDIA DGX Spark (GB10). Features ARM64 CUDA 13 builds using llama.cpp.

benchmark inference llm llamacpp llm-inference dgx-spark gemma4 sgx-sp

Updated Apr 6, 2026
Shell

vuiseng9 / openvino-ubuntu

Star

Set up and run OpenVINO in Docker Ubuntu Environment on Intel CPU with Integrated Graphics

docker opencv computer-vision ubuntu inference ubuntu1604 ubuntu-xenial inference-engine neural-compute-stick igpu ncsdk openvino model-optimizer

Updated Apr 10, 2019
Shell

pmady / llmops

Star

🚀 The Ultimate Curated List of LLMOps Tools, Frameworks, and Resources - A comprehensive collection of the best tools for Large Language Model Operations

Updated Jan 12, 2026
Shell

ugo-emekauwa / private-ai-setup-dream-guide

Star

The Private AI Setup Dream Guide for Demos automates the installation of the software needed for a local private AI setup, utilizing AI models (LLMs and diffusion models) for use cases such as general assistance, business ideas, coding, image generation, systems administration, marketing, planning, and more.

Updated Dec 20, 2025
Shell

confluentinc / flink-sql-ai-meetingcoach-azure

Star

This project demonstrates a real-time AI "Meeting Coach" showcasing the use of Confluent Cloud for Apache Flink AI Inference functions to build a real-time Retrieval-Augmented Generation (RAG) pipeline. The demo uses both a static knowledge base of sales documents and real-time simulated meeting data.

kafka ai mongodb inference embeddings confluent-cloud rag rag-pipeline

Updated Apr 22, 2026
Shell

TrevTron / indiedroid-nova-llm

Star

Running Llama 3.1 8B and other LLMs on RK3588 NPU - benchmarks and setup guides

benchmark inference rockchip llama arm64 npu edge-ai single-board-computer on-device-ai rknn llm rk3588 local-llm qwen deepseek offline-ai raspberry-pi-alternative

Updated Apr 9, 2026
Shell

psyb0t / aigate

Star

Self-hosted AI gateway. 82 models across 12 providers, free-tier-first routing with automatic fallback. Local CPU inference, transcription, and TTS. Models autonomously browse the web (stealth), run agentic Claude Code, and manage object storage via MCP. Expose publicly through Cloudflare Tunnel.

Updated Apr 21, 2026
Shell

youngday / easy_yolo

Star

rust onnxruntime inference , yolo and other models

rust detection inference yolo onnxruntime ultralytics yolov8 yolov11

Updated Mar 30, 2026
Shell

kf5i / k3ai-plugins

Star

K3ai plugins Repo is the place where we host all the optional capabilites of k3ai. The main goal of the repo is to mantainer k3ai simple and lightweight while adding capabilites in the form of manifests or helm charts.

machine-learning pipelines inference datascience helm-charts cloudnative plugin-architecture kubeflow k3ai-plugins mantainer-k3ai

Updated Nov 2, 2021
Shell

sozercan / k8s-distributed-inference

Star

🦄 Distributed Inference on Kubernetes with DRA and MIG

kubernetes inference nvidia dra llm dynamic-resource-allocation multi-instance-gpu

Updated Sep 21, 2024
Shell

Improve this page

Add a description, image, and links to the inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inference

Here are 45 public repositories matching this topic...

raketenkater / llm-server

Project-MONAI / monai-deploy

aws-samples / aws-do-eks

monk1337 / auto-ollama

stampby / halo-ai-core

enescingoz / mac-llm-bench

intel / edge-insights-vision

hwdsl2 / docker-whisper

AEON-7 / vllm-dflash

hogeheer499-commits / strix-halo-guide

shamily / gemma4-llama-dgx-spark

vuiseng9 / openvino-ubuntu

pmady / llmops

ugo-emekauwa / private-ai-setup-dream-guide

confluentinc / flink-sql-ai-meetingcoach-azure

TrevTron / indiedroid-nova-llm

psyb0t / aigate

youngday / easy_yolo

kf5i / k3ai-plugins

sozercan / k8s-distributed-inference

Improve this page

Add this topic to your repo