inference
Here are 45 public repositories matching this topic...
MONAI Deploy aims to become the de-facto standard for developing, packaging, testing, deploying and running medical AI applications in clinical production.
-
Updated
Mar 26, 2025 - Shell
Create, List, Update, Delete Amazon EKS clusters. Deploy and manage software on EKS. Run distributed model training and inference examples.
-
Updated
Apr 20, 2026 - Shell
[ARCHIVED 2026-04-20 — stampby retired; see bong-water-water-bong] no longer actively developed
-
Updated
Apr 20, 2026 - Shell
Community benchmark database for running LLMs on Apple Silicon Macs
-
Updated
Apr 20, 2026 - Shell
Edge Insights for Vision (eiv) is a package that helps to auto install Intel® GPU drivers and setup environment for Inference application development using OpenVINO™ toolkit
-
Updated
Sep 29, 2025 - Shell
Docker image for a self-hosted Whisper speech-to-text server with an OpenAI-compatible transcription API. Powered by faster-whisper. Supports all Whisper models, JSON/SRT/VTT output, SSE streaming, offline mode, and multi-arch (amd64, arm64).
-
Updated
Apr 22, 2026 - Shell
The definitive Strix Halo LLM guide — 65 t/s on a $2,999 mini PC. Live benchmarks, tested optimizations, and everything that doesn't work.
-
Updated
Mar 21, 2026 - Shell
Set up and run OpenVINO in Docker Ubuntu Environment on Intel CPU with Integrated Graphics
-
Updated
Apr 10, 2019 - Shell
🚀 The Ultimate Curated List of LLMOps Tools, Frameworks, and Resources - A comprehensive collection of the best tools for Large Language Model Operations
-
Updated
Jan 12, 2026 - Shell
The Private AI Setup Dream Guide for Demos automates the installation of the software needed for a local private AI setup, utilizing AI models (LLMs and diffusion models) for use cases such as general assistance, business ideas, coding, image generation, systems administration, marketing, planning, and more.
-
Updated
Dec 20, 2025 - Shell
This project demonstrates a real-time AI "Meeting Coach" showcasing the use of Confluent Cloud for Apache Flink AI Inference functions to build a real-time Retrieval-Augmented Generation (RAG) pipeline. The demo uses both a static knowledge base of sales documents and real-time simulated meeting data.
-
Updated
Apr 22, 2026 - Shell
Running Llama 3.1 8B and other LLMs on RK3588 NPU - benchmarks and setup guides
-
Updated
Apr 9, 2026 - Shell
Self-hosted AI gateway. 82 models across 12 providers, free-tier-first routing with automatic fallback. Local CPU inference, transcription, and TTS. Models autonomously browse the web (stealth), run agentic Claude Code, and manage object storage via MCP. Expose publicly through Cloudflare Tunnel.
-
Updated
Apr 21, 2026 - Shell
rust onnxruntime inference , yolo and other models
-
Updated
Mar 30, 2026 - Shell
K3ai plugins Repo is the place where we host all the optional capabilites of k3ai. The main goal of the repo is to mantainer k3ai simple and lightweight while adding capabilites in the form of manifests or helm charts.
-
Updated
Nov 2, 2021 - Shell
🦄 Distributed Inference on Kubernetes with DRA and MIG
-
Updated
Sep 21, 2024 - Shell
Improve this page
Add a description, image, and links to the inference topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the inference topic, visit your repo's landing page and select "manage topics."