llm-inference

Here are 336 public repositories matching this topic...

microsoft / autogen

A programming framework for agentic AI 🤖

chat chatbot gpt chat-application agent-based-framework agent-oriented-programming gpt-4 chatgpt llmops gpt-35-turbo llm-agent llm-inference agentic llm-framework agentic-agi

Updated Nov 18, 2024
Python

kalavai-net / kalavai-client

Star

A platform to crowdsource AI computation

infrastructure open-source cloud ai self-hosted free llm-training llm-inference

Updated Nov 18, 2024
Python

armbues / SiLLM

Star

SiLLM simplifies the process of training and running Large Language Models (LLMs) on Apple Silicon by leveraging the MLX framework.

lora mlx dpo apple-silicon large-language-models llm llm-training llm-inference

Updated Nov 18, 2024
Python

superduper-io / superduper

Star

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

Updated Nov 18, 2024
Python

ugorsahin / TalkingHeads

Star

A library to communicate with ChatGPT, Claude, Copilot, Gemini, HuggingChat, and Pi

python selenium gemini free copilot browser-automation claude undetected-chromedriver chatgpt chatgpt-api huggingchat llm-inference

Updated Nov 18, 2024
Python

InternLM / lmdeploy

Star

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated Nov 18, 2024
Python

dstackai / dstack

Star

dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-prem support. It natively supports NVIDIA, AMD, & TPU.

python training kubernetes aws machine-learning cloud azure gpu gcp orchestration k8s fine-tuning llms llmops llm-training llm-inference

Updated Nov 18, 2024
Python

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Nov 18, 2024
Python

monocle2ai / monocle

Star

Monocle is a framework for tracing GenAI app code. This repo contains implementation of Monocle for GenAI apps written in Python.

python oss telemetry tracing observability linux-foundation opentelemetry llms generative-ai llm-agent llm-inference

Updated Nov 18, 2024
Python

bentoml / OpenLLM

Star

Run any open-source LLMs, such as Llama, Gemma, as OpenAI compatible API endpoint in the cloud.

llama mistral fine-tuning mlops bentoml vicuna llm model-inference llmops llm-serving llm-inference open-source-llm llama2 openllm llm-ops llama3-1 llama3-2 llama3-2-vision

Updated Nov 18, 2024
Python

expectedparrot / edsl

Star

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

python open-source openai surveys experiments domain-specific-language market-research social-science synthetic-data data-labeling llm anthropic llm-agent llm-inference llama2 llm-framework mixtral deepinfra

Updated Nov 18, 2024
Python

efeslab / fiddler

Star

Fast Inference of MoE Models with CPU-GPU Orchestration

mixture-of-experts llm local-inference llm-inference mixtral-8x7b

Updated Nov 18, 2024
Python

mrs83 / kurtis

Star

Kurtis is a fine-tuning, inference and evaluation tool built for SLMs (Small Language Models), such as Huggingface's SmolLM2.

machine-learning ai ml transformers question-answering mental-health mental-health-awareness slms huggingface-transformers llm llm-training llm-inference small-language-models huggingface-peft question-answering-model smollm smollm2 huggingface-smollm2

Updated Nov 18, 2024
Python

NVIDIA / GenerativeAIExamples

Star

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

microservice gpu-acceleration nemo tensorrt rag triton-inference-server large-language-models llm llm-inference retrieval-augmented-generation

Updated Nov 17, 2024
Python

SafeAILab / EAGLE

Star

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

large-language-models llm-inference speculative-decoding

Updated Nov 16, 2024
Python

morpheuslord / HackBot

Star

AI-powered cybersecurity chatbot designed to provide helpful and accurate answers to your cybersecurity-related queries and also do code analysis and scan analysis.