llm-inference

Here are 224 public repositories matching this topic...

bentoml / BentoML

The easiest way to serve AI/ML models in production - Build Model Inference Service, LLM APIs, Multi-model Inference Graph/Pipelines, LLM/RAG apps, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated May 29, 2024
Python

Lightning-AI / litgpt

Star

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.

ai deep-learning artificial-intelligence large-language-models llm llms llm-inference

Updated May 29, 2024
Python

bentoml / OpenLLM

Star

Run any open-source LLMs, such as Llama 2, Mistral, as OpenAI compatible API endpoint in the cloud.

Updated May 29, 2024
Python

🔮 SuperDuperDB: Bring AI to your database! Build, deploy and manage any AI application directly with your existing data infrastructure, without moving your data. Including streaming inference, scalable model training and vector search.

Updated May 29, 2024
Python

NVIDIA / GenerativeAIExamples

Star

Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.

microservice gpu-acceleration nemo tensorrt rag triton-inference-server large-language-models llm llm-inference retrieval-augmented-generation

Updated May 24, 2024
Python

InternLM / lmdeploy

Star

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated May 29, 2024
Python

databricks / dbrx

Star

Code examples and resources for DBRX, a large language model developed by Databricks

databricks llm generative-ai gen-ai llm-training llm-inference mosaic-ai

Updated May 1, 2024
Python

intel / intel-extension-for-transformers

Star

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

retrieval chatbot rag habana large-language-model chatpdf llm-inference 4-bits speculative-decoding llm-cpu streamingllm intel-optimized-llamacpp neural-chat neural-chat-7b autoround gaudi3

Updated May 30, 2024
Python

neuralmagic / deepsparse

Star

Sparsity-aware deep learning inference runtime for CPUs

nlp performance computer-vision inference machinelearning pruning object-detection pretrained-models quantization cpus onnx sparsification llm-inference deepsparse

Updated May 21, 2024
Python

anarchy-ai / LLM-VM

Star

irresponsible innovation. Try now at https://chat.dev/

machine-learning deep-learning artificial-intelligence distillation distillation-model llm llm-agent llm-training llm-inference llm-local

Updated May 14, 2024
Python

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated May 29, 2024
Python

ray-project / ray-llm

Star

RayLLM - LLMs on Ray

distributed-systems transformers ray serving large-language-models llm llmops llm-serving llm-inference

Updated May 28, 2024
Python

Kenza-AI / sagify

Star

LLMs and Machine Learning done easily

openai cohere sagemaker large-language-models llm generative-ai langchain llmops large-language-model anthropic langchain-python llm-inference open-source-llm ai-gateway

Updated Mar 10, 2024
Python

ugorsahin / TalkingHeads

Star

A library to communicate with ChatGPT, Claude, Copilot, Gemini, HuggingChat, and Pi

python selenium free browser-automation undetected-chromedriver chatgpt chatgpt-api google-bard huggingchat google-bard-python llm-inference

Updated May 24, 2024
Python

SafeAILab / EAGLE

Star

[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

large-language-models llm-inference speculative-decoding

Updated May 26, 2024
Python

morpheuslord / HackBot

Star

AI-powered cybersecurity chatbot designed to provide helpful and accurate answers to your cybersecurity-related queries and also do code analysis and scan analysis.