#

llm-inference

Here are 425 public repositories matching this topic...

Siris2314 / ytsum

Summarize YT videos in one go

pypi-package llms llm-inference togetherai distil-whisper mixtral-8x7b distil-whisper-large-v3

Updated Jun 9, 2024
Python

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

nlp natural-language-processing ai computer-vision deep-learning transformers inference speech-recognition yolo recommendation-system performance-boost good-first-issue openvino diffusion-models stable-diffusion generative-ai llm-inference optimize-ai deploy-ai

Updated Jun 9, 2024
C++

vectorch-ai / ScaleLLM

A high-performance inference system for large language models, designed for production environments.

performance gpu model production cuda efficiency inference transformer llama speculative serving llm llm-inference llama3

Updated Jun 9, 2024
C++

blowtorch-transformer-api

B0-B / blowtorch-transformer-api

LLM bootstrap loader for local CPU/GPU inference with fully customizable chat.

bootstrap cpu gpu transformer gpt customgpt llm-inference llama2 llama3

Updated Jun 8, 2024
Python

beam-cloud / beta9

The open-source serverless GPU container runtime.

gpu distributed-computing cuda self-hosted fine-tuning ml-platform large-language-models llm generative-ai llm-inference

Updated Jun 8, 2024
Go

autonomi-ai / nos

⚡️ A fast and flexible PyTorch inference server that runs locally, on any cloud or AI HW.

machine-learning computer-vision inference inference-acceleration generative-ai llm-inference

Updated Jun 8, 2024
Python

uiuc-focal-lab / syncode

Efficient and general syntactical decoding for Large Language Models

parser large-language-models llm llm-inference

Updated Jun 8, 2024
Python

promptbook

webgptorg / promptbook

Library to supercharge your use of large language models

openai autogpt llm-inference

Updated Jun 8, 2024
TypeScript

lofcz / LlmTornado

One .NET library to consume OpenAI, Anthropic, Cohere, Azure, and self-hosed APIs.

sdk chatbot openai sonnet cohere gpt-4 llm-inference gpt4-turbo anthropic-ai command-r-plus gpt4o

Updated Jun 8, 2024
C#

Hoshinonyaruko / Gensokyo-llm

开源的智能体项目支持6种聊天平台 Onebotv11一对多连接流式信息 agent 对话keyboard气泡生成支持6种大模型接口(持续增加中) 具有将多种大模型接口转化为带有上下文的通用格式的能力.

chatbot qqbot ai-agents onebot onebot-plugin llm onebot11 llm-inference ai-agents-framework llm-api

Updated Jun 8, 2024
Go

microsoft / autogen

A programming framework for agentic AI. Discord: https://aka.ms/autogen-dc. Roadmap: https://aka.ms/autogen-roadmap

chat chatbot gpt chat-application agent-based-framework agent-oriented-programming gpt-4 chatgpt llmops gpt-35-turbo llm-agent llm-inference agentic llm-framework agentic-agi

Updated Jun 8, 2024
Jupyter Notebook

felladrin / MiniSearch

Minimalist web-searching app with an AI assistant that runs directly from your browser. Uses Web-LLM, Ratchet-ML, Wllama and SearXNG. Demo: https://felladrin-minisearch.hf.space

search nlp search-engine machine-learning information-retrieval typescript ai artificial-intelligence webapp question-answering searxng llm gpu-accelerated generative-ai llm-inference retrieval-augmented-generation web-llm ratchet-ml wllama

Updated Jun 9, 2024
TypeScript

davmacario / MDI-LLM

Implementation of Model-Distributed Inference for Large Language Models, built on top of LitGPT

ai torch llms llm-inference

Updated Jun 8, 2024
Python

expectedparrot / edsl

Design, conduct and analyze results of AI-powered surveys and experiments. Simulate social science and market research with large numbers of AI agents and LLMs.

python open-source openai surveys experiments domain-specific-language market-research social-science synthetic-data data-labeling llm anthropic llm-agent llm-inference llama2 llm-framework mixtral deepinfra

Updated Jun 8, 2024
Python

Opla / opla

Empower Your Productivity with Local AI Assistants

llama gpt aiassistant opla ai-assistant llm generative-ai llmops llamacpp localai llm-inference local-ai llama2 aiagent ai-agent-front

Updated Jun 8, 2024
TypeScript

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

llama cuda-kernels deepspeed llm fastertransformer llm-inference turbomind internlm llama2 codellama llama3

Updated Jun 8, 2024
Python

katsumiar / WiseOwlChat

A ChatBot written in C# using OpenAI's API

c-sharp wpf chatbot knowledge-base semanticsearch web-interaction plugins-api openai-api dotnet7 llm-inference function-calling

Updated Jun 8, 2024
C#

eastriverlee / LLM.swift

LLM.swift is a simple and readable library that allows you to interact with large language models locally with ease for macOS, iOS, watchOS, tvOS, and visionOS.

macos swift ios tvos watchos llm llm-inference visionos gguf

Updated Jun 8, 2024
Swift

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

gpu cuda pytorch tvm llm-inference flash-attention large-large-models

Updated Jun 8, 2024
Cuda

Mobile-Artificial-Intelligence / maid_llm

maid_llm is a dart implementation of llama.cpp used by the mobile artificial intelligence distribution (maid)

facebook meta llama gemma mistral mobile-ai llm flutter-ai llamacpp ggml llm-inference local-ai llama2 gguf mixtral

Updated Jun 8, 2024
Dart

Improve this page

Add a description, image, and links to the llm-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-inference topic, visit your repo's landing page and select "manage topics."