InferenceNexus
Popular repositories Loading
-
text-generation-inference
text-generation-inference PublicForked from huggingface/text-generation-inference
Large Language Model Text Generation Inference
Python 1
-
-
ipex-llm
ipex-llm PublicForked from intel/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc,…
Python 1
-
litellm
litellm PublicForked from BerriAI/litellm
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
Python 1
-
litgpt
litgpt PublicForked from Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
Python 1
-
inference-benchmarker
inference-benchmarker PublicForked from huggingface/inference-benchmarker
Inference server benchmarking tool
Rust 1
Repositories
- triton-server Public Forked from triton-inference-server/server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
- litgpt Public Forked from Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
- litellm Public Forked from BerriAI/litellm
Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
- FastChat Public Forked from lm-sys/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
- ncnn Public Forked from Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
- onnxruntime Public Forked from microsoft/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
- optimum-intel Public Forked from huggingface/optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
- ipex-llm Public Forked from intel/ipex-llm
Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
- llama-box Public Forked from gpustack/llama-box
LM inference server implementation based on llama.cpp.
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…