llms

Tensor parallelism is all you need. Run LLMs on weak devices or make powerful devices even more powerful by distributing the workload and dividing the RAM usage.

neural-network distributed-computing llm llms open-llm llm-inference llama2 distributed-llm llama3

Updated May 29, 2024
C++

janhq / cortex

Star

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM). Powers 👋 Jan

ai cuda llama accelerated inference-engine openai-api llm stable-diffusion llms llamacpp llama2 gguf tensorrt-llm

Updated May 29, 2024
C++

infiniflow / infinity

Star

The AI-native database built for LLM applications, providing incredibly fast full-text and vector search

Updated May 29, 2024
C++

Improve this page

Add a description, image, and links to the llms topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llms topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llms

Here are 6 public repositories matching this topic...

smvorwerk / xlstm-cuda

abdeladim-s / pyllamacpp

epsilla-cloud / vectordb

b4rtaz / distributed-llama

janhq / cortex

infiniflow / infinity

Improve this page

Add this topic to your repo