#
numa
Here are 4 public repositories matching this topic...
A LLaMA2-7b chatbot with memory running on CPU, and optimized using smooth quantization, 4-bit quantization or Intel® Extension For PyTorch with bfloat16.
meta
cpu
optimization
chatbot
intel
llama
numa
int8
ipex
4-bit-cpu
huggingface
streamlit
bfloat16
neural-compression
chatgpt
langchain
llama2
meta-ai
smooth-quantization
chatbot-memory
-
Updated
Feb 27, 2024 - Python
Python Multi-Process Execution Pool: concurrent asynchronous execution pool with custom resource constraints (memory, timeouts, affinity, CPU cores and caching), load balancing and profiling capabilities of the external apps on NUMA architecture
multiprocessing
parallel-computing
numa
monitoring-server
cache-control
task-queue
application-framework
parallel-processing
execution-pool
benchmarking-framework
load-balancing
in-memory-computations
-
Updated
Aug 28, 2019 - Python
Improve this page
Add a description, image, and links to the numa topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the numa topic, visit your repo's landing page and select "manage topics."