中文文档 · README.zh-CN.md
RAGret (not “regret” 😂) is a self-hosted RAG web app aimed at small teams (about 15–30 people) and low-cost servers (GPU memory ≤ 8 GB).
With RAGret, team members can publish knowledge bases to a shared hub, subscribe to others’ bases, and query them afterward via HTTP GET.
- Creators can set visibility scopes and control access flexibly.
- Agent-friendly: the app exposes API keys and a SKILL.md so agents can plug into team workflows quickly.
- Quick Q&A: a built-in chat UI (LangGraph agent) that lists your knowledge bases and searches them with natural language—configure
openai(default) oranthropicLLM via.env. - Ingest via tar upload or GitLab / GitHub webhooks, so it fits common doc storage habits. Support for Feishu and similar online docs is planned.
- Many formats: PDF, Word (docx), Excel (xlsx), Markdown (md), Email (eml), TXT, CSV, web links (html).
- Bilingual UI (Chinese / English), light / dark themes, and brand tweaks via YAML (e.g. favicon, page title).
Indexing and retrieval use BCE embedding + SQLite + BM25 + RRF fusion + BCE reranking, backed by:
- BCEmbedding (GitHub)
- Models on Hugging Face (
bce-embedding-base_v1,bce-reranker-base_v1) - SQLite FTS5 BM25 for lexical recall (keyword / sparse signal)
- RRF (Reciprocal Rank Fusion) to merge dense and BM25 candidate lists before reranking
Pick one GPU path: CUDA or Intel XPU. Pick one runtime: local Python or Docker.
General notes:
- Use only one GPU stack and one runtime per environment.
- Hugging Face mirror (optional): if downloads are slow or blocked, set
HF_ENDPOINTbefore runningwarmup_hf_models.pyordocker build(see below).
# Windows PowerShell
$env:HF_ENDPOINT = "https://hf-mirror.com"
# Linux / macOS
export HF_ENDPOINT=https://hf-mirror.com-
Python 3.10+ (tested on 3.12). Create a venv or conda env.
-
Install PyTorch for your GPU (pick one):
- NVIDIA CUDA: follow Start Locally, or e.g.
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124 - Intel XPU: follow Get started with Intel GPU. After installing the Intel GPU drivers, e.g.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
- NVIDIA CUDA: follow Start Locally, or e.g.
-
App deps:
pip install -r requirements.txt -
Models (once, before indexing/search): from the repo root, online:
python warmup_hf_models.py
Weights land in
./models. You can also download BCE weights manually into./models. -
Verify GPU:
- CUDA:
python -c "import torch; print(torch.cuda.is_available())"→True - XPU:
python -c "import torch; print(torch.xpu.is_available())"→True
On Intel XPU, only embedding uses the GPU; upstream BCEmbedding rerank does not support XPU.
- CUDA:
This repo’s Docker image targets CUDA only (Dockerfile). For Intel XPU, use local Python above.
Build (warmup bakes weights into /opt/hf in the image):
docker build -t ragret .
# China mirror; disable proxy when using it
docker build -t ragret --build-arg HF_ENDPOINT=https://hf-mirror.com .Run with --gpus all (or '--gpus "device=0"').
docker run --name ragret -it --gpus all -p 8765:8765 ragretYou can also download the model to the host machine and start the container quickly by skipping warmup as follows:
docker build -t ragret --build-arg RAGRET_SKIP_WARMUP=1 .
docker run --name ragret -it --gpus all -p 8765:8765 -v /path/on/host/models:/opt/hf ragretCopy the template and edit values at the repo root (.env is gitignored):
cp .env.example .env # Windows: copy .env.example .envSee .env.example for variable names and comments (host/port, Quick Q&A LLM, optional image ingest / vision, public URL, paths).
All settings use the RAGRET_ prefix. RAGRET_LLM_PROVIDER must be openai (default) or anthropic; any other value falls back to openai. CLI flags such as --host or --llm-model override .env when provided (no separate provider flag).
Without LLM settings, Quick Q&A falls back to direct index search (no LangGraph agent).
-
Build the web UI (output goes to
ragret/static/):cd frontend npm install npm run build cd ..
-
Run the API from the repo root:
python -m ragret serve # or: python ragret.py serveOpen
http://127.0.0.1:8765/(or yourRAGRET_HOST/RAGRET_PORT). The home page is Quick Q&A; use the sidebar for Knowledge plaza, tasks, and account settings.Optional overrides:
python -m ragret serve --host 0.0.0.0 --port 8765
pip install -r requirements.txt
pytest tests/ -qAfter sign-in, the home page is Quick Q&A: ask questions in natural language. The agent can list knowledge bases you own or subscribe to and call retrieval tools against them.
- Configure
RAGRET_LLM_*in.env(see above) for full agent mode. - Without LLM config, answers come from a simple multi-KB index search.
- Chat is stored in this browser; use Clear chat to reset to the welcome message; sign out clears the cached history for that account.
Use Knowledge plaza in the sidebar (/plaza) to browse and subscribe to bases.
Open Account:
- Change avatar, theme, and language.
- Create up to 3 API keys to query knowledge bases you own or subscribe to.
- For GitHub or GitLab webhooks, paste a PAT in the right fields. For safety, scope the PAT to read-only repo access.
Open Add knowledge base:
- Required: name and description so agents can pick the right base.
- Optional: README-style description file and cover image.
- Set visibility. Locked defaults to creator-only; you can add members after creation.
- Choose type. Tar upload is straightforward; webhook is shown below.
The first build pulls from the repo, so repo URL and branch are required. Copy Webhook URL and Secret Token into your repo’s webhook settings, then click build.
On modest hardware, chunking and indexing are queued. Each build click or webhook run registers a task.
Open Task list to see queued and running jobs:
Cancel tasks when needed.
Open My knowledge bases, then select one to manage.
Notes:
- For webhook bases, if you rename the base, update the webhook URL in the repo.
- Rebuilds are incremental for all types. To add files via tar, upload an archive of the full document set you want indexed.
- Webhook bases can be pulled from the repo manually on this page.
- Use the search box at the bottom to try retrieval against the base.
- Subscribe in Knowledge plaza (sidebar).
- Copy an API key from Account.
- Set
RAGRET_API_KEY.
HTTP API examples (BASE = e.g. http://127.0.0.1:8765):
# List subscribed indexes (API key)
curl -sS -H "X-API-Key: $RAGRET_API_KEY" "$BASE/api/subscribe-indexes"
# Search (API key)
curl -sS -G "$BASE/api/search/INDEX_NAME" -H "X-API-Key: $RAGRET_API_KEY" --data-urlencode "query=…"
# Quick Q&A (signed-in session cookie or Bearer token from /api/auth/login)
curl -sS -X POST "$BASE/api/quick-qa" -H "Content-Type: application/json" \
-H "Authorization: Bearer $SESSION_TOKEN" \
-d '{"question":"What is in my docs?","lang":"en"}'Search JSON now includes parent_url, line_start, and line_end per hit. For tool-augmented RAG:
- Call
/api/search/{kb}to get hit chunks plus citation offsets. web_fetchtheparent_url(requiresAuthorization: BearerorX-API-Key).- Read or grep around
line_start..line_endfor nearby context.
Agents: download SKILL.md from the UI (SKILL.md in the sidebar) and import it into Claude Code, Cursor, OpenClaw, or other agent tools.
- More formats: tables, PPT, images.
- Sync with Feishu and similar online docs.
- Distributed deployment for higher concurrency and larger teams.
- Stability fixes across the stack.





