A full-featured desktop client for llama.cpp, built with Electron + Vite + React + TypeScript. Chat with local GGUF models, manage llama-server processes, convert Hugging Face models to GGUF, monitor training, and build multi-step AI workflows — all from one app.
中文说明见本文档下半部分。
# 1. Install dependencies
npm install
# 2. Launch in development mode (hot reload)
npm run electron:devThen, inside the app:
- Open the Models tab → Add model, pick a
.gguffile, set a port. - Click Start to launch the
llama-serverprocess for that model. - Switch to the Chat tab, select the model in the top bar, and start chatting.
You need a llama-server binary on your PATH (from a built llama.cpp). See Installing Dependencies.
- Streaming (SSE) responses with live token output, or non-streaming mode.
- Markdown rendering with code highlighting, GFM tables, and KaTeX math (with copy-LaTeX buttons).
- Thinking / reasoning mode — toggle it in the top bar to send
reasoning_effort(low / medium / high); the model's reasoning is shown in a collapsible block. - Per-message metrics below each AI reply: a live thinking timer, live tokens/sec while generating, and final average / peak tokens/sec plus token count.
- Multimodal input — attach images to a message for vision-capable models.
- Edit any user message to re-run the conversation from that point.
- JSON mode, web search flag, adjustable temperature / max tokens, and custom system + formatting prompts.
- Background generation: switch conversations mid-stream and the reply keeps saving to its own conversation.
- Add, edit, and delete GGUF model configurations (path, port, extra
llama-serverargs). - Start / stop a
llama-serverprocess per model, with live server logs. - Flags for multimodal and web-search support per model.
- Local history in the sidebar: new, rename (right-click), and delete.
- Stored on disk as JSON (see Data storage).
- Run
convert_hf_to_gguf.pyfrom a local model directory or a Hugging Face model ID. - Pick output path and quantization type; conversion logs stream live.
- Requires Python and a llama.cpp checkout containing the script (see below).
- Embed a running TensorBoard instance by URL, or
- Point at a training log file (
.log,.txt,.jsonl); the chart refreshes every 2 s.
- Visual node-graph editor to chain AI steps. Node types:
- Input / Output — entry and exit of the graph.
- LLM — call a configured model with a system prompt and a
{{input}}template. - Tool — run a shell command (with optional working directory). Every command requires explicit confirmation before it runs.
- Router — branch by keyword; each route has its own output port. Unmatched input flows from the
defaultport. - Merge — combine branches by
concat,vote(most common answer), orfirst.
- Drag from an output port to an input port to connect; click an edge to delete it.
- Runs are validated before execution (missing model, no output node, cycles) with clear messages.
- Live tokens/sec while a node generates, per-node output inspection, and aggregate average / peak after the run.
- Dark / light theme, persisted across sessions.
- Bilingual UI: English and 简体中文.
- Node.js >= 18. Run
npm installin the project folder.
- Build llama.cpp and ensure
llama-server(orllama-server.exeon Windows) is on yourPATH, or place it next to the packaged app. - Download GGUF models (e.g. from Hugging Face) and add them in the Models tab.
- Python 3.8+ on your
PATH. - The
convert_hf_to_gguf.pyscript from a llama.cpp checkout, plus its requirements:pip install -r /path/to/llama.cpp/requirements.txt
npm run electron:build
# Output in release/| Platform | Run |
|---|---|
| Windows | release\Llamacore Setup x.x.x.exe, then open from the Start menu. |
| macOS | Open the .dmg, drag to Applications. If blocked, allow it in System Settings → Privacy & Security. |
| Linux | chmod +x "release/Llamacore-x.x.x.AppImage" then run it. |
Conversations, model configs, settings, and workflows are stored locally as JSON:
- Windows:
%APPDATA%\Llamacore\llamacore.json - macOS:
~/Library/Application Support/Llamacore/llamacore.json - Linux:
~/.config/Llamacore/llamacore.json
This software is provided "as is", without warranty of any kind, express or implied. Use it at your own risk.
- Workflow Tool nodes execute shell commands on your machine. Although every command requires explicit confirmation, you are solely responsible for what you run. Do not execute untrusted commands or untrusted workflows.
- The app launches and manages local
llama-serverand Python processes; it does not sandbox them. - Llamacore is an independent client and is not affiliated with or endorsed by the llama.cpp project or any model provider. Model outputs may be inaccurate — verify anything important.
- You are responsible for complying with the licenses and usage terms of any models you load.
Released under the MIT License — see LICENSE.
Copyright (c) 2026 Lucas Vann (陆凯文).
This app bundles or depends on third-party open-source software (llama.cpp, Electron, React, Vite, Tailwind CSS, Zustand, i18next, react-markdown, KaTeX, Recharts, and others), each under its own license. See NOTICE for the list.
基于 llama.cpp 的功能完整桌面客户端,使用 Electron + Vite + React + TypeScript 构建。可与本地 GGUF 模型对话、管理 llama-server 进程、将 Hugging Face 模型转换为 GGUF、监测训练,并搭建多步骤 AI 工作流——全部集成在一个应用中。
# 1. 安装依赖
npm install
# 2. 启动开发模式(支持热重载)
npm run electron:dev随后在应用内:
- 打开模型标签页 →「添加模型」,选择
.gguf文件并设置端口。 - 点击启动,为该模型启动
llama-server进程。 - 切换到对话标签页,在顶栏选择该模型,开始对话。
你需要在系统 PATH 中提供 llama-server 可执行文件(来自编译好的 llama.cpp)。详见安装依赖。
- 流式(SSE)逐字输出,或非流式模式。
- Markdown 渲染,含代码高亮、GFM 表格、KaTeX 公式(带「复制 LaTeX」按钮)。
- 思考 / 推理模式——在顶栏开启后,每次请求携带
reasoning_effort(低 / 中 / 高),推理过程以可折叠块展示。 - 每条消息的指标显示在 AI 回复下方:实时思考计时、生成中的实时 tokens/秒,以及结束后的平均 / 峰值 tokens/秒和 token 总数。
- 多模态输入——为支持视觉的模型附加图片。
- 可编辑任意用户消息,从该处重新生成对话。
- JSON 模式、联网搜索开关、可调温度 / 最大 token,以及自定义 system 与格式化提示词。
- 后台生成:对话进行中切换会话,回复仍会保存到其所属的会话。
- 添加、编辑、删除 GGUF 模型配置(路径、端口、额外
llama-server参数)。 - 按模型独立启动 / 停止
llama-server进程,并查看实时日志。 - 可为每个模型标记是否支持多模态与联网搜索。
- 侧边栏本地记录:新建、重命名(右键)、删除。
- 以 JSON 形式存储于本地(见数据存储)。
- 从本地模型目录或 Hugging Face 模型 ID 运行
convert_hf_to_gguf.py。 - 选择输出路径与量化类型;转换日志实时输出。
- 需要 Python 以及包含该脚本的 llama.cpp 仓库(见下文)。
- 通过 URL 嵌入正在运行的 TensorBoard,或
- 指定训练日志文件(
.log、.txt、.jsonl),图表每 2 秒刷新。
- 可视化节点图编辑器,串联多个 AI 步骤。节点类型:
- 输入 / 输出——图的入口与出口。
- LLM——以 system 提示词和
{{input}}模板调用已配置的模型。 - 工具——执行 Shell 命令(可设工作目录)。每条命令执行前都需显式确认。
- 路由——按关键词分支,每条路由有独立输出端口;未匹配的输入走
default端口。 - 合并——以
concat(拼接)、vote(取最多)或first(取第一个)方式合并分支。
- 从输出端口拖拽到输入端口即可连线;点击连线可删除。
- 运行前会校验(未选模型、缺少输出节点、存在环)并给出明确提示。
- 节点生成时显示实时 tokens/秒,可逐节点查看输出,运行结束后给出平均 / 峰值汇总。
- 深色 / 浅色主题,跨会话保留。
- 双语界面:English 与简体中文。
- Node.js >= 18。在项目目录运行
npm install。
- 编译 llama.cpp,确保
llama-server(Windows 为llama-server.exe)在PATH中,或放在打包后应用的同级目录。 - 下载 GGUF 模型(如从 Hugging Face),在模型标签页添加。
PATH中的 Python 3.8+。- 来自 llama.cpp 仓库的
convert_hf_to_gguf.py脚本及其依赖:pip install -r /path/to/llama.cpp/requirements.txt
npm run electron:build
# 输出目录:release/| 平台 | 运行方式 |
|---|---|
| Windows | 运行 release\Llamacore Setup x.x.x.exe,再从开始菜单打开。 |
| macOS | 打开 .dmg 并拖入 Applications。若被拦截,在「系统设置 → 隐私与安全性」中允许打开。 |
| Linux | chmod +x "release/Llamacore-x.x.x.AppImage" 后运行。 |
对话记录、模型配置、设置与工作流均以 JSON 形式存储于本地:
- Windows:
%APPDATA%\Llamacore\llamacore.json - macOS:
~/Library/Application Support/Llamacore/llamacore.json - Linux:
~/.config/Llamacore/llamacore.json
本软件按「现状」提供,不附带任何明示或暗示的担保。使用风险由你自行承担。
- 工作流的「工具」节点会在你的机器上执行 Shell 命令。 尽管每条命令都需显式确认,但你需对所运行的内容负全部责任。请勿执行不受信任的命令或工作流。
- 应用会启动并管理本地
llama-server与 Python 进程,并不对其进行沙箱隔离。 - Llamacore 是独立客户端,与 llama.cpp 项目及任何模型提供方无隶属或背书关系。模型输出可能不准确,重要信息请自行核实。
- 你需自行遵守所加载模型的许可证与使用条款。
基于 MIT 许可证发布——详见 LICENSE。
版权所有 (c) 2026 Lucas Vann (陆凯文)。
本应用打包或依赖第三方开源软件(llama.cpp、Electron、React、Vite、Tailwind CSS、Zustand、i18next、react-markdown、KaTeX、Recharts 等),各自遵循其许可证,列表见 NOTICE。