-
Notifications
You must be signed in to change notification settings - Fork 596
Knowledge Chunking and RAG
Knowledge bases are the core input layer in GEOFlow. High-quality knowledge makes generation more stable and more grounded in real business material. Low-quality knowledge turns automation into a noise amplifier.
For a complete explanation of knowledge base principles, usage, vectorization rules, retrieval algorithms, and governance logic, start with: AI Knowledge Base Tutorial.
If you only want to upload files and preview chunks, an embedding model is not required.
If you want RAG retrieval during article generation, you need:
- at least one working chat model
- at least one working embedding model
- the embedding model selected as the default embedding model
- PostgreSQL pgvector available
- the knowledge base re-saved, uploaded again, or refreshed for vector writing
Knowledge chunking is configured from the AI Models page.
| Strategy | Description | Best fit |
|---|---|---|
| Structured rule chunking | GEOFlow chunks by headings, paragraphs, length, and overlap windows | Recommended default; stable, controllable, low-cost |
| Automatic strategy | GEOFlow chooses a suitable strategy from configuration | Use when you want simpler setup |
| LLM semantic planning | A chat model plans semantic boundaries; GEOFlow rebuilds final chunks from the source text | Long documents, complex structures, or documents where semantic completeness matters |
Semantic planning does not rewrite the knowledge base and does not generate new knowledge.
It does one thing:
Plan chunk boundaries from the source document.
The chunks stored in the database are still rebuilt from the original text. This balances semantic completeness, cost, speed, and traceability.
If semantic planning fails, for example because:
- the model times out
- JSON is invalid
- boundary counts are abnormal
- planned boundaries cannot be mapped back to source text
GEOFlow falls back to structured rule chunking so knowledge ingestion can still complete.
Each chunk keeps as much metadata as possible:
- chunk title
- section path
- chunking strategy
- sequence number
- token / character estimate
- source hash
This metadata is useful for previewing, debugging, rebuilding, and future retrieval improvements.
Usually because the embedding model is missing, disabled, failing, or not selected as the default embedding model.
Check that the LLM semantic planning strategy is selected and that a working chat model is available for planning. Even when semantic planning fails, GEOFlow falls back to rule chunking.
Use a fast, low-cost chat model with enough context, such as a lightweight Gemini or OpenAI-compatible model. Boundary planning does not require the heaviest reasoning model.
No. Structured rule chunking is usually enough for clean documents. Semantic planning is most valuable for long, complex documents with cross-section semantics.
- 首页
- 快速上手
- 常见问题
- 部署指南
- 部署脚本使用指南
- 部署检查清单
- 模板与主题工作流
- 模型接入指南
- AI 知识库教程
- 知识库切片与 RAG
- 分发管理与目标站点
- 数据分析与日志
- 什么是 GEOFlow
- GEOFlow 方法论
- 使用边界与内容底线
- 适用场景
- 场景部署与使用方式
- 核心能力总览
- 推荐采用路径
- Skill / CLI / API 生态
- 路线图
- 作者与项目
- Home
- Getting Started
- FAQ
- Deployment Guide
- Deployment Scripts Guide
- Deployment Checklist
- Theme and Template Workflow
- Model Setup Guide
- AI Knowledge Base Tutorial
- Knowledge Chunking and RAG
- Distribution Management and Target Sites
- Analytics and Logs
- What Is GEOFlow
- GEOFlow Methodology
- Principles and Content Boundaries
- Use Cases
- Deployment Patterns by Scenario
- Core Capabilities
- Recommended Adoption Path
- Skill / CLI / API Ecosystem
- Roadmap
- Author and Project