目录入库 markdown 时,MarkdownParser 会把每个 .md 按标题拆成目录结构,导致
[x](./other.md)、 等相对链接失效——目标已不在原路径。本改动在写入
每个拆分 section 前重写相对链接,补偿入库引入的路径变换(源文件→目录、目标 .md→
目录或文件)。
When a markdown directory is ingested, MarkdownParser splits each .md into a
directory structure, breaking relative links like [x](./other.md) and
 — their targets no longer live at the original paths. This rewrites
relative links before writing each split section, compensating for the path
transformations ingest introduces.
核心设计 / Key properties:
- 磁盘坐标系 / Disk-coordinate: relpath 用原始磁盘路径计算,对 --to 免疫。
- 保守精确 / Conservative: 仅重写磁盘存在且落在 import_root 子树内的目标;外链 /
页内锚点 / 绝对路径 / 缺失 / 越界一律原样保留。
- 对入库零假设 / Zero ingest assumptions: 目标 .md 的入库布局通过用同一个
MarkdownParser 实跑解析到内存 FS 得到(_target_split_files/_inmemory_split_files);
落点是文件还是目录(_doc_landing)、目录名、章节文件、章节内容全部来自 parser
本身。无论 parse_content 今后怎么改,跑一遍 in-memory parse 即得最终结果,重写
自动跟随、与真实入库一致,不复刻命名/拆分规则,也不假设 .md 一定目录化。
The target's ingest layout is obtained by running the SAME MarkdownParser into an
in-memory FS: whether the landing is a file or a directory, its name, the section
files and their content all come from the parser, so nothing about ingest is
reimplemented or assumed and the rewrite follows parse_content automatically.
带锚点的 .md 经 in-memory parse 精确定位到章节文件(GitHub-slug 匹配标题);单文件
文档(含未来小 .md 不拆目录)保留后缀指向该文件;图片 / 裸目录保持路径仅调相对深度。
重写链路为 async,且仅由 DirectoryParser 触发——单文件入库不重写。
新增 tests/parse/test_markdown_link_rewrite.py,22 passed。
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary / 概述
目录入库 markdown 时,
MarkdownParser会把每个.md按标题拆成目录结构,导致[x](./other.md)、等相对链接失效——目标已不在原路径。本 PR 在写入每个拆分 section 前重写相对链接,补偿入库引入的路径变换(源文件→目录、目标 .md→目录或文件)。When a markdown directory is ingested,
MarkdownParsersplits each.mdinto a directory structure, breaking relative links like[x](./other.md)and— their targets no longer live at the original paths. This PR rewrites relative links before writing each split section, compensating for the path transformations ingest introduces.Key design / 核心设计
--to目标位置免疫。DirectoryParser触发)。.md的入库布局通过用同一个MarkdownParser实跑解析到内存 FS 得到(_target_split_files/_inmemory_split_files);落点是文件还是目录(_doc_landing)、目录名、章节文件、内容全部来自 parser 本身。无论parse_content今后怎么改,跑一遍 in-memory parse 即得最终结果,与真实入库一致,不复刻命名/拆分规则、也不假设.md一定目录化。The target's ingest layout is obtained by running the SAME
MarkdownParserinto an in-memory FS, so whether the landing is a file or a directory, its name and the section files all come from the parser itself — nothing about ingest is reimplemented or assumed, and the rewrite followsparse_contentautomatically.Test plan / 测试
tests/parse/test_markdown_link_rewrite.py— 22 passed:端到端覆盖目录链接、小文件锚点/查询、大文件章节定位、图片、裸目录、外链/绝对路径/越界、以及"未来小 .md 不拆目录"的前瞻场景。ruff checkclean。🤖 Generated with Claude Code