v2.1.0
DeepDoc 2.1.0 is a quality and reliability release — no new pipeline stages, but a broad set of correctness fixes across the scanner, planner, generator, chatbot, and CLI that address real-world failures observed after the 2.0.0 launch.
Fixed
Scanner
- Deepdoc-generated dirs excluded from scan —
site/,.deepdoc/,chatbot_backend/, and the configuredoutput_dirwere being scanned as source files, causing giant-file warnings on scaffold files likechatbot-panel.tsx. These dirs are now excluded both inconfig.pydefaults and as a hardcoded implicit guard inscan_repo(). - Redis false-positive reduced — bare
rwas included in the Redis connection-variable pattern inscanner/common.py, causing almost any single-letter variable to match. Removed; pattern now requirescache,redis, orclient. - Artifact scan is now deterministic —
artifacts.pyiteratesfile_contentsin sorted order so artifact detection results are stable across runs. - Path stripping handles stacked prefixes —
scanner/utils.pynow strips all leading./segments (e.g.../../foo) rather than only the first one. - NestJS multi-controller support —
detect_nestjspreviously used.search()so only the first@Controllerin a file was found. Now uses.finditer()with a sorted controller-span list; each@Get/@Post/etc. is assigned the base path of the nearest preceding@Controller. - Fastify mount attribute fix —
repo_resolver.pyreferenced the non-existentparent_info.local_fastify_mounts; corrected toparent_info.local_mounts.
Planner
- Bucket slug collision guard — fallback slug generation now checks for existing slugs and appends a counter suffix (
-2,-3, …) to avoid silent collisions where two buckets resolve to the same slug and one overwrites the other in the plan. - Consolidation cycle guard — added a
visitedset to the bucket-consolidation while-loop; a bucket that has already been considered as a merge target cannot be merged again, preventing infinite loops when the merge-candidate graph has a cycle. - Unified nav section normalizer —
_normalize_nav_sectionwas defined in bothheuristics.pyandnav_shaping.pywith diverging logic (theheuristics.pycopy had backend-specific remaps thenav_shaping.pycopy lacked). The duplicate is removed; the canonical version innav_shaping.pynow includes all remaps for universal and backend-specific cases. - Duplicate
_decompose_bucketsremoved —heuristics.pyboth imported_decompose_bucketsfrombucket_refinementand redefined it locally (~158 lines). The local shadow is deleted; all callers resolve to the single canonical version inbucket_refinement.py. - Terminal corruption during parallel decompose fixed —
_llm_stepwrapped every LLM call in aRich.Live()context on the shared module-levelconsole. With up to 6 concurrentThreadPoolExecutorworkers each opening aLivecontext simultaneously, terminal output was corrupted (garbled progress bars, interleaved escape sequences). TheLivecontext manager has been removed entirely. - Incidental HTTP bucket double-merge prevented —
bucket_refinement.pynow tracks absorbed slugs in amerge_target_slugsset; a bucket that has already absorbed another cannot itself be absorbed again in the same pass. - Smart-update
merged_planwas incomplete — theDocPlanconstructor call insmart_update_v2.pyomittedorphaned_files,integration_candidates, andclassification. These are now propagated so incremental replans retain full plan context. - Orphaned slugs removed from stale set — after
_handle_deleted_filesruns, slugs for fully-orphaned buckets are now filtered out ofchange_set.stale_bucket_slugsto prevent a redundant regeneration attempt on pages that have already been deleted. - Copy-renamed files trigger regeneration — the incremental update stale check used
status_code == "R"(rename only); now checksstatus_code in ("R", "C")to also catch copy operations. - Update success requires zero failures —
pages_failed <= 0(always true since the count is non-negative) replaced with the correctpages_failed == 0.
Generator
- Null guard on
generation_hints—evidence.pyaccessedbucket.generation_hints.get(...)directly; if the field wasNonethis raisedAttributeError. Now guarded with(bucket.generation_hints or {}).get(...). - Manifest loaded once per run —
generation.pywas loading the on-disk manifest once per bucket inside the stale check. The manifest is now loaded once ingenerate_alland passed into_bucket_is_stale, eliminating redundant I/O in large repos. - Non-transient LLM errors no longer retry — the generation retry loop was sleeping and retrying on all exceptions. Auth failures, invalid model names, and quota errors now raise immediately; only rate-limit and transient errors trigger the backoff retry.
- MDX brace escaping skips JSX prop lines — the broad
{…}→{…}escape inpost_processors.pywas mangling JSX prop assignments likecomponent={MyComp}. Lines containing={are now excluded from broad brace escaping. - Dead unreachable
returnremoved — an unreachablereturn contentstatement at the end of a branch inpost_processors.pywas silently masking the actual return path. Deleted. - Empty list YAML frontmatter —
_merge_frontmatter_fieldswas writing empty lists askey:\n [](block-style), which gray-matter/Fumadocs rejected. Empty lists are now written askey: [](flow-style).
Chatbot
- FAISS invalid-embedding filter —
chatbot/persistence.pynow filters out results withscore <= -0.5, preventing corrupted or zero-magnitude embeddings from appearing as top search hits. - SSE streams no longer hang — all three SSE endpoints (
/stream,/deep-research/stream,/code-deep/stream) used a blockingtokens.get()with no timeout; a silently-dead generator thread would stall the HTTP response forever. Each endpoint now usestokens.get(timeout=30)and emits apingkeepalive event on timeout. - Citation dedup by range, not just path —
answer_mixin.pywas deduplicating citations by file path only, collapsing distinct line ranges in the same file to a single entry. The dedup key is now(path, start_line, end_line). - Leading
./stripped from citation paths — regex-matched file paths inanswer_mixin.pynow have any leading./stripped before lookup, matching how paths are stored in the index. - Azure
api_versionpropagated to chatbot — when the chatbot inherits its LLM config fromllm.*,api_versionis now included in the inherited config alongsidebase_urlandapi_key_env.
CLI / Config
deepdoc config settype inference uses defaults —_set_nestedused the existing config value to infer the target type; if the value wasNone(key not yet set), it fell through to a plain string assignment. It now walksDEFAULT_CONFIGas a type oracle when the existing value is absent.- Azure provider validated before generation starts — selecting
--provider azurepreviously wrote a config that caused generation to start and then fail silently mid-run because LiteLLM couldn't reach the endpoint.LLMClient.__init__now validates thatbase_urlandapi_versionare both present and non-empty before any LLM call is made, raising a loud box error that names exactly what is missing and shows the correct YAML snippet to fix it. The same check runs inbuild_chat_clientfor chatbot Azure configs.deepdoc init --provider azurenow writes placeholder values for both fields into.deepdoc.yamland shows Azure-specific next steps so users know what to fill in before runninggenerate.
Nav / Site
whats-changedpage appears in nav on first run — two ordering bugs prevented the changelog page from appearing in the sidebar on the very firstgenerate: (1)pipeline_v2.pywas calling_build_site()before_record_changelog(), so the nav was built without the slug; (2)smart_update_v2.pywas calling_append_changelog()after_rebuild_nav(), so the updated plan nav was never written to the site. Both fixed by reordering the calls.whats-changedsynthetic page registered before nav loop —engine.pynow injects a syntheticDocPageforwhats-changedintoslug_to_pagebefore the nav-structure loop, so it isn't silently skipped (it isn't aDocBucketsoplan.pagesnever contains it).
Glossary
- Glossary evidence cap —
bucket_injection.pywas feeding up to 30 model files as evidence for the domain-glossary bucket; capped at 10. - Glossary length limits enforced — the domain-glossary prompt now enforces a 40-term hard cap, an explicit skip-list for generic fields (
id,created_at,email, etc.), grouped output via<Accordions>, a single Mermaid diagram maximum, and a 300-line page length limit. Previously the LLM wrote individual entries for every model field, producing pages that exceeded 5 000 lines.
Changelog page
- Richer changelog entries —
changelog_writer.pynow generates commit metadata tables, bulleted page lists with links, source file lists, and strategy explanation blocks per entry, replacing the previous one-liner accordion entries.