Skip to content

Dev#116

Merged
zTgx merged 30 commits intomainfrom
dev
Apr 24, 2026
Merged

Dev#116
zTgx merged 30 commits intomainfrom
dev

Conversation

@zTgx
Copy link
Copy Markdown
Member

@zTgx zTgx commented Apr 24, 2026

Summary

Changes

Checklist

  • Code compiles (cargo build)
  • Tests pass (cargo test --lib --all-features)
  • No new clippy warnings (cargo clippy --all-features)
  • Public APIs have documentation comments
  • Python bindings updated (if Rust API changed)

Notes

zTgx added 30 commits April 24, 2026 14:57
… of Python strategy layer

BREAKING CHANGE: The ask() method and QueryContext have been removed from the Rust engine
as retrieval functionality has been migrated to the Python strategy layer. Users should
now use Engine.ask() from the Python SDK for retrieval operations. This removes the
retrieval-related code from the Rust core including the ask method implementation,
QueryContext struct, and RetrieverClient stub.
…iguration

- Remove vectorless-agent crate from workspace members in Cargo.toml
- Move vectorless-rerank from active member to commented out section
- Remove entire vectorless-agent module including command parsing,
  configuration, and context modules
- Update member list to reflect removal of agent-related components
BREAKING CHANGE: Removed the entire validator module from vectorless-config
which was not being used. This affects the ConfigValidator implementation
and related traits that were previously available.

- Remove validator module from lib.rs
- Delete entire validator.rs file with all validation logic
- Remove unused re-exports in vectorless-llm/src/lib.rs
- Remove unused imports and types throughout the codebase

perf: optimize memo store by removing unused key builder

- Remove MemoKeyBuilder struct and all related methods
- Clean up unused Fingerprint import in store.rs
- Remove age() method from MemoEntry as it was unused
- Simplify test imports and remove related test

refactor: update Python bindings with skip_from_py_object attribute

- Add skip_from_py_object to all PyO3 class definitions
- This optimizes Python object creation and prevents circular references
- Affects Answer, Evidence, ReasoningTrace, Config, DocumentInfo, Concept
- Affects Document, NodeInfo, MatchResult, FindResult, WordCount
- Affects CollectedEvidence, TopicEntry, SectionSummary, TocEntry
- Affects NodeStats, SimilarResult, SectionCard, DocCard, ConceptInfo
- Affects Engine, VectorlessError, WeightedKeyword, EdgeEvidence
- Affects GraphEdge, DocumentGraphNode, DocumentGraph, LlmMetricsReport
- Affects RetrievalMetricsReport, and MetricsReport classes

feat: introduce shared blackboard system for worker collaboration

- Add Discovery and SharedBlackboard classes for inter-worker communication
- Enable workers to share findings, leads, and cross-references
- Provide formatted context views for individual workers
- Implement discovery extraction from worker outputs

feat: implement query reasoning pipeline replacing understanding

- Replace QueryPlan with QueryAnalysis in dispatcher
- Introduce QueryAnalyzer for multi-stage query analysis
- Add reasoning types: Ambiguity, EntityRef, TemporalConstraint
- Include RetrievalStrategy and QueryAnalysis components
- Update dispatch function to use reasoning instead of understanding

docs: update ask module documentation terminology

- Change "query understanding" to "query reasoning" in docstrings
- Reflect the shift from understanding to reasoning in comments
…val config

- Move indexer, llm_pool, metrics, and storage modules from types/
  directory to src/ directory root
- Remove retrieval module as it's no longer needed in core configuration
- Update lib.rs to export modules directly instead of through types mod
- Add comprehensive Config struct with validation capabilities
- Include ConfigValidationError and ValidationError types for proper
  configuration validation
- Add tests for configuration defaults and validation

feat(blackboard): enhance discovery extraction with cross-document refs

- Extract cross-references from evidence content using regex patterns
- Add "cross_ref" and "lead" discovery types for document references
- Track evidence-referenced documents and generate lead discoveries

refactor(analyzer): consolidate JSON parsing utilities

- Move _parse_json_response function to shared utils module
- Import parse_json_response from vectorless.ask.utils
- Update QueryAnalyzer to use consolidated JSON parsing utility

feat(analyzer): add analysis completion tracking

- Add analysis_complete flag to QueryAnalysis to track whether
  deep analysis stages completed successfully
- Set analysis_complete=False when deep analysis stages fail
- Propagate completion status through re_analyze method

refactor(python): remove deprecated retrieval configuration

- Remove set_top_k and set_max_iterations methods from PyConfig
- Update Config documentation to reflect removal of retrieval params
- Remove RetrievalConfig from python exports

refactor(utils): centralize JSON response parsing logic

- Create parse_json_response utility function in utils module
- Consolidate JSON parsing logic from analyzer and verifier modules
- Handle markdown-wrapped JSON and extract JSON blocks properly

feat(verify): improve evidence reference formatting and scoring

- Update verify prompt to use "doc_name/node_title" format for
  evidence references
- Modify DimensionScore to accept both "doc_name/node_title" and
  "node_title" formats
- Calculate overall confidence from dimension scores instead of
  relying on LLM self-assessment
…e module

- Remove SufficiencyConfig struct and related default functions
- Remove CacheConfig struct and related default functions
- Remove StrategyConfig and all related strategy configuration structs
- Update module documentation to reflect removal of sufficiency types
- Clean up tests by removing sufficiency and strategy config tests
- Keep only storage-related configuration types in the module
- Add explicit ValueError documentation in docstring when JSON parsing fails
- Implement proper exception handling for JSONDecodeError
- Wrap JSON parsing in try-catch block and raise descriptive ValueError
- Include original exception in the raised error using 'from e' syntax
Remove the ReferenceResolver struct that was caching resolved references
for batch resolution. The implementation is no longer needed and has been
removed from the codebase.
…ine module

Remove the `ask` method documentation reference from the Engine module's
main documentation block, along with unused imports for Answer, Evidence,
and ReasoningTrace types that were only used by the removed ask functionality.

BREAKING CHANGE: The ask method has been removed from the Engine API.
- Remove unused FieldKey and Field structs from bm25 module
- Remove entire memory backend implementation including tests
- Remove memory backend from storage backend module exports
- Remove example documentation from storage lib.rs
- Remove unused file system imports from persistence module
- Remove PersistenceOptions struct and related configuration methods
- Remove file-based save/load functions with atomic write logic
- Remove index save/load functions that used file operations
- Update tests to use bytes-based serialization instead of file-based
- Simplify checksum verification tests to work with byte arrays
Removed the unused test module from workspace.rs that contained
a helper function for creating test documents which was no longer
being used in the codebase.
BREAKING CHANGE: drop support for Python 3.10 and require Python >= 3.11

- Update pyproject.toml to require Python >= 3.11
- Remove Python 3.10 classifier from package metadata
- Remove conditional tomli dependency since tomllib is built-in from 3.11+
- Update mypy and ruff configurations to target Python 3.11
- Simplify TOML loading code by removing Python version checks

refactor: improve type safety with protocol-based typing

- Replace loose `Any` and `Callable` types with structured protocols
- Add DocLoader and EventCallback protocols for better type checking
- Update dispatcher and orchestrator to use typed parameters
- Remove dynamic imports for tomllib in favor of built-in module

refactor: modernize optional type annotations

- Convert Optional[T] to T | None union syntax throughout codebase
- Update engine class constructors and methods with modern typing
- Standardize nullability patterns across all modules

perf: replace asyncio.gather with TaskGroup for better error handling

- Use TaskGroup instead of gather(return_exceptions=True) for worker tasks
- Maintain same fault-tolerance while improving async execution
- Update batch compilation to use TaskGroup for better resource management
- Change project description from "Document Understanding Engine for AI"
  to "Knowing by reasoning, not vectors." in both Cargo.toml and pyproject.toml
- Update Python version requirement from 3.9+ to 3.11+ in installation docs
BREAKING CHANGE: Removed QueryResult, QueryResultItem, QueryMetrics,
EvidenceItem, and Confidence types from vectorless-engine as query
logic has been moved to Python layer.

Also removed QueryEvent enum and related functionality from events
module since query handling is now managed externally.
…nderstanding module

BREAKING CHANGE: Removed Answer, Evidence, ReasoningTrace, and TraceStep types
from the understanding module as they were no longer used. Also removed
SufficiencyLevel from format exports.

- Remove Answer, Evidence, ReasoningTrace, TraceStep from understanding exports
- Remove SufficiencyLevel from format exports
- Clean up related documentation comments
…functions

BREAKING CHANGE: Remove SufficiencyLevel enum from vectorless-document
crate and consolidate keyword extraction and evidence formatting
utilities into shared module.

- Remove SufficiencyLevel enum from format.rs as it's no longer used
- Move extract_keywords function to vectorless/ask/utils.py as single
  source of truth
- Move format_evidence function to vectorless/ask/utils.py as single
  source of truth
- Replace in-memory response cache with bounded LRU cache in LLMClient
- Add structured error types for ask pipeline operations
- Remove Answer-related Python bindings that were unused
- rename "vectorless-index" crate to "vectorless-compiler"
- update IndexMode enum to SourceFormat
- rename IndexInput to CompilerInput and PipelineResult to CompileResult
- update IndexContext to CompileContext and related stage names
- rename IndexStage trait to CompileStage across all modules
- update documentation to reflect document compilation instead of indexing
- Introduce extract_llm_insights function in blackboard module to
  identify findings relevant to other documents using LLM analysis
- Add new import and export for extract_llm_insights in ask module
- Integrate LLM insight extraction in orchestrator for multi-document
  scenarios with additional error handling
- Include comprehensive docstring explaining the functionality and cost
  implications
- Rename `stages` module to `passes` across the compiler
- Update all stage-related structs to use pass terminology:
  - `EnhanceStage` → `EnhancePass`
  - `ValidateStage` → `ValidatePass`
  - `ConceptExtractionStage` → `ConceptPass`
  - `NavigationCompileStage` → `NavigationPass`
  - `OptimizeStage` → `OptimizePass`
  - `ReasoningCompileStage` → `ReasoningPass`
  - `VerifyStage` → `VerifyPass`
  - `BuildStage` → `BuildPass`
  - `ParseStage` → `ParsePass`

- Update trait implementations from `CompileStage` to `CompilePass`
- Change result types from `StageResult` to `PassResult`
- Restructure module organization with frontend, analysis, and backend passes
- Update import paths to use new `crate::passes` module structure
- Fix all test references to use new pass naming convention
…ategories

- Replace "Priority" labels with semantic phase categories (Frontend,
  Analysis, Transform, Backend) to better reflect the compilation
  pipeline stages

- Update descriptions for clarity: change "Tree integrity checks
  (optional)" to "Tree integrity checks", remove "(optional)" from
  various stages as they are conditionally executed based on pipeline
  configuration rather than being truly optional

- Add new "Concept" stage at priority 47 between "Reasoning Idx" and
  "Navigation Idx" phases

- Rename "Symbol table (keyword→path mapping)" for "Reasoning Idx"
  stage and "Debug info for runtime navigation" for "Navigation Idx"
  stage

- Add "Output validation" stage at priority 55 between "Reasoning Idx"
  and "Optimize" phases

- Update checkpointing description from "stage group" to "pass group"
  for more accurate terminology
- Add overview documentation explaining the compiler architecture and
  phase breakdown
- Document pipeline infrastructure including CompilePass trait,
  PipelineExecutor, and PipelineOrchestrator components
- Detail all compilation passes with their priorities, dependencies,
  and functionality
- Provide configuration guide for PipelineOptions and related types
- Explain incremental compilation mechanism with change detection
- Document checkpoint and resume functionality for pipeline recovery

feat(compiler): implement RoutePass for query routing table generation

- Build intent routes from nodes with question hints for Agent
  acceleration
- Create concept routes from topic tags to enable semantic navigation
- Calculate relevance scores based on content richness and hint count
- Limit route targets to improve performance and reduce memory usage

refactor(compiler): extend CompileContext with agent acceleration data

- Add query_routes field for pre-computed routing table storage
- Include chain_index for reasoning chain navigation
- Add content_overlap map to prevent duplicate content visits
- Introduce evidence_scores for per-node quality assessment
- Update context cloning and result extraction methods accordingly
…ndalone usage

- Add documentation page for writing custom passes with implementation examples
- Document the parsers module and RawNode structure for document parsing
- Create standalone usage guide for vectorless-compiler crate
- Update sidebar configuration to include new documentation pages

feat(compiler): implement backend analysis passes for chain, overlap, and scoring

- Add ChainPass to build reasoning chain index from document references
- Implement OverlapPass to detect content overlap between leaf nodes using
  Jaccard similarity
- Create ScorePass to compute evidence quality scores based on density,
  data richness, and specificity
- Register new passes in the pipeline executor with appropriate priorities
- Update module exports to make new passes available
…hains, overlap detection and scoring

- Add RoutePass for building query routing tables with intent and concept routes
- Add ChainPass for creating reasoning chain indexes from document references
- Add OverlapPass for detecting content overlap with Jaccard similarity algorithm
- Add ScorePass for evidence quality scoring using density, data richness and specificity
- Update pipeline executor to include new backend stages at priorities 52-58
- Add comprehensive unit tests for each new pass covering edge cases and end-to-end scenarios
- Update documentation diagrams to show new backend components (Route, Chain, Overlap, Score)
- Add metrics recording for each new pass including timing and count statistics
- Update validation pass to track new output flags
- Export NodeReference type for external usage
- Add RoutePass for pre-computed query routing table to accelerate
  agent-based queries
- Add ChainPass for building reasoning chain index from
  in-document cross-references
- Add OverlapPass for detecting content overlap between leaf nodes
  using Jaccard similarity
- Add ScorePass for computing per-node evidence quality scores
  based on density, richness, and specificity metrics

Update documentation to reflect 15 passes instead of 1 in the
pipeline, including detailed descriptions of new passes, their
dependencies, and data flow diagrams.

Modify ChainPass implementation to use proper RefType enum instead
of string matching for reference classification.
BREAKING CHANGE: Renamed IndexMetrics to CompileMetrics and
IndexedDocument to CompiledDocument throughout the codebase.

- Updated documentation to reflect compile pipeline terminology
- Changed metric type from IndexMetrics to CompileMetrics
- Renamed internal document type from IndexedDocument to
  CompiledDocument
- Added new agent acceleration data fields to compiled document
- Updated schema version from 1 to 2 due to structural changes
- Modified persistence layer to include new index types
…stomStageBuilder

BREAKING CHANGE: Removed deprecated StageResult type alias that was marked
for removal since version 0.2.0. Also removed CustomStageBuilder struct
which was unused in the codebase. These changes clean up the API surface
and remove dead code.
Add pre-computed agent acceleration data structures to the Document
type including query routing tables, reasoning chain indices, content
overlap maps, and evidence quality scores. Update documentation to
reflect compilation terminology instead of ingestion terminology.

BREAKING CHANGE: Document understanding terminology changed from
ingestion to compilation process.

feat(navigator): implement agent acceleration query methods

Add new methods to DocumentNavigator for querying agent acceleration
data including intent routes, concept routes, reasoning chains,
content overlaps, and evidence scores. Include helper method for
node ID conversion.

refactor(python): expose agent acceleration APIs to Python bindings

Expose new agent acceleration data structures and query methods
through Python bindings. Add corresponding Python wrapper classes
and async methods for all new functionality.

feat(agent): utilize acceleration data for improved keyword hints

Enhance agent keyword hint generation by incorporating pre-computed
concept routes and evidence quality scores alongside traditional
keyword index matches. Provide richer context for agent decision
making.
…hanced phases

- Rename "Index Pipeline" to "Compile Pipeline" to better reflect the compilation nature of the process
- Replace stage-based terminology with phase-based structure (Frontend → Analysis → Transform → Backend)
- Add detailed documentation for new backend passes including Route, Chain, Overlap, Score, and Verify
- Document agent acceleration data and how it guides worker navigation
- Update references from "indexing" to "compilation" throughout the architecture documentation

feat(navigator): optimize concept routes lookup with early termination

- Refactor concept_routes method to return early when no targets found
- Limit results to one ConceptRouteInfo instead of collecting all matches
- Improve performance by avoiding unnecessary collection operations

fix(python): ensure error messages are properly converted to strings

- Convert string literals to owned String objects in VectorlessError creation
- Maintain consistency in error message handling across Python bindings
- Prevent potential issues with string ownership in error contexts
- Replace old module path `vectorless::parser::markdown` with new path
  `vectorless_compiler::parse::markdown::config::MarkdownConfig`
- Update examples to use proper crate structure and remove outdated
  configuration methods
- Change rust code blocks to `rust,ignore` to prevent compilation errors
- Move pipeline module declaration after passes module
- Reorder re-exported types from pipeline and config modules
- Adjust import order for consistency across multiple files

refactor(pipeline): reorder context field exports

- Move ChainIndex before Concept in Document imports
- Remove unnecessary blank lines in context definitions
- Clean up unused imports in various pipeline modules

refactor(passes): reorganize pass module structure

- Move chain module before reasoning in backend
- Reorder imports and module declarations consistently
- Move parse module after build in frontend
- Move split module after enrich in transform

refactor(engine): clean up imports and declarations

- Reorder compiler imports in engine module
- Simplify import grouping in indexer module
- Move engine module declaration position

refactor(storage): remove redundant documentation comment

style: format function calls and assertions with proper line breaks

- Wrap long assertion statements in test cases
- Format method chaining for better readability
- Break down complex expressions across multiple lines
- **Compile pipeline**: renamed index pipeline to compile pipeline with passes-based architecture
- **Compiler refactor**: renamed stages to passes, removed deprecated `StageResult` alias and `CustomStageBuilder`
- New backend compilation passes: query routing, reasoning chains, overlap detection, and scoring
- Agent acceleration data added to compiled documents
- LLM-powered cross-document insight extraction in ask module
- Enhanced JSON parsing with proper error handling
- Upgraded minimum Python version to 3.11
- Removed unused modules: agent, memory backend, validation, ReferenceResolver, SufficiencyLevel
- Restructured configuration modules and removed legacy retrieval config
- Simplified storage layer by removing memory backend
- Documentation updates for architecture and compilation pipeline
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vectorless Ready Ready Preview, Comment Apr 24, 2026 3:13pm

@zTgx zTgx merged commit 9acb0b9 into main Apr 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant