Feat understanding by zTgx · Pull Request #108 · vectorlessflow/vectorless

zTgx · 2026-04-23T03:25:08Z

Summary

Changes

Checklist

Code compiles (cargo build)
Tests pass (cargo test --lib --all-features)
No new clippy warnings (cargo clippy --all-features)
Public APIs have documentation comments
Python bindings updated (if Rust API changed)

Notes

- Add PyAnswer wrapper with content, evidence, confidence, and trace getters - Rename DocumentInfo to reflect "understood" instead of "indexed" - Change id field to doc_id for clarity - Replace summary field with concepts extraction - Update section_count and rename list method to list_documents - Add Concept class for key concept extraction - Refactor Engine methods: index->ingest, query->ask, remove->forget - Remove deprecated streaming and context modules - Update documentation examples to use new API

- Rename "reasoning-native document intelligence engine" to "Document Understanding Engine for AI" - Update project structure to reflect cargo workspace with vectorless-core/vectorless and vectorless-py crates - Change Engine.query() to Engine.ask() in retrieval flow - Update build commands to use workspace root - Adjust development workflow paths to use crates/vectorless - Update Python binding paths to crates/vectorless-py/src/lib.rs - Add Python SDK development notes

Update the development workflow documentation to reflect new directory structure: - Change feature implementation path from crates/vectorless/src/ to vectorless-core/vectorless/src/ - Update Python bindings path from crates/vectorless-py/src/lib.rs to vectorless-core/vectorless-py/src/lib.rs - Update Python SDK path from python/vectorless/ to vectorless/

- Update description from "Reasoning-based Document Engine" to "Document Understanding Engine for AI" - Bump minimum Python requirement from 3.9 to 3.10 - Update author name from "vectorless developers" to "Vectorless" - Remove Python 3.9 classifier and add clarifying comment for tomli dependency - Update keywords to better reflect document understanding focus - Update mypy and ruff target versions to Python 3.10 - Add uv tool configuration with dev dependencies

- Remove exclude directive from Cargo.toml that was excluding docs/, examples/, and .* patterns - Delete all example files including deep_retrieval.rs, events.rs, flow.rs, graph.rs, index_directory.rs, index_incremental.rs, and index_pdf.rs

- Change Rust examples description from "Rust examples (flow, indexing, pdf, batch, etc.)" to "Rust examples (legacy, no new additions)" - Add Python examples entry with description "Python examples (primary, for Python ecosystem)" - Remove samples/ directory from documentation

- Replace deprecated IndexContext and QueryContext imports with IngestInput - Update method names: index/list/remove to ingest/list_documents/forget - Change query API usage from QueryContext to direct ask method call - Update terminology from 'indexing' to 'ingesting' and 'understanding' - Rename challenge queries to challenge questions - Add confidence, evidence count, and trace steps to output display - Update variable names from doc.id to doc.doc_id for consistency

- Create HISTORY.md to track project changes and version history

- Add complete history tracking from initial release (0.1.0) to current version (0.1.11) - Document core principles: "Reason don't vector", "Model fails we fail", "No thought no answer" - Include detailed changelog covering agent-based retrieval architecture, navigation commands, orchestrator supervisor loop, and query understanding pipeline - Track evolution from basic indexing to reasoning-based document engine - Document PDF parsing improvements, streaming retrieval, and multi-document support

- Downgrade workspace package version from 0.1.32 to 0.1.12 - Update description from "Reasoning-based Document Engine" to "Document Understanding Engine for AI" - Change pyproject.toml to use dynamic version management instead of hardcoded version 0.1.11

- Introduce ConceptExtractionStage that extracts key concepts from document topics and summaries using LLM calls - Add fallback mechanism for keyword-based concept extraction when LLM is unavailable - Implement maximum limits for topics (20) and concepts (15) to control processing scope - Add proper error handling with fallback to basic extraction on LLM failures feat(document): add utility methods for document navigation - Add `cat()` method to get node content by ID for agent commands - Add `find()` method to search nodes by keyword in title/content - Add `node_title()` method to retrieve node titles by ID - Add `section_count()` method to get total number of sections refactor(index): integrate concept extraction into pipeline - Register ConceptExtractionStage in pipeline executor at priority 47 - Update pipeline documentation to reflect new stage ordering - Modify IndexContext to include concepts field for stage output - Update PipelineResult to include concepts for final output refactor(storage): persist concepts in indexed documents - Add concepts field to PersistedDocument struct with serde serialization - Include concepts in IndexedDocument for runtime access - Ensure concepts are properly saved and loaded during persistence refactor(indexer): pass concepts through indexing workflow - Update indexer to transfer concepts from pipeline results to indexed documents - Ensure concepts are properly persisted along with other document metadata

Add trace_steps field to Output and WorkerOutput structs to capture reasoning trace steps during agent navigation. Initialize trace_steps in constructors and extend WorkerState with trace collection capabilities. Add navigation index building and verification stage to pipeline that validates ingest output reliability by checking tree structure, document summary, and concept extraction results before persistence. Refactor document loading to use unified Document structure and implement trace collection in agent state management.

- Split the main crate into multiple specialized crates including vectorless-error, vectorless-document, vectorless-config, vectorless-utils, vectorless-scoring, vectorless-graph, vectorless-events, vectorless-metrics, vectorless-llm, vectorless-storage, vectorless-query, vectorless-index, vectorless-agent, vectorless-retrieval, vectorless-rerank, and vectorless-engine - Add comprehensive command parsing system for agent navigation with support for ls, cd, cat, find, grep, head, findtree, wc, pwd, check, and done commands - Implement quote-stripping and multi-level target resolution with exact, case-insensitive, substring, and numeric matching - Add extended target resolution with deep search capability up to depth 4 using BFS algorithm - Create agent configuration system with worker and answer pipeline settings including navigation budgets and evidence caps - Implement structured output types for agent results including evidence collection, metrics tracking, and confidence scoring - Add read-only context wrappers for accessing document navigation indices, content trees, and reasoning indexes - Include comprehensive test suite for command parsing and target resolution functionality - Add Python script to fix crate:: import references across split modules

- Add vectorless-rerank dependency to vectorless-agent - Introduce Evidence type in vectorless-rerank and re-export it from vectorless-agent instead of defining locally - Move query-related types (EvidenceItem, QueryMetrics, QueryResultItem, Confidence) from vectorless-engine to vectorless-retrieval - Update imports across multiple modules to use correct paths after refactoring - Add necessary dependencies (regex, serde_json) and remove vectorless-agent dependency from vectorless-rerank - Update module visibility for config, memo, and throttle in vectorless-llm This change centralizes query result types in vectorless-retrieval module and introduces proper re-ranking capabilities through the new vectorless-rerank module. BREAKING CHANGE: Evidence type is now re-exported from vectorless-rerank::types instead of being defined in vectorless-agent.

- Move tempfile to dev-dependencies in Cargo.toml - Update import path from crate::llm::throttle to crate::throttle in client.rs and executor.rs test modules - Fixes incorrect module path references in test code

Change import from crate::document::DocumentTree to crate::tree::DocumentTree across multiple test modules to align with updated module structure. BREAKING CHANGE: This change updates the internal module structure and import paths for DocumentTree.

Change the type annotation from crate::DocumentTree to vectorless_document::DocumentTree for consistency with module structure. feat(retriever): import additional types and update module paths Import DocContext, Scope, and WorkspaceContext from vectorless_agent config module and update QueryResult import path from crate::client to super::types. refactor(retriever): remove redundant module prefix in type usage Replace agent::DocContext with DocContext and update agent::Scope and agent::WorkspaceContext to their respective unqualified imports. chore(retrieval): add indextree as dev dependency Add indextree to dev-dependencies section of Cargo.toml for workspace configuration.

- Move some import statements to improve code readability and maintain consistent ordering - Reorder some field declarations and function calls to follow standard Rust formatting conventions - Remove unused pub(crate) mod test_support from vectorless-engine - Remove unused test_support.rs file as it's no longer needed - Adjust some long lines to fit within 100 character limit - Move DocumentGraphConfig export to proper location in types module - Reorder some struct field initializations for better readability

- Replace direct doc.as_context() call with explicit DocContext construction using individual fields (tree, nav_index, reasoning_index, doc_name) - Update concurrency configuration to use proper type conversion from throttle config refactor(graph): consolidate configuration in vectorless-config - Remove local DocumentGraphConfig implementation - Add vectorless-config dependency to vectorless-graph - Re-export DocumentGraphConfig from vectorless_config as single source of truth refactor(python): update module imports to use vectorless_engine - Replace ::vectorless imports with ::vectorless_engine in python bindings for Answer, Config, DocumentInfo, Engine, Error, Graph, and Metrics types - This ensures consistent usage of the engine module across Python API

- Add re-exports of Config from vectorless_config - Add re-exports of core document types (Answer, Concept, DocumentInfo, etc.) - Add re-exports of error handling types (Error, Result) - Add re-exports of event types (EventEmitter, IndexEvent, QueryEvent, etc.) - Add re-exports of graph types (DocumentGraph, DocumentGraphNode, etc.) - Add re-exports of metrics types (LlmMetricsReport, MetricsReport, etc.) - Add re-export of DocumentTree from vectorless_document

…raph crate - Remove tracing and tokio dependencies from vectorless-config - Add vectorless-graph as dependency instead - Remove graph module from types and update import to use vectorless-graph - Move DocumentGraphConfig re-export to use vectorless_graph crate refactor(vectorless-graph): move DocumentGraphConfig implementation to graph crate - Remove vectorless-config dependency from vectorless-graph - Implement DocumentGraphConfig directly in vectorless-graph crate - Include all configuration fields and methods for document graph settings - Maintain same API interface while moving implementation to correct location

BREAKING CHANGE: Remove the entire vectorless core module including: - Cargo.toml configuration and dependencies - Single document challenge example that tested deep reasoning - Agent command parsing system with navigation commands (ls, cd, cat, find, grep, etc.) - Target resolution logic for document tree navigation - All associated tests and implementations This removes the core vectorless functionality that enabled AI-powered document navigation and reasoning capabilities.

…ned crates Update CLAUDE.md to reflect the new architecture with 17 fine-grained Rust crates instead of the previous monolithic structure. Add detailed tree view of the new crate organization and dependency layers showing compilation isolation benefits. Remove the fix_imports.py script that was used for the crate splitting process as it's no longer needed. Update development workflow instructions to reflect the new multi-crate structure and add information about cargo test counts and specific crate building commands.

Move DocumentGraphConfig import to maintain consistent ordering and improve code organization. fix(engine): format ConcurrencyConfig initialization Properly format the ConcurrencyConfig initialization across multiple lines to improve readability. refactor(lib): consolidate DocumentTree export Move DocumentTree export to correct location in engine lib to avoid duplicate exports and maintain proper module structure. refactor(python): format graph module imports Reformat imports in python graph module to follow consistent multi-line style for better readability.

Remove complete examples directory containing various demonstration files including batch indexing, document management, error handling, index metrics, PDF indexing, and session walkthrough examples. The entire examples folder with all subdirectories and files has been removed, including: - README.md files explaining each example - main.py implementation files - Directory indexing and management examples - Error handling demonstrations - Index metrics and PDF indexing examples - Session API walkthrough materials

Add a comprehensive example demonstrating advanced document indexing and querying capabilities. The example includes a realistic technical report about quantum computing research with complex inter-lab dependencies, financial data, and technical specifications. The challenge demonstrates the engine's ability to handle queries requiring deep navigation through the document tree, cross-referencing details across distant sections, and extracting information from nested structures rather than surface-level keyword matching. Includes five challenge questions that test: - Cross-referencing device characterization needs with equipment specs - Tracing dependency chains between research milestones - Calculating impacts from distributed data points - Complex multi-step reasoning across document sections

vercel · 2026-04-23T03:25:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
vectorless	Ready	Preview, Comment	Apr 23, 2026 3:25am

zTgx added 28 commits April 22, 2026 22:00

docs(HISTORY): add history tracking file

6c3ecb5

- Create HISTORY.md to track project changes and version history

refactor(vectorless-llm): update import paths and add dev dependency

a1f8373

- Move tempfile to dev-dependencies in Cargo.toml - Update import path from crate::llm::throttle to crate::throttle in client.rs and executor.rs test modules - Fixes incorrect module path references in test code

refactor(builder): update indexer client import path

db2396d

refactor(engine): remove explicit type annotation in source_path mapping

060345c

zTgx merged commit 4cc38f4 into dev Apr 23, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat understanding#108

Feat understanding#108
zTgx merged 28 commits intodevfrom
feat-understanding

zTgx commented Apr 23, 2026

Uh oh!

vercel Bot commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zTgx commented Apr 23, 2026

Summary

Changes

Checklist

Notes

Uh oh!

vercel Bot commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant