Add test coverage for VectorSimilarityUtils functions #165

Copilot · 2026-01-04T00:56:39Z

The VectorSimilarityUtils module (cosineSimilarity, jaccardSimilarity, hammingDistance, hammingSimilarity) was extracted from VectorSimilarityTask but lacked dedicated tests.

Changes

Added VectorSimilarityUtils.test.ts with 63 test cases covering:
- All four similarity/distance functions
- All typed array types (Int8Array, Uint8Array, Int16Array, Uint16Array, Float32Array, Float64Array)
- Edge cases: zero vectors, orthogonal vectors, opposite vectors, single element vectors
- Error handling for mismatched vector lengths
- Cross-function consistency (e.g., hammingSimilarity = 1 - hammingDistance)

Example Test Coverage

test("should calculate cosine similarity for orthogonal vectors", () => {
  const a = new Float32Array([1, 0, 0]);
  const b = new Float32Array([0, 1, 0]);
  expect(cosineSimilarity(a, b)).toBeCloseTo(0.0, 5);
});

test("should work with Int8Array", () => {
  const a = new Int8Array([10, 20, 30]);
  const b = new Int8Array([10, 20, 30]);
  expect(cosineSimilarity(a, b)).toBeCloseTo(1.0, 5);
});

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* [feat] New VectorQuantizeTask, updated VectorSimilarityTask * [WIP] rework document * [refactor] Update task input handling and smartClone method for improved input data handling for tests - Replaced structuredClone and JSON methods with a new smartClone function that deep-clones plain objects and arrays while preserving class instances by reference. - quick versions of tasks as functions now pass input to run not the constructor which means no defaults and cloning * [refactor] Removed unnecessary checks for undefined values when copying additional input properties. * [refactor] Enhance tasks with service registry integration - Updated IExecuteContext and IRunConfig to include registry support. - Refactored TaskRunner and TaskGraphRunner to utilize the service registry for improved task execution and model retrieval. - Ensured backward compatibility while enhancing the overall architecture for better service management. - Introduced a service registry to manage model repositories and execution contexts in AiTask. * [feat] Introduce input resolver system for enhanced schema handling - Added a new InputResolver to manage schema-annotated inputs, allowing for automatic resolution of string IDs to their corresponding instances. - Implemented repository and model resolution capabilities, improving task input handling and validation. - Created new schemas for tabular, vector, and document repositories to facilitate input resolution. - Enhanced AiTask and TaskRunner to utilize the input resolver for better integration with service registries. - Added comprehensive tests to ensure the functionality of the input resolver system and its integration with tasks. * [feat] Introduce new AI tasks and enhance document processing capabilities - Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows. - Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution. - Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers. - Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system. * Update packages/ai/src/source/ProvenanceUtils.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/ai/src/source/StructuralParser.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/ai/src/source/ProvenanceUtils.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/ai/src/task/ChunkToVectorTask.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/ai/src/task/QueryExpanderTask.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/task-graph/src/task/Task.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/util/src/vector/Tensor.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Initial plan * Update packages/ai/src/task/VectorQuantizeTask.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/util/src/vector/VectorUtils.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Optimize quantizeToUint8 and quantizeToUint16 with single-pass min/max Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Remove unused query variable from InputResolver test (#161) * Initial plan * Remove unused query variable from InputResolver test Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Fix edge case: return non-zero range for empty arrays in findMinMax Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Fix markdown auto-detection to use header pattern matching (#157) * Initial plan * Improve markdown auto-detection with robust pattern matching Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Remove unused task class imports from ChunkToVector.test.ts (#160) * Initial plan * Remove unused imports ChunkToVectorTask and HierarchicalChunkerTask Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Fix inconsistent vector/tensor terminology in Tensor.ts (#167) * Initial plan * Update Tensor.ts to use consistent "tensor" terminology throughout Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Optimize VectorQuantizeTask min/max calculation for large vectors (#168) * Initial plan * Optimize quantizeToUint8 and quantizeToUint16 to use single loop for min/max Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Add empty array guard to quantization methods Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Replace unsafe type assertions with type-safe field extraction in Document.addVariant (#158) * Initial plan * Use extractConfigFields for type-safe provenance handling Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Add comprehensive tests for type-safe provenance handling Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Add test coverage for VectorSimilarityUtils functions (#165) * Initial plan * Add comprehensive tests for VectorSimilarityUtils Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Extract magic number to named constant in ProvenanceUtils (#159) * Initial plan * Extract magic number 512 to DEFAULT_MAX_TOKENS constant Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Add support for Float16Array in normalize function of VectorUtils.ts * Update packages/ai/src/source/StructuralParser.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix naming inconsistency between Vector type and TensorSchema (#169) * Initial plan * Fix naming inconsistency: rename Vector to Tensor in Tensor.ts Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Enhance jaccardSimilarity function to handle negative values by normalizing inputs to a non-negative range. This includes calculating the global minimum across both vectors and adjusting values accordingly. * [test] Add tests for VectorUtils, covering magnitude, inner product, normalization, and handling of various TypedArray types. Update normalize function to support an additional parameter for Float32Array conversion. * Add circular reference detection to smartClone method (#162) * Initial plan * Add circular reference detection to smartClone method Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Fix circular reference detection to handle shared references correctly Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Refactor TaskEvents to import TaskStatus from TaskTypes and add unit tests for smartClone method - Updated TaskEvents to import TaskStatus from the correct module. - Added comprehensive unit tests for the smartClone method, including cases for circular reference detection and handling various data structures. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> Co-authored-by: Steven Roussey <sroussey@gmail.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Rag (#170) * [feat] New VectorQuantizeTask, updated VectorSimilarityTask * [WIP] rework document * [refactor] Update task input handling and smartClone method for improved input data handling for tests - Replaced structuredClone and JSON methods with a new smartClone function that deep-clones plain objects and arrays while preserving class instances by reference. - quick versions of tasks as functions now pass input to run not the constructor which means no defaults and cloning * [refactor] Removed unnecessary checks for undefined values when copying additional input properties. * [refactor] Enhance tasks with service registry integration - Updated IExecuteContext and IRunConfig to include registry support. - Refactored TaskRunner and TaskGraphRunner to utilize the service registry for improved task execution and model retrieval. - Ensured backward compatibility while enhancing the overall architecture for better service management. - Introduced a service registry to manage model repositories and execution contexts in AiTask. * [feat] Introduce input resolver system for enhanced schema handling - Added a new InputResolver to manage schema-annotated inputs, allowing for automatic resolution of string IDs to their corresponding instances. - Implemented repository and model resolution capabilities, improving task input handling and validation. - Created new schemas for tabular, vector, and document repositories to facilitate input resolution. - Enhanced AiTask and TaskRunner to utilize the input resolver for better integration with service registries. - Added comprehensive tests to ensure the functionality of the input resolver system and its integration with tasks. * [feat] Introduce new AI tasks and enhance document processing capabilities - Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows. - Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution. - Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers. - Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system. * Update packages/ai/src/source/ProvenanceUtils.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/ai/src/source/StructuralParser.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/ai/src/source/ProvenanceUtils.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/ai/src/task/ChunkToVectorTask.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/ai/src/task/QueryExpanderTask.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/task-graph/src/task/Task.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/util/src/vector/Tensor.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Initial plan * Update packages/ai/src/task/VectorQuantizeTask.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update packages/util/src/vector/VectorUtils.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Optimize quantizeToUint8 and quantizeToUint16 with single-pass min/max Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Remove unused query variable from InputResolver test (#161) * Initial plan * Remove unused query variable from InputResolver test Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Fix edge case: return non-zero range for empty arrays in findMinMax Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Fix markdown auto-detection to use header pattern matching (#157) * Initial plan * Improve markdown auto-detection with robust pattern matching Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Remove unused task class imports from ChunkToVector.test.ts (#160) * Initial plan * Remove unused imports ChunkToVectorTask and HierarchicalChunkerTask Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Fix inconsistent vector/tensor terminology in Tensor.ts (#167) * Initial plan * Update Tensor.ts to use consistent "tensor" terminology throughout Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Optimize VectorQuantizeTask min/max calculation for large vectors (#168) * Initial plan * Optimize quantizeToUint8 and quantizeToUint16 to use single loop for min/max Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Add empty array guard to quantization methods Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Replace unsafe type assertions with type-safe field extraction in Document.addVariant (#158) * Initial plan * Use extractConfigFields for type-safe provenance handling Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Add comprehensive tests for type-safe provenance handling Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Add test coverage for VectorSimilarityUtils functions (#165) * Initial plan * Add comprehensive tests for VectorSimilarityUtils Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Extract magic number to named constant in ProvenanceUtils (#159) * Initial plan * Extract magic number 512 to DEFAULT_MAX_TOKENS constant Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Add support for Float16Array in normalize function of VectorUtils.ts * Update packages/ai/src/source/StructuralParser.ts Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Fix naming inconsistency between Vector type and TensorSchema (#169) * Initial plan * Fix naming inconsistency: rename Vector to Tensor in Tensor.ts Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Enhance jaccardSimilarity function to handle negative values by normalizing inputs to a non-negative range. This includes calculating the global minimum across both vectors and adjusting values accordingly. * [test] Add tests for VectorUtils, covering magnitude, inner product, normalization, and handling of various TypedArray types. Update normalize function to support an additional parameter for Float32Array conversion. * Add circular reference detection to smartClone method (#162) * Initial plan * Add circular reference detection to smartClone method Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Fix circular reference detection to handle shared references correctly Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Refactor TaskEvents to import TaskStatus from TaskTypes and add unit tests for smartClone method - Updated TaskEvents to import TaskStatus from the correct module. - Added comprehensive unit tests for the smartClone method, including cases for circular reference detection and handling various data structures. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> Co-authored-by: Steven Roussey <sroussey@gmail.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com> * Enhance StructuralParser to include title in document nodes * Enhance DocumentSchema to include title field and update required properties * [refactor] Document and InputResolver modules - Removed re-export of schemas and types from Document.ts for cleaner module structure. - Enhanced AiTask's getDefaultQueueName method to handle single model inputs and throw an error for multiple models. - Cleaned up InputResolver by removing unnecessary re-exports, streamlining the module for better clarity. - Added comprehensive tests for Document functionality, ensuring robust handling of variants and provenance. * Refactor Document class to use optional chaining and nullish coalescing in getChunks method for improved safety. Update README to clarify vector metadata structure and enrich metadata fields for hierarchical documents. * Refactor DocumentEnricherTask to utilize ModelConfig for summary and NER model parameters, enhancing type safety and clarity in method signatures. * Refactor provenance handling to support array structure - Updated the Provenance type to be an array of ProvenanceItem, allowing for multiple provenance entries. - Modified extractConfigFields and related functions to handle provenance as an array, enhancing type safety and flexibility. - Adjusted Document and task classes to utilize the new provenance structure, ensuring consistent handling across the codebase. - Updated tests to reflect changes in provenance structure and validate functionality. * Refactor ModelRegistry and InputResolver for improved type handling of model arrays - Updated setGlobalModelRepository parameter name for clarity. - Enhanced resolveModelFromRegistry to support both single and array of model IDs. - Modified resolveSchemaInputs to handle string values and arrays of strings more effectively, ensuring proper resolution of inputs. * [refactor] Remove ArrayTask from between JobQueueTask and Task. Refactor AI task schemas to simplify model handling - Updated various AI task schemas to replace array-based model definitions with single model references, enhancing clarity and type safety. - Adjusted input schemas for tasks such as BackgroundRemovalTask, ImageClassificationTask, and others to reflect these changes. - Removed unnecessary type handling for model arrays in AiTask and AiVisionTask classes, streamlining the codebase. - Enhanced the GraphAsTask and JobQueueTask classes to support the new model structure, ensuring compatibility across the task framework. * [refator] Remove Provenance from task and task graph - Removed the Provenance type and related handling from various classes, including Task, TaskRunner, and Dataflow, to streamline the codebase. - Updated Document and HierarchicalChunkerTask to directly use VariantProvenance, enhancing clarity and type safety. - Adjusted method signatures and removed unused provenance-related methods across the task graph framework. - Updated tests to reflect changes in provenance structure and validate functionality. * [refactor] Simplify Document handling by removing Provenance and variants - Removed Provenance-related functionality from the Document class, including the handling of variants and associated methods. - Updated Document methods to manage chunks directly, enhancing clarity and reducing complexity. - Adjusted related schemas and tests to reflect the removal of Provenance and the shift to a chunk-based structure. - Ensured compatibility across the codebase by updating references and method signatures accordingly. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

* Rag (#170) * [feat] New VectorQuantizeTask, updated VectorSimilarityTask * [WIP] rework document * [refactor] Update task input handling and smartClone method for improved input data handling for tests - Replaced structuredClone and JSON methods with a new smartClone function that deep-clones plain objects and arrays while preserving class instances by reference. - quick versions of tasks as functions now pass input to run not the constructor which means no defaults and cloning * [refactor] Removed unnecessary checks for undefined values when copying additional input properties. * [refactor] Enhance tasks with service registry integration - Updated IExecuteContext and IRunConfig to include registry support. - Refactored TaskRunner and TaskGraphRunner to utilize the service registry for improved task execution and model retrieval. - Ensured backward compatibility while enhancing the overall architecture for better service management. - Introduced a service registry to manage model repositories and execution contexts in AiTask. * [feat] Introduce input resolver system for enhanced schema handling - Added a new InputResolver to manage schema-annotated inputs, allowing for automatic resolution of string IDs to their corresponding instances. - Implemented repository and model resolution capabilities, improving task input handling and validation. - Created new schemas for tabular, vector, and document repositories to facilitate input resolution. - Enhanced AiTask and TaskRunner to utilize the input resolver for better integration with service registries. - Added comprehensive tests to ensure the functionality of the input resolver system and its integration with tasks. * [feat] Introduce new AI tasks and enhance document processing capabilities - Added several new tasks including ChunkToVectorTask, ContextBuilderTask, DocumentEnricherTask, HierarchicalChunkerTask, and others to support advanced document processing workflows. - Enhanced the input handling for tasks to streamline the integration with the service registry and improve task execution. - Updated the documentation to reflect the new tasks and their functionalities, ensuring clarity for developers. - Implemented comprehensive tests for the new tasks to validate their behavior and integration within the workflow system. * Update packages/ai/src/task/QueryExpanderTask.ts * Update packages/task-graph/src/task/Task.ts * Update packages/util/src/vector/Tensor.ts * Update packages/ai/src/task/VectorQuantizeTask.ts * Update packages/util/src/vector/VectorUtils.ts * Optimize quantizeToUint8 and quantizeToUint16 with single-pass min/max * Remove unused query variable from InputResolver test (#161) * Fix edge case: return non-zero range for empty arrays in findMinMax * Fix markdown auto-detection to use header pattern matching (#157) * Improve markdown auto-detection with robust pattern matching * Remove unused task class imports from ChunkToVector.test.ts (#160) * Remove unused imports ChunkToVectorTask and HierarchicalChunkerTask * Fix inconsistent vector/tensor terminology in Tensor.ts (#167) * Update Tensor.ts to use consistent "tensor" terminology throughout * Optimize VectorQuantizeTask min/max calculation for large vectors (#168) * Optimize quantizeToUint8 and quantizeToUint16 to use single loop for min/max * Add empty array guard to quantization methods * Replace unsafe type assertions with type-safe field extraction in Document.addVariant (#158) * Use extractConfigFields for type-safe provenance handling * Add comprehensive tests for type-safe provenance handling * Add test coverage for VectorSimilarityUtils functions (#165) * Add comprehensive tests for VectorSimilarityUtils * Extract magic number to named constant in ProvenanceUtils (#159) * Extract magic number 512 to DEFAULT_MAX_TOKENS constant * Add support for Float16Array in normalize function of VectorUtils.ts * Fix naming inconsistency between Vector type and TensorSchema (#169) * Fix naming inconsistency: rename Vector to Tensor in Tensor.ts * Enhance jaccardSimilarity function to handle negative values by normalizing inputs to a non-negative range. This includes calculating the global minimum across both vectors and adjusting values accordingly. * [test] Add tests for VectorUtils, covering magnitude, inner product, normalization, and handling of various TypedArray types. Update normalize function to support an additional parameter for Float32Array conversion. * Add circular reference detection to smartClone method (#162) * Fix circular reference detection to handle shared references correctly * Refactor TaskEvents to import TaskStatus from TaskTypes and add unit tests for smartClone method * Enhance StructuralParser to include title in document nodes * Enhance DocumentSchema to include title field and update required properties * [refactor] Document and InputResolver modules - Removed re-export of schemas and types from Document.ts for cleaner module structure. - Enhanced AiTask's getDefaultQueueName method to handle single model inputs and throw an error for multiple models. - Cleaned up InputResolver by removing unnecessary re-exports, streamlining the module for better clarity. - Added comprehensive tests for Document functionality, ensuring robust handling of variants and provenance. * Refactor Document class to use optional chaining and nullish coalescing in getChunks method for improved safety. Update README to clarify vector metadata structure and enrich metadata fields for hierarchical documents. * Refactor DocumentEnricherTask to utilize ModelConfig for summary and NER model parameters, enhancing type safety and clarity in method signatures. * Refactor provenance handling to support array structure - Updated the Provenance type to be an array of ProvenanceItem, allowing for multiple provenance entries. - Modified extractConfigFields and related functions to handle provenance as an array, enhancing type safety and flexibility. - Adjusted Document and task classes to utilize the new provenance structure, ensuring consistent handling across the codebase. - Updated tests to reflect changes in provenance structure and validate functionality. * Refactor ModelRegistry and InputResolver for improved type handling of model arrays - Updated setGlobalModelRepository parameter name for clarity. - Enhanced resolveModelFromRegistry to support both single and array of model IDs. - Modified resolveSchemaInputs to handle string values and arrays of strings more effectively, ensuring proper resolution of inputs. * [refactor] Remove ArrayTask from between JobQueueTask and Task. Refactor AI task schemas to simplify model handling - Updated various AI task schemas to replace array-based model definitions with single model references, enhancing clarity and type safety. - Adjusted input schemas for tasks such as BackgroundRemovalTask, ImageClassificationTask, and others to reflect these changes. - Removed unnecessary type handling for model arrays in AiTask and AiVisionTask classes, streamlining the codebase. - Enhanced the GraphAsTask and JobQueueTask classes to support the new model structure, ensuring compatibility across the task framework. * [refator] Remove Provenance from task and task graph - Removed the Provenance type and related handling from various classes, including Task, TaskRunner, and Dataflow, to streamline the codebase. - Updated Document and HierarchicalChunkerTask to directly use VariantProvenance, enhancing clarity and type safety. - Adjusted method signatures and removed unused provenance-related methods across the task graph framework. - Updated tests to reflect changes in provenance structure and validate functionality. * [refactor] Simplify Document handling by removing Provenance and variants - Removed Provenance-related functionality from the Document class, including the handling of variants and associated methods. - Updated Document methods to manage chunks directly, enhancing clarity and reducing complexity. - Adjusted related schemas and tests to reflect the removal of Provenance and the shift to a chunk-based structure. - Ensured compatibility across the codebase by updating references and method signatures accordingly.

Initial plan

a8743e9

Copilot AI assigned Copilot and sroussey Jan 4, 2026

Copilot AI mentioned this pull request Jan 4, 2026

[feat] Repo registries and RAG workflows #154

Draft

Copilot started work on behalf of sroussey January 4, 2026 01:00 View session

Add comprehensive tests for VectorSimilarityUtils

b0119af

Co-authored-by: sroussey <127349+sroussey@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Update changes based on feedback for VectorQuantizeTask~~ Add test coverage for VectorSimilarityUtils functions Jan 4, 2026

Copilot AI requested a review from sroussey January 4, 2026 01:11

Copilot finished work on behalf of sroussey January 4, 2026 01:11

sroussey approved these changes Jan 4, 2026

View reviewed changes

sroussey marked this pull request as ready for review January 4, 2026 01:19

sroussey merged commit f8611f2 into rag Jan 4, 2026

sroussey mentioned this pull request Jan 4, 2026

Repo registries and RAG #171

Draft

sroussey mentioned this pull request Jan 6, 2026

RAG, no provenance #175

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add test coverage for VectorSimilarityUtils functions #165

Add test coverage for VectorSimilarityUtils functions #165

Uh oh!

Copilot AI commented Jan 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add test coverage for VectorSimilarityUtils functions #165

Add test coverage for VectorSimilarityUtils functions #165

Uh oh!

Conversation

Copilot AI commented Jan 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Example Test Coverage

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 4, 2026 •

edited

Loading