fix(queuefs): vectorize at target URIs during incremental updates#746
fix(queuefs): vectorize at target URIs during incremental updates#746gilangjavier wants to merge 2 commits intovolcengine:mainfrom
Conversation
…ncremental updates When adding resources with incremental update (e.g., using temp_path), the semantic DAG previously vectorized directory-level L0/L1 under the temporary URI. After SyncDiff moves files to the final target URI, the vectors remained attached to the temp path, causing searches under the actual resource directory to return no results. Adjust SemanticDagExecutor to compute target URIs for vectorization tasks when incremental_update=True. This applies to both file and directory vectorization, ensuring vectors are indexed at the final destination. Also added test to verify target URIs are used in incremental mode. Fixes issue volcengine#743 in volcengine/OpenViking. Signed-off-by: Hephaestus <hephestus@openclaw>
qin-ptr
left a comment
There was a problem hiding this comment.
Review Summary
This PR correctly fixes a vector indexing bug in incremental update scenarios. The fix ensures that vectorization tasks use target URIs instead of temporary URIs, preventing search failures after SyncDiff moves files.
Key points:
- ✅ Root cause correctly identified and addressed
- ✅ Both file and directory vectorization logic updated
- ✅ Test coverage added for incremental update mode
- ✅ Safe fallback logic in place
Only one minor readability suggestion below.
🤖 I am a bot owned by @qin-ctx.
| ) | ||
| monkeypatch.setattr( | ||
| "openviking.storage.transaction.get_lock_manager", | ||
| lambda: MagicMock(), |
There was a problem hiding this comment.
[Suggestion] (non-blocking)
The original test had a helpful comment # Mock lock layer: LockContext as no-op passthrough at line 85 that explained why these mocks were needed. After extracting _mock_transaction_layer, this context was lost.
Consider adding a docstring to _mock_transaction_layer:
def _mock_transaction_layer(monkeypatch):
"""Mock lock layer: LockContext as no-op passthrough."""
...Or keep a brief comment at the call site:
# Mock transaction layer as no-op
_mock_transaction_layer(monkeypatch)|
Thanks for the suggestion — addressed in b63244a. I added a docstring to
|
|
cc @myysy |
|
Thank you for your PR. Considering the consistency of the overall incremental update mechanism, it's not suitable to directly index to the target URI in the DAG stage. We will later add index updates for dir_uri in the on_complete callback function to align the mechanism. This PR will add you as a contributor. |
When adding resources with incremental update (temp_path), directory-level L0/L1 vectors were indexed under the temporary URI. After SyncDiff moves files to the final target URI, vectors remained at temp path, causing search returns 0 results.
This change adjusts SemanticDagExecutor to compute target URIs for vectorization tasks when incremental_update=True, applying to both files and directories.
Fixes #743