Skip to content

Newly extracted memories are not searchable via vector search (embedding index not updated) #422

@1960697431

Description

@1960697431

Environment

  • OS: macOS (Darwin)
  • OpenViking version: Latest from main branch
  • Embedding provider: Volcengine doubao-embedding-vision-250615 (multimodal, dimension=1024)
  • VLM provider: kimi-k2.5 via DashScope
  • Config mode: Standalone server (python -m openviking.server.bootstrap)

Problem

Memories extracted via the Session API (/api/v1/sessions/{id}/extract) are written to the filesystem correctly, but cannot be found by vector search (/api/v1/search/find or /api/v1/search/search). Only pre-existing memories (e.g., profile.md) that were indexed before the issue began are searchable.

Reproduction Steps

# 1. Create session
curl -s -X POST http://127.0.0.1:1933/api/v1/sessions -H "Content-Type: application/json" -d '{}'
# Returns session_id

# 2. Add message
curl -s -X POST "http://127.0.0.1:1933/api/v1/sessions/{session_id}/messages" \
  -H "Content-Type: application/json" \
  -d '{"role": "user", "content": "I love Sichuan hotpot, especially the dual-flavor pot. I go every Friday evening."}'

# 3. Extract memories
curl -s -X POST "http://127.0.0.1:1933/api/v1/sessions/{session_id}/extract" \
  -H "Content-Type: application/json" -d '{}'
# Returns extracted memory with URI and abstract ✅

# 4. Wait for embedding queue to process
sleep 15

# 5. Search - FAILS to find the new memory
curl -s -X POST http://127.0.0.1:1933/api/v1/search/find \
  -H "Content-Type: application/json" \
  -d '{"query": "hotpot", "limit": 10, "score_threshold": 0.0, "target_uri": "viking://user/default/memories"}'
# Only returns old profile.md, NOT the newly extracted memory ❌

Observations

Check Result
File written to disk ~/.openviking/data/viking/default/user/default/memories/preferences/mem_*.md exists with correct content
/api/v1/content/read ✅ Returns correct content
Embedding queue ✅ Shows Processed: 1, Errors: 0
VikingDB vector count ✅ Increments (e.g., 22 → 23)
/api/v1/search/find ❌ New memory not returned, only old profile.md
/api/v1/search/search ❌ Same — new memory not returned
abstract field in /api/v1/fs/ls ❌ Empty string "" for all mem_*.md files

Analysis

  • The embedding queue reports successful processing with no errors
  • The vector count in vikingdb increments, suggesting something was embedded
  • However, the new memory is never returned by search, even with score_threshold: 0.0
  • All mem_*.md files have empty abstract fields in directory listings, while directory-level .abstract.md files have content
  • Hypothesis: The embedding queue may be processing the directory-level abstract updates rather than the actual memory file content, or the vector is being stored without proper URI association for search retrieval

Additional Issue: Database Lock Contention

During testing, we also encountered a persistent issue where agfs-server (child process of bootstrap) holds the vectordb LOCK file, causing the bootstrap's internal retry loop to fail repeatedly with:

vikingdb - ERROR - Failed to open data db: IO error: lock .../vectordb/context/store/LOCK: Resource temporarily unavailable
uvicorn.error - ERROR - [Errno 48] error while attempting to bind on address ('127.0.0.1', 1933): [errno 48] address already in use

This creates a cascading failure: bootstrap retries every ~10s, each attempt fails on both the DB lock and port binding, filling logs with errors. The server appears healthy (/health returns {"status":"ok"}) but the vectordb collection context cannot be properly initialized on retry iterations.

Expected Behavior

  1. Memories extracted via Session API should be searchable via /api/v1/search/find and /api/v1/search/search within a reasonable time after extraction
  2. abstract field should be populated for extracted mem_*.md files
  3. Bootstrap retry loop should not conflict with its own child process (agfs-server) over database locks

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions