Skip to content

async chunk#19

Merged
wangle201210 merged 3 commits intomainfrom
feat/async-chunk
Jun 20, 2025
Merged

async chunk#19
wangle201210 merged 3 commits intomainfrom
feat/async-chunk

Conversation

@wangle201210
Copy link
Copy Markdown
Owner

初始index不生成qa & 提供单独接口进行docs的indexer

@wangle201210 wangle201210 changed the title Feat/async chunk async chunk Jun 19, 2025
@wangle201210 wangle201210 requested a review from Copilot June 19, 2025 14:04
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the indexing pipeline to separate initial document indexing from asynchronous chunk embedding and QA generation, and provides dedicated async indexing interfaces.

  • Extracted EsHit2Document for cleaner retriever result parsing and unified metadata keys.
  • Disabled inline QA in the synchronous indexer graph and added a new async indexer graph.
  • Introduced IndexAsync and indexAsyncByDocsID methods in Rag, plus corresponding tests.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
server/core/retriever/retriever.go Extracted result parser into EsHit2Document and wired embedding
server/core/rag_test.go Added tests for async indexing and sleep to wait for QA tasks
server/core/rag.go Added idxerAsync, IndexAsync, and indexAsyncByDocsID logic
server/core/indexer/qa.go Skip QA generation if qa_content already present
server/core/indexer/orchestration.go Removed synchronous QA node from indexer graph
server/core/indexer/loader.go Changed UseNameAsID from true to false for file loader
server/core/indexer/indexer_async.go New async indexer component
server/core/indexer/indexer.go Inline ID generation, ext-data extraction, disabled QA embedding
server/core/indexer/async.go Added BuildIndexerAsync graph
server/core/common/consts.go Removed duplicate DocExtra; consolidated metadata constants
Comments suppressed due to low confidence (4)

server/core/retriever/retriever.go:32

  • [nitpick] The variable name embeddingIns11 is unclear. Consider renaming to embedding or embedder for readability.
	embeddingIns11, err := common.NewEmbedding(ctx, conf)

server/core/indexer/orchestration.go:16

  • Multiple QA-related lines are commented out. Remove dead code or extract QA toggling into a configuration flag to improve clarity.
		// QA                   = "QA"

server/core/rag_test.go:69

  • This test logs the returned IDs but does not assert their correctness. Add assertions (e.g., len(ids) > 0) to catch regressions.
func TestIndexAsyncByDocsID(t *testing.T) {

server/core/indexer/indexer.go:91

  • The boolean in the map lookup is assigned to e, which can be confused with an error value. Rename it to ok for clarity.
	for _, key := range common.ExtKeys {

Comment thread server/core/rag_test.go
@wangle201210 wangle201210 merged commit 1da96de into main Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants