feat(backend): enrich RAG sources with legal citations and chunk identifiers#47
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
📌 개요
Pinecone 기반 RAG 파이프라인에서 LLM 응답의 출처(sources)를
파일명 중심 구조에서 법령·조항 기반 구조로 확장하여,
Frontend에서 신뢰도 높은 출처 UI를 구현할 수 있도록 개선합니다.
🔧 주요 변경 사항
sources구조 고도화citation(법령명 · 조항 표기)law_title,law_shortarticle_no,article_title,clause_no,item_nochunk_id(source + doc_sha + chunk_index 기반 재구성)snippet(출처 요약용 텍스트)[1],[2]인용 표기) 유지🎯 변경 목적
🔍 검증 결과
sources필드에 법령 메타데이터 정상 포함chunk_id값이 빈 문자열이 아닌 정상 값으로 생성됨📝 비고
chain_builder레이어로 한정됨