Skip to content

fix(chunking): preserve sentence order in NlpSentenceChunking#1913

Open
ntohidi wants to merge 1 commit intodevelopfrom
fix/nlp-sentence-chunking-1909
Open

fix(chunking): preserve sentence order in NlpSentenceChunking#1913
ntohidi wants to merge 1 commit intodevelopfrom
fix/nlp-sentence-chunking-1909

Conversation

@ntohidi
Copy link
Copy Markdown
Collaborator

@ntohidi ntohidi commented Apr 11, 2026

Summary

Test plan

  • Verified sentence order is preserved with 10 ordered sentences
  • Verified duplicate sentences are no longer silently removed
  • Verified deterministic output across multiple runs

Remove broken re-import of load_nltk_punkt (already imported at module level).
Replace list(set(sens)) with plain return — set() destroyed document order
and silently dropped duplicate sentences.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant