-
Notifications
You must be signed in to change notification settings - Fork 4k
.Net: Fix TextChunker.SplitPlainTextParagraphs to handle embedded newlines in input strings #12558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
markwallace-microsoft
merged 12 commits into
microsoft:main
from
shethaadit:shethaadit/FixBug12556
Jun 25, 2025
Merged
.Net: Fix TextChunker.SplitPlainTextParagraphs to handle embedded newlines in input strings #12558
markwallace-microsoft
merged 12 commits into
microsoft:main
from
shethaadit:shethaadit/FixBug12556
Jun 25, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Co-authored-by: Kyle Rader <126627085+kyle-rader-msft@users.noreply.github.com>
@shethaadit, thanks for the contribution. Looks like there is a build failure. Can you take a look? |
Hi @westey-m, I just updated the code and seems the build is passing locally. |
westey-m
approved these changes
Jun 25, 2025
markwallace-microsoft
approved these changes
Jun 25, 2025
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Summary
Fixes issue #12556 where
TextChunker.SplitPlainTextParagraphs
does not properly handle embedded newlines in input strings.Problem
The
SplitPlainTextParagraphs
method had two issues:"\n\r"
(LF+CR) which is not a standard line ending format - should be"\r\n"
(CR+LF) for Windows or"\n"
for UnixThis caused the method to process text with embedded newlines as single units instead of handling each line separately.
Solution
s_plaintextSplitOptions
array to use"\n"
as the separator for proper newline recognitionSplitPlainTextParagraphs
to useSelectMany
withSplit('\n')
to handle embedded newlines\r\n
,\r
,\n
) to ensure consistent handlingChanges
s_plaintextSplitOptions
array to use correct newline separatorSplitPlainTextParagraphs
method to split embedded newlines before processingTesting
CanSplitTextParagraphsOnNewlines