.Net: Bug: TextChunker does not actually split on newlines

**Context**

While debugging some usage on a large git diff we're building summaries of, the text chunker only split output into 2 giant chunks, but we thought it was splitting into lines, and then splitting those lines into paragraphs, but in reality, we're only getting 2 "paragraphs" with newline chars preserved in the input.

**Describe the bug**
The `TextChunker.SplitPlainTextLines` does not actually produce a list that has split on any newline characters as it implies, _if the input token count is less than the maxTokenCount per line passed in_.

This is quite confusing as the name would imply it's going to split on lines. If use this function, to get a list of input strings and then use SplitPlainTextParagraphs, we aren't not really getting what we would expect since our input isn't nearly as split as we might think.

**To Reproduce**

This unit test fails:
```c#
[Theory]
[InlineData("First line\r\nSecond line\r\nThird line")]
[InlineData("First line\nSecond line\nThird line")]
public void ActuallySplitsOnNewLines(string input)
{
    var result = TextChunker.SplitPlainTextLines(input, 10);

    var expected = new[]
    {
        "First line",
        "Second line",
        "Third line"
    };

    Assert.Equal(expected, result); // ❌
}
```

with message:
```
Assert.Equal() Failure: Collections differ
                        ↓ (pos 0)
Expected: string[]     ["First line", "Second line", "Third line"]
Actual:   List<string> ["First line\nSecond line\nThird line"]
                        ↑ (pos 0)
```

**Expected behavior**

I am unsure if this is intentional, but I don't think it is given the prescense of the `\r\n` in the textSplitOptions, which is itself, not a valid line ending.

`TextChunker.SplitPlainTextLines` would produce a list of strings that do split on `\n` characters.

**Platform**
 - Language: [DotNet]



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

.Net: Bug: TextChunker does not actually split on newlines #12556

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

.Net: Bug: TextChunker does not actually split on newlines #12556

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions