Skip to content

Commit 562d33d

Browse files
committed
Remove UTF-8 byte[] APIs
- Drop byte[] overloads across Tomlyn (lexer/parser/syntax/reader/serializer) to avoid implying native UTF-8 byte support - Keep Stream overloads for convenient file IO (UTF-8 only) and clarify behavior in docs - Remove Encoding.UTF8.GetString usage and obsolete UTF-8 decode helpers
1 parent 3a8deb2 commit 562d33d

19 files changed

+210
-734
lines changed

site/docs/getting-started.md

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ var toml = TomlSerializer.Serialize(new ServerConfig { Host = "example.com", Por
6666
## Streams (UTF-8)
6767

6868
> [!TIP]
69-
> Prefer `Stream` or `byte[]` overloads for files - they avoid allocating an intermediate `string` and read UTF-8 directly.
69+
> `Stream` overloads are for convenience. Tomlyn reads the entire stream into memory before parsing (see [Performance](performance.md)).
7070
7171
Tomlyn provides `Stream` and `TextReader` overloads to avoid manual `StreamReader`/`StreamWriter` boilerplate:
7272

@@ -83,13 +83,6 @@ using var output = File.Create("config_out.toml");
8383
TomlSerializer.Serialize(output, config);
8484
```
8585

86-
You can also deserialize from `byte[]` (UTF-8):
87-
88-
```csharp
89-
byte[] utf8Bytes = File.ReadAllBytes("config.toml");
90-
var config = TomlSerializer.Deserialize<ServerConfig>(utf8Bytes)!;
91-
```
92-
9386
## Configure options
9487

9588
[`TomlSerializerOptions`](xref:Tomlyn.TomlSerializerOptions) is an immutable `sealed record` - create it once and reuse it:

site/docs/low-level.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -125,16 +125,12 @@ var parser = TomlParser.Create(toml, parserOptions);
125125

126126
### Input sources
127127

128-
[`TomlParser.Create(...)`](xref:Tomlyn.Parsing.TomlParser) accepts `string`, `TextReader`, and `byte[]` (UTF-8):
128+
[`TomlParser.Create(...)`](xref:Tomlyn.Parsing.TomlParser) accepts `string` and `TextReader`:
129129

130130
```csharp
131131
// From a file stream
132132
using var reader = new StreamReader("config.toml");
133133
var parser = TomlParser.Create(reader);
134-
135-
// From UTF-8 bytes
136-
byte[] utf8 = File.ReadAllBytes("config.toml");
137-
var parser2 = TomlParser.Create(utf8);
138134
```
139135

140136
## SyntaxParser (full-fidelity syntax tree)
@@ -168,7 +164,7 @@ Console.WriteLine(doc.ToString());
168164
| [`SyntaxParser.Parse(...)`](xref:Tomlyn.Parsing.SyntaxParser) | Tolerant - collects errors in `doc.Diagnostics`, always returns a tree. |
169165
| [`SyntaxParser.ParseStrict(...)`](xref:Tomlyn.Parsing.SyntaxParser) | Strict - throws [`TomlException`](xref:Tomlyn.TomlException) on the first error. |
170166

171-
Both accept `string` and [`TomlLexer`](xref:Tomlyn.Parsing.TomlLexer) inputs. `Parse` also accepts `TextReader` and `byte[]`.
167+
Both accept `string` and [`TomlLexer`](xref:Tomlyn.Parsing.TomlLexer) inputs. `Parse` also accepts `TextReader`.
172168

173169
### Syntax tree structure
174170

site/docs/performance.md

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: Performance
33
---
44

5-
Tomlyn is designed for high throughput and low allocations.
5+
Tomlyn is designed for high throughput and low allocations for **configuration-style** TOML workloads.
66

77
## Architecture
88

@@ -13,20 +13,29 @@ Tomlyn is designed for high throughput and low allocations.
1313
| [`TomlSerializer`](xref:Tomlyn.TomlSerializer) | Reads directly from the parser stream when possible; source-generated metadata avoids reflection. |
1414
| [`SyntaxParser`](xref:Tomlyn.Parsing.SyntaxParser) | Allocates tree nodes - use only when full-fidelity round-tripping is needed. |
1515

16+
## What “high-performance” means for Tomlyn
17+
18+
Tomlyn is optimized around the common TOML use-case: **configuration files** that can be loaded in memory.
19+
Tomlyn is not a streaming parser and it does not aim to match the design goals of `System.Text.Json` for high-throughput web API scenarios.
20+
21+
Internally, Tomlyn parses a **UTF-16 `ReadOnlyMemory<char>`** view of the TOML payload.
22+
This means that:
23+
24+
- Passing TOML as `string` is the most direct path.
25+
- When you use `Stream` or `TextReader`, Tomlyn reads the entire payload into memory before parsing.
26+
1627
## Practical tips
1728

1829
| Tip | Why |
1930
| --- | --- |
2031
| **Cache [`TomlSerializerOptions`](xref:Tomlyn.TomlSerializerOptions)** | It is an immutable `sealed record`. Creating it once avoids repeated metadata resolution. |
2132
| **Prefer source generation** | [`TomlSerializerContext`](xref:Tomlyn.Serialization.TomlSerializerContext) / [`TomlTypeInfo<T>`](xref:Tomlyn.TomlTypeInfo`1) avoids reflection, reduces startup, and is required for NativeAOT. |
22-
| **Use `Stream` overloads** | [`TomlSerializer.Deserialize<T>(Stream)`](xref:Tomlyn.TomlSerializer) reads UTF-8 directly, avoiding a full-string allocation. |
23-
| **Use `byte[]` overloads** | [`TomlSerializer.Deserialize<T>(byte[])`](xref:Tomlyn.TomlSerializer) avoids the `string``byte[]` conversion for pre-loaded data. |
33+
| **Use `Stream`/`TextReader` for convenience** | Tomlyn reads the entire content into memory before parsing; use these overloads to integrate with existing IO code. |
2434
| **Set `SourceName` once** | Attach it to cached options, not per-call - it doesn't affect parsing, only exception messages. |
2535
| **Avoid [`SyntaxParser`](xref:Tomlyn.Parsing.SyntaxParser) for mapping** | [`SyntaxParser`](xref:Tomlyn.Parsing.SyntaxParser) builds a full tree. If you only need .NET objects, use [`TomlSerializer`](xref:Tomlyn.TomlSerializer) directly. |
2636

2737
> [!TIP]
28-
> For the best performance, combine source generation with `Stream` or `byte[]` overloads.
29-
> This gives you zero-reflection, zero-intermediate-string deserialization.
38+
> For the best performance, combine source generation with a cached `TomlSerializerContext` and cached `TomlSerializerOptions`.
3039
3140
## When to use each API
3241

site/docs/serialization.md

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,17 +19,13 @@ var toml = TomlSerializer.Serialize(new Person("Ada", 37));
1919
var person = TomlSerializer.Deserialize<Person>(toml)!;
2020
```
2121

22-
[`TomlSerializer`](xref:Tomlyn.TomlSerializer) provides overloads for `string`, `byte[]` (UTF-8), `Stream`, and `TextReader`/`TextWriter`:
22+
[`TomlSerializer`](xref:Tomlyn.TomlSerializer) provides overloads for `string`, `Stream`, and `TextReader`/`TextWriter`:
2323

2424
```csharp
25-
// From/to a file stream (UTF-8)
25+
// From a file stream (UTF-8)
2626
using var stream = File.OpenRead("config.toml");
2727
var config = TomlSerializer.Deserialize<MyConfig>(stream);
2828

29-
// From a UTF-8 byte array
30-
byte[] utf8 = File.ReadAllBytes("config.toml");
31-
var config2 = TomlSerializer.Deserialize<MyConfig>(utf8);
32-
3329
// To a TextWriter
3430
using var writer = new StreamWriter("output.toml");
3531
TomlSerializer.Serialize(writer, config);
@@ -576,5 +572,5 @@ if (!TomlSerializer.TryDeserialize<MyConfig>(toml, out var config))
576572
}
577573
```
578574

579-
`TryDeserialize` is available for all input types (`string`, `byte[]`, `Stream`, `TextReader`)
575+
`TryDeserialize` is available for all input types (`string`, `Stream`, `TextReader`)
580576
and with all metadata styles (options, context, type info).

src/Tomlyn.Tests/NewApiExceptionLocationTests.cs

Lines changed: 0 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -41,29 +41,6 @@ public void Parse_SyntaxError_IncludesLocation()
4141
Assert.That(ex.Message, Does.Contain("test.toml("));
4242
}
4343

44-
[Test]
45-
public void Parse_InvalidUtf8_IncludesByteOffset()
46-
{
47-
// "a = " followed by an invalid UTF-8 sequence: 0xC3 0x28
48-
var bytes = new byte[] { (byte)'a', (byte)' ', (byte)'=', (byte)' ', 0xC3, 0x28, (byte)'\n' };
49-
var options = new TomlSerializerOptions { SourceName = "utf8.toml" };
50-
51-
var parser = TomlParser.Create(bytes, options);
52-
var ex = Assert.Throws<TomlException>(() =>
53-
{
54-
while (parser.MoveNext())
55-
{
56-
}
57-
});
58-
59-
Assert.That(ex, Is.Not.Null);
60-
Assert.That(ex!.Span.HasValue, Is.True);
61-
Assert.That(ex.Offset, Is.EqualTo(4));
62-
Assert.That(ex.Line, Is.EqualTo(1));
63-
Assert.That(ex.Column, Is.EqualTo(5));
64-
Assert.That(ex.Message, Does.Contain("utf8.toml("));
65-
}
66-
6744
[Test]
6845
public void Parse_InvalidKeyHexEscape_IncludesLocation()
6946
{

src/Tomlyn.Tests/NewApiMetadataStreamOverloadsTests.cs

Lines changed: 9 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
using System.IO;
2+
using System.Text;
23
using NUnit.Framework;
34
using Tomlyn.Model;
45
using Tomlyn.Serialization;
@@ -38,27 +39,21 @@ public void Serialize_Stream_ObjectTypeInfo_WritesToml()
3839
}
3940

4041
[Test]
41-
public void Deserialize_Utf8Bytes_ObjectTypeInfo_UsesByteOffsets()
42+
public void Deserialize_Stream_ObjectTypeInfo_Works()
4243
{
4344
var context = new BuiltInContext();
4445
var typeInfo = context.GetTypeInfo(typeof(TomlTable), context.Options);
4546
Assert.NotNull(typeInfo);
4647

47-
var bytes = new byte[]
48+
using var stream = new MemoryStream();
49+
using (var writer = new StreamWriter(stream, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), bufferSize: 1024, leaveOpen: true))
4850
{
49-
(byte)'a',
50-
(byte)' ',
51-
(byte)'=',
52-
(byte)' ',
53-
(byte)'"',
54-
0xC3,
55-
0x28,
56-
(byte)'"',
57-
};
51+
writer.Write("a = 1\n");
52+
}
5853

59-
var ex = Assert.Throws<TomlException>(() => TomlSerializer.Deserialize(bytes, typeInfo!));
60-
Assert.NotNull(ex);
61-
Assert.AreEqual(5, ex!.Offset);
54+
stream.Position = 0;
55+
var table = (TomlTable)TomlSerializer.Deserialize(stream, typeInfo!)!;
56+
Assert.AreEqual(1L, (long)table["a"]);
6257
}
6358

6459
[Test]

src/Tomlyn.Tests/NewApiSerializerOverloadTests.cs

Lines changed: 31 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,17 @@ public void Deserialize_String_Type_WithContext_Works()
3737
}
3838

3939
[Test]
40-
public void Deserialize_Utf8Bytes_WithContext_Works()
40+
public void Deserialize_Stream_WithContext_Works()
4141
{
4242
var context = TestTomlSerializerContext.Default;
43-
var bytes = Encoding.UTF8.GetBytes(SampleToml);
44-
var person = TomlSerializer.Deserialize<GeneratedPerson>(bytes, context);
43+
using var stream = new MemoryStream();
44+
using (var writer = new StreamWriter(stream, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), bufferSize: 1024, leaveOpen: true))
45+
{
46+
writer.Write(SampleToml);
47+
}
48+
49+
stream.Position = 0;
50+
var person = TomlSerializer.Deserialize<GeneratedPerson>(stream, context);
4551

4652
Assert.That(person, Is.Not.Null);
4753
Assert.That(person!.Name, Is.EqualTo("Ada"));
@@ -75,7 +81,13 @@ public void Deserialize_TextReader_Type_WithContext_Works()
7581
public void Deserialize_Stream_WithTypeInfo_Works()
7682
{
7783
var context = TestTomlSerializerContext.Default;
78-
using var stream = new MemoryStream(Encoding.UTF8.GetBytes(SampleToml));
84+
using var stream = new MemoryStream();
85+
using (var writer = new StreamWriter(stream, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), bufferSize: 1024, leaveOpen: true))
86+
{
87+
writer.Write(SampleToml);
88+
}
89+
90+
stream.Position = 0;
7991
var person = TomlSerializer.Deserialize(stream, context.GeneratedPerson);
8092

8193
Assert.That(person, Is.Not.Null);
@@ -124,12 +136,17 @@ public void TryDeserialize_String_WithContext_ReturnsFalseOnFailure()
124136
}
125137

126138
[Test]
127-
public void TryDeserialize_Utf8Bytes_WithContext_ReturnsFalseOnFailure()
139+
public void TryDeserialize_Stream_WithContext_ReturnsFalseOnFailure()
128140
{
129141
var context = TestTomlSerializerContext.Default;
130-
var bytes = Encoding.UTF8.GetBytes("name = \"Ada\"\nage = \"not-a-number\"\n");
142+
using var stream = new MemoryStream();
143+
using (var writer = new StreamWriter(stream, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), bufferSize: 1024, leaveOpen: true))
144+
{
145+
writer.Write("name = \"Ada\"\nage = \"not-a-number\"\n");
146+
}
131147

132-
var ok = TomlSerializer.TryDeserialize<GeneratedPerson>(bytes, context, out var value);
148+
stream.Position = 0;
149+
var ok = TomlSerializer.TryDeserialize<GeneratedPerson>(stream, context, out var value);
133150

134151
Assert.That(ok, Is.False);
135152
Assert.That(value, Is.Null);
@@ -151,7 +168,13 @@ public void TryDeserialize_TextReader_Type_WithContext_ReturnsFalseOnFailure()
151168
public void TryDeserialize_Stream_Type_WithContext_ReturnsFalseOnFailure()
152169
{
153170
var context = TestTomlSerializerContext.Default;
154-
using var stream = new MemoryStream(Encoding.UTF8.GetBytes("name = \"Ada\"\nage = \"not-a-number\"\n"));
171+
using var stream = new MemoryStream();
172+
using (var writer = new StreamWriter(stream, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), bufferSize: 1024, leaveOpen: true))
173+
{
174+
writer.Write("name = \"Ada\"\nage = \"not-a-number\"\n");
175+
}
176+
177+
stream.Position = 0;
155178

156179
var ok = TomlSerializer.TryDeserialize(stream, typeof(GeneratedPerson), context, out var value);
157180

src/Tomlyn.Tests/NewApiUtf8StreamExceptionLocationTests.cs

Lines changed: 18 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
using System.IO;
2+
using System.Text;
13
using NUnit.Framework;
24
using Tomlyn.Model;
35

@@ -6,40 +8,34 @@ namespace Tomlyn.Tests;
68
public sealed class NewApiUtf8StreamExceptionLocationTests
79
{
810
[Test]
9-
public void Deserialize_Utf8Bytes_InvalidUtf8_ThrowsTomlExceptionWithByteOffset()
11+
public void Deserialize_Stream_SyntaxError_ThrowsTomlExceptionWithLocation()
1012
{
11-
// a = "<invalid utf-8>"
12-
var bytes = new byte[]
13+
using var stream = new MemoryStream();
14+
using (var writer = new StreamWriter(stream, new UTF8Encoding(encoderShouldEmitUTF8Identifier: false), bufferSize: 1024, leaveOpen: true))
1315
{
14-
(byte)'a',
15-
(byte)' ',
16-
(byte)'=',
17-
(byte)' ',
18-
(byte)'"',
19-
0xC3, // invalid sequence start (expects continuation byte in 0x80..0xBF, but 0x28 is '(')
20-
0x28,
21-
(byte)'"',
22-
};
16+
writer.Write("a = [\n");
17+
}
2318

24-
var ex = Assert.Throws<TomlException>(() => TomlSerializer.Deserialize<TomlTable>(bytes));
19+
stream.Position = 0;
20+
var ex = Assert.Throws<TomlException>(() => TomlSerializer.Deserialize<TomlTable>(stream));
2521
Assert.NotNull(ex);
2622
Assert.NotNull(ex!.Span);
2723

28-
Assert.AreEqual(5, ex.Offset, "Offset must be the character position of the invalid sequence.");
29-
Assert.AreEqual(1, ex.Line);
30-
Assert.AreEqual(6, ex.Column);
24+
Assert.That(ex.Line, Is.EqualTo(2));
25+
Assert.That(ex.Column, Is.GreaterThan(0));
3126
}
3227

3328
[Test]
34-
public void Deserialize_Utf8Bytes_WithBom_Succeeds()
29+
public void Deserialize_Stream_WithBom_Succeeds()
3530
{
36-
var bytes = new byte[]
31+
using var stream = new MemoryStream();
32+
using (var writer = new StreamWriter(stream, new UTF8Encoding(encoderShouldEmitUTF8Identifier: true), bufferSize: 1024, leaveOpen: true))
3733
{
38-
0xEF, 0xBB, 0xBF, // UTF-8 BOM
39-
(byte)'a', (byte)' ', (byte)'=', (byte)' ', (byte)'1', (byte)'\n',
40-
};
34+
writer.Write("a = 1\n");
35+
}
4136

42-
var table = TomlSerializer.Deserialize<TomlTable>(bytes);
37+
stream.Position = 0;
38+
var table = TomlSerializer.Deserialize<TomlTable>(stream);
4339

4440
Assert.NotNull(table);
4541
Assert.AreEqual(1L, (long)table!["a"]);

src/Tomlyn.Tests/StandardTests.cs

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -105,14 +105,15 @@ private static void ValidateSpec(string type, string inputName, string toml, str
105105
}
106106

107107
{
108-
var docUtf8 = SyntaxParser.Parse(Encoding.UTF8.GetBytes(toml), inputName);
109-
var roundtripUtf8 = docUtf8.ToString();
110-
if (roundtrip != roundtripUtf8)
108+
using var reader = new StringReader(toml);
109+
var docFromReader = SyntaxParser.Parse(reader, inputName);
110+
var roundtripFromReader = docFromReader.ToString();
111+
if (roundtrip != roundtripFromReader)
111112
{
112113
Console.WriteLine($"Testing {inputName}");
113114
Dump(toml, doc, roundtrip);
114115
}
115-
Assert.AreEqual(roundtrip, roundtripUtf8, "The UTF8 version doesn't match with the UTF16 version");
116+
Assert.AreEqual(roundtrip, roundtripFromReader, "The TextReader version doesn't match with the string version");
116117
}
117118
}
118119

@@ -211,6 +212,13 @@ public static IEnumerable ListTomlFiles(string type)
211212

212213
if (type == InvalidSpec)
213214
{
215+
// The toml-test "invalid/encoding" suite validates raw UTF-8 byte-level correctness.
216+
// Tomlyn's test harness feeds TOML as text (string/TextReader), so these cases are not applicable here.
217+
if (normalizedFile.IndexOf("/invalid/encoding/", StringComparison.OrdinalIgnoreCase) >= 0)
218+
{
219+
goto next_file;
220+
}
221+
214222
for (var i = 0; i < Toml11ValidButTomlTestMarksInvalid.Length; i++)
215223
{
216224
if (normalizedFile.EndsWith(Toml11ValidButTomlTestMarksInvalid[i], StringComparison.OrdinalIgnoreCase))
@@ -222,7 +230,7 @@ public static IEnumerable ListTomlFiles(string type)
222230

223231
var functionName = Path.GetFileName(file);
224232

225-
var input = Encoding.UTF8.GetString(File.ReadAllBytes(file));
233+
var input = File.ReadAllText(file);
226234

227235
string? json = null;
228236
if (type == "valid")

0 commit comments

Comments
 (0)