Safe preparation module for splitting long rich-text (Markdown) messages for delivery through transports with message length limits (Telegram, etc.).
Pipeline: markdown → parser → normalized IR → planner → renderer → typed chunks
- Markdown parsing via markdown-it, normalization to a compact IR
- Two rendering modes: rich-html (safe subset:
<b>,<i>,<code>,<pre>,<a>; code blocks preserve language info viaclass="language-*") and plain-text - 5-level strategy escalation:
preserve → split-blocks → split-blocks-soft → plain-text → forced-plain-text - Greedy packing (maximal prefix per chunk)
- Budget is checked against the final rendered content (
content.length); inrich-htmlthis includes HTML escaping/markup overhead - Unicode-safe splitting (never breaks surrogate pairs)
replanTail()for replanning undelivered tail after transport reject- Deterministic: same input + same transport profile = same plan
- No network requests, no transport SDK dependency
npm install message-chunkerRequires Node.js >= 18.
This package is ESM-only. Use import / export, not CommonJS require().
import { planDelivery } from 'message-chunker';
const plan = planDelivery({
markdown: '# Hello\n\nThis is a **long** message...',
preferredMode: 'auto', // 'auto' | 'rich-html' | 'plain-text'
strategy: 'preserve', // starting strategy
transport: {
maxTextLength: 4096,
safeTextBudget: 3600,
supportsPlainText: true,
supportsMultipartPlainText: true,
supportsRichHtml: true,
countMethod: 'string-length',
},
});
for (const chunk of plan.chunks) {
console.log(`[${chunk.index + 1}/${chunk.total}] (${chunk.mode})`);
console.log(chunk.content);
}
console.log('Strategy used:', plan.diagnostics.usedStrategy);
console.log('Mode used:', plan.diagnostics.usedMode);
console.log('Had forced split:', plan.diagnostics.hadForcedSplit);Within a single DeliveryPlan, usedStrategy and usedMode apply to the whole plan. Mixed rich-html/plain-text delivery is possible only across separate plans, for example: original plan in rich-html + replanned tail in plain-text.
import { planDelivery, replanTail, nextStrategy } from 'message-chunker';
const markdown = '...';
const transport = { /* ... */ };
const plan = planDelivery({ markdown, preferredMode: 'auto', strategy: 'preserve', transport });
// Send chunks sequentially...
// If chunk i is rejected by the transport:
const tail = replanTail({
markdown,
previousPlan: plan,
failedChunkIndex: 2, // chunk 2 failed
preferredMode: 'auto',
nextStrategy: nextStrategy(plan.diagnostics.usedStrategy) || 'forced-plain-text',
transport,
rejectReason: 'too-long', // 'too-long' | 'invalid-markup'
});
// tail.chunks has fresh indices 0..M-1
// Chunks 0..1 from the original plan are considered deliveredreplanTail() returns a new, separate plan for the undelivered tail. It may use a different usedStrategy and usedMode from the original plan. This is how mixed-format delivery is supported when, for example, the original rich-html chunk is rejected as invalid-markup.
Build a delivery plan for a Markdown message.
PlanRequest:
| Field | Type | Description |
|---|---|---|
markdown |
string |
Source Markdown text |
preferredMode |
'auto' | 'rich-html' | 'plain-text' |
Rendering mode preference |
strategy |
SplitStrategy |
Starting strategy |
transport |
TransportProfile |
Transport capabilities |
DeliveryPlan:
| Field | Type | Description |
|---|---|---|
chunks |
PlannedChunk[] |
Ordered chunks ready for delivery |
diagnostics |
PlanDiagnostics |
Detailed diagnostic information |
| Field | Type | Description |
|---|---|---|
sourceLength |
number |
Source markdown length |
plainTextLengthEstimate |
number |
Plain-text length estimate after normalization/rendering |
normalizedBlockCount |
number |
Number of top-level blocks after normalization |
chunkCount |
number |
Number of chunks in the plan |
requestedStrategy |
SplitStrategy |
Strategy requested by the caller |
usedStrategy |
SplitStrategy |
First strategy that produced a valid plan |
requestedMode |
'auto' | 'rich-html' | 'plain-text' |
Mode preference requested by the caller |
usedMode |
'rich-html' | 'plain-text' |
Rendering mode actually used for this plan |
hadDegradation |
boolean |
true if strategy/mode had to degrade or unsupported markdown was simplified |
degradedToPlainText |
boolean |
true if planning ended up in plain-text after a non-plain-text preference |
hadForcedSplit |
boolean |
true if at least one actual chunk boundary in this plan used forced Unicode-safe split |
splitBlockTypes |
string[] |
Unique block types that actually had to be split |
Replan the undelivered tail after a transport reject.
| Field | Type | Description |
|---|---|---|
index |
number |
0-based index in the plan |
total |
number |
Total number of chunks |
mode |
'rich-html' | 'plain-text' |
Rendering mode used |
content |
string |
Rendered chunk content |
estimatedLength |
number |
content.length |
sourceRange |
SourceRange |
Opaque reference into normalized IR |
| Field | Type | Description |
|---|---|---|
maxTextLength |
number |
Hard transport limit |
safeTextBudget |
number |
Safe budget (>= 200, must not exceed maxTextLength), checked against the final rendered chunk content |
supportsPlainText |
boolean |
Transport accepts plain text |
supportsMultipartPlainText |
boolean |
Transport accepts multiple plain-text messages |
supportsRichHtml |
boolean |
Transport accepts rich HTML |
countMethod |
'string-length' |
Length counting method |
STRATEGY_LADDER— array of all strategies in escalation ordernextStrategy(strategy)— returns the next more aggressive strategy, ornullisAtLeastAsAggressive(a, b)— compares two strategiesvalidateTransportProfile(tp)— throws on invalid profile
| Strategy | Description |
|---|---|
preserve |
Keep as single chunk if it fits |
split-blocks |
Split at block boundaries (paragraphs, headings, etc.) |
split-blocks-soft |
Split within blocks (sentences, punctuation) |
plain-text |
Same as split-blocks-soft but in plain-text mode |
forced-plain-text |
Last resort: split at \n\n → \n → whitespace → Unicode-safe forced cut |
- Splitting is based on the maximal prefix that fits the budget, not on a balanced split.
- For
rich-html, fit is checked after the final render/escape step, so HTML overhead can move the split point left compared with plain text. - Within that fitting prefix, the planner prefers softer boundaries according to the current block rule.
- Forced Unicode-safe split is used only when no softer allowed boundary exists inside the fitting prefix.
The library provides replanning tools but does not hardcode the retry policy — that is the caller's responsibility.
Typical caller reaction: lower the budget, raise the strategy, or both.
// Transport rejected chunk 1 as too long → lower budget and escalate strategy
const tail = replanTail({
markdown,
previousPlan: plan,
failedChunkIndex: 1,
preferredMode: 'auto',
nextStrategy: nextStrategy(plan.diagnostics.usedStrategy) || 'forced-plain-text',
transport: { ...transport, safeTextBudget: transport.safeTextBudget - 400 },
rejectReason: 'too-long',
});Typical caller reaction: switch to plain-text mode for the remaining tail.
// Transport rejected rich-html chunk → replan tail as plain-text
const tail = replanTail({
markdown,
previousPlan: plan,
failedChunkIndex: 2,
preferredMode: 'plain-text',
nextStrategy: plan.diagnostics.usedStrategy,
transport,
rejectReason: 'invalid-markup',
});This may produce a mixed final delivery:
- already delivered prefix stays
rich-html; - replanned tail goes as
plain-text.
That mixed result is expected and supported.
These are not module-level reject reasons. The integration layer must decide whether to retry sending, abort, or map the error to too-long / invalid-markup before calling replanTail().
- Underscore emphasis (
_text_,__text__) is intentionally not supported — treated as literal text - Tables are not supported — they fall through as text paragraphs
- Images become text:
alt (src) - Raw HTML is escaped in rich-html mode, kept literal in plain-text
- Unicode splitting is surrogate-pair safe but not grapheme-cluster safe (ZWJ sequences may be split)
npm testnpm install
npm test
npm run pack:checkMIT