message-chunker

Safe preparation module for splitting long rich-text (Markdown) messages for delivery through transports with message length limits (Telegram, etc.).

Pipeline: markdown → parser → normalized IR → planner → renderer → typed chunks

Features

Markdown parsing via markdown-it, normalization to a compact IR
Two rendering modes: rich-html (safe subset: <b>, <i>, <code>, <pre>, <a>; code blocks preserve language info via class="language-*") and plain-text
5-level strategy escalation: preserve → split-blocks → split-blocks-soft → plain-text → forced-plain-text
Greedy packing (maximal prefix per chunk)
Budget is checked against the final rendered content (content.length); in rich-html this includes HTML escaping/markup overhead
Unicode-safe splitting (never breaks surrogate pairs)
replanTail() for replanning undelivered tail after transport reject
Deterministic: same input + same transport profile = same plan
No network requests, no transport SDK dependency

Installation

npm install message-chunker

Requires Node.js >= 18.

This package is ESM-only. Use import / export, not CommonJS require().

Quick start

import { planDelivery } from 'message-chunker';

const plan = planDelivery({
    markdown: '# Hello\n\nThis is a **long** message...',
    preferredMode: 'auto',       // 'auto' | 'rich-html' | 'plain-text'
    strategy: 'preserve',         // starting strategy
    transport: {
        maxTextLength: 4096,
        safeTextBudget: 3600,
        supportsPlainText: true,
        supportsMultipartPlainText: true,
        supportsRichHtml: true,
        countMethod: 'string-length',
    },
});

for (const chunk of plan.chunks) {
    console.log(`[${chunk.index + 1}/${chunk.total}] (${chunk.mode})`);
    console.log(chunk.content);
}

console.log('Strategy used:', plan.diagnostics.usedStrategy);
console.log('Mode used:', plan.diagnostics.usedMode);
console.log('Had forced split:', plan.diagnostics.hadForcedSplit);

Within a single DeliveryPlan, usedStrategy and usedMode apply to the whole plan. Mixed rich-html/plain-text delivery is possible only across separate plans, for example: original plan in rich-html + replanned tail in plain-text.

Replanning after reject

import { planDelivery, replanTail, nextStrategy } from 'message-chunker';

const markdown = '...';
const transport = { /* ... */ };

const plan = planDelivery({ markdown, preferredMode: 'auto', strategy: 'preserve', transport });

// Send chunks sequentially...
// If chunk i is rejected by the transport:
const tail = replanTail({
    markdown,
    previousPlan: plan,
    failedChunkIndex: 2,          // chunk 2 failed
    preferredMode: 'auto',
    nextStrategy: nextStrategy(plan.diagnostics.usedStrategy) || 'forced-plain-text',
    transport,
    rejectReason: 'too-long',     // 'too-long' | 'invalid-markup'
});

// tail.chunks has fresh indices 0..M-1
// Chunks 0..1 from the original plan are considered delivered

replanTail() returns a new, separate plan for the undelivered tail. It may use a different usedStrategy and usedMode from the original plan. This is how mixed-format delivery is supported when, for example, the original rich-html chunk is rejected as invalid-markup.

API

`planDelivery(request): DeliveryPlan`

Build a delivery plan for a Markdown message.

PlanRequest:

Field	Type	Description
`markdown`	`string`	Source Markdown text
`preferredMode`	`'auto' \| 'rich-html' \| 'plain-text'`	Rendering mode preference
`strategy`	`SplitStrategy`	Starting strategy
`transport`	`TransportProfile`	Transport capabilities

DeliveryPlan:

Field	Type	Description
`chunks`	`PlannedChunk[]`	Ordered chunks ready for delivery
`diagnostics`	`PlanDiagnostics`	Detailed diagnostic information

`PlanDiagnostics`

Field	Type	Description
`sourceLength`	`number`	Source markdown length
`plainTextLengthEstimate`	`number`	Plain-text length estimate after normalization/rendering
`normalizedBlockCount`	`number`	Number of top-level blocks after normalization
`chunkCount`	`number`	Number of chunks in the plan
`requestedStrategy`	`SplitStrategy`	Strategy requested by the caller
`usedStrategy`	`SplitStrategy`	First strategy that produced a valid plan
`requestedMode`	`'auto' \| 'rich-html' \| 'plain-text'`	Mode preference requested by the caller
`usedMode`	`'rich-html' \| 'plain-text'`	Rendering mode actually used for this plan
`hadDegradation`	`boolean`	`true` if strategy/mode had to degrade or unsupported markdown was simplified
`degradedToPlainText`	`boolean`	`true` if planning ended up in plain-text after a non-plain-text preference
`hadForcedSplit`	`boolean`	`true` if at least one actual chunk boundary in this plan used forced Unicode-safe split
`splitBlockTypes`	`string[]`	Unique block types that actually had to be split

`replanTail(request): ReplannedTail`

Replan the undelivered tail after a transport reject.

`PlannedChunk`

Field	Type	Description
`index`	`number`	0-based index in the plan
`total`	`number`	Total number of chunks
`mode`	`'rich-html' \| 'plain-text'`	Rendering mode used
`content`	`string`	Rendered chunk content
`estimatedLength`	`number`	`content.length`
`sourceRange`	`SourceRange`	Opaque reference into normalized IR

`TransportProfile`

Field	Type	Description
`maxTextLength`	`number`	Hard transport limit
`safeTextBudget`	`number`	Safe budget (>= 200, must not exceed `maxTextLength`), checked against the final rendered chunk content
`supportsPlainText`	`boolean`	Transport accepts plain text
`supportsMultipartPlainText`	`boolean`	Transport accepts multiple plain-text messages
`supportsRichHtml`	`boolean`	Transport accepts rich HTML
`countMethod`	`'string-length'`	Length counting method

Helpers

STRATEGY_LADDER — array of all strategies in escalation order
nextStrategy(strategy) — returns the next more aggressive strategy, or null
isAtLeastAsAggressive(a, b) — compares two strategies
validateTransportProfile(tp) — throws on invalid profile

Strategy escalation

Strategy	Description
`preserve`	Keep as single chunk if it fits
`split-blocks`	Split at block boundaries (paragraphs, headings, etc.)
`split-blocks-soft`	Split within blocks (sentences, punctuation)
`plain-text`	Same as split-blocks-soft but in plain-text mode
`forced-plain-text`	Last resort: split at `\n\n` → `\n` → whitespace → Unicode-safe forced cut

Planning semantics

Splitting is based on the maximal prefix that fits the budget, not on a balanced split.
For rich-html, fit is checked after the final render/escape step, so HTML overhead can move the split point left compared with plain text.
Within that fitting prefix, the planner prefers softer boundaries according to the current block rule.
Forced Unicode-safe split is used only when no softer allowed boundary exists inside the fitting prefix.

Reject handling scenarios

The library provides replanning tools but does not hardcode the retry policy — that is the caller's responsibility.

`too-long` — chunk exceeded the transport limit

Typical caller reaction: lower the budget, raise the strategy, or both.

// Transport rejected chunk 1 as too long → lower budget and escalate strategy
const tail = replanTail({
    markdown,
    previousPlan: plan,
    failedChunkIndex: 1,
    preferredMode: 'auto',
    nextStrategy: nextStrategy(plan.diagnostics.usedStrategy) || 'forced-plain-text',
    transport: { ...transport, safeTextBudget: transport.safeTextBudget - 400 },
    rejectReason: 'too-long',
});

`invalid-markup` — transport rejected the markup

Typical caller reaction: switch to plain-text mode for the remaining tail.

// Transport rejected rich-html chunk → replan tail as plain-text
const tail = replanTail({
    markdown,
    previousPlan: plan,
    failedChunkIndex: 2,
    preferredMode: 'plain-text',
    nextStrategy: plan.diagnostics.usedStrategy,
    transport,
    rejectReason: 'invalid-markup',
});

This may produce a mixed final delivery:

already delivered prefix stays rich-html;
replanned tail goes as plain-text.

That mixed result is expected and supported.

Other transport errors (401, 429, 5xx, network failures)

These are not module-level reject reasons. The integration layer must decide whether to retry sending, abort, or map the error to too-long / invalid-markup before calling replanTail().

Limitations

Underscore emphasis (_text_, __text__) is intentionally not supported — treated as literal text
Tables are not supported — they fall through as text paragraphs
Images become text: alt (src)
Raw HTML is escaped in rich-html mode, kept literal in plain-text
Unicode splitting is surrogate-pair safe but not grapheme-cluster safe (ZWJ sequences may be split)

Testing

npm test

Development

npm install
npm test
npm run pack:check

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 84 Commits
.github/workflows		.github/workflows
docs		docs
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

message-chunker

Features

Installation

Quick start

Replanning after reject

API

`planDelivery(request): DeliveryPlan`

`PlanDiagnostics`

`replanTail(request): ReplannedTail`

`PlannedChunk`

`TransportProfile`

Helpers

Strategy escalation

Planning semantics

Reject handling scenarios

`too-long` — chunk exceeded the transport limit

`invalid-markup` — transport rejected the markup

Other transport errors (401, 429, 5xx, network failures)

Limitations

Testing

Development

License

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

message-chunker

Features

Installation

Quick start

Replanning after reject

API

planDelivery(request): DeliveryPlan

PlanDiagnostics

replanTail(request): ReplannedTail

PlannedChunk

TransportProfile

Helpers

Strategy escalation

Planning semantics

Reject handling scenarios

too-long — chunk exceeded the transport limit

invalid-markup — transport rejected the markup

Other transport errors (401, 429, 5xx, network failures)

Limitations

Testing

Development

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages

`planDelivery(request): DeliveryPlan`

`PlanDiagnostics`

`replanTail(request): ReplannedTail`

`PlannedChunk`

`TransportProfile`

`too-long` — chunk exceeded the transport limit

`invalid-markup` — transport rejected the markup