-
-
Notifications
You must be signed in to change notification settings - Fork 3.5k
refactor(editor): optimize code block highlighting #12817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: canary
Are you sure you want to change the base?
refactor(editor): optimize code block highlighting #12817
Conversation
WalkthroughThis change introduces a new tokenizer architecture for code blocks, shifting from static, precomputed token arrays to a dynamic, stateful tokenization model. New classes and interfaces for tokenization and Shiki integration are added, and existing components are refactored to use the tokenizer for line-by-line syntax highlighting with improved state management and caching. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CodeBlockComponent
participant CodeTokenizer
participant ShikiTokenProvider
User->>CodeBlockComponent: Edits code or changes language
CodeBlockComponent->>ShikiTokenProvider: (If needed) Load language/theme
CodeBlockComponent->>CodeTokenizer: Create or update tokenizer
loop For each visible line
CodeBlockComponent->>CodeTokenizer: tokenizeLine({lineContent, lineIndex})
CodeTokenizer->>ShikiTokenProvider: tokenize(lineContent, state)
ShikiTokenProvider-->>CodeTokenizer: TokenizationResult (tokens, endState)
CodeTokenizer-->>CodeBlockComponent: Tokens for line
CodeBlockComponent->>AffineCodeUnit: Render with tokens and lineIndex
end
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
blocksuite/affine/blocks/code/src/code-block.ts (1)
108-115
:⚠️ Potential issueOld tokenizer lingers when language is cleared
If
modelLang
isnull
wereturn
without resettingthis.tokenizer$
; the previous tokenizer keeps highlighting with an out-of-date grammar.- if (modelLang === null) { - return; - } + if (modelLang === null) { + this.tokenizer$.value = null; + return; + }
🧹 Nitpick comments (3)
blocksuite/affine/blocks/code/src/code-block.ts (1)
167-172
: Clear tokenizer cache on text editsThe second effect touches
tokenizer$.value
but never invalidates cached tokens when the document changes, leading to stale highlighting after large edits/line re-orders. Considertokenizer?.clearCache()
whenevertext.deltas$
changes.blocksuite/affine/blocks/code/src/highlight/affine-code-unit.ts (1)
57-60
: HeavystructuredClone
callDeep-cloning every line’s tokens on each render can be costly. A shallow clone (
lineTokens.map(t => ({ ...t }))
) or slice when you only mutate.content
is usually sufficient and avoids browser support quirks forstructuredClone
.blocksuite/affine/blocks/code/src/highlight/tokenizer.ts (1)
27-34
: Unbounded cache may grow indefinitely
_tokenizedLines
is never pruned. Very large files or long editing sessions will leak memory. Add a simple LRU policy or reset on large size.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
blocksuite/affine/blocks/code/src/code-block-inline.ts
(1 hunks)blocksuite/affine/blocks/code/src/code-block.ts
(6 hunks)blocksuite/affine/blocks/code/src/highlight/affine-code-unit.ts
(4 hunks)blocksuite/affine/blocks/code/src/highlight/tokenizer.ts
(1 hunks)blocksuite/framework/std/src/inline/components/v-line.ts
(1 hunks)
🧰 Additional context used
🪛 Biome (1.9.4)
blocksuite/affine/blocks/code/src/highlight/tokenizer.ts
[error] 86-86: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
🔇 Additional comments (2)
blocksuite/affine/blocks/code/src/code-block-inline.ts (1)
25-29
: Confirm renderer signature compatibility
renderer: ({ delta, lineIndex }) => …
introduceslineIndex
, butInlineSpecExtension
’srenderer
type may still expect only{ delta }
. Please re-check the extension declaration to avoid a TS compile error in CI.blocksuite/affine/blocks/code/src/highlight/tokenizer.ts (1)
57-65
: VerifygetInternalStack/equals
API availability
GrammarState#getInternalStack()
and.equals()
are not part of Shiki’s public typings as of v1. If they change, this will break at runtime. Double-check the API or guard with optional chaining.
…golok727/AFFiNE into refactor/codeblock/highlighting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (3)
blocksuite/affine/blocks/code/src/highlight/shiki.ts (1)
24-42
: Be explicit about single-line token extraction & guard against Shiki API changes
codeToTokens
returns an array-of-lines; we currently grabtokens[0]
without checking length.
If Shiki ever changes the return shape (e.g. returns an empty array for blank lines) this will throw.- return { - lineTokens: res.tokens[0], - endState: new ShikiTokenizerState(res.grammarState), - }; + const firstLineTokens = res.tokens[0] ?? []; + return { + lineTokens: firstLineTokens, + endState: new ShikiTokenizerState(res.grammarState), + };blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts (2)
74-80
: Tight loop could walk the entire buffer each call
_guessStateForLine
walks backwards one line at a time until it hits a cached state.
For large files this is O(N) per miss and can be noticeable during fast scrolling.Consider storing the last line index whose
endState
is known and jump straight to it, or maintain a prefix-map of the latest valid state to achieve amortised O(1).🧰 Tools
🪛 Biome (1.9.4)
[error] 76-76: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
75-77
: Biome hint – optional chaining improves readabilityThe static analyser suggestion is minor but valid:
- const prevLineCache = this._tokenizedLines.get(lineIndex - 1); - if (prevLineCache && prevLineCache.token.endState) { - return prevLineCache.token.endState as State; + const prevLineCache = this._tokenizedLines.get(lineIndex - 1); + const endState = prevLineCache?.token.endState as State | undefined; + if (endState) { + return endState; }🧰 Tools
🪛 Biome (1.9.4)
[error] 76-76: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
blocksuite/affine/blocks/code/src/code-block.ts
(6 hunks)blocksuite/affine/blocks/code/src/highlight/affine-code-unit.ts
(4 hunks)blocksuite/affine/blocks/code/src/highlight/shiki.ts
(1 hunks)blocksuite/affine/blocks/code/src/tokenizer/index.ts
(1 hunks)blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts
(1 hunks)blocksuite/affine/blocks/code/src/tokenizer/types.ts
(1 hunks)
✅ Files skipped from review due to trivial changes (2)
- blocksuite/affine/blocks/code/src/tokenizer/index.ts
- blocksuite/affine/blocks/code/src/tokenizer/types.ts
🚧 Files skipped from review as they are similar to previous changes (2)
- blocksuite/affine/blocks/code/src/highlight/affine-code-unit.ts
- blocksuite/affine/blocks/code/src/code-block.ts
🧰 Additional context used
🧬 Code Graph Analysis (2)
blocksuite/affine/blocks/code/src/highlight/shiki.ts (1)
blocksuite/affine/blocks/code/src/tokenizer/types.ts (3)
TokenizerState
(1-3)TokensProvider
(22-24)TokenizationResult
(17-20)
blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts (1)
blocksuite/affine/blocks/code/src/tokenizer/types.ts (5)
TokenizerState
(1-3)TokenizationResult
(17-20)TokensProvider
(22-24)LineKey
(5-9)Token
(11-15)
🪛 Biome (1.9.4)
blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts
[error] 76-76: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
blocksuite/affine/blocks/code/src/tokenizer/types.ts (1)
5-8
: Consider markingLineKey
fields asreadonly
.
LineKey
is mainly used as a cache-key. Marking its properties immutable prevents accidental mutation that could silently break cache invariants.-export type LineKey = { - lineContent: string; - lineIndex: number; -}; +export type LineKey = { + readonly lineContent: string; + readonly lineIndex: number; +};blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts (3)
74-80
: Tiny readability win: use optional chaining.Biome’s hint is valid – the manual null-check can be collapsed:
- const prevLineCache = this._tokenizedLines.get(lineIndex - 1); - if (prevLineCache && prevLineCache.token.endState) { - return prevLineCache.token.endState as State; - } + const maybeState = this._tokenizedLines.get(lineIndex - 1)?.token.endState; + if (maybeState) { + return maybeState as State; + }This removes one level of nesting and clarifies intent.
🧰 Tools
🪛 Biome (1.9.4)
[error] 76-76: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
90-95
: Empty-array sentinel may hide errors.
getLineTokens
returns[]
when a line hasn’t been tokenized.
Call-sites cannot distinguish “not yet tokenized” from “line is empty”. Returningundefined
(or throwing) forces explicit handling and avoids subtle rendering bugs.- return this._tokenizedLines.get(index)?.token.lineTokens ?? []; + const cached = this._tokenizedLines.get(index); + if (!cached) return undefined; // or throw + return cached.token.lineTokens;
58-71
: Potential memory growth: expose a pruning strategy.
_tokenizedLines
grows unbounded during long-lived editing sessions. Consider:• Exposing a
prune(afterLine: number)
method that drops caches before a given index, or
• Switching to an LRU map with a reasonable cap.Prevents unnecessary memory usage in large documents.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
blocksuite/affine/blocks/code/src/highlight/shiki.ts
(1 hunks)blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts
(1 hunks)blocksuite/affine/blocks/code/src/tokenizer/types.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- blocksuite/affine/blocks/code/src/highlight/shiki.ts
🧰 Additional context used
🧬 Code Graph Analysis (1)
blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts (1)
blocksuite/affine/blocks/code/src/tokenizer/types.ts (5)
TokenizerState
(1-3)TokenizationResult
(16-19)TokensProvider
(21-24)LineKey
(5-8)Token
(10-14)
🪛 Biome (1.9.4)
blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts
[error] 76-76: Change to an optional chain.
Unsafe fix: Change to an optional chain.
(lint/complexity/useOptionalChain)
🔇 Additional comments (2)
blocksuite/affine/blocks/code/src/tokenizer/types.ts (1)
10-14
: Align naming with common token interfaces (value
vscontent
).Most syntax-highlighting libs (Shiki, Prism, etc.) use
content
orvalue
consistently. If the rest of the pipeline (e.g.AffineCodeUnit
) expectsvalue
, a mismatch here becomes a foot-gun. Double-check downstream usages; consider renaming before this type propagates broadly.blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts (1)
9-16
: Great –HighlightedLine
now carries the genericState
.The earlier unsafe casts are gone, restoring proper type safety.
Nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
blocksuite/affine/blocks/code/src/tokenizer/types.ts (5)
1-3
: Clarify theequals
contract & document invariants
equals
sits at the heart of cache-invalidation logic, yet the interface doesn’t specify what constitutes equality (structural, referential, locale-sensitive, etc.). A brief JSDoc will prevent divergent implementations and subtle bugs.export interface TokenizerState { + /** + * Determine whether two states are equivalent for the purpose of + * incremental tokenisation. Implementations **must** satisfy reflexive, + * symmetric and transitive properties because the result is used as a + * cache key. + */ equals(other: TokenizerState): boolean; }
5-8
: MakeLineKey
immutable to avoid accidental mutation
LineKey
is presumably used as aMap
/WeakMap
key. Mutating properties after insertion would break lookup semantics. Marking itReadonly
prevents this class of bugs at compile time.-export type LineKey = { - lineContent: string; - lineIndex: number; -}; +export type LineKey = Readonly<{ + lineContent: string; + lineIndex: number; +}>;
10-14
: Consider a readonlyToken
interface for safer downstream useTokens are typically treated as value objects. Freezing their fields helps avoid accidental in-place edits that desynchronise caches or renderers.
-export type Token = { - content: string; - offset: number; - color?: string; -}; +export interface Token { + readonly content: string; + readonly offset: number; + readonly color?: string; +}
16-19
: RenamelineTokens
→tokens
for concision (optional)Within the
TokenizationResult
context, all tokens belong to a single line; the prefix doesn’t add information and slightly lengthens property access.-export type TokenizationResult = { - lineTokens: Token[]; - endState: TokenizerState; -}; +export interface TokenizationResult { + tokens: Token[]; + endState: TokenizerState; +}
21-24
: Evaluate need for async tokenisationSome providers (e.g., WASM-backed highlighters or remote services) are inherently asynchronous. If future extenders will need async, considering a Promise-based overload now may avoid breaking changes later.
tokenize(line: string, state: State): TokenizationResult | Promise<TokenizationResult>;Not urgent, but worth weighing while the API is still greenfield.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
blocksuite/affine/blocks/code/src/code-block.ts
(7 hunks)blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts
(1 hunks)blocksuite/affine/blocks/code/src/tokenizer/types.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- blocksuite/affine/blocks/code/src/tokenizer/tokenizer.ts
- blocksuite/affine/blocks/code/src/code-block.ts
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## canary #12817 +/- ##
==========================================
- Coverage 55.91% 55.59% -0.33%
==========================================
Files 2652 2654 +2
Lines 125440 125511 +71
Branches 19948 19904 -44
==========================================
- Hits 70142 69777 -365
+ Misses 53550 53426 -124
- Partials 1748 2308 +560
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Limitation:
If a new line is added the subsequent lines needs retokenization. ( Will fix it in next PR )
Summary by CodeRabbit
New Features
Improvements
Bug Fixes