Bug Description
ForgeCode's paste formatter truncates pasted Chinese text when the text contains certain Unicode characters such as 公 or 破.
This appears to affect both:
- the zsh
: prompt paste hook, because it calls forge zsh format --buffer "$BUFFER"
- the interactive Forge TUI paste path, because paste events also call the same paste formatting logic
Steps to Reproduce
Run:
forge --version
forge zsh format --buffer ': abc公def'
forge zsh format --buffer ': abc破def'
forge zsh format --buffer ': abc布def'
forge zsh format --buffer ': abc突def'
forge zsh format --buffer ': 2026 春季 BOOOMJam 开发主题公布让我们一起探索,突破「视界限」!'
Expected Behavior
All non-path pasted text should be preserved unchanged.
Expected examples:
: abc公def
: abc破def
: 2026 春季 BOOOMJam 开发主题公布让我们一起探索,突破「视界限」!
Actual Behavior
On ForgeCode 2.12.5, the text is truncated. For example:
forge zsh format --buffer ': abc公def'
# outputs: :
forge zsh format --buffer ': abc破def'
# outputs: :
Characters like 布 and 突 do not trigger the same truncation in my test.
Suspected Cause
The likely issue is in crates/forge_main/src/zsh/paste.rs, specifically find_token_end() / wrap_tokens().
The code scans input.as_bytes(), casts each byte to char, and then calls is_whitespace():
.map(|b| (*b as char).is_whitespace())
For UTF-8 Chinese characters:
公 = e5 85 ac
破 = e7 a0 b4
The continuation bytes 0x85 / 0xA0 can be interpreted as whitespace-like characters when cast independently, so the formatter incorrectly treats the middle of a UTF-8 character as a token boundary. The later safe .get() calls prevent a crash, but they still cause truncation.
A char-boundary-safe implementation should probably iterate with char_indices() instead of scanning individual bytes.
Related Issues / PRs
This looks related to, but not fully fixed by:
Those fixed or reduced UTF-8 byte-boundary crashes, but this current case still reproduces on 2.12.5 as silent truncation.
Forge Version
Operating System & Version
macOS, accessed from a PC over SSH.
Installation Method
Installed binary at ~/.local/bin/forge.
Configuration
No special configuration required to reproduce. The CLI command forge zsh format --buffer ... is enough.
Bug Description
ForgeCode's paste formatter truncates pasted Chinese text when the text contains certain Unicode characters such as
公or破.This appears to affect both:
:prompt paste hook, because it callsforge zsh format --buffer "$BUFFER"Steps to Reproduce
Run:
Expected Behavior
All non-path pasted text should be preserved unchanged.
Expected examples:
Actual Behavior
On ForgeCode 2.12.5, the text is truncated. For example:
Characters like
布and突do not trigger the same truncation in my test.Suspected Cause
The likely issue is in
crates/forge_main/src/zsh/paste.rs, specificallyfind_token_end()/wrap_tokens().The code scans
input.as_bytes(), casts each byte tochar, and then callsis_whitespace():For UTF-8 Chinese characters:
公=e5 85 ac破=e7 a0 b4The continuation bytes
0x85/0xA0can be interpreted as whitespace-like characters when cast independently, so the formatter incorrectly treats the middle of a UTF-8 character as a token boundary. The later safe.get()calls prevent a crash, but they still cause truncation.A char-boundary-safe implementation should probably iterate with
char_indices()instead of scanning individual bytes.Related Issues / PRs
This looks related to, but not fully fixed by:
Those fixed or reduced UTF-8 byte-boundary crashes, but this current case still reproduces on 2.12.5 as silent truncation.
Forge Version
Operating System & Version
macOS, accessed from a PC over SSH.
Installation Method
Installed binary at
~/.local/bin/forge.Configuration
No special configuration required to reproduce. The CLI command
forge zsh format --buffer ...is enough.