Token Counting Mismatch: Anthropic-Compatible API
Summary
ZLM's Anthropic-compatible API has significant tokenization differences from Anthropic's tokenizer, causing Claude Code to hit context limits at ~80% reported usage.
Environment
- Tool:
@z_ai/coding-helper v0.0.7
- Plan:
glm_coding_plan_global
- Model:
opus → glm-4.7
- Endpoint:
https://api.z.ai/api/anthropic
Bug Description
When Claude Code reports 80% context used, the API returns:
API Error: The model has reached its context window limit.
Root Cause
GLM's tokenizer counts tokens very differently for structured data:
| Content Type |
GLM vs Claude Ratio |
Impact |
| Natural text |
0.89x - 0.97x |
GLM more efficient ✅ |
| JavaScript/Python |
~0.95x |
Minor difference |
| JSON |
1.43x - 1.49x |
43-49% more tokens ⚠️ |
| Special chars |
2.43x - 2.49x |
143-149% more tokens 🔴 |
Example
# JSON content (300x repeated)
input: '{"key": "value", "nested": {"data": 123}} ' * 300
Claude estimate: ~3,150 tokens
GLM actual: ~4,506 tokens
Ratio: 1.43x
Impact: When working with JSON/config files, Claude's "80%" = GLM's ~114%.
Models Affected
All GLM models show the same pattern:
| Model |
JSON |
Special Chars |
| glm-4.7 |
1.49x |
2.49x |
| glm-5.1 |
1.49x |
2.49x |
Expected Behavior
Anthropic-compatible APIs should match Anthropic's tokenizer within ±5%. Current deviation of 40-149% breaks tools that rely on accurate token counting.
Reproduction
import requests
response = requests.post(
"https://api.z.ai/api/anthropic/v1/messages",
headers={
"x-api-key": "YOUR_KEY",
"anthropic-version": "2023-06-01",
"content-type": "application/json"
},
json={
"model": "opus",
"max_tokens": 100,
"messages": [{
"role": "user",
"content": '{"key": "value", "nested": {"data": 123}} ' * 300
}]
}
)
usage = response.json()['usage']
print(f"GLM tokens: {usage['input_tokens']}")
print(f"Claude estimate: ~{len(content) // 4}")
print(f"Ratio: {usage['input_tokens'] / (len(content) // 4):.2f}x")
Possible Solutions
- Fix tokenization - Match Anthropic's tokenizer in the compatibility layer
- Return adjusted usage - Provide Claude-compatible token counts in the response
- Document the difference - If intentional, clearly document and provide conversion formula
Priority
High - Makes the API unusable for coding workflows involving JSON/config files.
Token Counting Mismatch: Anthropic-Compatible API
Summary
ZLM's Anthropic-compatible API has significant tokenization differences from Anthropic's tokenizer, causing Claude Code to hit context limits at ~80% reported usage.
Environment
@z_ai/coding-helperv0.0.7glm_coding_plan_globalopus→glm-4.7https://api.z.ai/api/anthropicBug Description
When Claude Code reports 80% context used, the API returns:
Root Cause
GLM's tokenizer counts tokens very differently for structured data:
Example
Impact: When working with JSON/config files, Claude's "80%" = GLM's ~114%.
Models Affected
All GLM models show the same pattern:
Expected Behavior
Anthropic-compatible APIs should match Anthropic's tokenizer within ±5%. Current deviation of 40-149% breaks tools that rely on accurate token counting.
Reproduction
Possible Solutions
Priority
High - Makes the API unusable for coding workflows involving JSON/config files.