Token Counting Mismatch: Anthropic-Compatible API

# Token Counting Mismatch: Anthropic-Compatible API

## Summary
ZLM's Anthropic-compatible API has significant tokenization differences from Anthropic's tokenizer, causing Claude Code to hit context limits at ~80% reported usage.

## Environment
- **Tool**: `@z_ai/coding-helper` v0.0.7
- **Plan**: `glm_coding_plan_global`
- **Model**: `opus` → `glm-4.7`
- **Endpoint**: `https://api.z.ai/api/anthropic`

## Bug Description
When Claude Code reports **80% context used**, the API returns:
```
API Error: The model has reached its context window limit.
```

### Root Cause
GLM's tokenizer counts tokens very differently for structured data:

| Content Type | GLM vs Claude Ratio | Impact |
|--------------|---------------------|--------|
| Natural text | 0.89x - 0.97x | GLM more efficient ✅ |
| JavaScript/Python | ~0.95x | Minor difference |
| **JSON** | **1.43x - 1.49x** | 43-49% more tokens ⚠️ |
| **Special chars** | **2.43x - 2.49x** | 143-149% more tokens 🔴 |

### Example
```python
# JSON content (300x repeated)
input: '{"key": "value", "nested": {"data": 123}} ' * 300
Claude estimate: ~3,150 tokens
GLM actual: ~4,506 tokens
Ratio: 1.43x
```

**Impact**: When working with JSON/config files, Claude's "80%" = GLM's ~114%.

## Models Affected
All GLM models show the same pattern:

| Model | JSON | Special Chars |
|-------|------|---------------|
| glm-4.7 | 1.49x | 2.49x |
| glm-5.1 | 1.49x | 2.49x |

## Expected Behavior
Anthropic-compatible APIs should match Anthropic's tokenizer within ±5%. Current deviation of 40-149% breaks tools that rely on accurate token counting.

## Reproduction
```python
import requests

response = requests.post(
    "https://api.z.ai/api/anthropic/v1/messages",
    headers={
        "x-api-key": "YOUR_KEY",
        "anthropic-version": "2023-06-01",
        "content-type": "application/json"
    },
    json={
        "model": "opus",
        "max_tokens": 100,
        "messages": [{
            "role": "user", 
            "content": '{"key": "value", "nested": {"data": 123}} ' * 300
        }]
    }
)

usage = response.json()['usage']
print(f"GLM tokens: {usage['input_tokens']}")
print(f"Claude estimate: ~{len(content) // 4}")
print(f"Ratio: {usage['input_tokens'] / (len(content) // 4):.2f}x")
```

## Possible Solutions
1. **Fix tokenization** - Match Anthropic's tokenizer in the compatibility layer
2. **Return adjusted usage** - Provide Claude-compatible token counts in the response
3. **Document the difference** - If intentional, clearly document and provide conversion formula

## Priority
High - Makes the API unusable for coding workflows involving JSON/config files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token Counting Mismatch: Anthropic-Compatible API #12

Token Counting Mismatch: Anthropic-Compatible API

Summary

Environment

Bug Description

Root Cause

Example

Models Affected

Expected Behavior

Reproduction

Possible Solutions

Priority

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Content Type	GLM vs Claude Ratio	Impact
Natural text	0.89x - 0.97x	GLM more efficient ✅
JavaScript/Python	~0.95x	Minor difference
JSON	1.43x - 1.49x	43-49% more tokens ⚠️
Special chars	2.43x - 2.49x	143-149% more tokens 🔴

Token Counting Mismatch: Anthropic-Compatible API #12

Description

Token Counting Mismatch: Anthropic-Compatible API

Summary

Environment

Bug Description

Root Cause

Example

Models Affected

Expected Behavior

Reproduction

Possible Solutions

Priority

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions