Skip to content

Token Counting Mismatch: Anthropic-Compatible API #12

@jtstothard

Description

@jtstothard

Token Counting Mismatch: Anthropic-Compatible API

Summary

ZLM's Anthropic-compatible API has significant tokenization differences from Anthropic's tokenizer, causing Claude Code to hit context limits at ~80% reported usage.

Environment

  • Tool: @z_ai/coding-helper v0.0.7
  • Plan: glm_coding_plan_global
  • Model: opusglm-4.7
  • Endpoint: https://api.z.ai/api/anthropic

Bug Description

When Claude Code reports 80% context used, the API returns:

API Error: The model has reached its context window limit.

Root Cause

GLM's tokenizer counts tokens very differently for structured data:

Content Type GLM vs Claude Ratio Impact
Natural text 0.89x - 0.97x GLM more efficient ✅
JavaScript/Python ~0.95x Minor difference
JSON 1.43x - 1.49x 43-49% more tokens ⚠️
Special chars 2.43x - 2.49x 143-149% more tokens 🔴

Example

# JSON content (300x repeated)
input: '{"key": "value", "nested": {"data": 123}} ' * 300
Claude estimate: ~3,150 tokens
GLM actual: ~4,506 tokens
Ratio: 1.43x

Impact: When working with JSON/config files, Claude's "80%" = GLM's ~114%.

Models Affected

All GLM models show the same pattern:

Model JSON Special Chars
glm-4.7 1.49x 2.49x
glm-5.1 1.49x 2.49x

Expected Behavior

Anthropic-compatible APIs should match Anthropic's tokenizer within ±5%. Current deviation of 40-149% breaks tools that rely on accurate token counting.

Reproduction

import requests

response = requests.post(
    "https://api.z.ai/api/anthropic/v1/messages",
    headers={
        "x-api-key": "YOUR_KEY",
        "anthropic-version": "2023-06-01",
        "content-type": "application/json"
    },
    json={
        "model": "opus",
        "max_tokens": 100,
        "messages": [{
            "role": "user", 
            "content": '{"key": "value", "nested": {"data": 123}} ' * 300
        }]
    }
)

usage = response.json()['usage']
print(f"GLM tokens: {usage['input_tokens']}")
print(f"Claude estimate: ~{len(content) // 4}")
print(f"Ratio: {usage['input_tokens'] / (len(content) // 4):.2f}x")

Possible Solutions

  1. Fix tokenization - Match Anthropic's tokenizer in the compatibility layer
  2. Return adjusted usage - Provide Claude-compatible token counts in the response
  3. Document the difference - If intentional, clearly document and provide conversion formula

Priority

High - Makes the API unusable for coding workflows involving JSON/config files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions