fix(webapp): sanitize OTel attributes on ClickHouse JSON parse rejection#3659
Conversation
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (8)
💤 Files with no reviewable changes (1)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (6)
📜 Recent review details⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
WalkthroughThis PR implements UTF-16 surrogate sanitization for ClickHouse JSON parse failures in OTEL attributes. It broadens the Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
ClickHouse's JSONEachRow parser rejects rows containing unpaired UTF-16 surrogates (`Cannot parse JSON object here ... ParallelParsingBlock InputFormat`), losing the whole 5–10k-row batch through the scheduler's retry path. Locally reproduced with ~10 KB rows; the 100 MB size-stress error is distinct (`Size of JSON object is extremely large`), so the root cause is content quality, not size. `ClickhouseEventRepository.#flushBatch` and `#flushLlmMetricsBatch` now retry once after sanitizing every row in the batch — any string with a lone surrogate is replaced with `"[invalid-utf16]"`. ClickHouse's `at row N` hint is logged for observability but not used to slice; its semantics under `input_format_parallel_parsing` aren't reliable, and a whole-batch scan catches multi-row poisoning in one pass. If the retry also fails: loud error log with sample row, `permanentlyDroppedBatches` increments, return normally — deterministic parse failures don't benefit from the scheduler's transient-retry backoff. Non-parse errors propagate unchanged. Detection reuses `detectBadJsonStrings` via `JSON.stringify(value)`, with a latent regex bug fixed: the low-surrogate nibble matched `[cd]` instead of `[c-f]`, missing U+DE00–U+DFFF and false-flagging common emoji pairs (e.g. 😀). Healthy batches pay zero scan cost — the check only runs when ClickHouse has already rejected.
fa0034d to
8cc9e85
Compare
Before fix:


After fix: