Sanitize LLM output #30

ytsarev · 2025-11-19T22:39:34Z

Description of your changes

Problem

When using claude-haiku-4-5 model for cost savings, the model wraps JSON/YAML output in markdown code blocks (```json) despite prompt instructions. This caused:

Agent parsing failures with "unable to parse agent output" errors
Messy Kubernetes event messages showing raw markdown in Operations

Similar was observed with sonnet-4-5, but there was the ability to change the behaviour with prompt modifications.

Sanitization is useful in any case and makes function-claude output and behaviour much more deterministic.

Solution

Implemented defense-in-depth markdown stripping with clean code reuse:

Error extraction: Extract JSON from langchaingo agent errors when parsing fails
Response cleaning: Strip markdown from successful responses
Event messages: Reuse cleaned response for Kubernetes events (no redundant stripping)

Key Changes

Added extractJSONFromAgentError() - testable function for agent error handling
Modified resourceFrom() to return cleaned string alongside parsed resources
Updated Operations pipeline to use cleaned response for event messages
Added stripMarkdownCodeBlocks() helper used across both paths

Test Coverage

8 test cases for extractJSONFromAgentError() (nil, unrelated, JSON, YAML, generic, whitespace, wrapped)
4 test cases for resourceFrom() markdown scenarios with cleaned string validation
All 21 tests passing

E2E testing with https://github.com/upbound/configuration-aws-database-ai

Before: Operation failure with (```json) in the function output

Warning  FunctionInvocation  9s  ops/operation.ops.crossplane.io  failed to invoke pipeline step "upbound-function-claude": cannot
run Function "upbound-function-claude": rpc error: code = Unknown desc = unable to parse agent output: ```json
{"apiVersion":"aws.platform.upbound.io/v1alpha1","kind":"SQLInstance","metadata":{"name":"rds-metrics-database-ai-mysql","namespace":
"database-team","annotations":{"intelligent-scaling/last-analyzed":"2025-11-19T19:04:06Z","intelligent-scaling/last-scaled-decision":
"No scaling needed. CPU 4.4% < 20%, Memory 175.6MB available > 60%, Connections 0 < 30%. Instance is underutilized but within safe
downscale thresholds. Maintaining db.t3.micro as minimum instance class."}},"spec":{}}

After: Operation success with the clean json output

Events:
  Type    Reason           Age    From                             Message
  ----    ------           ----   ----                             -------
  Normal  RunPipelineStep  2m21s  ops/operation.ops.crossplane.io  Pipeline step "upbound-function-claude": {
  "apiVersion": "aws.platform.upbound.io/v1alpha1",
  "kind": "SQLInstance",
  "metadata": {
    "name": "rds-metrics-database-ai-mysql",
    "namespace": "database-team",
    "annotations": {
      "intelligent-scaling/last-analyzed": "2025-11-19T22:28:27Z",
      "intelligent-scaling/last-scaled-decision": "Rate limit applied: analysis performed < 1 minute ago. Skipping re-analysis. Previous analysis: No scaling required. Metrics remain optimal: CPU 2.45%, Connections 0, Memory 14% free. Instance at minimum class db.t3.micro."
    }
  }
}

Results

✅ Haiku model works reliably with cost savings
✅ Clean Kubernetes event messages without markdown
✅ No redundant stripping operations
✅ Works correctly in both success and error cases

I have:

Read and followed Crossplane's contribution process.
Added or updated unit tests for my change.

Signed-off-by: Yury Tsarev <yury@upbound.io>

tnthornton

Thanks @ytsarev ! The main question I have is if we really need to rely on string parsing (which can be pretty brittle) or if there's a path for structured processing of the errors 👍

tnthornton · 2025-11-20T00:31:42Z

tests/test-function-claude/main.k

+                        port: 8080
+                        protocol: "TCP"


I bet this continues to flake.

@tnthornton most probably, but it was stable during last 4 RC releases https://github.com/upbound/function-claude/actions/workflows/ci.yml over this branch

tnthornton · 2025-11-20T00:44:05Z

fn.go

+// extractJSONFromAgentError attempts to extract JSON from langchaingo agent parsing errors.
+// When the agent framework fails to parse output, it returns an error containing the raw output.
+// This function extracts and cleans that output for a second parsing attempt.
+// Returns the extracted content and true if extraction succeeded, empty string and false otherwise.
+func extractJSONFromAgentError(err error) (string, bool) {
+	if err == nil {
+		return "", false
+	}
+	if !strings.Contains(err.Error(), "unable to parse agent output") {
+		return "", false
+	}
+	parts := strings.SplitN(err.Error(), "unable to parse agent output: ", 2)
+	if len(parts) != 2 {
+		return "", false
+	}
+	cleaned := stripMarkdownCodeBlocks(parts[1])
+	return cleaned, true
+}


Is there any possible path where we don't have to do string parsing? e.g. are the errors structured in any way?

Good catch! I checked the langchaingo source code to verify.

langchaingo uses plain sentinel errors without structured fields.

Error definition (agents/errors.go:19):

ErrUnableToParseOutput = errors.New("unable to parse agent output")

Error wrapping with output (https://github.com/tmc/langchaingo/blob/main/agents/conversational.go#L157):

return nil, nil, fmt.Errorf("%w: %s", ErrUnableToParseOutput, output)

The error wraps the sentinel with fmt.Errorf and appends the raw output as a string. There's no custom error type with accessible fields - just a formatted string.

No structured alternative exists - ParserErrorHandler in the same file is for handling/formatting errors, not for accessing the output.

String parsing via strings.SplitN() is currently the only way to extract the embedded output. Our extractJSONFromAgentError() function uses defensive pattern matching and is covered by 8 test cases.

Makes sense. Thanks for the digging into it deeper and for adding the test coverage. Hypothetically if the strings change, the tests "pop" and we can account for it.

ytsarev added 4 commits November 19, 2025 20:54

Trim markdown for more deterministic output

1df35ed

Signed-off-by: Yury Tsarev <yury@upbound.io>

Additional output cleanup earlier in the chain

8ecac35

Signed-off-by: Yury Tsarev <yury@upbound.io>

Refactor markdown cleanup and cover with unit tests

ba8f1f2

Signed-off-by: Yury Tsarev <yury@upbound.io>

Reuse sanitizing on all steps include event log

81c13f0

Signed-off-by: Yury Tsarev <yury@upbound.io>

ytsarev requested a review from tnthornton November 19, 2025 23:11

tnthornton reviewed Nov 20, 2025

View reviewed changes

ytsarev requested a review from tnthornton November 20, 2025 12:16

tnthornton approved these changes Nov 20, 2025

View reviewed changes

tnthornton merged commit abc2d82 into main Nov 20, 2025
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanitize LLM output #30

Sanitize LLM output #30

Uh oh!

ytsarev commented Nov 19, 2025

Uh oh!

tnthornton left a comment

Uh oh!

tnthornton Nov 20, 2025

Uh oh!

ytsarev Nov 20, 2025 •

edited

Loading

Uh oh!

tnthornton Nov 20, 2025

Uh oh!

ytsarev Nov 20, 2025

Uh oh!

tnthornton Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sanitize LLM output #30

Sanitize LLM output #30

Uh oh!

Conversation

ytsarev commented Nov 19, 2025

Description of your changes

Problem

Solution

Key Changes

Test Coverage

E2E testing with https://github.com/upbound/configuration-aws-database-ai

Results

Uh oh!

tnthornton left a comment

Choose a reason for hiding this comment

Uh oh!

tnthornton Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ytsarev Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tnthornton Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

ytsarev Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

tnthornton Nov 20, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ytsarev Nov 20, 2025 •

edited

Loading