Skip to content

Conversation

@ytsarev
Copy link
Member

@ytsarev ytsarev commented Nov 19, 2025

Description of your changes

Problem

When using claude-haiku-4-5 model for cost savings, the model wraps JSON/YAML output in markdown code blocks (```json) despite prompt instructions. This caused:

  • Agent parsing failures with "unable to parse agent output" errors
  • Messy Kubernetes event messages showing raw markdown in Operations

Similar was observed with sonnet-4-5, but there was the ability to change the behaviour with prompt modifications.

Sanitization is useful in any case and makes function-claude output and behaviour much more deterministic.

Solution

Implemented defense-in-depth markdown stripping with clean code reuse:

  1. Error extraction: Extract JSON from langchaingo agent errors when parsing fails
  2. Response cleaning: Strip markdown from successful responses
  3. Event messages: Reuse cleaned response for Kubernetes events (no redundant stripping)

Key Changes

  • Added extractJSONFromAgentError() - testable function for agent error handling
  • Modified resourceFrom() to return cleaned string alongside parsed resources
  • Updated Operations pipeline to use cleaned response for event messages
  • Added stripMarkdownCodeBlocks() helper used across both paths

Test Coverage

  • 8 test cases for extractJSONFromAgentError() (nil, unrelated, JSON, YAML, generic, whitespace, wrapped)
  • 4 test cases for resourceFrom() markdown scenarios with cleaned string validation
  • All 21 tests passing

E2E testing with https://github.com/upbound/configuration-aws-database-ai

  • Before: Operation failure with (```json) in the function output
Warning  FunctionInvocation  9s  ops/operation.ops.crossplane.io  failed to invoke pipeline step "upbound-function-claude": cannot
run Function "upbound-function-claude": rpc error: code = Unknown desc = unable to parse agent output: ```json
{"apiVersion":"aws.platform.upbound.io/v1alpha1","kind":"SQLInstance","metadata":{"name":"rds-metrics-database-ai-mysql","namespace":
"database-team","annotations":{"intelligent-scaling/last-analyzed":"2025-11-19T19:04:06Z","intelligent-scaling/last-scaled-decision":
"No scaling needed. CPU 4.4% < 20%, Memory 175.6MB available > 60%, Connections 0 < 30%. Instance is underutilized but within safe
downscale thresholds. Maintaining db.t3.micro as minimum instance class."}},"spec":{}}
  • After: Operation success with the clean json output
Events:
  Type    Reason           Age    From                             Message
  ----    ------           ----   ----                             -------
  Normal  RunPipelineStep  2m21s  ops/operation.ops.crossplane.io  Pipeline step "upbound-function-claude": {
  "apiVersion": "aws.platform.upbound.io/v1alpha1",
  "kind": "SQLInstance",
  "metadata": {
    "name": "rds-metrics-database-ai-mysql",
    "namespace": "database-team",
    "annotations": {
      "intelligent-scaling/last-analyzed": "2025-11-19T22:28:27Z",
      "intelligent-scaling/last-scaled-decision": "Rate limit applied: analysis performed < 1 minute ago. Skipping re-analysis. Previous analysis: No scaling required. Metrics remain optimal: CPU 2.45%, Connections 0, Memory 14% free. Instance at minimum class db.t3.micro."
    }
  }
}

Results

✅ Haiku model works reliably with cost savings
✅ Clean Kubernetes event messages without markdown
✅ No redundant stripping operations
✅ Works correctly in both success and error cases

I have:

Signed-off-by: Yury Tsarev <yury@upbound.io>
Signed-off-by: Yury Tsarev <yury@upbound.io>
Signed-off-by: Yury Tsarev <yury@upbound.io>
Signed-off-by: Yury Tsarev <yury@upbound.io>
@ytsarev ytsarev requested a review from tnthornton November 19, 2025 23:11
Copy link
Member

@tnthornton tnthornton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ytsarev ! The main question I have is if we really need to rely on string parsing (which can be pretty brittle) or if there's a path for structured processing of the errors 👍

Comment on lines +84 to +85
port: 8080
protocol: "TCP"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet this continues to flake.

Copy link
Member Author

@ytsarev ytsarev Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tnthornton most probably, but it was stable during last 4 RC releases https://github.com/upbound/function-claude/actions/workflows/ci.yml over this branch

Comment on lines +243 to +260
// extractJSONFromAgentError attempts to extract JSON from langchaingo agent parsing errors.
// When the agent framework fails to parse output, it returns an error containing the raw output.
// This function extracts and cleans that output for a second parsing attempt.
// Returns the extracted content and true if extraction succeeded, empty string and false otherwise.
func extractJSONFromAgentError(err error) (string, bool) {
if err == nil {
return "", false
}
if !strings.Contains(err.Error(), "unable to parse agent output") {
return "", false
}
parts := strings.SplitN(err.Error(), "unable to parse agent output: ", 2)
if len(parts) != 2 {
return "", false
}
cleaned := stripMarkdownCodeBlocks(parts[1])
return cleaned, true
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possible path where we don't have to do string parsing? e.g. are the errors structured in any way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I checked the langchaingo source code to verify.

langchaingo uses plain sentinel errors without structured fields.

  1. Error definition (agents/errors.go:19):
ErrUnableToParseOutput = errors.New("unable to parse agent output")
  1. Error wrapping with output (https://github.com/tmc/langchaingo/blob/main/agents/conversational.go#L157):
  return nil, nil, fmt.Errorf("%w: %s", ErrUnableToParseOutput, output)

The error wraps the sentinel with fmt.Errorf and appends the raw output as a string. There's no custom error type with accessible fields - just a formatted string.

No structured alternative exists - ParserErrorHandler in the same file is for handling/formatting errors, not for accessing the output.

String parsing via strings.SplitN() is currently the only way to extract the embedded output. Our extractJSONFromAgentError() function uses defensive pattern matching and is covered by 8 test cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks for the digging into it deeper and for adding the test coverage. Hypothetically if the strings change, the tests "pop" and we can account for it.

@ytsarev ytsarev requested a review from tnthornton November 20, 2025 12:16
@tnthornton tnthornton merged commit abc2d82 into main Nov 20, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants