Skip to content

feat: support JSON-RPC 2.0 over http in golang's uprobe #161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

titaneric
Copy link

@titaneric titaneric commented Jun 14, 2025

I try to capture http request body by introducing new hook for net/http.(*body).Read function.
The body struct is initialized in the readTransfer in readRequest.

readTransfer prepare the body reader, and body's io.Reader implementation is here.

Note that I capture partial http request body and retrieve the Content-Type in bpf, and send them into userspace.

Test JSON-RPC code:
package main

import (
    "io"
    "log"
    "net"
    "net/http"
    "net/rpc"
    "net/rpc/jsonrpc"
)

// Args defines the arguments for the RPC methods.
type Args struct {
    A, B int
}

// Arith provides methods for arithmetic operations.
type Arith struct{}

// Multiply multiplies two numbers and returns the result.
func (t *Arith) Multiply(args *Args, reply *int) error {
    *reply = args.A * args.B
    return nil
}

// ReadWriteCloserWrapper wraps an io.Reader and io.Writer to implement io.ReadWriteCloser.
type ReadWriteCloserWrapper struct {
    io.Reader
    io.Writer
}
// Close is a no-op to satisfy the io.ReadWriteCloser interface.
func (w *ReadWriteCloserWrapper) Close() error {
    return nil
}
func main() {
    // Register the Arith service.
    arith := new(Arith)
    rpc.Register(arith)

    // Set up an HTTP handler for JSON-RPC.
    http.HandleFunc("/jsonrpc", func(w http.ResponseWriter, r *http.Request) {
        if r.Method != http.MethodPost {
            http.Error(w, "Only POST method is allowed", http.StatusMethodNotAllowed)
            return
        }
	// Wrap the request body and response writer in a ReadWriteCloser.
        conn := &ReadWriteCloserWrapper{Reader: r.Body, Writer: w}
        // Serve the request using JSON-RPC codec.
        rpc.ServeCodec(jsonrpc.NewServerCodec(conn))
    })

    // Start the HTTP server.
    go func() {
        log.Println("JSON-RPC server is listening on HTTP port 8080...")
        if err := http.ListenAndServe(":8080", nil); err != nil {
            log.Fatalf("Error starting HTTP server: %v", err)
        }
    }()

    // Optionally, start a TCP server as well.
    listener, err := net.Listen("tcp", ":1234")
    if err != nil {
        log.Fatalf("Error starting TCP server: %v", err)
    }
    log.Println("JSON-RPC server is listening on TCP port 1234...")

    for {
        conn, err := listener.Accept()
        if err != nil {
            log.Printf("Connection error: %v", err)
            continue
        }
        // Serve the connection using JSON-RPC.
        go rpc.ServeCodec(jsonrpc.NewServerCodec(conn))
    }
}

Sample request

curl -X POST http://localhost:8080/jsonrpc \
           -H "Content-Type: application/json" \
           -d '{
             "jsonrpc": "2.0",
             "method": "Arith.Multiply",
             "params": [{"A": 7, "B": 8}],
             "id": 1
           }'

Output:

make dev
sudo -E ./bin/ebpf-instrument --config ~/beyla-config.yaml
time=2025-06-15T12:19:40.682Z level=DEBUG msg="found relevant function for instrumentation" component=goexec.instructions function=net/http.(*conn).serve offsets="{Start:2521664 Returns:[2522012 2522356 2522372 2522388 2523316 2523332 2523412 2523428 2523452 2523860 2524116 2524228 2524444 2524564 2525716 2525856]}"
time=2025-06-15T12:19:40.682Z level=DEBUG msg="found relevant function for instrumentation" component=goexec.instructions function=net/http.(*body).Read offsets="{Start:2567744 Returns:[2568020 2568076 2568104]}"
time=2025-06-15T12:19:40.682Z level=DEBUG msg="found relevant function for instrumentation" component=goexec.instructions function=net/http.serverHandler.ServeHTTP offsets="{Start:2581872 Returns:[2582068]}"
time=2025-06-15T12:19:40.682Z level=DEBUG msg="Can't read all offsets from DWARF info. Checking in prefetched database" component=goexec.structMemberOffsets
time=2025-06-15T12:19:40.683Z level=DEBUG msg="Go version detected" component=goexec.structMemberOffsets version=1.24.2
...
2025-06-15 12:19:53.615121953 (504.832µs[494.916µs]) HTTP 200 Arith.M /jsonrpc [::1 as ::1:38966]->[::1 as ::1:8080] contentLen:140B responseLen:0B svc=[server go] traceparent=[00-dea07b178064c9d2d08b9856b20bdb8c-b2bf279cc2145e9f[0000000000000000]-01]
time=2025-06-15T12:19:53.925Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[Arith.M 200 /* titan:54337 server ::1 8080 server ]"
time=2025-06-15T12:19:53.925Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[Arith.M 200 /* titan:54337 server ::1 8080 server ]"
time=2025-06-15T12:19:53.925Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[Arith.M 200 /* titan:54337 server ::1 8080 server ]"
open_port: 1234,8080
trace_printer: text
log_level: DEBUG
ebpf:
  bpf_debug: true
  enable_context_propagation: true
  track_request_headers: true
  traffic_control_l7_context_propagation: true
  protocol_debug_print: true
routes:
    unmatched: heuristic
prometheus_export:
  port: 9090
  path: /metrics
otel_traces_export:
  endpoint: http://localhost:4318

I hope that present direction is correct for the implementation. I would like to hear more discussion and feedback, and I would continue the work for response body capturing, error code handling, etc.

P.S. The patch is came from grafana/beyla#2008

@titaneric titaneric requested a review from a team as a code owner June 14, 2025 13:32
@titaneric titaneric marked this pull request as draft June 14, 2025 13:33
Copy link

codecov bot commented Jun 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 63.27%. Comparing base (cc30f28) to head (91bc4cb).
Report is 6 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (cc30f28) and HEAD (91bc4cb). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (cc30f28) HEAD (91bc4cb)
k8s-integration-test 1 0
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #161      +/-   ##
==========================================
- Coverage   71.02%   63.27%   -7.75%     
==========================================
  Files         177      178       +1     
  Lines       19121    19161      +40     
==========================================
- Hits        13580    12125    -1455     
- Misses       4815     6355    +1540     
+ Partials      726      681      -45     
Flag Coverage Δ
integration-test-arm 35.81% <100.00%> (+0.04%) ⬆️
k8s-integration-test ?
oats-test 36.37% <100.00%> (+0.16%) ⬆️
unittests 45.94% <0.00%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@grcevski
Copy link
Contributor

Thanks for this @titaneric !

I checked the 5.15 kernel failure and it appears that there's some code that the old verifier complains about:

autoinstrumenter-1      | ; bpf_dbg_printk("=== uprobe/proc readContinuedLineSlice returns === ");
autoinstrumenter-1      | 24: (05) goto pc+101
autoinstrumenter-1      | ; bpf_dbg_printk("=== uprobe/proc readContinuedLineSlice returns === ");
autoinstrumenter-1      | 126: (85) call bpf_get_current_pid_tgid#14
autoinstrumenter-1      | 127: (bf) r8 = r0
autoinstrumenter-1      | 128: (bf) r1 = r6
autoinstrumenter-1      | 129: (07) r1 += 88
autoinstrumenter-1      | 130: (b7) r2 = 20
autoinstrumenter-1      | 131: (85) call bpf_get_current_comm#16
autoinstrumenter-1      |  R0_w=inv(id=3) R1_w=mem(id=0,ref_obj_id=2,off=88,imm=0) R2_w=inv20 R6=mem(id=0,ref_obj_id=2,off=0,imm=0) R7=ctx(id=0,off=0,imm=0) R8_w=inv(id=3) R10=fp0 refs=2
autoinstrumenter-1      | last_idx 131 first_idx 24
autoinstrumenter-1      | regs=4 stack=0 before 130: (b7) r2 = 20
autoinstrumenter-1      | 132: (77) r8 >>= 32
autoinstrumenter-1      | 133: (7b) *(u64 *)(r6 +0) = r8
autoinstrumenter-1      |  R0_w=inv(id=0) R6=mem(id=0,ref_obj_id=2,off=0,imm=0) R7=ctx(id=0,off=0,imm=0) R8_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 refs=2
autoinstrumenter-1      | 134: (bf) r1 = r6
autoinstrumenter-1      | 135: (b7) r2 = 0
autoinstrumenter-1      | 136: (85) call bpf_ringbuf_submit#132
autoinstrumenter-1      | ; void *goroutine_addr = GOROUTINE_PTR(ctx);
autoinstrumenter-1      | 137: (79) r6 = *(u64 *)(r7 +8)
autoinstrumenter-1      | ; u64 len = (u64)GO_PARAM2(ctx);
autoinstrumenter-1      | 138: (79) r9 = *(u64 *)(r7 +40)
autoinstrumenter-1      | 139: (b7) r1 = 100
autoinstrumenter-1      | 140: (bf) r2 = r9
autoinstrumenter-1      | ; if (len > sizeof(temp)) {
autoinstrumenter-1      | 141: (2d) if r1 > r9 goto pc+1
autoinstrumenter-1      | 
autoinstrumenter-1      | from 141 to 143: R1=inv100 R2=inv(id=4,umax_value=99,var_off=(0x0; 0x7f)) R6=inv(id=0) R7=ctx(id=0,off=0,imm=0) R8=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R9=inv(id=4,umax_value=99,var_off=(0x0; 0x7f)) R10=fp0
autoinstrumenter-1      | ; u8 *buf = (u8 *)GO_PARAM1(ctx);
autoinstrumenter-1      | time=2025-06-16T19:23:40.233Z level=WARN msg="couldn't load tracer" component=ebpf.ProcessTracer error="loading and assigning BPF objects: field BeylaUprobeReadContinuedLineSliceReturns: program beyla_uprobe_readContinuedLineSliceReturns: load program: permission denied: invalid indirect read from stack R1 off -108+0 size 99 (74 line(s) omitted)" required=true
autoinstrumenter-1      | 143: (79) r3 = *(u64 *)(r7 +80)
BeylaUprobeReadContinuedLineSliceReturns: program beyla_uprobe_readContinuedLineSliceReturns: load program: permission denied: invalid indirect read from stack R1 off -108+0 size 99 (74 line(s) omitted)"
autoinstrumenter-1      | 144: (bf) r1 = r10
autoinstrumenter-1      | 145: (07) r1 += -108
autoinstrumenter-1      | ; bpf_probe_read(temp, len, buf);
autoinstrumenter-1      | 146: (85) call bpf_probe_read#4
autoinstrumenter-1      | invalid indirect read from stack R1 off -108+0 size 99

@grcevski
Copy link
Contributor

If you need help debugging this let us know.

@titaneric
Copy link
Author

Ohh, I would check it in the old kernel version. I develop this in relatively new version.

❯ uname -a
Linux titan 6.11.0-26-generic #26~24.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr 17 19:41:24 UTC 2 aarch64 aarch64 aarch64 GNU/Linux

@titaneric titaneric force-pushed the feat/otel-bpf-patch-4 branch 2 times, most recently from 7799536 to 2bdba15 Compare June 17, 2025 15:01
@titaneric
Copy link
Author

ok, I could reproduce this verifier error ran on Ubuntu 22.04, and I could try to resolve it.

titan@titan:~/sources/opentelemetry-ebpf-instrumentation$ uname -a
Linux titan 5.15.0-141-generic #151-Ubuntu SMP Sun May 18 21:36:08 UTC 2025 aarch64 aarch64 aarch64 GNU/Linux
titan@titan:~/sources/opentelemetry-ebpf-instrumentation$ 

@titaneric
Copy link
Author

titaneric commented Jun 17, 2025

I fix it in kernel 5.15, and it also works on newer kernel such as 6.11 as well!

time=2025-06-17T15:57:48.258Z level=DEBUG msg="submitting traces on timeout" component=ringbuf.Tracer len=1
time=2025-06-17T15:57:48.260Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues=[go]
2025-06-17 15:57:47.61735747 (1.114756ms[1.102506ms]) HTTP 200 Arith.M /jsonrpc [192.168.106.5 as 192.168.106.5:60376]->[192.168.106.6 as 192.168.106.6:8080] contentLen:140B responseLen:0B svc=[server go] traceparent=[00-dc8cf5f20469a22fa08442e737a60e05-2e0d40b8ea61b0c6[0000000000000000]-01]

If you find any verifier error on different kernel, please let me know!

Copy link
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments - thank you again for sticking with this - it's great work! I hope you don't mind my multiple passes, it's a big PR and it's difficult to catch everything in one go.

@rafaelroquetto
Copy link
Contributor

@titaneric do you think it would be possible to add an integration test as well? Similar to the example you mentioned in the review description. Happy to help with that as well if you prefer to share the burden.

@titaneric
Copy link
Author

Yes! I would like to add the integration test as well.

@titaneric
Copy link
Author

titaneric commented Jun 18, 2025

I have hit verifier error (max instrution limit) in kernel 5.15 in 2976d5e . I would try to resolve it first and update everything you suggest and test in both kernel.

@rafaelroquetto
Copy link
Contributor

@titaneric whatever is easier for you - the max instruction limit error may warrant a tail call (feel free to paste the error - I checked the logs for the failing VM 5.15 integration test but I couldn't spot it there - for some reason the otel collector appears to be crashing?)

@titaneric
Copy link
Author

Thank you! The verifier error on kernel 5.15 is attached as follows.
beyla-ebpf-debug.log

@rafaelroquetto
Copy link
Contributor

Thanks @titaneric - yeah it seems we will have to tailcall. The fact that it didn't happen earlier is sheer luck - the compiler is generating a number of instructions that sits close to the allowed limit - so any changes in the code end up triggering it.

If you need a reference implementation, have a look at:

If you need help with this, let me know and I can try and have a go at it tomorrow.

@titaneric
Copy link
Author

titaneric commented Jun 19, 2025

I approximately know how tail call work, the preparation of PROG_ARRAY, and calling.
Since the root cause seems that it came from beyla_uprobe_bodyReadReturn function. I think we might need to wrap the following code snippet into tail call?

            if (read_go_str_n(
                    "http body", (void *)invocation->body_addr, n, body_buf, sizeof(body_buf))) {
                bpf_dbg_printk("body is %s", body_buf);
                if (is_jsonrpc2_body((const unsigned char *)body_buf, sizeof(body_buf))) {
                    char method_buf[JSONRPC_METHOD_BUF_SIZE] = {};
                    u32 method_len = extract_jsonrpc2_method(
                        (const unsigned char *)body_buf, sizeof(body_buf), method_buf);
                    if (method_len > 0) {
                        bpf_dbg_printk("JSON-RPC method: %s", method_buf);
                        read_go_str_n("JSON-RPC method",
                                      (void *)method_buf,
                                      method_len,
                                      invocation->method,
                                      sizeof(invocation->method));
                    }
                }
            }

@rafaelroquetto
Copy link
Contributor

Thank you for your suggestion. I could try to study tail call first.

My pleasure - just ping me if you need anything, I'm looking forward to merging this change :)

@titaneric
Copy link
Author

It does not work by simply adopting the tail code titaneric@8b5e12f , I might need some time to investigate it in the tomorrow.

@rafaelroquetto
Copy link
Contributor

@titaneric no worries - feel free to share updated logs, or the actual kernel version you are testing if you want me to give it a go as well

if (read_go_str_n(
"http body", (void *)invocation->body_addr, n, body_buf, sizeof(body_buf))) {
bpf_dbg_printk("body is %s", body_buf);
if (is_jsonrpc2_body((const unsigned char *)body_buf, sizeof(body_buf))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can try to tail here instead - i.e.

if (!is_json_content_type(...)) {
   return ...
 }

if (!read_go_str_n()) {
   return ...
}}

// all good, tail

bpf_tailcall_....

this will split the program complexity into two different bpf programs

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolved in b3dc7d4

Copy link
Contributor

@rafaelroquetto rafaelroquetto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your changes @titaneric - let me know if you need help rebasing main (there's a conflict) - we can then re-trigger the tests, and everything looking good, go ahead and merge this! :)

@titaneric titaneric force-pushed the feat/otel-bpf-patch-4 branch from 3d5c004 to b3dc7d4 Compare June 20, 2025 19:42
@titaneric
Copy link
Author

Thank you for your information, I would try it in the tomorrow!
Also, I have tried the tail call and allocate body memory from bpf map in b3dc7d4. It works in kernel 6.11 but not in 5.15 (max instruction limit, again).
bpf-tail-code.log

// get content-type from readContinuedLineSlice
if (header_inv && header_inv->content_type[0]) {
bpf_dbg_printk("Found content type in ongoing request: %s", header_inv->content_type);
// __builtin_memcpy(invocation.content_type,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we copy the previously set content type into new invocation?

@titaneric titaneric marked this pull request as ready for review June 22, 2025 08:26
@titaneric titaneric force-pushed the feat/otel-bpf-patch-4 branch from d8a395e to eb938aa Compare June 22, 2025 08:55
@titaneric
Copy link
Author

titaneric commented Jun 22, 2025

I run the mcp-go server example, and it works!

go run server.go -t http

with mcp-go client example

go run client.go --http http://localhost:8080/mcp

Result log:
mcp-server.log

Partial result:

2025-06-22 10:06:38.62210638 (4.441337ms[4.024504ms]) HTTP 200 initial /mcp [127.0.0.1 as 127.0.0.1:36896]->[127.0.0.1 as 127.0.0.1:8080] contentLen:177B responseLen:0B svc=[server go] traceparent=[00-9c70703095a7979a788673a8cdf70c73-c6dfbb050b503ca4[0000000000000000]-01]
2025-06-22 10:06:38.62210638 (644.613µs[637.693µs]) HTTP 202 notific /mcp [127.0.0.1 as 127.0.0.1:36896]->[127.0.0.1 as 127.0.0.1:8080] contentLen:66B responseLen:0B svc=[server go] traceparent=[00-1d02d1b36f1bfb82b72ddeb969dd9953-2ea1990ceb7144c7[0000000000000000]-01]
2025-06-22 10:06:38.62210638 (1.711144ms[1.69105ms]) HTTP 200 tools/l /mcp [127.0.0.1 as 127.0.0.1:36896]->[127.0.0.1 as 127.0.0.1:8080] contentLen:58B responseLen:0B svc=[server go] traceparent=[00-32505e9e074d9e312ed0026e638703fa-138d73580aa4cb49[0000000000000000]-01]
2025-06-22 10:06:38.62210638 (431.006µs[427.963µs]) HTTP 200 resourc /mcp [127.0.0.1 as 127.0.0.1:36896]->[127.0.0.1 as 127.0.0.1:8080] contentLen:62B responseLen:0B svc=[server go] traceparent=[00-0f0c9828e70b90469d5e0517c6734e66-a2fc7d723c9249b1[0000000000000000]-01]
time=2025-06-22T10:06:39.568Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues=[go]
time=2025-06-22T10:06:39.570Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[initial 200 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[initial 200 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[initial 200 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[notific 202 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[notific 202 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[notific 202 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[tools/l 200 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[tools/l 200 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[tools/l 200 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[resourc 200 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[resourc 200 /* titan:378356 server 127.0.0.1 8080 server ]"
time=2025-06-22T10:06:39.573Z level=DEBUG msg="storing new metric label set" component=prom.Expirer labelValues="[resourc 200 /* titan:378356 server 127.0.0.1 8080 server ]"

F.Y.I. MCP spec

@titaneric
Copy link
Author

@rafaelroquetto, I have added the tail call, stack-based array replacement, and integration test. Please take your time to review it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants