Skip to content

E2E tests for long_tcp_conns metrics and accesslogs #1330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

yp969803
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:
E2E tests for long_tcp_conns metrics and accesslogs
Which issue(s) this PR fixes:
Fixes #1322

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

@@ -0,0 +1,25 @@
//go:build integ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could put this test in the baseline directly, just like L4 telemetry test case, ref: https://github.com/kmesh-net/kmesh/blob/main/test/e2e/baseline_test.go#L780

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok @YaoZengzeng i will do that, thanks for the help

@yp969803
Copy link
Contributor Author

yp969803 commented Apr 24, 2025

No packages found for open file /home/yash/go/src/github.com/kmesh-net/kmesh/test/e2e/restart_test.go

getting these warnings for all the files in e2e directory in vs-code, also the go-linting and code-navigation Is not working in e2e directory. @YaoZengzeng

@yp969803
Copy link
Contributor Author

//go:build integ
// +build integ

i think these comments at the above are the reason, what they do

@yp969803
Copy link
Contributor Author

yp969803 commented Apr 24, 2025

ok got it, they are required to separate unit and integration tests

@yp969803
Copy link
Contributor Author

@YaoZengzeng i want to deploy

apiVersion: v1
kind: Pod
metadata:
  name: ws-server
  labels:
    app: ws-server
spec:
  containers:
  - name: ws-server
    image: node:18
    command: ["/bin/sh", "-c"]
    args:
      - |
        mkdir server 
        cd server 
        npm init -y
        npm install ws
        cat << EOF > server.js
        const WebSocket = require('ws');
        const server = new WebSocket.Server({ port: 8080 });
        server.on('listening', () => {
          console.log('🚀 WebSocket server started on ws://localhost:8080');
        });

        server.on('connection', socket => {
          console.log('Client connected');
          
          // Send data every 1 seconds
          setInterval(() => {
            socket.send(`Hello world`);
            console.log('Sent message to client');
          }, 1000);

          socket.on('message', message => {
            console.log(`Received`);
          });

          socket.on('close', () => {
            console.log('Client disconnected');
          });
        });
        EOF
        node server.js

    ports:
    - containerPort: 8080

---
apiVersion: v1
kind: Service
metadata:
  name: ws-server
  labels:
    app: ws-server
spec:
  selector:
    app: ws-server
  ports:
  - protocol: TCP
    port: 8080        # Port exposed by the service
    targetPort: 8080  # Port the server container listens on

--- 

apiVersion: v1
kind: Pod
metadata:
  name: ws-client
  labels:
    app: ws-client
spec:
  containers:
  - name: ws-client
    image: node:18
    command: ["/bin/sh", "-c"]
    args:
      - |
        mkdir client
        cd client
        npm init -y
        npm install ws 
        cat <<EOF > client.js
        const WebSocket = require('ws');
        const socket = new WebSocket('ws://ws-server.default.svc.cluster.local:8080');

        socket.on('open', () => {
          console.log('Connected to server');
          socket.send('Hello from client');
          console.log('Sent message to server');
        });

        socket.on('message', data => {
          console.log(`Received`);

        });

        socket.on('close', () => {
          console.log('Connection closed');
        });
        EOF
        node client.js

for long_conn testing, how can I do this

@YaoZengzeng
Copy link
Member

You should use the deployed test application as in other test cases.

Deploy ref: https://github.com/kmesh-net/kmesh/blob/main/test/e2e/main_test.go#L146

Using method ref: https://github.com/kmesh-net/kmesh/blob/main/test/e2e/baseline_test.go#L780

It seems that there is no direct way to specify a long connection, but maybe we can simulate it by specifying a number of reqs and delay of each req, ref: https://github.com/istio/istio/blob/master/tests/integration/ambient/baseline_test.go#L3316

We use the same e2e testing framework as istio. You can check if there are more use cases to ref in it.

@yp969803

@yp969803
Copy link
Contributor Author

maybe we can simulate it by specifying a number of reqs and delay of each req

there should be only one single request which last for long time and should exchange data

@yp969803
Copy link
Contributor Author

is it possible to deploy custom images in istio test framework @YaoZengzeng

@kmesh-bot kmesh-bot added size/L and removed size/S labels Apr 25, 2025
@yp969803
Copy link
Contributor Author

@YaoZengzeng can u check my approach

@yp969803
Copy link
Contributor Author

@YaoZengzeng @hzxuzhonghu can u review the test

@yp969803 yp969803 force-pushed the issue1322 branch 5 times, most recently from 2ee32da to 18c4f29 Compare April 26, 2025 04:31
@yp969803
Copy link
Contributor Author


RUN   TestLongConnL4Telemetry
2025-04-26T04:43:17.817076Z	info	tf	=== BEGIN: Test: '_home_runner_work_kmesh_kmesh_test_e2e[TestLongConnL4Telemetry]' ===
2025-04-26T04:43:17.824679Z	info	tf	Checking pods ready...
2025-04-26T04:43:17.824699Z	info	tf	Checking pods ready...
2025-04-26T04:43:17.828047Z	info	tf	Failed retrieving pods: no matching pod found for selectors: [gateway.networking.k8s.io/gateway-name=namespace-waypoint]
2025-04-26T04:43:18.029903Z	info	tf	Checking pods ready...
2025-04-26T04:43:18.029936Z	info	tf	Checking pods ready...
2025-04-26T04:43:18.034353Z	info	tf	  [ 0]           namespace-waypoint-7c4db7565c-6z484         Pending (Pending)
2025-04-26T04:43:18.435639Z	info	tf	Checking pods ready...
2025-04-26T04:43:18.435670Z	info	tf	Checking pods ready...
2025-04-26T04:43:18.441857Z	info	tf	  [ 0]           namespace-waypoint-7c4db7565c-6z484         Pending (Pending)
2025-04-26T04:43:19.243002Z	info	tf	Checking pods ready...
2025-04-26T04:43:19.243031Z	info	tf	Checking pods ready...
2025-04-26T04:43:19.247699Z	info	tf	  [ 0]           namespace-waypoint-7c4db7565c-6z484         Pending (Pending)
2025-04-26T04:43:20.849662Z	info	tf	Checking pods ready...
2025-04-26T04:43:20.849758Z	info	tf	Checking pods ready...
2025-04-26T04:43:20.853779Z	info	tf	  [ 0]           namespace-waypoint-7c4db7565c-6z484         Running (Ready)
2025-04-26T04:43:21.111652Z	info	tf	Checking pods ready...
2025-04-26T04:43:21.111681Z	info	tf	Checking pods ready...
2025-04-26T04:43:21.117641Z	warn	tf	More than one pod found matching selectors: []
2025-04-26T04:43:21.117669Z	info	tf	  [ 0]         enrolled-to-kmesh-v1-5c887bf547-rr7pg         Running (Ready)
2025-04-26T04:43:21.216875Z	info	tf	Checking pods ready...
2025-04-26T04:43:21.216906Z	info	tf	Checking pods ready...
2025-04-26T04:43:21.[2246](https://github.com/kmesh-net/kmesh/actions/runs/14677740528/job/41196441567?pr=1330#step:4:2247)77Z	warn	tf	More than one pod found matching selectors: []
2025-04-26T04:43:21.[2247](https://github.com/kmesh-net/kmesh/actions/runs/14677740528/job/41196441567?pr=1330#step:4:2248)01Z	info	tf	  [ 0]         enrolled-to-kmesh-v1-5c887bf547-rr7pg         Running (Ready)
    baseline_test.go:1008: prometheus query: prometheus.Query{Metric:"kmesh_tcp_workload_sent_bytes_total", Aggregation:"", Labels:map[string]string{"destination_app":"ws-server", "destination_pod_name":"ws-server", "reporter":"destination", "source_app":"ws-client", "source_workload":"ws-client"}}
    baseline_test.go:1012: could not query for traffic from ws-client to ws-server: value not found (query: kmesh_tcp_workload_sent_bytes_total{destination_app="ws-server",destination_pod_name="ws-server",reporter="destination",source_app="ws-client",source_workload="ws-client",})
    baseline_test.go:1012: could not query for traffic from ws-client to ws-server: value not found (query: kmesh_tcp_workload_sent_bytes_total{destination_app="ws-server",destination_pod_name="ws-server",reporter="destination",source_app="ws-client",source_workload="ws-client",})
    baseline_test.go:1012: could not query for traffic from ws-client to ws-server: value not found (query: kmesh_tcp_workload_sent_bytes_total{destination_app="ws-server",destination_pod_name="ws-server",reporter="destination",source_app="ws-client",source_workload="ws-client",})
    baseline_test.go:1012: could not query for traffic from ws-client to ws-server: value not found (query: kmesh_tcp_workload_sent_bytes_total{destination_app="ws-server",destination_pod_name="ws-server",reporter="destination",source_app="ws-client",source_workload="ws-client",})
    baseline_test.go:1022: no metrics found for kmesh_tcp_workload_sent_bytes_total{}
    baseline_test.go:1023: could not validate TCP long connection L4 telemetry for ws-client to ws-server: timeout while waiting after 4 attempts (last error: expected condition not met)
2025-04-26T04:43:56.227638Z	info	tf	=== DONE (failed):  Test: '_home_runner_work_kmesh_kmesh_test_e2e[TestLongConnL4Telemetry] (38.410559091s)' ===
    baseline_test.go:987: failed to delete ws-client: exit status 1
--- FAIL: TestLongConnL4Telemetry (38.45s)

@yp969803
Copy link
Contributor Author

@hzxuzhonghu can u review the test

Copy link

codecov bot commented Apr 28, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 45.82%. Comparing base (02a0c79) to head (a55aeab).
Report is 18 commits behind head on main.

see 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 43c2a71...a55aeab. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Timeout: time.Second,
Check: check.OK(),
To: localDst,
NewConnectionPerRequest: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stc.Logf("prometheus query: %#v", query)
err := retry.Until(func() bool {
stc.Logf("sending call from %q to %q", deployName(localSrc), localDst.Config().Service)
localSrc.CallOrFail(stc, opt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition this test wants to get metrics during the long connection process? However this function call is synchronous, and when it returns, the connection has been terminated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about running this function in another go routine ??

@yp969803
Copy link
Contributor Author

yp969803 commented Apr 28, 2025

@YaoZengzeng can u check now ??, i think now things can work

@yp969803
Copy link
Contributor Author

baseline_test.go:845: sending call from "enrolled-to-kmesh-v1" to "enrolled-to-kmesh"
    asm_amd64.s:1700: 2 errors occurred:
        	* failed calling enrolled-to-kmesh (cluster=cluster-0)->'tcp://enrolled-to-kmesh.echo-1-19476.svc.cluster.local:80': call failed from enrolled-to-kmesh (cluster=cluster-0) to tcp://enrolled-to-kmesh.echo-1-19476.svc.cluster.local:80 (using tcp): expected no error, but encountered rpc error: code = Unknown desc = 20/20 requests had errors; first error: expect to recv message with StatusCode=200, got [17] SourceIP=10.244.1.6
        [17] Url=tcp://enrolled-to-kmesh.echo-1-19476.svc.cluster.local:80
        [17 body] HTTP/1.1 400 Bad Request
        [17 body] Content-Type: text/plain; charset=utf-8
        [17 body] Connection: close
        [17 body] 
        [17 body] 400 Bad Request
        . Return EOF
        	* failed calling enrolled-to-kmesh (cluster=cluster-0)->'tcp://enrolled-to-kmesh.echo-1-19476.svc.cluster.local:80': call failed from enrolled-to-kmesh (cluster=cluster-0) to tcp://enrolled-to-kmesh.echo-1-19476.svc.cluster.local:80 (using tcp): expected no error, but encountered rpc error: code = Unknown desc = 20/20 requests had errors; first error: expect to recv message with StatusCode=200, got [9] SourceIP=10.244.1.8

@YaoZengzeng why this, I have used HTTP now

Port: echo.Port{Name: "http"},
Scheme: scheme.TCP,
Count: 20,
Timeout: time.Second,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be bigger, the delay is three seconds

localSrc := src
opt := echo.CallOptions{
Port: echo.Port{Name: "http"},
Scheme: scheme.TCP,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be HTTP as well, I think.

@yp969803
Copy link
Contributor Author

@YaoZengzeng done the changes

@yp969803
Copy link
Contributor Author

/retest

@yp969803 yp969803 force-pushed the issue1322 branch 4 times, most recently from 6ad1730 to a66aa0f Compare April 28, 2025 14:11
@yp969803
Copy link
Contributor Author

if i comment observe_data, then also tests are working fine, hence we have to change somthing

@yp969803
Copy link
Contributor Author

multiple request are send paralllely

@yp969803
Copy link
Contributor Author

i dont think it is reusing the socket

Check: check.OK(),
HTTP: echo.HTTP{Path: "/?delay=3s"},
To: localDst,
NewConnectionPerRequest: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmesh-bot kmesh-bot added size/L and removed size/M labels Apr 29, 2025
Signed-off-by: Yash Patel <yp969803@gmail.com>
@kmesh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign lec-bit for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kmesh-bot kmesh-bot added size/M and removed size/L labels May 22, 2025
@yp969803
Copy link
Contributor Author

yp969803 commented May 22, 2025

i want to run

kmeshctl monitoring --all enable

in this test how to do it @hzxuzhonghu @YaoZengzeng

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

E2E tests for lon_conn accesslogs and metrics
3 participants