-
Notifications
You must be signed in to change notification settings - Fork 207
[DO NOT MERGE] Ringbuf redux #1687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
dave-tucker
wants to merge
2,372
commits into
sustainable-computing-io:main
from
dave-tucker:ringbuf-redux
Closed
[DO NOT MERGE] Ringbuf redux #1687
dave-tucker
wants to merge
2,372
commits into
sustainable-computing-io:main
from
dave-tucker:ringbuf-redux
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Kaiyi <kaiyiliu21@gmail.com>
sustainable-computing-io#1458) Signed-off-by: Vimal Kumar <vimal78@gmail.com>
Signed-off-by: sustainable-computing-bot <bot@sustainable-computing.io>
removes debugging logging from GPU metrics Signed-off-by: Vibhu Prashar <vibhu.sharma2929@gmail.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
…d-verbose fix: remove logging while collecting GPU metrics
feat(ci)add equinix metal instance to CI
Signed-off-by: sustainable-computing-bot <bot@sustainable-computing.io>
…ing-io#1463) Signed-off-by: Sam Yuan <yy19902439@126.com>
…able-computing-io#1462) Signed-off-by: Huamin Chen <hchen@redhat.com>
This PR: * Corrects the query for `bpf_cpu_time_ms_total` metric * Fixes the display issue for panels showing CPU time, cache miss, CPU cycles and instructions Signed-off-by: Vibhu Prashar <vibhu.sharma2929@gmail.com>
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
…x-dev-dash fix(dev-dashboard): update and correct metrics in dev dashboard
Signed-off-by: sustainable-computing-bot <bot@sustainable-computing.io>
Signed-off-by: Sunyanan Choochotkaew <sunyanan.choochotkaew1@ibm.com>
This commit simplifies the containerID from Cgroup path parsing logic. All container IDs from all supported runtimes (including podman) are 64 character alphanumeric strings. As such we can simply strip the `.scope` suffix and take the last 64 characters. Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Huamin Chen <hchen@redhat.com>
…puting-io/validator chore(validator)update validator usage; remove job from prom query
…g-io#1473) This commit adds a `label` - `version` to `kepler_exporter_build_info` metrics to expose the version of kepler installed on the machine / node. Fixes: sustainable-computing-io#1449 Signed-off-by: Sunil Thaha <sthaha@redhat.com>
* add vm name option to validator Signed-off-by: Huamin Chen <hchen@redhat.com> * validator: externalize the query Signed-off-by: Huamin Chen <hchen@redhat.com> --------- Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
…puting-io/build-fix make: quote ldflags
…-io#1477) Previously only a `MapSize` (lesser than size of processs) of entries used to be copied from processes bpf map. The right use of a batch size less than the size of map is as follows: Ref: https://lwn.net/Articles/797808/ ``` p_key = NULL; p_next_key = &key; while (true) { err = bpf_map_lookup_batch(fd, p_key, &p_next_key, keys, values, &batch_size, elem_flags, flags); if (err) ... if (p_next_key) break; // done if (!p_key) p_key = p_next_key; } ``` This PR fixes the issue by creating a entries byte array with the same size as the `processes` map. Additionally, this commit makes use of `bytes.Reader` instead of bytes.Buffer to avoid unnecessary copy of ephemeral byte array. Signed-off-by: Sunil Thaha <sthaha@redhat.com>
…ainable-computing-io#1479) Signed-off-by: marceloamaral <marcelo.amaral1@ibm.com>
Signed-off-by: sustainable-computing-bot <bot@sustainable-computing.io>
…nable-computing-io#1461) * Adding New Metric Cases to Case module Signed-off-by: Kaiyi <kaiyiliu21@gmail.com> * WIP: Add Test Cases for Prometheus, Cases, and Stresser I included basic pytest test cases for pure methods used for producing error metrics from prometheus response data, cleaning data from prometheus response data, and generating prom queries for validation. I will also include basic test cases for stresser related methods like for connecting to vm, copying files to vm, and running the scripts for vm. The purpose of these test cases are to ensure the code made during PRs do not break the functionality of the validator. Reported-by: Kaiyi <kaliu@redhat.com> Signed-off-by: Kaiyi <kaiyiliu21@gmail.com> * Include additional test cases for prometheus and stresser. Included fixtures for cases, prometheus, and stresser. Included additional test cases for prometheus and stresser. Signed-off-by: Kaiyi <kaiyiliu21@gmail.com> * Skip Cases test Due to previous changes to the Cases API, this test case needs to be updated in future. Signed-off-by: Kaiyi <kaiyiliu21@gmail.com> --------- Signed-off-by: Kaiyi <kaiyiliu21@gmail.com>
…ing gpu metrics (sustainable-computing-io#1480) Signed-off-by: marceloamaral <marcelo.amaral1@ibm.com>
Signed-off-by: sustainable-computing-bot <bot@sustainable-computing.io>
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
…unused-config cleanup(config): remove unused kernel src dir
The resolves pid 0 to system_processes without (throwing) any error since there is no command associated with it. This commit fixes the following log I0802 03:24:03.647656 733341 process_bpf_collector.go:100] failed to resolve comm for PID 0: process not running: stat /proc/0: no such file or directory, set comm=system_processes I0802 03:24:03.648079 733341 process_bpf_collector.go:100] failed to resolve comm for PID 0: process not running, set comm=system_processes 0802 process_bpf_collector.go:100] failed to resolve comm for PID 0: process not running: stat /proc/0: no such file or directory, set comm=system_processes ... I0802 process_bpf_collector.go:100] failed to resolve comm for PID 0: process not running, set comm=system_processes Signed-off-by: Sunil Thaha <sthaha@redhat.com>
…lve-pid-0 fix: resolve pid 0 to system_processes
…#1671) Processing events in the same goroutine as the ring buffer reader requires acquiring a mutex, which blocks ringbuf event processing causing a backlog. To avoid this, send events via a buffered channel to a dedicated event processing goroutine to ensure that the ringbuf remains unblocked. This has decreased CPU load from 1-3% on my machine to 0-1% CPU load. Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Bumps the go-dependencies group with 2 updates: [github.com/onsi/gomega](https://github.com/onsi/gomega) and [golang.org/x/sys](https://github.com/golang/sys). Updates `github.com/onsi/gomega` from 1.34.0 to 1.34.1 - [Release notes](https://github.com/onsi/gomega/releases) - [Changelog](https://github.com/onsi/gomega/blob/master/CHANGELOG.md) - [Commits](onsi/gomega@v1.34.0...v1.34.1) Updates `golang.org/x/sys` from 0.22.0 to 0.23.0 - [Commits](golang/sys@v0.22.0...v0.23.0) --- updated-dependencies: - dependency-name: github.com/onsi/gomega dependency-type: direct:production update-type: version-update:semver-patch dependency-group: go-dependencies - dependency-name: golang.org/x/sys dependency-type: direct:production update-type: version-update:semver-minor dependency-group: go-dependencies ... Signed-off-by: dependabot[bot] <support@github.com>
This commit fixes dashboards to show MAPE and MSE for the time range selected in the panel. Thus the dashboards now show the MSE and MAPE that the validator script would produce if it was run for the same time range. Signed-off-by: Sunil Thaha <sthaha@redhat.com>
This commit modifies validator in the following way 1. It now makes use of TestResult object to hold all information about the test, so that other formats (e.g. json) can be easily generated. 2. Modifies generation of Markdown to make use of the TestResult dataclass 3. Fixes prometheus timestamp based filter to include samples that are within the next scrape of the first series. 4. Markdown result now produces a summary table Signed-off-by: Sunil Thaha <sthaha@redhat.com>
…r-refactor chore(validator): store test result in an object
…puting-io/dependabot/go_modules/go-dependencies-258b52e974 build(deps): bump the go-dependencies group with 2 updates
Add a workflow that runs the pre-commit jobs to cover the case where a developer hasn't opted into pre-commit. Also add the conventional commit message hook and enable more golang linters. The commits also tidies up the markdown files that had line length issues. Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Huamin Chen <hchen@redhat.com>
check metrics before and after e2e tests
…uting-io#1671)" This reverts commit c99e399. Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This reverts commit cde7833. Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
This reverts commit ca4cdf6. Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
…ert-ringbuf Revert "Merge pull request sustainable-computing-io#1628 from dave-tucker/new-ebpf"
Signed-off-by: Sunil Thaha <sthaha@redhat.com>
…dator-node-info chore(validator): fix missing node-info in md
Signed-off-by: Anthony Harivel <aharivel@redhat.com>
By default the name chosen to define the Virtual Machine ID is the one found in /proc/<pid>/cgroup/. Many IaaS platform like Openstack use metadata in libvirt to customise among other things the instance's ID. Allow the user to use the metadata in libvirt to set the name of the vm metrics and gives a better user experience. Set "LIBVIRT_METADATA_URI" with the uri of the XML namespace identifier. Set "LIBVIRT_METADATA_TOKEN" to choose the string used for VM ID. By default, the token "name" is used. Signed-off-by: Anthony Harivel <aharivel@redhat.com>
…t-metadata Libvirt metadata vm_id usage
This uses a ring buffer to send events from eBPF back to userspace. Doing so allows for our eBPF probes to complete quickly, and pushes all delta calculation and summary back into userspace. Moves the logic that resolves comm from procfs into its own pkg. In addition, a cache is added to avoid hitting procfs every time we update process metrics. Removes the bpftest pkg and includes the test programs in the main kepler.bpf.c file. This prevents drift in the go generate flags and generally makes it easier to write tests. Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
The resolves pid 0 to system_processes without (throwing) any error since there is no command associated with it. This commit fixes the following log I0802 03:24:03.647656 733341 process_bpf_collector.go:100] failed to resolve comm for PID 0: process not running: stat /proc/0: no such file or directory, set comm=system_processes I0802 03:24:03.648079 733341 process_bpf_collector.go:100] failed to resolve comm for PID 0: process not running, set comm=system_processes 0802 process_bpf_collector.go:100] failed to resolve comm for PID 0: process not running: stat /proc/0: no such file or directory, set comm=system_processes ... I0802 process_bpf_collector.go:100] failed to resolve comm for PID 0: process not running, set comm=system_processes Signed-off-by: Sunil Thaha <sthaha@redhat.com>
…#1671) Processing events in the same goroutine as the ring buffer reader requires acquiring a mutex, which blocks ringbuf event processing causing a backlog. To avoid this, send events via a buffered channel to a dedicated event processing goroutine to ensure that the ringbuf remains unblocked. This has decreased CPU load from 1-3% on my machine to 0-1% CPU load. Signed-off-by: Dave Tucker <dave@dtucker.co.uk>
🤖 SeineSailor Here is a concise summary of the pull request changes: The "[DO NOT MERGE] Ringbuf redux" pull request reintroduces the ring buffer functionality to the codebase, affecting the build process, testing procedures, and internal implementation. Key modifications include:
Observations and suggestions for improvement:
|
should we start a new branch for this change? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Brings back the ringbuf work, just to see if it's something we want to continue with or not.