Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using eBPF and predefined inspections to minimize "observability tax" #234

Open
xluffy opened this issue Feb 29, 2024 · 0 comments
Open

Comments

@xluffy
Copy link
Owner

xluffy commented Feb 29, 2024

1. The problem

Observability is a critical aspect of any infrastructure, as it allows teams to identify and troubleshoot issues quickly. However, making a system observable is not without its costs. It's quite a time- and resource-consuming process since it requires adding instrumentation into every application.

Let's see what the integration of an APM (Applications Performance Monitoring) tool into a system looks like in practice:

  • Integrate the appropriate APM SDK into each service
  • Enable request tracing in every ORM, database/queue/http/grpc client that is being used
  • Test and re-deploy every service
  • Figure out which dashboards can be helpful in troubleshooting and configure them
  • Define alerting rules for your metrics and/or traces to be notified when something goes wrong

My optimistic estimate is that it can take over 40 hours for an experienced engineer to instrument a system of 10 services. Keep in mind, however, that you will have to repeat most of these steps every time you run a new service. I think the term "observability tax" is particularly well-suited to describe the costs that companies have to incur in terms of time, resources, and effort if they want to maintain a high level of visibility into their infrastructures.

Additionally, it can be challenging to manually instrument all parts of a system due to the presence of third-party and legacy services. This can result in "blind spots" where certain parts of the system are not observable.

2. eBPF is revolutionizing security, networking, ... and 🎉observability🎉

eBPF (extended Berkeley Packet Filter) is a game-changing technology that can eliminate the need to manually instrument application code.

It allows users to attach custom programs to various parts of the Linux kernel, such as system calls, network functions, and tracepoints. Such eBPF programs can be used for a wide range of purposes, including networking, security, and observability.

For example, we can create an eBPF program that traces all HTTP requests made by a container, and this will be effective for any app running on the host, whether it is nginx, curl, or a Java application.

Given that the Linux kernel in most of its supported versions now offers at least minimal support for eBPF, we decided to create a real zero-instrumentation observability tool based on it.

Gathering telemetry data, It uses eBPF for several purposes, such as:

  • Container discovery: the agent traces the task_newtask and sched_process_exit tracepoints to discover the containers that are running on a node
  • Tracing TCP connections: the sys_connect and inet_sock_set_state tracepoints allow the agent to discover TCP connections and LISTEN sockets of the container
  • Tracing application layer protocol requests: the agent follows the sys_write/writev/sendto and sys_read/readv/recvfrom tracepoints to trace the requests that the containers make to other services using application layer protocols such as HTTP, Postgres, and Redis.

https://coroot.com/blog/minimizing-observability-tax

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant