New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probes falling back to proc parsing due to late eBPF events #2650

Closed
rade opened this Issue Jun 27, 2017 · 13 comments

Comments

Projects
None yet
3 participants
@rade
Member

rade commented Jun 27, 2017

We see this:

<probe> ERRO: 2017/06/27 05:02:25.336993 tcp tracer received event with timestamp 615263068049191 even though the last timestamp was 615263068049504. Stopping the eBPF tracker. 

after running probes for hours, even days.

The message is produced by the logic introduced in #2334, even though we are running Ubuntu with a 4.4.0-81 kernel, which should not exhibit the problem described in that issue.

@alban any idea how we might go about tracking this down?

Also, as a workaround, is there any downside to simply restarting all the eBPF tracking when getting this error, i.e. going through the same initialisation as when initially starting the probes, including the initial proc walk? We'd need to guard against "flapping", obviously.

@rade rade added the bug label Jun 27, 2017

@2opremio

This comment has been minimized.

Collaborator

2opremio commented Jun 27, 2017

@alban has suggested offline:

if it is reproducible, I would add logs to print the cpu number for both events..

I would try to reproduce an issue with the tcptracer-bpf test program (without scope) in the dev environment with your kernel, using a script like #2507 (comment) with different cpu# in different terminals

@2opremio

This comment has been minimized.

Collaborator

2opremio commented Jul 11, 2017

Could this be due to iovisor/gobpf#42 ?

@alban

This comment has been minimized.

Collaborator

alban commented Jul 12, 2017

@rade How many cpus do you have on the machine where the bug was produced?

I am currently trying to reproduce this on GCE with this config:

$ uname -r
4.4.0-81-generic
$ grep PRETTY /etc/os-release
PRETTY_NAME="Ubuntu 14.04.5 LTS"
$ cat /sys/devices/system/cpu/{possible,present,online}
0
0
0
@rade

This comment has been minimized.

Member

rade commented Jul 12, 2017

# uname -r
4.4.0-83-generic
# grep PRETTY /etc/os-release
PRETTY_NAME="Ubuntu 16.04.1 LTS"
# cat /sys/devices/system/cpu/{possible,present,online}
0-127
0-15
0-15
@rade

This comment has been minimized.

Member

rade commented Jul 12, 2017

FYI, on our dev cluster of seven machines, all the probes were started on 2017/06/29 21:51. Four of them are still running in ebpf mode now, the other three encountered a timestamp discrepancy at

  • 2017/06/30 07:50:49.966603
  • 2017/07/01 22:42:24.916713
  • 2017/07/04 02:03:31.517207

respectively.

So this error does appear to be quite rare.

@alban

This comment has been minimized.

Collaborator

alban commented Jul 12, 2017

I can reproduce the bug using tcptracer-bpf/tests/tracer in about 1 minute execution with the following:

n1-highcpu-16 (16 vCPUs, 14.4 GB memory)
$ uname -r
4.4.0-81-generic
$ grep PRETTY /etc/os-release
PRETTY_NAME="Ubuntu 14.04.5 LTS"
$ cat /sys/devices/system/cpu/{possible,present,online}
0-15
0-15
0-15

(I added more vCPUs)

I start nginx with sudo docker run -d nginx and run the test script with:

$ cat test/test.sh 
#!/bin/bash
cpu=$1
for i in $(seq 1 10000) ; do
  echo -n "$i "
  for j in $(seq 1 500) ; do
    taskset --cpu-list $cpu wget -O /dev/null http://172.17.0.2 2>/dev/null
  done
done
echo

I open several terminals and run test/test.sh 0, test/test.sh 1,test/test.sh 2 etc. to load several cpus. I can reproduce the bug when there is at least 3 test scripts running in parallel.

@rade

This comment has been minimized.

Member

rade commented Jul 12, 2017

great! should be easy to fix then ;)

@alban

This comment has been minimized.

Collaborator

alban commented Jul 17, 2017

I cannot reproduce this on my laptop but only on GCE with the configuration above. Trying to see what the difference could be, I notice that GCE has a clock skew daemon but disabling it didn't help.

I wrote a program (bpf_clocks) to compare the kernel clock bpf_ktime_get_ns() and the userspace clock clock_gettime(CLOCK_MONOTONIC) since they need to be the same for the ordering to work. I tested a couple of time and both clocks seemed to be the same. I want to automate this test a bit more to run it plenty of times over a couple of minutes.

Next thing to check is if the clock is consistent between cpus or if one cpu can have a different clock than another. I suspect each cpu can give different results because of two comments I read:

  • "other CPUs are likely to be able observe [time going backward]" in __ktime_get_fast_ns
  • "There is no promise that the timestamp counters of multiple CPUs on a single motherboard will be synchronized" (source)

I am not sure how to write a test to check if it is the cause of the problem. But that's what I would like to check next.

@rade

This comment has been minimized.

Member

rade commented Jul 17, 2017

@alban do you want me to give you access to one of our clusters where the problem occurs?

@alban

This comment has been minimized.

Collaborator

alban commented Jul 18, 2017

@rade no need for now because I can reproduce the bug without.

Status update: still exploring...

I continued with my bpf_clocks program to compare the kernel and userspace clocks and looked at more instances and I didn't notice discrepancies between the clocks.

I wrote another program to look at the clocks between cpus based on https://github.com/karthick18/ticket_spinlock (with this patch) and I also didn't notice discrepancies.


I am exploring a different possible race: the execution of one eBPF function (such as kprobe__tcp_set_state()) is not instantaneous and there could be some time between the time it takes a timestamp with bpf_ktime_get_ns() and the time the perf event is sent to the ring buffer with bpf_perf_event_output().

With virtual CPUs on platforms like GCE and AWS, there is no guarantees that a CPU is not suspended by the hypervisor, so different execution of eBPF functions can take variable amount of time. We could have the following scenario:

image

  1. cpu#1 takes a timetamp
  2. cpu#2 takes a timestamp and sends the event with the timestamp to the ring buffer
  3. userspace checks the clock with clock_gettime() and reads the event from cpu#2.
  4. cpu#1 sends the event to the ring buffer
  5. later, userspace receives the event from cpu#1 but it notices its timestamp is before the event from cpu#2 and it cannot reorder it because the event from cpu#2 has already been forwarded to the user since it was before the clock_gettime() barrier.

In order to check if this scenario really happens, I added a new type of event (named type TCP_EVENT_TYPE_CLOCK) emitted after each tcp connect, accept or close (kinvolk/tcptracer-bpf@baa1e79). I am inspecting the logs. I see that between 2 bpf_ktime_get_ns(), there could be between 800ns to 2000ns. It is not that small compared to 2 tcp events (often ~50000ns)

@alban

This comment has been minimized.

Collaborator

alban commented Jul 19, 2017

When looking at the timestamps in the logs, I was not able to observe the race described in my comment above so not sure if it is a correct explanation.

I gave Ubuntu 16.10 with Linux 4.8.0-59-generic a try and the bug is still reproducible there. So it does not seem due to an old kernel.

@alban

This comment has been minimized.

Collaborator

alban commented Jul 20, 2017

I didn't reach a conclusion on the root cause so I will go on with the workaround:

Also, as a workaround, is there any downside to simply restarting all the eBPF tracking when getting this error, i.e. going through the same initialisation as when initially starting the probes, including the initial proc walk? We'd need to guard against "flapping", obviously.

@alban alban referenced this issue Jul 20, 2017

Merged

restart eBPF tracking on error #2735

4 of 4 tasks complete
@2opremio

This comment has been minimized.

Collaborator

2opremio commented Jul 20, 2017

That's quite a big hack but I don't have a better suggestion ...

@rade rade added this to the 1.6 milestone Jul 24, 2017

alban added a commit to kinvolk/scope that referenced this issue Jul 24, 2017

EbpfTracker: restart it when it dies
EbpfTracker can die when the tcp events are received out of order. This
can happen with a buggy kernel or apparently in other cases, see:
weaveworks#2650

As a workaround, restart EbpfTracker when an event is received out of
order. This does not seem to happen often, but as a precaution,
EbpfTracker will not restart if the last failure is less than 5 minutes
ago.

This is not easy to test but I added instrumentation to trigger a
restart:

- Start Scope with:
    $ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch

- Request a stop with:
    $ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf

alban added a commit to kinvolk/scope that referenced this issue Jul 24, 2017

EbpfTracker: restart it when it dies
EbpfTracker can die when the tcp events are received out of order. This
can happen with a buggy kernel or apparently in other cases, see:
weaveworks#2650

As a workaround, restart EbpfTracker when an event is received out of
order. This does not seem to happen often, but as a precaution,
EbpfTracker will not restart if the last failure is less than 5 minutes
ago.

This is not easy to test but I added instrumentation to trigger a
restart:

- Start Scope with:
    $ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch

- Request a stop with:
    $ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf

alban added a commit to kinvolk/scope that referenced this issue Jul 25, 2017

EbpfTracker: restart it when it dies
EbpfTracker can die when the tcp events are received out of order. This
can happen with a buggy kernel or apparently in other cases, see:
weaveworks#2650

As a workaround, restart EbpfTracker when an event is received out of
order. This does not seem to happen often, but as a precaution,
EbpfTracker will not restart if the last failure is less than 5 minutes
ago.

This is not easy to test but I added instrumentation to trigger a
restart:

- Start Scope with:
    $ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch

- Request a stop with:
    $ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf

alban added a commit to kinvolk/scope that referenced this issue Jul 25, 2017

EbpfTracker: restart it when it dies
EbpfTracker can die when the tcp events are received out of order. This
can happen with a buggy kernel or apparently in other cases, see:
weaveworks#2650

As a workaround, restart EbpfTracker when an event is received out of
order. This does not seem to happen often, but as a precaution,
EbpfTracker will not restart if the last failure is less than 5 minutes
ago.

This is not easy to test but I added instrumentation to trigger a
restart:

- Start Scope with:
    $ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch

- Request a stop with:
    $ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf

@rade rade modified the milestones: 1.6, Next Jul 26, 2017

alban added a commit to kinvolk/scope that referenced this issue Aug 7, 2017

EbpfTracker: restart it when it dies
EbpfTracker can die when the tcp events are received out of order. This
can happen with a buggy kernel or apparently in other cases, see:
weaveworks#2650

As a workaround, restart EbpfTracker when an event is received out of
order. This does not seem to happen often, but as a precaution,
EbpfTracker will not restart if the last failure is less than 5 minutes
ago.

This is not easy to test but I added instrumentation to trigger a
restart:

- Start Scope with:
    $ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch

- Request a stop with:
    $ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf

alban added a commit to kinvolk/scope that referenced this issue Aug 8, 2017

EbpfTracker: restart it when it dies
EbpfTracker can die when the tcp events are received out of order. This
can happen with a buggy kernel or apparently in other cases, see:
weaveworks#2650

As a workaround, restart EbpfTracker when an event is received out of
order. This does not seem to happen often, but as a precaution,
EbpfTracker will not restart if the last failure is less than 5 minutes
ago.

This is not easy to test but I added instrumentation to trigger a
restart:

- Start Scope with:
    $ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch

- Request a stop with:
    $ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf

alban added a commit to kinvolk/scope that referenced this issue Aug 17, 2017

EbpfTracker: restart it when it dies
EbpfTracker can die when the tcp events are received out of order. This
can happen with a buggy kernel or apparently in other cases, see:
weaveworks#2650

As a workaround, restart EbpfTracker when an event is received out of
order. This does not seem to happen often, but as a precaution,
EbpfTracker will not restart if the last failure is less than 5 minutes
ago.

This is not easy to test but I added instrumentation to trigger a
restart:

- Start Scope with:
    $ sudo WEAVESCOPE_DOCKER_ARGS="-e SCOPE_DEBUG_BPF=1" ./scope launch

- Request a stop with:
    $ echo stop | sudo tee /proc/$(pidof scope-probe)/root/var/run/scope/debug-bpf

@rade rade closed this in 8fe3538 Aug 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment