Skip to content

Elevated latencies during single-node faults #2739

@aphyr

Description

@aphyr

In TigerBeetle 0.16.17 through 0.16.27, single-node failures are frequently associated with higher latencies across all clients--often from three to five orders of magnitude. For instance, consider this test. We killed one of three nodes, and saw latencies jump from 1-50 ms up to ~100 seconds. Latencies remained elevated until the node was restarted, almost a thousand seconds later.

Image

High latencies may recover spontaneously after tens to hundreds of seconds, or persist for thousands of seconds.

This may involve the failure of a non-leader node. It also seems more likely in smaller clusters, rather than large ones.

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions