Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLC timer support for Proxy #1110

Open
wants to merge 1 commit into
base: next
Choose a base branch
from

Conversation

dormando
Copy link
Member

@dormando dormando commented Feb 20, 2024

While working on replication support for routelib, I want to:

  • Be able to know if all N copies of an item across pools are the same.
  • Have some useful information about when the items were stored and where they came from.
  • Ensure that item identifiers are unique, allowing out of band repair decisions (LWW or delete/retry) if a copy is gone or missing.

For this, I want:

  • A globally atomically incrementing value.
  • This value reflects the time and which host stored the item
  • This value cannot go backwards per proxy instance

The problem with timestamps

  • REALTIME clocks can go backwards on computers: often small microsecond or millisecond adjustments, and occasionally by whole seconds.
  • MONOTONIC clocks will always go forward, but do not represent real time. They represent the uptime of the system or start time of a process, and will slowly drift out of sync.

Monotonic clocks are useful, but cannot easily be compared cross-machine and do not represent "real time" so we cannot "loosely order" lists of items or know when items were created.

Hybrid Logical Clocks

HLC's are a well used pattern and can (as we'll show later) usefully be compressed to 64bits. UUID's could also solve this problem but are 128bits or longer.

We don't need or use (at least right now) the gossip synchronization part of HLC's. Instead we use them to do loose global ordering and strict local ordering.

In this case, an HLC is defined as:

  1. 41 bits of epoch time at milisecond resolution (known as Physical Time)
  2. 15 bits of logical counter
  3. 8 bits of opaque (used as a machine identifier in a pool)

In the proxy we use REALTIME clock as the source for Physical Time. When a timestamp is requested we:

  • Check if realtime milliseconds is higher than the last time we were called.
  • If so, reset logical counter to 1
  • Assemble the 64bit time and return upstream

If realtime is the same or has gone backwards:

  • Increase the logical counter by 1
  • Assemble the 64bit time and return upstream

If the logical counter would overflow:

  • Return nil, causing an error upstream.

Properties of the HLC

  • Time cannot go backwards.
  • The same time can be repeated 32767 times per millisecond before we throw an error.
  • If NTP is being used properly time should not jump by huge values (rarely a full second, usually by milliseconds a few times per day)
  • Thus the clock can continue to move forward when time temporarily moves backwards.

This gives us:

  • If we have N copies of an item across N pools, and all three HLC values match, the items were set at the same time.
  • If one of the copies is "newer" than other copies, but all copies have the same opaque value, we can guarantee the newest copy is the latest
  • If one of the copies is "newer", but not all of the opaques match, we know that the data was set by multiple machines and cannot be trusted. See advanced notes below for repairing beyond an uncertainty window.

Implementation via CAS

  • Client flags are only 32bits.
  • HLC's cannot reasonably fit in 32bits.
  • Client flags are used for other things (compression flags, data encoding, or other such information)

However, our CAS value is a 64bit opaque. So long as a CAS value is generally increasing for the same item, it does not matter what the number is or if it's globally unique.

CAS ID's are generated per memcached node via a 64bit number, incrementing by one each time an item is updated. Thus CAS ID's across nodes cannot relate to each other. We also cannot set our own CAS.

Setting our own CAS

  • Add E flag to the meta protocol, which means "if operation is successful, use this value as your new CAS"
  • This lets use set the HLC as the CAS value on a successful update of an item.
  • The CAS feature continues to work as normal: if cas does not match exactly, the update will fail.
  • If using "old CAS causes stale bit to be set" mode, this will still generally work since HLC's are generally always increasing, and a higher CAS value than requested will fail.

The CAS value is still opaque and does NOT directly relate to an HLC. This leaves flexibility in the system for other kinds of data consistency models:

  1. Set CAS from data row versions from source data.
  2. Set CAS from a combination of row versions and CRC32
  3. Different configurations of HLC
  4. A global data versioner,
  5. etc.

HLC's in the proxy

HLC's are standard lua objects. They use a global data structure that is synchronized across all worker VM's in a proxy.
You can thus either have one HLC for all sets going through a proxy, or one HLC per namespace if the proxy handles many unrelated pools of data, or so on. Having many HLC's means more logical capacity for time errors.

Advanced break-fix resolution

  • If N copies of an item exists, but they do not have the same time and do not have the same opaque ID's, it's still possible to automatically repair the data if a 'time uncertainty window' is used.

Basically:

  • We state that: we manage NTP carefully and observe that the clocks in a cluster cannot be more than 5 seconds apart. It takes a great deal of work to get very close (milliseconds), but multiple seconds can be done reasonably.
  • Thus, if the opaque ID's do not match, but the "highest time" is more than 5 seconds newer than the next newest item, we can reasonably assume this is the latest write and repair the copies.

TODO:

Time estimate: two days of work. Likely less.

  • Finish implementing HLC API (reading the data off of a request/response and appending the data to a request)
  • Test suite for the lua HLC side
  • More tests for the CAS change
  • Implement in routelib to kick the API's tires.
  • Maybe: make bit amounts for logical / opaque selectable.

@dormando dormando added WIP proxy worklogs and issues related to proxy labels Feb 20, 2024
@dormando
Copy link
Member Author

was excited to finish this today but felt like shit all day. pushing up what I can and finishing tomorrow I hope.

Think my plan will be to let this sit for at least one extra week to think about it and make adjustments. Should cap the wait time at two weeks though.

@dormando
Copy link
Member Author

Yeah don't have any API's at all for parsing meta responses, which has finally come to bite me :) Might be best to do that then split into another PR and upstream sooner.

@dormando
Copy link
Member Author

Was sick for a few days :'(

I'm going to do a brute force method for the parse-from-res for now and get on with tests and integration.

@dormando
Copy link
Member Author

Basic roundtripping works. Basic tests are in.

Still not doing amazing right now so working pretty slowly. Next step is to fill in more tests and see about adding to routelib.

@dormando
Copy link
Member Author

tempted to slow roll this and come back to it in a week. Got some bugs out just now but I am again off-schedule for releases and need to go cut down the PR/issue backlog some.

@dormando
Copy link
Member Author

dormando commented Mar 6, 2024

Next goal: squash and rebase, split CAS work and upstream that first.

@dormando dormando changed the title Allow overriding CAS value in meta and add HLC timer support to Proxy HLC timer support for Proxy Mar 8, 2024
@dormando
Copy link
Member Author

dormando commented Mar 8, 2024

#1115 has been split off of this.

Test lib updates and mcp_request updates were also split off and upstreamed already.

Going to hold this change for one more week for adjustments/tests. The CAS change it depends on should upstream shortly.

@dormando
Copy link
Member Author

Note to self: need to document that this should only be used with a local clock that doesn't experience daylight savings time. or else it could throw errors for an hour once a year.

Provides a basic "hybrid logical clock" object for use mainly in
overriding CAS values with a descriptive value.
@dormando
Copy link
Member Author

dormando commented Apr 5, 2024

rebased now that Ecas is upstreamed.

going to hold this PR until I can use it with routelib.

@@ -0,0 +1,14 @@
#ifndef PROXY_HLC_H
#define PROXY_HLC
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix: PROXY_HLC_H

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proxy worklogs and issues related to proxy WIP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant