Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

graft sqld #465

Merged
merged 1,241 commits into from
Oct 17, 2023
Merged

graft sqld #465

merged 1,241 commits into from
Oct 17, 2023

Conversation

MarinPostma
Copy link
Collaborator

@MarinPostma MarinPostma commented Oct 17, 2023

graft sqld into libsql-server

haaawk and others added 30 commits May 17, 2023 14:18
This is needed for Atlas integration.

Signed-off-by: Piotr Jastrzebski <piotr@chiselstrike.com>
Co-authored-by: Doug Stevenson <doug@chiselstrike.com>
Signed-off-by: Piotr Jastrzebski <piotr@chiselstrike.com>
When registering WAL methods and later opening a connection
with them, sqld is currently subject to race condition,
where multiple threads can register methods by the same name,
and then look up the same instance of them.
That's dangerous, because our methods hold state.
The short-term solution for this problem is to make register+find
globally atomic. A proper solution will allow libsql_open()
to take a pointer to user state, making any race conditions moot.
If the WAL methods are not deregistered from a global libSQL
table before dropping them, we end up with having a dangling
pointer in the libSQL structure. It often worked, because
the freed memory just didn't manage to get overwritten yet,
but it was a time bomb nonetheless.

The scenario was as follows:
1. Fiber A registers WAL methods W1
2. Fiber A frees WAL methods W1, but does not deregister them
   from the libSQL global data structure
3. Fiber B registers WAL methods W2, and registration involves
   iterating over the data structure that contains a dangling
   pointer, after the methods were freed by (2.)
Secure mode has a performance and overhead penalty,
and we don't really need to be robust against memory reuse attacks
in sqld.
Simple measurements show that secure mode can bump resident memory
usage as much as 250%, plus there's the cpu cost of encrypting
freelists and other features that they offer.
libSQL has no notion of checking how much physical memory is available, if
there's swap space configured, etc. Normally it limits its own allocations to
2GiB, but on small edge machines it can already prove to be too much to handle.
To remedy that, two command-line options are added to sqld -
`soft_heap_limit_mb` and `hard_heap_limit_mb`. Soft heap limit hints to libSQL
that it should not allocate once the limit is reached and put pressure on
shrinking caches, but it will choose to overallocate rather than fail with
SQLITE_NOMEM error. Hard limit bails out once an allocation is attempted past
the limit.  While we could automatically detect the hardware setup on boot,
command-line options are good enough for a start to avoid out-of-memory (OOM)
conditions. A good rule of thumb would be to set the soft limit to, say, 70% of
available physical memory, and hard limit to 90%, leaving some leeway for the
operating system.

Local tests imply that libSQL is really good at keeping allocations above
even soft limit, with a negligible error margin.
Due to how sqld, one of the prime users of bottomless, works,
we should ignore passive checkpoints instead of upgrading them
to full checkpoints. Reason is, sqld uses short-standing connections
per each request, and we certainly don't want to checkpoint
multiple times per second.
It can be recovered once we decide to make it production-ready,
and meanwhile let's drop it so that we end up with cleaner code.
* make WalHook stateless

The WalHook cannot be statefull. Instead, it ties a WalHook
implementation to some context, and provides a way to safely register
methods, and retrieve the passed context fromt the Wal pointer.

* update LibsqlDb to use new WalHooks

* logger hook

* replica hook

* glue stuff
* Propose a stateful Hrana-over-HTTP API

* Separate WebSocket specific parts of Hrana into a submodule

* Implement the stream mechanism for HTTP API v2

* Implement stream expiration

* Refactor and document

* Report a nice error when wrong step index is used in BatchCond

* Small fixes

* Address comments from code review

* Fixup Cargo.lock

* Add the --http-self-url command-line argument
Since the expected size of the values vector is known,
let's allocate its memory once instead of relying on reallocations
to grow the vector.
* wal_hook: add on_savepoint_undo and on_checkpoint

They will be overwritten along with bottomless integration

* treewide: integrate bottomless as WAL hook

From now on, bottomless can be enabled in sqld as a regular WalHook.
TODO: the main functionality, i.e. actually replicating frames is not hooked to
anything yet. It will be done, but it should be heavily considered to just
queue writes with an unbounded channel (which allows sync insertions), so that
everything happens in the background.

* treewide: replicate and checkpoint frames with bottomless

Right now the replication and checkpointing is sync and blocking
- that will need to be rewritten to async later.
* introduce QueryResultBuilder

* hrana query result builder

* http query result builder

* posgres result builder

* update Database trait

* update all modules to use QueryResultBuilder

* add tests

* cleanup unused types

* bump toolchain

* cargo fmt

* remove postgres

* review edits
MarinPostma and others added 25 commits October 6, 2023 13:59
* add savepoint and release stmt categories

* forbid savepoints in legacy http
This reverts be7386 and removes decoding limits for the `ReplicationLogProxyClient`
so that when replication requests get forwarded to the primary and it returns a full
batch of `1024` frames it does not trigger the 4mb decoding limit set by default for
tonic gRPC clients.
* Group all checks together in the same workflow

Reusing artifacts ends up being faster than parallelizing with the
current CI resources.

* Set same RUSTFLAGS in both jobs && remove unnecessary steps

* Cache rust artifacts
This will allow ARMv7 architecture users to compile sqld.
Wasmtime currently does not have support for that arch.
* wip

* add prometheus metrics probes

* expose metrics admin endpoint

* move metric route to /metrics

* use histogram macro
* Optimize runtime container layers

* Use Github Actions cache for Docker

* Move rm -rf /var/lib/apt/lists/* to its own line
* add same thread injector

* refactor Frame API; primary return commit frame for snapshots

* hook new injector

* add test assets

* handle fatal replication error

* use persistent replication table

* add doc commen to snapshot frames_iter_from

* review fixes

* fmt

* handle potential injector execute error
While testing multitenancy, it's more convenient to use a wildcard
domain rather than edit /etc/hosts manually.
* connection: fix various comment, code var and doc typos

* replication: fix various comment, code var and doc typos

* namespace: fix various comment, code var and doc typos

* metrics: fix typo in metric docstring

* hrana/stmt: fix code typo verion -> version

* query_result_builder: fix code typo weigths -> weights

* h2c: fix typo in docstring

* query_analysis: fix typo in docstring
* Add embedded replica test

* sqld: Add `extended_code` support
For specific edge cases, a 32MiB threshold is a little small.
Let's bump it to at least 256MiB of data and inform about vacuuming
with info log level.
There's no reason I'm aware of to only make checkpoint-interval-s
option effective if bottomless is enabled. Botomlessless instances
should also be able to checkpoint periodically.
@MarinPostma MarinPostma added this pull request to the merge queue Oct 17, 2023
Merged via the queue into main with commit 37b695b Oct 17, 2023
7 checks passed
@MarinPostma MarinPostma deleted the graft-sqld branch October 17, 2023 10:22
@MarinPostma MarinPostma restored the graft-sqld branch October 17, 2023 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet