Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Make the library embedabble #577

Open
1 task done
AshleySchaeffer opened this issue Dec 14, 2023 · 2 comments
Open
1 task done

[Feature]: Make the library embedabble #577

AshleySchaeffer opened this issue Dec 14, 2023 · 2 comments
Labels
enhancement New feature or request

Comments

@AshleySchaeffer
Copy link

AshleySchaeffer commented Dec 14, 2023

Description about the feature

Hello,

I just came across Xline when on the hunt for a KV store that I can embed in my own Rust binaries. I don't really want to run a container/node for a persistent KV store, but I want to be able to share state between multiple instances of my own binaries.

Is this something you'd consider adding? Essentially, like RocksDB et al but the ability to throw it run in a tokio runtime alongside my own code and interact with it just like the existing client does, but under the hood it's just calling methods instead of communicating with a remote node.

SurrelDB is embeddedable as a library, but the license and corporate interest puts me off. However, it relies on database backends for persistence (e.g. TiKV). I'd rather it just handle the persistence in RocksDB and do the distribution itself but this about as close as I've seen:

https://crates.io/crates/surrealdb

Code of Conduct

  • I agree to follow this project's Code of Conduct
@AshleySchaeffer AshleySchaeffer added the enhancement New feature or request label Dec 14, 2023
@Phoenix500526
Copy link
Collaborator

Hi, @AshleySchaeffer. From your description, it seems like what you need is a local KV database. Xline adopts CURP as the consensus protocol, with a focus on applications in WAN with high latency. We may not consider adding such a feature in the short term.

@AshleySchaeffer
Copy link
Author

AshleySchaeffer commented Dec 19, 2023

Hey @Phoenix500526. While I totally appreciate that you may not wish to add the feature, I just wanted to clarify what my original request meant.

Today, if you want to build a distributed service and have shared state amongst all instances of it, you might choose to write a REST API using Rust's Axum framework and couple it with a database of some kind (e.g. xline-kv, mongodb, postgres, etc.) which will maintain the state. All the instances of the Axum service you have written will communicate across the network with the database to get and update state(s) from the database, and you pay the cost of those network communications (as xline points out in its marketing materials, latency is an issue). In addition, now you have two components to manage operationally; your Rust Axum service and the database. They must be monitored, scaled, managed, and maintained separately. In most modern day cases, you'll do all that within a container orchestration framework that may run these systems on same nodes (hosts or servers) or on multiple nodes.

SurrealDB takes a slightly different approach. If you used that in the system described above, instead of having two systems you would have one. The REST API would be hosted in the SurrealDB itself (they provide a way to load code into an instance) and your code would interact with it locally (I'm not sure how this works exactly, I'd guess via communicating with a localhost interface but I could be wrong). The latency costs of these communications is drastically reduced over the system above and SurrealDB is still doing all the things it would have been doing before (e.g. syncing state via consensus amongst its nodes) it is just faster to communicate with it. The big pro being operationally, you only have to manage one system.

The downsides of using SurrealDB's approach is that you are limited to what libraries and frameworks you can use. It appears you may also be restricted WASM (could be wrong). This could be a complete deal breaker for many.

What I propose is that instead of running an app in the database ecosystem, the database runs in the apps ecosystem. To stick with my example above; if the database could be executed in the same context as the Axum based REST API (i.e. within one or more tokio/OS threads but in the same process), communicating with it would be as cheap as dispatching messages via channels (extremely low latency) or potentially direct memory access. You could compile the entire stack (the REST API and the database) into the same binary and only one system to manage. The database would scale up and down with your app. You'd be able to write code as you do today and manage the database from a code-level like you manage database connections or any other in process state today. The database library would simply spawn some threads that it executes on, and do what it does today with regards to the consensus process.

The downsides to this in my opinion are almost completely mitigated. Some will say that segregation helps mitigate bugs. If the database crashes, your app is still live; only the state(s) it depends upon in the database can't be accessed so it's technically not live it's reporting errors. Some will also say that you can scale more granularly. If the database doesn't need more compute it need not be allocated, you can scale the REST API separately to make things more efficient, etc. My argument is that in modern day orchestration frameworks, unless you've put effort into scheduling, your apps/APIs and database will likely run on the same node anyway so there is no real point in including the overhead of scheduling compute resources over the top of the OS. You might get a level resiliency against one component exhausting resources (e.g. k8s can limit CPU and RAM usage) but again I'd argue this is probably something that's going to cause issues that result in breakage/outage anyway.

In short, for xline this would mean making a Rust crate that facilitates starting an instance of the database in the current process, providing a config that matches the config elements you already have, and then a bunch of methods to get/put keys in the same way you would a HashMap.

Does that make sense? I believe it aligns with the goals of xline (high performance in high latency scenarios) but again, I completely understand if you'd prefer to focus elsewhere or disagree completely 😄

If you disagree, feel free to close the issue. No hard feelings!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants