Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clustering capabilities (write master w/ read-only slave followers) #136

Open
yuer1727 opened this issue Apr 29, 2019 · 21 comments
Open

Clustering capabilities (write master w/ read-only slave followers) #136

yuer1727 opened this issue Apr 29, 2019 · 21 comments
Assignees
Labels
feature Anything that is related to a new feature
Milestone

Comments

@yuer1727
Copy link

hello valeriansaliou:
have you any plan to develop multi nodes version? I mean that highly available and scalable feature is import to be popular and be used in product environment.

@valeriansaliou
Copy link
Owner

No, not planned.

@valeriansaliou
Copy link
Owner

Though if you're interested, I'm still open to contributions; if you can work it out in a simple way.

@toxuin
Copy link

toxuin commented May 23, 2019

Here's a simple idea for clustering, correct me if I am missing something:

Implement a "proxy" in front of 2+ nodes of sonic which would do two things:

  • Maintain channel connections to the nodes, proxying requests from it's own clients. Ingest requests go to EVERY node, Search requests go to a random node (round robin?).
  • Have a "snapshot" of either the whole index or replay-log on file in order to seed a new cluster node with data when it joins cluster

Bonus badass points for every node being able to act as such proxy elected by consensus algo instead of appointed-once proxy (which would become a single point of failure).

@valeriansaliou
Copy link
Owner

valeriansaliou commented May 24, 2019

That would be a pragmatic solution, but I'd be more open to a built-in replication system in Sonic, that may use the Sonic Channel protocol to 'SYNC' between nodes.

With the proxy solution, how would you implement synchronization of 'lost' commands when a given node goes offline for a few seconds? Same as for new nodes, with a full replay from the point it lost synchronization? A replay method would also imply that the proxy stores all executed commands over time, and ideally periodically 'compact' commands when eg. you have a PUSH and then a FLUSHB or FLUSHO that would cancel out the earlier PUSH.

A built-in Sonic clustering protocol could use database 'snapshots' to handle recoveries (I think RocksDB natively supports this), and only re-synchronize the latest part that's been lost on a given node due to downtime. I'll read more on how Redis and others do this as they have native clustering.

@valeriansaliou
Copy link
Owner

A nice side effect of clustering, is that one could built a large Sonic-based infrastructure, with several "ingestor" nodes and several "search" nodes.

@toxuin
Copy link

toxuin commented May 24, 2019

I like the things you are proposing!

In any shape, clustering for fault-tolerance is a production-ready requirement – at least for us.

May I suggest to re-open this issue, even as an item for a roadmap item?..

@valeriansaliou
Copy link
Owner

Sure, leaving it open now.

@valeriansaliou valeriansaliou changed the title any plan to develop distributed version? Clustering capabilities May 24, 2019
@valeriansaliou valeriansaliou added the feature Anything that is related to a new feature label May 24, 2019
@benjamincburns
Copy link

Possibly relevant: Rocksplicator (article, repo).

I don't know how directly useful this would be as-is, but given that it's billed as a set of libraries for clustering RocksDB-backed services, I'd imagine at the very least there are some good lessons learned re: clustering strategy here.

@valeriansaliou valeriansaliou added this to the v2.0.0 milestone Jul 8, 2019
@SINHASantos
Copy link

SINHASantos commented Sep 16, 2019

I think The Raft Consensus Algorithm will do great addition to Sonic.

Actix Raft can be integrated with Sonic to add clustering capability

Also Distributed Search idea can be borrowed from Bleve which is used by Couchbase

@SINHASantos
Copy link

We see many products built on rocksDB use RAFT successfully. (CoackroachDB, Arangodb etc)

@valeriansaliou valeriansaliou changed the title Clustering capabilities Clustering capabilities (write master w/ read-only slave followers) Jun 19, 2020
@valeriansaliou valeriansaliou self-assigned this Jun 19, 2020
@valeriansaliou
Copy link
Owner

For simplicity's sake, the replication protocol will be inspired by Redis replication protocol: a primary write master, followed by N read slaves (ie. read-only). If the master falls down, reads are available on slaves but writes are rejected. When the master is recovered, slaves catch up to the master binlog and writes can be resumed by the connected libraries (possibly from a queue).

@SINHASantos
Copy link

my two cents , you may consider CRDT model used by Redis replication.

  1. Active-Active database replication will ensure Writes are never rejected
  2. All the servers are used and you make most of all the servers
  3. since there is no transaction and it is only search data -- there is no risk of data loss

@Tenzer
Copy link

Tenzer commented Oct 24, 2022

As a stop-gap workaround for missing clustering functionality, would it be possible to use a distributed file system for synchronizing changes between multiple Sonic instances and then (via a setting, somewhere) configure only one of the Sonic instances to have write access, leaving the other ones to be read-only?

I don't know if it would work to have multiple Sonic instances reading from the same files on disk, and what would happen to one Sonic instance if another instance writes to the files - but if that part works somewhat, it should "only" be a matter of implementing a read-only setting for this to work, I think.

@valeriansaliou
Copy link
Owner

That would unfortunately not work as RocksDB holds a LOCK file on the file system. Unsure if RocksDB can be started in RO mode, ignoring any LOCK file. Can you provide more reference on that? If it's easily done, then it's a quick-wine.

However, properly-done clustering needs full-data replication, meaning that a second independant server holds a full copy of the data on a different file system. That way, if the master fails, then the slave can be elected as master, and the old master can be resynchronized from the old slave in case of total data loss on one replica.

@Tenzer
Copy link

Tenzer commented Oct 24, 2022

It looks like RocksDB can be opened in read-only mode. There's a page here that specifically talks about this: https://github.com/facebook/rocksdb/wiki/Read-only-and-Secondary-instances.

@valeriansaliou
Copy link
Owner

Excellent, that can work then.

@SINHASantos
Copy link

SINHASantos commented Oct 26, 2022

may be support databases like TiDB ( built on rocks db) and extend the features of TiDB Cluster / replication

@charandas
Copy link

Would it be worth it to move this to 1.5.0 milestone? I am open to taking a look at this and sending a PR based on this comment.

@valeriansaliou
Copy link
Owner

Sure! Totally open to moving this to an earlier milestone. Open to PRs on this, thank you so much!

@charandas
Copy link

I looked at the code a bit this evening. As I understand it, all the DB references (whether KV or FST) are opened lazily using the acquire functions in StoreGenericPool which the specific implementations call when there is an executor action requiring the use of the store. Tasker also accesses the stores lazily when doing its tasks.

I am assuming its straight forward enough to utilize the RocksDB secondary replicas as they allow catchup. When I say readonly below, I mean the Sonic replica, not RocksDB readonly replica.

My main question for now is to do with the syncing of the FST store? That doesn't come for free like RocksDB syncing. The executor currently keeps KV and FST stores in sync by performing pushes/pops/other operations on both stores in tandem. Writer will continue to do so, but for readers, should there be an evented system to broadcast all successfully completed write operations to them? On one hand, this approach does sound really complicated to implement and I wonder if constructing the FST by looking up all the keys in the synced KV store is an option on the reader's side?

Anyone else here who has thoughts about this? (I am a newbie 😄 in both Rust and Sonic).

@valeriansaliou
Copy link
Owner

Hello there,

On the FST thing, as they are guaranteed to be limited in max size per FST, we could think about having a lightweight replication protocol work out insertions/deletions in the FST. If the replication loses sync, we could stream the modified FST files from master disk to slave disk, using checksums for comparison. Note that however, to guarantee integrity this would require to also send the not-yet-consolidated FST operations to the slave, so that next CONSOLIDATE on master and slave result in the same FST file being updated with the same pending operations.

Does that make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Anything that is related to a new feature
Projects
None yet
Development

No branches or pull requests

7 participants