Ludicrous mode in dgraph . eventual consistency #3903

bjzhaoqing · 2022-02-16T06:18:47Z

Ludicrous mode is available in Dgraph v20.03.1 and later.

Ludicrous mode allows a Dgraph database to ingest data at an incredibly fast speed, but with fewer guarantees. In normal mode, Dgraph provides strong consistency. In Ludicrous mode, Dgraph provides eventual consistency, so any mutation that succeeds should be available eventually. This means changes are applied more slowly during periods of peak data ingestion, and might not be immediately reflected in query results.

In dgraph, every node in the raft cluster is readable （leader and follower）.

In normal mode, Dgraph provides strong consistency. When a network partition occurs, minority nodes cannot write or read.

But in Ludicrous mode，Dgraph provides eventual consistency. When a network partition occurs, minority nodes are not writable, but readable.

I would like to have this feature in the nebula graph, when I only need eventual consistency then each follower node is readable.

Take a nebula cluster consisting of three machines as an example, each machine has a meta node, a graph node and a storage node.

I would expect that when a network partition occurs, the majority nodes are readable and writable; the minority nodes are readable and non-writable.

When two machines go down at the same time and cannot be recovered, I hope that the only remaining machine can have a way to break away from the original cluster and become a new cluster to provide read and write services as soon as possible. After all, this machine has meta and storage nodes "complete data". We can expand online later

There is a possibility that the machine of the cloud manufacturer we use is a three-node nebula cluster, and the underlying host of two machines may be a physical server.

In the current nebula cluster, if two machines go down, it will take a long time for the business to recover. We focus more on avalibility, especially readability. So I need these features above.

wey-gu · 2022-02-18T09:01:03Z

Dear @bjzhaoqing , thanks a lot for composing the user story on this feature!

It's indeed meaningful for extreme machine failure cases(2 down).

BTW, in most IaaS providers, anti-affinity VM placement policy could be used to ensure machines sit in different fault sets (at least host machine level), while the issue could come from higher-level accidents though(same switch/ same power source, etc... level failures).

Thanks!

cc @Sophie-Xie

bjzhaoqing · 2022-02-21T04:27:43Z

Dear @wey-gu

Thank you for your reply, I understand the anti-affinity VM placement policy, which is very effective for small and medium customer groups. For a large customer like our company, the demand for machines is very large. For example, we need 1500 virtual machine resources, but when the IAAS provider has only 1000 hosts, the situation where there are 2 virtual machines on one host is inevitable. We have communicated with our suppliers about this. We need to notify cloud manufacturers in advance of our machine needs, because it takes a period of time for us to stock up, our business reduces the machines returned to cloud manufacturers, and they cannot sell them to other customers in a short period of time. This is the reality. We have to think about it.

For various reasons, when we designed the nebula graph architecture, we chose to use two computer rooms, such as Baidu Cloud and Huawei Cloud. For example, in a three-node cluster, two are located in Baidu Cloud and one is located in Huawei Cloud. The application accesses the nebula nodes of the local cloud.

According to past failure statistics, private line problems (including cloud vendor and our own problems) account for the highest proportion of failures. In this case, the application of Baidu Cloud is no problem, but the application of Huawei Cloud cannot read and write. For most Internet businesses, we pay more attention to availability, so we do need a minority of nodes ( meta, graph, storage node) to provide read-only services when network partitions occur.

bjzhaoqing · 2022-02-23T09:49:48Z

Dear @wey-gu

   Can this feature be done? If possible, how long is the lead time?

bjzhaoqing added the type/feature req Type: feature request label Feb 16, 2022

wey-gu mentioned this issue Feb 18, 2022

Support fully concurrent read for raft #997

Open

jamieliu1023 mentioned this issue Feb 19, 2022

Weekly Report 2022-02-18 vesoft-inc/nebula-community#96

Closed

Sophie-Xie added the community Source: who proposed the issue label Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ludicrous mode in dgraph . eventual consistency #3903

Ludicrous mode in dgraph . eventual consistency #3903

bjzhaoqing commented Feb 16, 2022

wey-gu commented Feb 18, 2022 •

edited

bjzhaoqing commented Feb 21, 2022

bjzhaoqing commented Feb 23, 2022

Ludicrous mode in dgraph . eventual consistency #3903

Ludicrous mode in dgraph . eventual consistency #3903

Comments

bjzhaoqing commented Feb 16, 2022

wey-gu commented Feb 18, 2022 • edited

bjzhaoqing commented Feb 21, 2022

bjzhaoqing commented Feb 23, 2022

wey-gu commented Feb 18, 2022 •

edited