Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ludicrous mode in dgraph . eventual consistency #3903

Open
bjzhaoqing opened this issue Feb 16, 2022 · 3 comments
Open

Ludicrous mode in dgraph . eventual consistency #3903

bjzhaoqing opened this issue Feb 16, 2022 · 3 comments
Labels
community Source: who proposed the issue type/feature req Type: feature request

Comments

@bjzhaoqing
Copy link

Ludicrous mode is available in Dgraph v20.03.1 and later.

Ludicrous mode allows a Dgraph database to ingest data at an incredibly fast speed, but with fewer guarantees. In normal mode, Dgraph provides strong consistency. In Ludicrous mode, Dgraph provides eventual consistency, so any mutation that succeeds should be available eventually. This means changes are applied more slowly during periods of peak data ingestion, and might not be immediately reflected in query results.

In dgraph, every node in the raft cluster is readable (leader and follower).

In normal mode, Dgraph provides strong consistency. When a network partition occurs, minority nodes cannot write or read.

But in Ludicrous mode,Dgraph provides eventual consistency. When a network partition occurs, minority nodes are not writable, but readable.

I would like to have this feature in the nebula graph, when I only need eventual consistency then each follower node is readable.

Take a nebula cluster consisting of three machines as an example, each machine has a meta node, a graph node and a storage node.

I would expect that when a network partition occurs, the majority nodes are readable and writable; the minority nodes are readable and non-writable.

When two machines go down at the same time and cannot be recovered, I hope that the only remaining machine can have a way to break away from the original cluster and become a new cluster to provide read and write services as soon as possible. After all, this machine has meta and storage nodes "complete data". We can expand online later

There is a possibility that the machine of the cloud manufacturer we use is a three-node nebula cluster, and the underlying host of two machines may be a physical server.

In the current nebula cluster, if two machines go down, it will take a long time for the business to recover. We focus more on avalibility, especially readability. So I need these features above.

@wey-gu
Copy link
Contributor

wey-gu commented Feb 18, 2022

Dear @bjzhaoqing , thanks a lot for composing the user story on this feature!

It's indeed meaningful for extreme machine failure cases(2 down).

BTW, in most IaaS providers, anti-affinity VM placement policy could be used to ensure machines sit in different fault sets (at least host machine level), while the issue could come from higher-level accidents though(same switch/ same power source, etc... level failures).

Thanks!

cc @Sophie-Xie

@bjzhaoqing
Copy link
Author

Dear @wey-gu

Thank you for your reply, I understand the anti-affinity VM placement policy, which is very effective for small and medium customer groups. For a large customer like our company, the demand for machines is very large. For example, we need 1500 virtual machine resources, but when the IAAS provider has only 1000 hosts, the situation where there are 2 virtual machines on one host is inevitable. We have communicated with our suppliers about this. We need to notify cloud manufacturers in advance of our machine needs, because it takes a period of time for us to stock up, our business reduces the machines returned to cloud manufacturers, and they cannot sell them to other customers in a short period of time. This is the reality. We have to think about it.

For various reasons, when we designed the nebula graph architecture, we chose to use two computer rooms, such as Baidu Cloud and Huawei Cloud. For example, in a three-node cluster, two are located in Baidu Cloud and one is located in Huawei Cloud. The application accesses the nebula nodes of the local cloud.

According to past failure statistics, private line problems (including cloud vendor and our own problems) account for the highest proportion of failures. In this case, the application of Baidu Cloud is no problem, but the application of Huawei Cloud cannot read and write. For most Internet businesses, we pay more attention to availability, so we do need a minority of nodes ( meta, graph, storage node) to provide read-only services when network partitions occur.

@bjzhaoqing
Copy link
Author

Dear @wey-gu

   Can this feature be done? If possible, how long is the lead time?

@Sophie-Xie Sophie-Xie added the community Source: who proposed the issue label Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Source: who proposed the issue type/feature req Type: feature request
Projects
None yet
Development

No branches or pull requests

3 participants