Introduce sharding rules to MongoDB collections #642

sejongk · 2023-09-13T09:47:04Z

What this PR does / why we need it:
This PR introduces sharding rules to MongoDB collections to distribute loads on the database cluster.
It takes reference from #472.

Which issue(s) this PR fixes:

Addresses #673

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Additional documentation:

Checklist:

Added relevant tests or not required
Didn't break anything

codecov · 2023-09-13T09:54:47Z

Codecov Report

Attention: 116 lines in your changes are missing coverage. Please review.

Comparison is base (940941a) 49.47% compared to head (bb175e1) 49.19%.
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #642      +/-   ##
==========================================
- Coverage   49.47%   49.19%   -0.28%     
==========================================
  Files          69       69              
  Lines        9951    10129     +178     
==========================================
+ Hits         4923     4983      +60     
- Misses       4512     4608      +96     
- Partials      516      538      +22

Files	Coverage Δ
server/backend/database/mongo/indexes.go	`57.14% <ø> (ø)`
server/backend/database/mongo/client.go	`38.95% <39.58%> (-0.52%)`	⬇️

... and 8 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

build/docker/sharding/min/scripts/init-config1.js

sejongk · 2023-09-29T11:41:10Z

There are some issues I’ve been handling.

The first thing is how to guarantee the uniqueness of fields. In the Yorkie, there is a requirement to force unique constraints on several field combinations (e.g. owner and name in the project collection).

However, MongoDB does not support unique indexes across shards, except when the unique index contains the full shard key as a prefix of the index (ref. https://www.mongodb.com/docs/manual/core/sharding-shard-key/). In these situations MongoDB will enforce uniqueness across the full key, not a single field . It seems related to how the sharding works in MongoDB. Indexing is done and maintained in each shard, not globally, so that the unique indexes are also applied per shard. In addition, uniqueness is supported only for ranged shard key, not for hashed shard key.

It suggests to use a proxy collection for each combination to be globally unique (ref. https://www.mongodb.com/docs/manual/tutorial/unique-constraints-on-arbitrary-fields/) So, I implemented this method (the commit Introduce proxy collections to guarantee uniqueness). It looks fine but I think the collection design is going to be complex, the additional executions for proxies are costly, and there should be a way to make those executions atomic.

Therefore, I’ve implemented to use the application-level lock to get rid of proxy collections. Before insertion operations, it checks if there is a document that already has the same combination, and it creates a new document only if it does not exist (using upsert in MongoDB). Plus, the application-level locking is used to make the operation atomic and prevent conflicts between server instances. The current memory lock seems not to support distributed locks. We can use the Redis or TTLed MongoDB instead later for this.

I’m going to check benchmark results to figure out the accurate tradeoffs for performance.

The second thing is about the shard key. I choose the following keys for good performance under the current query patterns. Note that the server_seq ranged-sharding is used for changes, because range query is frequently used for the changes.

sh.shardCollection("yorkie-meta.projects", { _id: "hashed" })
sh.shardCollection("yorkie-meta.users", { username: "hashed" })
sh.shardCollection("yorkie-meta.clients", { project_id: "hashed" })
sh.shardCollection("yorkie-meta.documents", { project_id: "hashed" })
sh.shardCollection("yorkie-meta.changes", { doc_id: "hashed", server_seq: 1 })
sh.shardCollection("yorkie-meta.snapshots", { doc_id: "hashed" })
sh.shardCollection("yorkie-meta.syncedseqs", { doc_id: "hashed" })

MongoDB provides the balancing mechanisms for sharding (ref. https://www.mongodb.com/docs/manual/core/sharding-data-partitioning/#range-migration, https://www.mongodb.com/docs/manual/core/sharding-balancer-administration/) The balancer process automatically migrates data when there is an uneven distribution of a sharded collection's data across the shards. See Migration Thresholds (ref. https://www.mongodb.com/docs/manual/core/sharding-balancer-administration/#std-label-sharding-migration-thresholds ) for more details.

Any idea about these issues?

krapie

Thank you for your contribution.
I left some small comments below.

The first thing is how to guarantee the uniqueness of fields. In the Yorkie, there is a requirement to force unique constraints on several field combinations (e.g. owner and name in the project collection).

I personally think locking with distributed keys is the lowest priority option that we have considering our current sharded cluster mode, since we are avoiding communications between server instances in sharded cluster mode.

Proxy collection seems more attractive to me for now, since it is MongoDB's official suggestion and we already have some DB actions that require atomic execution.

But benchmarking will clearly show what is most suitable for our situation.

The second thing is about the shard key. I choose the following keys for good performance under the current query patterns. Note that the server_seq ranged-sharding is used for changes, because range query is frequently used for the changes.

I think we have to constantly benchmark and tune collection shard keys based on your shard key selection.

krapie · 2023-10-12T01:56:17Z

server/backend/database/mongo/client.go

+	}
+
+	info.ID = types.ID(result.UpsertedID.(primitive.ObjectID).Hex())
+	println("infoID!!", info.ID, result.UpsertedID)


Seems like this is for testing purpose.
Consider removing this code later.

krapie · 2023-10-12T02:13:24Z

build/docker/sharding/prod/scripts/init-config1.js

+      ]
+    }
+  )
+


Please add newline here.

krapie · 2023-10-12T02:22:35Z

server/backend/database/mongo/client.go

+		return err
+	}
+
+	// NOTE: If the project is already being created by another, it is


What does this comment mean?
Could you give me an explanation for this?

krapie · 2023-10-12T02:23:44Z

server/backend/database/mongo/client.go

@@ -1466,6 +1641,24 @@ func (c *Client) collection(
 		Collection(name, opts...)
 }

+func (c *Client) deleteProjectProxyInfo(


Are we still using this function?
It seems like you have introduced lock instead of proxy collection.

Introduce sharding rules to MongoDB collections

9323464

sejongk added the enhancement 🌟 New feature or request label Sep 13, 2023

sejongk requested review from hackerwins and krapie September 13, 2023 09:47

sejongk self-assigned this Sep 13, 2023

sejongk marked this pull request as draft September 13, 2023 09:47

Add final newlines

33afeaf

hackerwins reviewed Sep 15, 2023

View reviewed changes

build/docker/sharding/min/scripts/init-config1.js Outdated Show resolved Hide resolved

Update build/docker/sharding/min/scripts/init-config1.js

7993ad9

hackerwins reviewed Sep 15, 2023

View reviewed changes

build/docker/sharding/min/scripts/init-config1.js Outdated Show resolved Hide resolved

hackerwins and others added 8 commits September 15, 2023 12:33

Update build/docker/sharding/min/scripts/init-config1.js

c104fd3

Merge branch 'main' into add-sharding-rule-to-mongodb

27aebab

Introduce proxy collections to guarantee uniqueness

1792e14

Use proxy collections only for consistent uniqueness

da5e3ec

Use app level lock instead of proxy collections

a79cbb5

Add test case

87a91d4

Add README.md

d6ee5b9

Fix lint error and remove proxy config

bb175e1

sejongk marked this pull request as ready for review September 29, 2023 11:42

krapie reviewed Oct 12, 2023

View reviewed changes

hackerwins force-pushed the main branch 3 times, most recently from 7892142 to fdc2e1c Compare November 4, 2023 10:22

hackerwins force-pushed the main branch from c7b445a to 17d0039 Compare November 23, 2023 00:57

hackerwins marked this pull request as draft November 23, 2023 01:45

sejongk closed this Nov 23, 2023

sejongk deleted the add-sharding-rule-to-mongodb branch November 23, 2023 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce sharding rules to MongoDB collections #642

Introduce sharding rules to MongoDB collections #642

sejongk commented Sep 13, 2023 •

edited

Loading

codecov bot commented Sep 13, 2023 •

edited

Loading

sejongk commented Sep 29, 2023 •

edited

Loading

krapie left a comment

krapie Oct 12, 2023

krapie Oct 12, 2023

krapie Oct 12, 2023

krapie Oct 12, 2023

Introduce sharding rules to MongoDB collections #642

Introduce sharding rules to MongoDB collections #642

Conversation

sejongk commented Sep 13, 2023 • edited Loading

codecov bot commented Sep 13, 2023 • edited Loading

Codecov Report

sejongk commented Sep 29, 2023 • edited Loading

krapie left a comment

Choose a reason for hiding this comment

krapie Oct 12, 2023

Choose a reason for hiding this comment

krapie Oct 12, 2023

Choose a reason for hiding this comment

krapie Oct 12, 2023

Choose a reason for hiding this comment

krapie Oct 12, 2023

Choose a reason for hiding this comment

sejongk commented Sep 13, 2023 •

edited

Loading

codecov bot commented Sep 13, 2023 •

edited

Loading

sejongk commented Sep 29, 2023 •

edited

Loading