Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce MongoDB sharding rules to Client-wide collections #768

Closed
wants to merge 7 commits into from

Conversation

sejongk
Copy link
Contributor

@sejongk sejongk commented Jan 19, 2024

What this PR does / why we need it:
This commit introduces MongoDB sharding rules to clients and client-wide collections (e.g. syncedSeqs).

Some fields continue to use client_id instead of (client_key, client_id):

  • actor_id in the changes and syncedSeqs collections
  • owner in the documents collection
  • publisher in DocEvent of the pubsub package

Currently, Yorkie utilizes the objectID in MongoDB as a client_id.
According to the official MongoDB post, the objectID is composed of 12 bytes (TimeStamp(4 bytes) + MachineId(3 bytes) + ProcessId(2 bytes) + Counter(3 bytes)), making it unlikely to be duplicated in practical use cases.
Furthermore, the chances of actor_id and publisher in a single document being duplicated are even lower.
Considering Yorkie isn't currently expected to exceed the practical use cases, maintaining the client_id reference key in the aforementioned concerns seems reasonable.

In addition, the client_key currently doesn't have any use case, including the owner in the documents collection. It seems possible to remove the client_key field from clients and client-wide collections.
Then, I believe we can use client_id as a shard key instead (with an unique constraint), eliminating the risk of duplicate IDs in client_id.

Moreover, there are alternatives for handling duplicate client_ids:

  • Introduce an UUID generator into the Yorkie cluster
  • Use actor_id as a combination of client_key and client_id
    • This may increase the size of CRDT metadata and the size of a Document snapshot.

Which issue(s) this PR fixes:

Addresses #673

Special notes for your reviewer:

Does this PR introduce a user-facing change?:


Additional documentation:


Checklist:

  • Added relevant tests or not required
  • Didn't break anything

@sejongk sejongk marked this pull request as draft January 19, 2024 09:39
Copy link

codecov bot commented Jan 19, 2024

Codecov Report

Attention: 31 lines in your changes are missing coverage. Please review.

Comparison is base (7b3df9d) 50.73% compared to head (f07f121) 50.73%.

Files Patch % Lines
client/client.go 0.00% 8 Missing ⚠️
server/backend/database/mongo/client.go 60.00% 8 Missing ⚠️
server/backend/database/client_info.go 0.00% 5 Missing ⚠️
server/rpc/yorkie_server.go 82.75% 4 Missing and 1 partial ⚠️
api/types/resource_ref_key.go 0.00% 2 Missing ⚠️
server/backend/database/memory/database.go 80.00% 0 Missing and 2 partials ⚠️
server/backend/housekeeping/housekeeping.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #768   +/-   ##
=======================================
  Coverage   50.73%   50.73%           
=======================================
  Files          70       70           
  Lines       10234    10274   +40     
=======================================
+ Hits         5192     5213   +21     
- Misses       4517     4536   +19     
  Partials      525      525           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sejongk and others added 2 commits January 19, 2024 19:59
This commit introduces MongoDB sharding rules to documents and
document-wide collections (e.g. Changes, SyncedSync, Snapshots).

Changes:
- Introduce RefKey to represent ID and shard key together instead of ID.
- Change the reference key of Users from _id to username.
- Introduce encoders for types.ID, time.ActorID and ClientDocInfo to MongoDB.
@sejongk sejongk force-pushed the introduce-sharding-rules-to-clients-collection branch from f8f61f2 to 2788be4 Compare January 22, 2024 05:10
@sejongk sejongk force-pushed the introduce-sharding-rules-to-clients-collection branch from 2788be4 to 8d00c78 Compare January 22, 2024 05:29
sejongk and others added 3 commits January 22, 2024 17:11
This commit rolls back MongoDB sharding rules for `users` collection
to avoid unnecessary collection sharding. Not only `projects`
collection but also `users` collection is expected to store relatively
less amount of data.
@sejongk sejongk marked this pull request as ready for review January 22, 2024 11:20
@sejongk sejongk changed the title Introduce MongoDB sharding rules to clients collection Introduce MongoDB sharding rules to Client-wide collections Jan 22, 2024
@hackerwins hackerwins closed this Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants