Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ActorID type with UUID or Long instead of String to reduce the size of CRDT meta #36

Closed
hackerwins opened this issue Mar 30, 2020 · 12 comments · Fixed by #145
Closed
Labels
enhancement 🌟 New feature or request good first issue 🐤 Good for newcomers

Comments

@hackerwins
Copy link
Member

hackerwins commented Mar 30, 2020

The sizes of the types:

Name Binary Size String Size Features
[UUID] 16 bytes 36 chars configuration free, not sortable
[shortuuid] 16 bytes 22 chars configuration free, not sortable
[Snowflake] 8 bytes up to 20 chars needs machine/DC configuration, needs central server, sortable
[ObjectID] 12 bytes 24 chars <-- currently we use configuration free, sortable
xid 12 bytes 20 chars configuration free, sortable
uint64 8 bytes N/A configuration free, needs central server, sortable
@hackerwins hackerwins added the enhancement 🌟 New feature or request label Apr 11, 2020
@hackerwins hackerwins added the good first issue 🐤 Good for newcomers label May 16, 2020
@computerphilosopher
Copy link
Contributor

Could I try this issue?

@hackerwins
Copy link
Member Author

@computerphilosopher Sure. :)

@dc7303
Copy link
Member

dc7303 commented Nov 30, 2020

@hackerwins
The current ActorID is checked by the _id value of the clients collection in mongoDB.
Are you thinking of adding the actorID field to the clients collection to fix this issue?

@hackerwins
Copy link
Member Author

@dc7303

Yes. If we use a type other than ObjectID, we will need to add a new field to the clients collection. Looking at it again, choosing a UUID will increase its size more than it is now.

The sizes of the types listed in this issue are:

Even if we use uint64, the actorID is expected to decrease by only 33%. We need to calculate how much it reduces in the snapshot.

@autumn-n
Copy link
Contributor

@hackerwins Could I try this if still alive?

@hackerwins
Copy link
Member Author

@autumn-n Sure. If you have any questions, please feel free to comment.

@autumn-n
Copy link
Contributor

@hackerwins

Thank you :) It's the first time with Golang, MongoDB, but I'll give it a try.

Before getting started, I'd like to hear your thoughts on the below.

  1. Is it breaking backward compatibility, right?
  2. What's a proper name for the new field in ClientInfo? I have no idea but ID, Key. :(
  3. MongoDB doesn't support unsigned integers due to BSON spec. So I got a fail (overflow) when I try to store the value of rand.uint64() to DB. Any ideas?

@hackerwins
Copy link
Member Author

  1. Is it breaking backward compatibility, right?

Yes, this breaks backward compatibility.

  1. What's a proper name for the new field in ClientInfo? I have no idea but ID, Key. :(

I think id is appropriate.

  1. MongoDB doesn't support unsigned integers due to BSON spec. So I got a fail (overflow) when I try to store the value of rand.uint64() to DB. Any ideas?

I didn't expect the problem because I saw uint64 at bsonspec. However, it seems a bit cumbersome according to what I searched for.


I think we can reduce the size of actor_id(or client_id) of API if we define it strictly using it as array of 12 bytes rather than string.

Name Binary Size String Size Features
[UUID] 16 bytes 36 chars configuration free, not sortable
[shortuuid] 16 bytes 22 chars configuration free, not sortable
[Snowflake] 8 bytes up to 20 chars needs machine/DC configuration, needs central server, sortable
[ObjectID] 12 bytes <-- suggestion 24 chars <-- currently we use configuration free, sortable
xid 12 bytes 20 chars configuration free, sortable
uint64 8 bytes <-- new field N/A configuration free, sortable

Introducing a new field with uint64 can only reduce it by 33.33% more(12 bytes -> 8 bytes), so I think it would be good to define the ObjectID type strictly in the API.

@autumn-n
Copy link
Contributor

  1. What's a proper name for the new field in ClientInfo? I have no idea but ID, Key. :(

I think id is appropriate.

Ok, then is it gonna be like this?

type ClientInfo struct {
	_ID       ID                    `bson:"_id_fake"`   // `ID` -> `_ID` or something else
	ID        uint64                `bson:"id"`         // new
	Key       string                `bson:"key"`
	Status    string                `bson:"status"`
	Documents map[ID]*ClientDocInfo `bson:"documents"`
	CreatedAt time.Time             `bson:"created_at"`
	UpdatedAt time.Time             `bson:"updated_at"`
}
  1. MongoDB doesn't support unsigned integers due to BSON spec. So I got a fail (overflow) when I try to store the value of rand.uint64() to DB. Any ideas?

I didn't expect the problem because I saw uint64 at bsonspec. However, it seems a bit cumbersome according to what I searched for.

Because uint64 is supported only for the timestamp.

I think we can reduce the size of actor_id(or client_id) of API if we define it strictly using it as array of 12 bytes rather than string.
...
Introducing a new field with uint64 can only reduce it by 33.33% more(12 bytes -> 8 bytes), so I think it would be good to define the ObjectID type strictly in the API.

Sorry I didn't catch it. Could you please tell me more details between actor_id as an array of 12 bytes and Introducing a new field with uint64 ... (12 bytes -> 8 bytes)?

Let me sort out what I understand so far.

From my understanding, It wants to reduce the size of data on the two points.

  1. Response data from a server
  2. Stored data on DB. Especially snapshot of the snapshots collection or operations of the changes collection?

Both are used binary data from serializing with Protobuf, so I made a test briefly.

doc := document.New("c1", "d1")
_ = doc.Update(func(root *proxy.ObjectProxy) error {
	//root.SetString("test", "123456789012345678901234") // "size: 102" <-- currently we use
	//root.SetBytes("test", make([]byte, 12)) // "size: 90" <-- suggestion
	//root.SetLong("test", math.MaxInt64) // "size: 86" <-- suggestion?
	return nil
})
bytes, _ := converter.ObjectToBytes(doc.RootObject())
fmt.Printf("size: %d\n", len(bytes))

The integer is the best, but no big difference with an array of 12 bytes. So I thought that to avoid overflow, actor_id could be defined as uint64 and limit the value up to a max of int64.

@hackerwins
Copy link
Member Author

@autumn-n Thank you for your confirmation.

We can reduce it by 50% just by strictly using the type of ObjectID in the Protobuf message. So it is not yet necessary to define a new field of type integer. This is because the logic becomes more complex and can only be reduced by an additional 33.33%.

In summary:

  1. We don't need to add a new field to the collection.
  2. Change the types of actor_id and client_id from string to 12 bytes array in Protobuf message.

@autumn-n
Copy link
Contributor

@hackerwins
Ahw ok, thank you. I'm on it.

JS SDK has to be also changed together.

@hackerwins
Copy link
Member Author

@autumn-n Thanks for the contribution.
I noticed that the size of the snapshot is more than 10% smaller than before.
#9 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement 🌟 New feature or request good first issue 🐤 Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants