Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Dictionaries support #172

Open
vasiliy-t opened this issue Dec 10, 2018 · 3 comments
Open

[RFC] Dictionaries support #172

vasiliy-t opened this issue Dec 10, 2018 · 3 comments
Assignees
Labels

Comments

@vasiliy-t
Copy link

Motivating example

Application dealign with User's, User's have Attribute's, each Attribute has type, and there is a dictionary table (~10k records) linking attribute type to attribute name localization table and attribute groups table.
Users stored on vshard storages and accessed through vshard routers.
There is an REST API /profile endpoint returning user profile info, including grouped attributes with localized names.

Problem statement

This example illustrates a use case for dictionary. There are not much records in dictionaries, dictionaries required to process almost any request, dictionary data is defined by user, not configuration, at some point dictionary data changes rarely.

There are several ways how to deal with dictionaries in sharded cluster:

  • shard dictionaries records and then get them on router with additional requests to storage, to mitigate additional network roundtrips it's possible to cache them on router for a short period of time, requires additional caching and expiration logic for dictionaries and some cache warm up time

  • provide each storage instance with it's own full copy of dictionaries so each storage is able to get required data locally

  • set standalone tarantool instance to handle dictionaries, but it's obvious that dictionaries is the most accessed data in this case, so there could be cases where one instance is not sufficient and also requires additional network roundtrips

Seem like to store a copy of dictionary on each storage is the best option but requires additional application logic - when new instance is set up dictionaries must be there before node starts processing requests, dictionaries updates must be processed consistently on each instance.

This seems like pretty common use case and it seems reasonable to implement dictionaries support directly in vshard.

@Gerold103
Copy link
Collaborator

At first, the text is too big, full of your application details and hard to understand. Please, rephrase what you want in a more common terms. At second, I am sure that such 'lua-sharding' is not a common thing that can not be implemented on current vshard as an application. Vshard shard buckets consisting of tuples from spaces, not application nor language-specific in-memory data.

@Gerold103
Copy link
Collaborator

Just for record - I would have understood an idea to shard additionally any user data, but I had not understood the text in the first comment and why it should shard only dictionaries. My proposal would look like this: I provide to a user an interface, a set of hooks, which vshard calls when tries to reshard. A user should implement this interface so as to return an iterator from which vshard fetches data and transfers it. On a destination storage another user hook is called which applies the data. It would allow to do not depend on type of data. An example of interface to register your iterators.

--
-- Register a custom sharded storage. Can be different from space.
-- @a storage is an object having methods:
--
-- * storage.iterator(bucket_id)
-- Get an iterator object for a specified bucket and having
-- method next(), returning a next object in this bucket of
-- this storage.
--
-- * storage.store(bucket_id, object)
-- Store an object, transferred from a remote storage.
--
-- * storage.gc(bucket_id)
-- Remove content of a specified bucket.
--
function vshard.storage.register_custom(name, storage)
-- ...
end

@Gerold103
Copy link
Collaborator

After a verbal discussion it appeared, that 'dictionary table' here is a space, which should be fully stored on each instance in the cluster. In fact, this is a feature request for tarantool/tarantool#3982. In case of urgency this issue can be solved without the core support via a special cluster-wide bucket.

@kyukhin kyukhin added this to the wishlist milestone Jul 29, 2021
@kyukhin kyukhin changed the title Dictionaries support [RFC] Dictionaries support Mar 31, 2023
@Gerold103 Gerold103 added feature A new functionality complicated labels May 22, 2023
@kyukhin kyukhin removed this from the wishlist milestone May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants