Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make space format recursive #5991

Open
alyapunov opened this issue Apr 14, 2021 · 10 comments
Open

Make space format recursive #5991

alyapunov opened this issue Apr 14, 2021 · 10 comments
Labels
feature A new functionality raw idea
Milestone

Comments

@alyapunov
Copy link
Contributor

alyapunov commented Apr 14, 2021

Now you may specify space format:
box.schema.space.create('test', {format = {{name = 'id', type = 'num'}, {name = 'data', type = 'array'}}})
Also in short form:
box.schema.space.create('test', {format = {{'id', 'num'}, {'data', 'array'}}})
But as I know there's no way to specify format for nested arrays:
box.schema.space.create('test', {format = {{name = 'id', type = 'num'}, {name = 'data', type = 'array', format = {{'name', 'str'},{'age', 'num'}}}}})
I think we should implement it, this kind of scheme would be complete. Of course all the usual thing like UPDATE by json path must work with such a format.
For the complete scheme there some things that should be investigated:

  • Format for maps.
  • Specify array size, like field_count in space.
  • Specify nameless format like {type='num'}
  • Alternative formats. For example for RTREE index a field must be array of two numbers / array of four numbers.
  • Default values.
  • Default formats, e.g. format for the rest fields format={{'name','str'},{'age','num'},default_format={type='num'}}.
  • Index definition format = {{name = 'id', type = 'num', index = 'primary'}}
  • Set indexed fields.
  • Set not indexed fields (even if the space has index by the field).
  • Merge formats. It would be great if 1)a user could append format, not specifying whole format 2)an index created it's own format that is merged with user's in order to create cumulative.
UPDATE. Can't stop. I think 'foreign key' feature is also a format issue. I haven't thought much about referential integrity, but as for selection imagine the following:
box.schema.space.create('person', {format = {'id', 'name', 'age'}}):create_index('pk')
box.schema.space.create('account', {format = {{'id'}, {'person', foreign_space='person'}}}):create_index('pk')
--box.schema.space.create('account', {format = {{'id'}, {'person', foreign_space=512, foreign_index='pk'}}}):create_index('pk')
box.space.person:insert{1, 'Alice', 21}
box.space.account:insert{1, 1}
tarantool> box.space.account:get{1}
---
- [1, 1]
...
-- We could get here [1, 'Alice', 21], but it would brake backward compatibility. Maybe it's better to make it configurable.
tarantool> box.space.account:get{1}.person
---
- 1
...
tarantool> box.space.account:get{1}.person.name
---
- Alice
...
@alyapunov alyapunov added feature A new functionality incoming raw idea and removed incoming labels Apr 14, 2021
@olegrok
Copy link
Collaborator

olegrok commented Apr 14, 2021

There are at least two issues for some points specified in your list:

@parihaaraka
Copy link

In accordance with the advice i'd expect an option to enable indirect nesting (within string or varbinary value). It looks like a cornerstone of the most performant msgpack usage.

@Totktonada
Copy link
Member

In theory we could also provide ability to strip known key names to decrease stored tuple size. Say, when a field is known as storing a dictionary with keys 'foo' and 'bar' (and it is known that it is unable to store other keys), there is no reason to actually store those key names. We can pack {foo = 6, bar = 7} into [6, 7]. In fact, we don't store top level field names (when 'foo' and 'bar' are columns), but store them for nested dictionaries.

However I guess it is will be tricky to implement it in a backward compatible way. OTOH, we can create a space with an option like is_dense = true/false (or is_packed) if it is actually impossible without such explicit hint.

@Totktonada
Copy link
Member

Recursive schemas are quite hard to read as yaml / json / lua object. It worth to consider implementing IDL (interface description language) like Protocol Buffers or Avro IDL do.

@Totktonada
Copy link
Member

{name = 'data', type = 'array', format = {{'name', 'str'},{'age', 'num'}}}

Is is array or map? Stored as array, but allows accesses to array elements using string names? Interesting. Maybe it resolves my point about dense storing.

@parihaaraka
Copy link

pgpro way
i remember they were going to store json schema only, but they go further

@alyapunov
Copy link
Contributor Author

{name = 'data', type = 'array', format = {{'name', 'str'},{'age', 'num'}}}

Is is array or map? Stored as array, but allows accesses to array elements using string names? Interesting. Maybe it resolves my point about dense storing.
It's an array. Like the tuple itself now. It's an array but (some) indexes have alternative names.

@Gerold103
Copy link
Collaborator

I see you added something about foreign keys - they are not a part of the format for sure. They are separate entities called "constraints". They are supposed to be created independently like indexes and SQL CHECKs. AFAIK, the only constraint which is a part of the format now is "not nullable" (when is_nullable is omitted or set to false), and it would be better to keep it like that.

We can though add some syntax sugar to Lua to create them together with the space.

@kyukhin kyukhin added this to the wishlist milestone Jul 13, 2021
@R-omk
Copy link

R-omk commented Jun 14, 2023

related #3142

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality raw idea
Projects
None yet
Development

No branches or pull requests

7 participants