Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lua serializers should use FFI to check and set options #4521

Open
Gerold103 opened this issue Sep 23, 2019 · 15 comments
Open

Lua serializers should use FFI to check and set options #4521

Gerold103 opened this issue Sep 23, 2019 · 15 comments
Assignees
Labels
app bug Something isn't working

Comments

@Gerold103
Copy link
Collaborator

There are several modules in Lua allowing to transform data from one format to another. YAML, Msgpack, Msgpackffi, JSON. They are available as require('msgpack'), require('yaml'), require('json'). They provide method cfg{}. It takes a dictionary of options to update. The options are members of struct luaL_serializer object in C.
The problem is that this structure needs to be visible in Lua. For example, so as it would be possible to check msgpack.cfg.encode_max_depth. Currently it is implemented quite inaccurate. Any update of struct luaL_serializer is copied manually to its Lua table, in luaL_serializer_cfg().
A better solution would be to declare struct luaL_serializer via ffi in Lua, and directly read and update the C structure. It would allow to drop luaL_serializer_cfg(), other related functions, and will solve a bug, that a user can do: <serializer>.cfg.<key> = <value> instead of <serializer>.cfg({<key> = <value>}), it won't update any options in fact, and the user won't get an error either. After the patch both ways should work the same.

@Gerold103 Gerold103 added bug Something isn't working good first issue Good for newcomers app labels Sep 23, 2019
@Totktonada
Copy link
Member

The similar issue about box.cfg{} is #2867.

@kyukhin kyukhin added this to the 2.4.1 milestone Sep 26, 2019
@OKriw OKriw self-assigned this Oct 22, 2019
@igormunkin
Copy link
Collaborator

Considering existing API and dubious benefits of FFI usage, the implementation of luaL_serializer interfaces via cdata seems to be less convenient way than one using userdata. As discussed offline the latter alternative will be elaborated to fix the issue. In case provided benchmarks for the userdata + metatable are considered acceptable, the approach can be adopted for #2867 mentioned above. Otherwise an alternative using FFI machinery will be implemented and compared to patch using userdata.

@Gerold103
Copy link
Collaborator Author

It has nothing to do with speed. What benchmarks we are talking about? This is just options, for Lua. This is only about ease of usage. Current implementation is a spaghetti of lua C API usage and Lua code.

And why it is less convenient? It allows to drop lots of lua C code trying to update a table + a structure simultaneously. It is inconvenient now.

@igormunkin
Copy link
Collaborator

@Gerold103, I guess you're confused a little with my comment. I totally agree with your remark about Lua + C spaghetti and two separate sources, that need to be synced after every update, and I'm all for the idea you proposed above. I wrote that FFI usage seems to be an inconvenient way to implement your proposal. Considering msgpuck/yaml/json initialization on Tarantool start phase and the lack of C API for manipulating with cdata (the set hacked within Tarantool utils is incomplete), implementation via userdata looks to be an alternative with much handy future maintanence.

Benchmarks I mentioned in my comment can show the performance deviation, introduced as a result of the patch to be made. Feel free to correct me if I'm wrong: box.cfg is more performance critical than configs for msgpuck/yaml/json modules. Therefore whether performance is not nerfed a lot after the discussed changes, we can adopt the elaborated approach for #2867.

@Gerold103
Copy link
Collaborator Author

Gerold103 commented Nov 13, 2019

Is it possible to expose a C structure to Lua not via FFI? The point is to remove Lua C API usage, which would not be possible if a serializer would be a userdata.
And what a problem about cdata and API for working with it? We have plenty of simple examples. SWIM, tuple, ibuf, uuid, and more - all of them can be used at start phase, and work with cdata objects both in Lua and C.
As FFI I mean ffi.cdef to declare a serializer structure. Not usage of C functions via FFI.

@igormunkin
Copy link
Collaborator

OK, now I don't get it.

  • Why do we need to expose a C structure to Lua?
  • Why do we need to remove Lua C API usage?

I see the userdata approach more convenient than divide module initialization in C part via hacked interfaces and Lua chunk, converted in a C literal.

Serializer seems to be a simple entity, obtainable in Lua and C, having the following overloaded metamethods: __index, __newindex, __serialize, __pairs (I hope I haven't missed a vital one). I propose to implement a such entity via userdata with metatable containing all methods listed above.

@Gerold103
Copy link
Collaborator Author

  • We need to expose a C serializer to Lua to remove Lua C usage;
  • We need to remove Lua C usage because it is too complex. It is based on some weird offsets to each member of the serializer, and that each of them is integer. And it is a duplicate of information - it is stored twice in Lua and in C. What makes possible to change it in Lua, but don't in C, and vice versa.

Userdata way won't be really different in terms of complexity, what makes this issue useless. It was about simplification, and userdata won't solve it.

Actually I don't care really much so as to continue arguing. Do whatever you consider right.

@Totktonada
Copy link
Member

We discussed this voicely and Igor said that creation and instantiation of a cdata type from C code is tricky and requires luaL_loadstring() calls that looks hacky. However I looked over the code and it seems that all necessary tricks are already encapsulated in src/lua/utils.c. So both ways looks equivalent for me except that cdata field accesses will be possibly faster.

@igormunkin
Copy link
Collaborator

The main problem I see for now, that cfg table to be used in Lua space is filled with values, so it need to be synced after every modification with the corresponding structure in C. My proposal is to set of __index/__newindex metamethods either for an empty cfg table or for userdata with the serializer payload. Implementing metamethods via Lua C API leads to a sole configuration source, that can be obtained via both C and Lua space, and looks to me a way more relevant approach for purposes described within this issue.

Since I'm neither the ticket author nor its assignee, I do not insist on the approach and @OKriw can compare both mentioned approaches and choose any of them she consider more convenient. My initial comment was only a dump of our recent offline discussion.

@OKriw
Copy link
Contributor

OKriw commented Feb 14, 2020

As we had agreed with @Totktonada and @igormunkin we need to try both cases and see which way is better/faster. I have implemented userdata+ metatable variant, however at the moment it may be pointless even to test i, due to #4770 . Because patch that turns on __pairs/ipairs metamethod might be reverted.
At the moment I can say that userdata looks + metamethods looks over complicated. So, we need to decide should we even test variant with userdata + metatable, due to the reason I have mentioned before.
After adding necessary functionality described in the task I have found #4761 and fixed it.

@igormunkin
Copy link
Collaborator

@Gerold103, the FFI approach has the one killing benefit: LuaJIT takes into account the __pairs/__ipairs for cdata even w/o LUAJIT_ENABLE_LUA52COMPAT being enabled. Therefore it pretends to be a table much better than userdata does. Now, I totally agree with your original proposal.

@OKriw, I left a pure Lua PoC of implementing the pairs for an FFI structure below, please consider it:

local ffi = require('ffi')

local function buildcdef(name, struct)
	local cdef = string.format("struct %s {", name)
	for field, ftype in pairs(struct) do
		cdef = cdef .. string.format('%s %s;', ftype, field)
	end
	return cdef .. '};'
end

local struct = {
	encode_sparse_convert = 'int',
	encode_sparse_ratio = 'int',
	encode_sparse_safe = 'int',
	encode_max_depth = 'int',
	encode_deep_as_nil = 'int',
	encode_invalid_numbers = 'int',
	encode_number_precision = 'int',
	encode_load_metatables = 'int',
	encode_use_tostring = 'int',
	encode_invalid_as_nil = 'int',
	decode_invalid_numbers = 'int',
	decode_save_metatables = 'int',
	decode_max_depth = 'int',
}

ffi.cdef(buildcdef('serializer', struct))

local serializer = ffi.metatype('struct serializer', {
	__pairs = function(s)
		return function(s, field)
			local name = next(struct, field)
			if not name then return nil end
			return name, s[name]
		end, s, nil
	end
})

local json = ffi.new(serializer,
	1,   -- encode_sparse_convert
	2,   -- encode_sparse_ratio
	10,  -- encode_sparse_safe
	128, -- encode_max_depth
	0,   -- encode_deep_as_nil
	1,   -- encode_invalid_numbers
	14,  -- encode_number_precision
	1,   -- encode_load_metatables
	0,   -- encode_use_tostring
	0,   -- encode_invalid_as_nil
	1,   -- decode_invalid_numbers
	1,   -- decode_save_metatables
	128, -- decode_max_depth
)

for k, v in pairs(json) do print(k, v) end

@Totktonada
Copy link
Member

After 676369b we have 'on update' triggers, which should be fired at serializer options updates. Is there way to keep this with cdata-based config?

@kyukhin kyukhin modified the milestones: 2.4.1, 2.4.2 Apr 10, 2020
@kyukhin kyukhin modified the milestones: 2.4.2, wishlist May 13, 2020
@OKriw OKriw removed their assignment Aug 8, 2020
@ligurio ligurio self-assigned this Apr 13, 2021
@kyukhin kyukhin removed prio6 labels Dec 9, 2021
@Totktonada
Copy link
Member

After 676369b we have 'on update' triggers, which should be fired at serializer options updates. Is there way to keep this with cdata-based config?

It seems, no.

tarantool> ffi = require('ffi')
tarantool> ffi.cdef([[struct foo { int x; };]])
tarantool> ffi.metatype('struct foo', {__index = function(...) print('__index', ...) end, __newindex = function(...) print('__newindex', ...) end})
tarantool> f = ffi.new('struct foo')

tarantool> f.y = 42
__newindex  cdata<struct foo>: 0x41dc7ca8  y  42
---
...

tarantool> f.x = 42
---
...

@Totktonada
Copy link
Member

I mean, it does not work neither of the ways:

  1. If we want to allow setting of known options directly (json.cfg.encode_use_tostring = true), we can't execute any code here (not even validation).
  2. If we want to disallow it and keep only assignments via a call (json.cfg({encode_use_tostring = true})), we can't forbid assigning of existing options directly.

OTOH, it seems, we can define a structure with one _internal field, which represents saved configuration values, define __index to read values, __newindex to set them (and perform any validation / actions) and __pairs and __ipairs to traverse over them. In fact, we can even use an empty structure + captured table (but that's harder to access from C land).

(__newindex may forbid assignments instead of setting configuration options, if we want.)

@Totktonada
Copy link
Member

NB: Don't forget about tab-completion of the options:

tarantool> json.cfg.encode_<Tab><Tab>
json.cfg.encode_deep_as_nil       json.cfg.encode_invalid_as_nil    json.cfg.encode_load_metatables   json.cfg.encode_number_precision  json.cfg.encode_sparse_ratio      json.cfg.encode_use_tostring      
json.cfg.encode_error_as_ext      json.cfg.encode_invalid_numbers   json.cfg.encode_max_depth         json.cfg.encode_sparse_convert    json.cfg.encode_sparse_safe 

@Mons Mons self-assigned this Feb 9, 2023
@TarantoolBot TarantoolBot removed the teamP label Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
app bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants