Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce unified Lua ffi.cdef builder #4202

Open
Gerold103 opened this issue May 8, 2019 · 18 comments
Open

Introduce unified Lua ffi.cdef builder #4202

Gerold103 opened this issue May 8, 2019 · 18 comments
Labels
feature A new functionality lua

Comments

@Gerold103
Copy link
Collaborator

Lua ffi is quite fast and simple mechanism allowing to call C functions directly, without using Lua C API. It is usually faster, sometimes much faster. But ffi requires to define all the C methods again in a Lua file in ffi.cdef function. It is as bad as any other code duplication - when the original API is changed, a one could forget to update Lua files and ffi.cdef arguments. It is especially painful, when ffi C definition is really big.

Interestingly, we have exactly the same issue with module.h file - it is in fact a set of copy-pasted code blocks from some header files. But it is copy-pasted automatically. CMake calls a special routine walking over specified header files and extracting code inside /** \cond public */ and /** \endcond public */ tags.

Goal of this ticket is to introduce a similar mechanism to generate ffi C definitions. API RFC:

Option 1

File box_example.h

/* \cond ffi */

int
box_example1(void);

int
box_example2(void);

int
box_example3(void);

/* \endcond ffi */

File box_example.lua

ffi = require('ffi')
ffi.cdef_file('box_example.h')

Option 2

This option allows to collect C definitions by name, from multiple files.

File box_example.h

/* \cond lua box_example */

int
box_example1(void);

int
box_example2(void);

int
box_example3(void);

/* \endcond lua box_example */

File box_another_example.h

/* \cond lua box_example */

int
box_another_example4(void);

int
box_another_example5(void);

/* \endcond lua box_example */

File box_example.lua

ffi = require('ffi')
ffi.cdef_tag('box_example')
@Gerold103 Gerold103 added the lua label May 8, 2019
@Gerold103
Copy link
Collaborator Author

Just to realize the problem: SWIM needs to duplicate 96 lines from two headers; schema.lua duplicates 97 lines; tuple.lua duplicates 49 lines; buffer.lua duplicates 29 lines; digest.lua duplicates 10 lines; error.lua duplicates 27 lines; fiber.lua - 8 lines; iconv.lua - 5 lines; uuid.lua - 24 lines.

@kyukhin kyukhin added the feature A new functionality label May 9, 2019
@kyukhin kyukhin added this to the 2.3.0 milestone May 9, 2019
@kyukhin kyukhin modified the milestones: 2.3.1, 2.4.0 Aug 6, 2019
@servoin
Copy link
Contributor

servoin commented Dec 13, 2019

I propose to combine the two options into:
ffi.cdef_file(files, [, tag])
Where
"files" is a list of headers in Linux-style command line expansion (using space separated names and file masks);
"tag" is an optional tag string (like in option 2).
So, for example,
ffi.cdef_file("box_example*.h", "box_example")
will look for all .h files with names starting from box_example, and include all definitions in
/* \cond ffi box_example */ blocks.

@Gerold103
Copy link
Collaborator Author

Gerold103 commented Dec 13, 2019

Thanks for the proposal! It is good that this thing is started.
I think that an ability to cdef multiple files at once is an overkill. And it has nothing to do with command line. It is enough to be able to include one file. Just like C style '#include'. Then you can call ffi.cdef_file several times if you need. Although in 99% you don't need.
Also I don't think we need a wildcard. That is even a bigger overkill, which adds unnecessary complexity to the implementation.
I mean I proposed it as the option 2, but now I think it might be an overkill.

@igormunkin
Copy link
Collaborator

Wow, the feature is one vital for both Tarantool developers and users, and I'd like to add my two cents.

At first, I guess ffi.cdef_file is not such a good name for such a great feature. Maybe ffi.include or ffi.export, what do you think?

As for me, the processing of /* \cond */ sentinels seems to be excess, since we can just separate our public and private interfaces and split them into two headers. Then you just add to your Lua sources the following line as proposed above

ffi.include('example.h')

Moreover I see another approach to differ functions to be exported via ffi.include. Considering possible FFI misusage (e.g the one leading to FFI sandwiches), we can explicitly mark FFI function signature with a special keyword (also to be provided via LuaJIT public headers). Thus only marked functions are exported to Lua space. Please consider the following

#define FFISAFE

FFISAFE int
box_example1(void);

FFISAFE int
box_example1(void);

There are only functions mentioned in the original proposal. And what about typedefs and enums declarations?

@Gerold103
Copy link
Collaborator Author

I like ffi.include. Export does not fit. We don't export. We rather import.

Talking of cond labels. The problem is that we can't really split the headers. There is many of them, and that way does not scale IMO. Also it is unavoidable that such public headers will have #include in them, what won't be parsed by ffi. So we need tags. It is easy to implement, and allows not to change any of the existing C code.

Talking of FFISAFE sounds good. Maybe even better than tags. But then you need to design how to apply them to structures, typedefs and enums. Because often we export them too. So maybe tags are better. Don't know.

What bother me more - can we do that import at runtime? And how?

@Totktonada
Copy link
Member

We can just load all generated definitons at once at startup, before loading built-in Lua modules. This way ffi.{cdef_file,cdef_tag,import} will not be needed.

This solution looks as easiest for me and it handles the stated problem. Aside of that we don't need to design a general purpose feature that nobody actually asks.

FFISAFE proposal looks interesting, however I don't sure it actually handles FFI misuse problem: what if a function that is called from FFISAFE one is changed? It seems that this is the primary source of misusages.

Prevention of FFI misusages deserves its own issue. I guess it requires a way more work, then automatic generation of cdefs for FFI.

@servoin
Copy link
Contributor

servoin commented Dec 16, 2019

Thank you all for your comments!
But I need a clear understanding of what should I do, because there are many opinions and ideas about the implementation.
Please provide me with the following info:

  1. What will be the name of proposed function?
  2. What should this function do in particular?
    Thanks.

@Totktonada
Copy link
Member

I propose to don't add any ffi.* function, but do ffi.cdef() for generated declarations under hood.

Waiting for comments from @Gerold103 and @igormunkin.

@Gerold103
Copy link
Collaborator Author

What is 'generated declarations' you are talking about? Only a few things are a part of public API in module.h. I need to be able to export internal not generated structures, functions, types. Otherwise this is useless. Moreover, public 'generated' API is fixed. It never changes, so this code does not need that feature.

Also what is 'load at startup'? You need ffi.cdef for that, there is no other way to export structures into Lua land.

Talking of the problem about 'misusage' - it has nothing to do with this ticket. The ticket is purely about simplification of ffi.cdef.

@Totktonada
Copy link
Member

I'm not about module.h, but another generated file. It seems I should describe my proposal better.

So, I propose to generate a file that may look like so:

local ffi = require('ffi')
ffi.cdef([[
<all declarations between /* \cond ffi */ and /* \endcond ffi */ here>
]])

Then add the file name to lua_sources list in src/CMakeLists.txt and to lua_modules in src/lua/init.c. This is basically all.

@Gerold103
Copy link
Collaborator Author

Thanks for the explanation. Yeah, that looks good to me as well. I don't have a strict preference to ffi.import or your option. Both solve the problem.

@igormunkin
Copy link
Collaborator

@Totktonada, thanks for the proposal! Yours one looks to fit better regarding the problem to be solved. For internal usage I'm totally OK with your solution, and I totally agree that we need another issue for "FFI misusage" problem. Could you please dump the final proposal as @servoin asked before?

@servoin
Copy link
Contributor

servoin commented Dec 17, 2019

So as I currently understand, there should be some module which generates .lua source file containing ffi.cdef with definitions from some .h file using tags to include only parts of this .h file. Correct me if I'm wrong. What I don't understand for now is how this module will be called during the build process. Could you please give an example?

@Totktonada
Copy link
Member

Could you please dump the final proposal as @servoin asked before?

#4202 (comment) is not enough? Don't know what to add here.

What I don't understand for now is how this module will be called during the build process. Could you please give an example?

In the similar way as module.h is generated. See extra/apigen and cmake/module.cmake.

@Totktonada
Copy link
Member

BTW, should we do the same for luaL_cdef() calls?

@servoin
Copy link
Contributor

servoin commented Dec 26, 2019

Currently, there are ffi.cdefs not only in tarantool, but also in its submodules, namely: msgpuck and small. The question is: what should we do with these ffi.cdefs in submodules? Leave them as is or apply changes for unified cdef builder (it will require separate patches for submodules)?

Also, the question about luaL_cdef() calls is still open.

@olegrok
Copy link
Collaborator

olegrok commented Dec 26, 2019

Leave them as is

IMO, yes, because this libraries are standalone and tarantool depends on them but not the opposite

@Totktonada
Copy link
Member

Looks connected with #911.

servoin added a commit that referenced this issue Jan 21, 2020
Lua ffi is quite fast and simple mechanism allowing to call
C functions directly, without using Lua C API. It is usually
faster, sometimes much faster. But ffi requires to define all
the C methods again in a Lua file in ffi.cdef function. It is
as bad as any other code duplication - when the original API
is changed, a one could forget to update Lua files and ffi.cdef
arguments. It is especially painful, when ffi C definition is really big.

The goal is to introduce a mechanism to generate ffi C definitions.

The implementation.
===================

The mechanism to generate ffi C definitions is similar to how
module.h is generated. There is a script extra/fficdefgen which
parses specified set of C headers and produces content for ffi.cdef.
The idea is similar to module.h generation: use comment tags in
C headers to rip off the content between two tags:
/* \cond ffi */
and
/* \endcond ffi */
Everything between these tags will be the output of this script.

The output of extra/fficdefgen is placed into src/lua/load_ffi_defs.lua file.
This file contains three parts:
1. Header - from src/lua/load_ffi_defs.header.lua
2. Output from extra/fficdefgen
3. Footer - from src/lua/load_ffi_defs.footer.lua
So we can add some C definitions manually before and after those
which are automatically generated. This is important since
some definitions cannot be added automatically, e.g. from system
headers, submodules and containing identifiers which are Lua keywords
(see comments in load_ffi_defs.header.lua and load_ffi_defs.footer.lua).

load_ffi_defs.lua file is produced by rebuild_module_fficdef function in
cmake/fficdef.cmake CMake script.

load_ffi_defs.lua is then compiled into src/lua/load_ffi_defs.lua.c source
(see lua_sources list in CMakeLists.txt).

load_ffi_defs.lua content is referenced in lua_modules in src/lua/init.c
and is loaded at startup with all ffi.cdef definitions gathered from
specified C headers and load_ffi_defs.header.lua/load_ffi_defs.footer.lua.

So, what we need to include in C headers are the tags.
But how do we know which C headers to include into load_ffi_defs.lua?
There is a list (similar to one for module.h) in src/CMakeLists.txt file
called fficdef_headers. rebuild_module_fficdef function is also
called from CMakeLists.txt.

Important notice. For now, only core C headers are included into
fficdef_headers list. This means that headers from Tarantool
submodules are not included and some C definitions are thereby
manually placed into load_ffi_defs.header.lua/load_ffi_defs.footer.lua.

Currently, this ffi.cdef builder is kind of internal mechanism, but
can be used by anyone following the scenario:
1. Include /* \cond ffi */ and /* \endcond ffi */ tags into needed C header.
2. Include needed C header in fficdef_headers list in CMakeLists.txt.

Closes: #4202
servoin added a commit that referenced this issue Jan 21, 2020
Lua ffi is quite fast and simple mechanism allowing to call
C functions directly, without using Lua C API. It is usually
faster, sometimes much faster. But ffi requires to define all
the C methods again in a Lua file in ffi.cdef function. It is
as bad as any other code duplication - when the original API
is changed, a one could forget to update Lua files and ffi.cdef
arguments. It is especially painful, when ffi C definition is really big.

The goal is to introduce a mechanism to generate ffi C definitions.

The implementation.
===================

The mechanism to generate ffi C definitions is similar to how
module.h is generated. There is a script extra/fficdefgen which
parses specified set of C headers and produces content for ffi.cdef.
The idea is similar to module.h generation: use comment tags in
C headers to rip off the content between two tags:
/* \cond ffi */
and
/* \endcond ffi */
Everything between these tags will be the output of this script.

The output of extra/fficdefgen is placed into src/lua/load_ffi_defs.lua file.
This file contains three parts:
1. Header - from src/lua/load_ffi_defs.header.lua
2. Output from extra/fficdefgen
3. Footer - from src/lua/load_ffi_defs.footer.lua
So we can add some C definitions manually before and after those
which are automatically generated. This is important since
some definitions cannot be added automatically, e.g. from system
headers, submodules and containing identifiers which are Lua keywords
(see comments in load_ffi_defs.header.lua and load_ffi_defs.footer.lua).

load_ffi_defs.lua file is produced by rebuild_module_fficdef function in
cmake/fficdef.cmake CMake script.

load_ffi_defs.lua is then compiled into src/lua/load_ffi_defs.lua.c source
(see lua_sources list in CMakeLists.txt).

load_ffi_defs.lua content is referenced in lua_modules in src/lua/init.c
and is loaded at startup with all ffi.cdef definitions gathered from
specified C headers and load_ffi_defs.header.lua/load_ffi_defs.footer.lua.

So, what we need to include in C headers are the tags.
But how do we know which C headers to include into load_ffi_defs.lua?
There is a list (similar to one for module.h) in src/CMakeLists.txt file
called fficdef_headers. rebuild_module_fficdef function is also
called from CMakeLists.txt.

Important notice. For now, only core C headers are included into
fficdef_headers list. This means that headers from Tarantool
submodules are not included and some C definitions are thereby
manually placed into load_ffi_defs.header.lua/load_ffi_defs.footer.lua.

Currently, this ffi.cdef builder is kind of internal mechanism, but
can be used by anyone following the scenario:
1. Include /* \cond ffi */ and /* \endcond ffi */ tags into needed C header.
2. Include needed C header in fficdef_headers list in CMakeLists.txt.

Closes: #4202
@kyukhin kyukhin modified the milestones: 2.4.1, 2.5.1 Mar 26, 2020
@kyukhin kyukhin modified the milestones: 2.5.1, 2.6.1 Apr 23, 2020
@kyukhin kyukhin modified the milestones: 2.6.1, wishlist Oct 23, 2020
@Mons Mons removed the in progress label Dec 4, 2020
tsafin added a commit that referenced this issue Dec 20, 2020
Created infrastructure which simplifies building of cdef header
for integration into SQL AST module. We will use approach
similar to module_api scripts, but integrated into build the
different way, e.g:

1. we created `cdef_source` cmake macro, which is similar
   to `lua_source` macro and embeds source literal to the
   Tarantool executable, but this file is created as result
   of preprocessing similar to module_api;
2. we use `\cond ffi` / `\endcond ffi` for marking regions
   we would like to keep for exporting to FFI;
3. we preprocess using the referenced file as start. Preprocessor
   invoked with `-CC` option first to retain `\cond`/'\endcond`
   comments ;
4. We cleanup the generated file using sed to delete empty lines
   end prerpocessor noise;
5. We leave only relevant `\(end)cond` region;
6. Then preprocess again to remove comments and trim header;
7. The generated file to be embedded to C as literal via `txt2c`;
8. And at the initialization of Tarantool we could use this literal
   for luaL_cdef to make known whole AST hierarchy via
   single hop;

Could not resist and not note that I hate cmake, it took more
than single day to figure out the proper way to construct
$CC command line extended with all necessary include paths.
I had to use `COMMAND_EXPAND_LISTS`
`add_custom_command`' option, which is there only since
cmake 3.8, and if we had to use anything older then
we ought to rework this approach using external script.
But I'd prefer we not do it.

Inspired by #4202
tsafin added a commit that referenced this issue Dec 20, 2020
Prior commit has established facilities for semi-automatic
build of header file whih might be consumed by ffi.cdef.
Here we use these facilities for SQL AST generation. All relevant
headers have been marked appropriately with:
- `/** \cond ffi */` for range begin;
- `/** \endcond ffi */` for range end.

This all sounds very similar to module_api, but integrated
in a different way - we create `sql_ast_cdef[]` literal, not publicly
accessible headers.

Please not be surprised with strange renames of typedefs used with
`struct StructName` here and there. It simplifies integration with
LuaJIT ffi compiler.

NB! Eventually we have to fix error propagation from LuaJIT
ffi compiler in luaL_cdef. At the moment it simply produces
'Unknown exception' and stops. I had to use luajit for checking
syntax errors there, which is  gracefully shown there

```
20:08 $ luajit loadcdef.lua
luajit: loadcdef.lua:12: declaration specifier expected near 'ExprList' \
at line 83
stack traceback:
        [C]: in function 'cdef'
        loadcdef.lua:12: in main chunk
        [C]: at 0x5584f179c1d0
```

Still is inspired by #4202
tsafin added a commit that referenced this issue Dec 21, 2020
Prior commit has established facilities for semi-automatic
build of header file whih might be consumed by ffi.cdef.
Here we use these facilities for SQL AST generation. All relevant
headers have been marked appropriately with:
- `/** \cond ffi */` for range begin;
- `/** \endcond ffi */` for range end.

This all sounds very similar to module_api, but integrated
in a different way - we create `sql_ast_cdef[]` literal, not publicly
accessible headers.

Please not be surprised with strange renames of typedefs used with
`struct StructName` here and there. It simplifies integration with
LuaJIT ffi compiler.

NB! Eventually we have to fix error propagation from LuaJIT
ffi compiler in luaL_cdef. At the moment it simply produces
'Unknown exception' and stops. I had to use luajit for checking
syntax errors there, which is  gracefully shown there

```
20:08 $ luajit loadcdef.lua
luajit: loadcdef.lua:12: declaration specifier expected near 'ExprList' \
at line 83
stack traceback:
        [C]: in function 'cdef'
        loadcdef.lua:12: in main chunk
        [C]: at 0x5584f179c1d0
```

Still is inspired by #4202
tsafin added a commit that referenced this issue Dec 22, 2020
Created infrastructure which simplifies building of cdef header
for integration into SQL AST module. We will use approach
similar to module_api scripts, but integrated into build the
different way, e.g:

1. we created `cdef_source` cmake macro, which is similar
   to `lua_source` macro and embeds source literal to the
   Tarantool executable, but this file is created as result
   of preprocessing similar to module_api;
2. we use `\cond ffi` / `\endcond ffi` for marking regions
   we would like to keep for exporting to FFI;
3. we preprocess using the referenced file as start. Preprocessor
   invoked with `-CC` option first to retain `\cond`/'\endcond`
   comments ;
4. We cleanup the generated file using sed to delete empty lines
   end prerpocessor noise;
5. We leave only relevant `\(end)cond` region;
6. Then preprocess again to remove comments and trim header;
7. The generated file to be embedded to C as literal via `txt2c`;
8. And at the initialization of Tarantool we could use this literal
   for luaL_cdef to make known whole AST hierarchy via
   single hop;

Could not resist and not note that I hate cmake, it took more
than single day to figure out the proper way to construct
$CC command line extended with all necessary include paths.
I had to use `COMMAND_EXPAND_LISTS`
`add_custom_command`' option, which is there only since
cmake 3.8, and if we had to use anything older then
we ought to rework this approach using external script.
But I'd prefer we not do it.

Inspired by #4202
tsafin added a commit that referenced this issue Dec 22, 2020
Prior commit has established facilities for semi-automatic
build of header file whih might be consumed by ffi.cdef.
Here we use these facilities for SQL AST generation. All relevant
headers have been marked appropriately with:
- `/** \cond ffi */` for range begin;
- `/** \endcond ffi */` for range end.

This all sounds very similar to module_api, but integrated
in a different way - we create `sql_ast_cdef[]` literal, not publicly
accessible headers.

Please not be surprised with strange renames of typedefs used with
`struct StructName` here and there. It simplifies integration with
LuaJIT ffi compiler.

NB! Eventually we have to fix error propagation from LuaJIT
ffi compiler in luaL_cdef. At the moment it simply produces
'Unknown exception' and stops. I had to use luajit for checking
syntax errors there, which is  gracefully shown there

```
20:08 $ luajit loadcdef.lua
luajit: loadcdef.lua:12: declaration specifier expected near 'ExprList' \
at line 83
stack traceback:
        [C]: in function 'cdef'
        loadcdef.lua:12: in main chunk
        [C]: at 0x5584f179c1d0
```

Still is inspired by #4202
@kyukhin kyukhin removed the incoming label Jul 29, 2021
@igormunkin igormunkin removed the teamL label Sep 15, 2022
@igormunkin igormunkin removed this from the wishlist milestone Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality lua
Projects
None yet
Development

No branches or pull requests

7 participants