Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: provide a build system for Linux distribution integration #581

Closed
eli-schwartz opened this issue Mar 16, 2020 · 10 comments · Fixed by #602
Closed

RFC: provide a build system for Linux distribution integration #581

eli-schwartz opened this issue Mar 16, 2020 · 10 comments · Fixed by #602

Comments

@eli-schwartz
Copy link
Contributor

Linux distributions would like to provide a system libtree-sitter.so and headers in /usr/include, sufficient to build and link against the tree-sitter software. This would be available as a mutually inclusive alternative to the currently documented way to use tree-sitter, which is to copy the project wholesale as a vendored subdirectory inside another project's git sources, and manually integrate it into that other project's build system.

Requirements for distribution packaging:

  • respect standard conventions such as $CFLAGS and $LDFLAGS
  • Compile the sources using one authoritative command
  • Link the sources into a shared library, preferably with a soname indicating its current ABI level.
  • (optionally) allow the user to enable a static library instead of, or in addition to, the shared library
  • Install the library(/ies) to /usr/lib (or some build system convention like --libdir)
  • Install the public headers needed to link to libtree-sitter.(so|a) to /usr/include (or some build system convention like --includedir)
  • Install a pkg-config file to /usr/lib/pkgconfig/, e.g. tree-sitter.pc, which will cause the command pkg-config --libs --cflags tree-sitter to emit whichever required C compiler flags needed to link to tree-sitter. Typically, that would be -I/usr/include -L/usr/lib -ltree-sitter.

Not in scope:
Preventing people from continuing to copy source files into their project, if they really want to.

Suggestion:
The meson build system is a fairly popular build system that can automatically handle most integration details for building and installing software. You simply define a library(), tag it with install: true, and add all source files and headers, everything else is handled automatically. #467 implements meson for this project, and could be used as inspiration for a mutually satisfactory solution. One bonus of meson is that it can automatically generate pkg-config files via its Pkgconfig module.

@eli-schwartz
Copy link
Contributor Author

eli-schwartz commented Mar 16, 2020

From the linked PR, you said:

The reason that this particular PR is a non-starter is that Tree-sitter is used in a very large number of downstream projects which all use different C build systems, and which all rely on the fact that they can trivially incorporate Tree-sitter into their build statically by adding one source file and two include paths. This is by far the most popular way of using this library, and it will most likely remain as the main way, which is recommended in the docs and exercised by the project's CI build.

I would like to respond, that nearly every C build system provides a way to detect "system dependencies" using the standardized pkg-config interface. Certainly,

  • meson (builtin via dependency()),
  • cmake (builtin via find_package(PkgConfig); pkg_check_modules()),
  • golang (builtin via // #cgo pkg-config: tree-sitter),
  • cabal (builtin via pkgconfig-depends),
  • ruby (builtin via mkmkf.rb's pkg_config),
  • python-pkgconfig,
  • and pkg-config-rs

among others, support this detection. (Implementing it in any other build system is as simple as writing the glue code to execute the pkg-config program with the name of the dependency, and capture its output for reuse as C compiler arguments. For example, chromium does it via gyp.)

I believe that there would be significant interest by downstream projects in using system tree-sitter installations.

Furthermore, there are a number of environments (most Linux distributions tend to be very concerned about this) where vendoring source code of one project inside another project is a very big no-no and can cause proposed package uploads to be rejected or delayed.

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Mar 16, 2020

Hey, thanks for listing out the requirements clearly. I think it seems reasonable to include something in this repo that helps with this use case.

Not in scope:
Preventing people from continuing to copy source files into their project, if they really want to

I believe that there would be significant interest by downstream projects in using system tree-sitter installations.

As we discuss this, I do think it'd be helpful if you understood the reasons why the library is currently consumed the way that it is. I'm not sure whether it's necessary to go into detail, but for the majority of use cases, the fact is that the downstream application needs to:

  1. statically link the library
  2. explicitly control its version (via a git submodule or whatever)

Does that make sense? Moving on, I totally acknowledge that we should support dynamic linking better, provided that it doesn't complicate the existing use cases.

@maxbrunsfeld
Copy link
Contributor

Some questions about the requirements that you listed. I'm not super familiar with the differences between the various linux package management systems. So I appreciate you explaining this stuff.

  1. Is pkg-config a fairly universal thing that's used by all C libraries and package managers?
  2. Do pkg-config files need to be dynamically generated, even for simple libraries without dependencies, as opposed to simply being hand-written? If so, what parts of the config file cannot by captured by a simple static .pc file?
  3. Would a simple Makefile suffice for this purpose? I like Make because it is universal.

Overall, I'd just really like to minimize 1) the dependencies associated with the project, and 2) the code/config that I take responsible for maintaining and which is not consumed by the existing project structure (e.g. the CLI, the existing language bindings, etc).

@maxbrunsfeld
Copy link
Contributor

I'm looking at the Arch package sources for some packages seem to have a similar complexity as Tree-sitter. It seems like there's a number of different patterns that exist.

In some cases, the project repos themselves just contain a Makefile, and the Arch packages specify their own .pc files:

In other cases, project repos themselves contain both a Makefile and a simple static .pc file:

In some cases, projects just contain a Makefile, and it's not clear that any pkg-config are created at all 🤔:

It seems like most of the more complex libraries use either autotools or CMake.


It seems like we can all agree that Tree-sitter should probably at least have a Makefile, instead of its current 20-line shell script. So I'd like to propose, as a starting point for this discussion, the following initial steps:

  1. Replace the current shell script script/build-lib with a Makefile which can build static and dynamic libraries, and install header files.
  2. Add a static .pc file at the root of the repo, and have the Makefile copy that .pc file to lib/pkgconfig.

If we need futher automated configuration, I think I'd prefer to use CMake as opposed to autotools, or something more obscure like Meson.

@eli-schwartz Thoughts?

@eli-schwartz
Copy link
Contributor Author

eli-schwartz commented Mar 16, 2020

I do understand that downstream projects may need to carefully control the version, however, I think pkg-config can serve this purpose too: the downstream project can require a specific version or version range such as pkg-config --libs --cflags 'foo >= 2.1 foo < 4'.

Satisfying missing external dependencies can often be solved with a fallback such as meson's Wrap dependencies or cmake's ExternalProject that downloads on demand and builds a static library.

(I agree, too, that static libraries can be useful!)

Is pkg-config a fairly universal thing that's used by all C libraries and package managers?

It's a widely accepted standard, and though it isn't universally used by C libraries, projects that depend on C libraries prefer to discover them using pkg-config. It's pretty universally supported by build systems, and some, like https://conan.io/, will actually try to create one for you if the C library dependency doesn't already have one. (This is sometimes not so great, because you can only depend on it if you use conan for everything, and also because you have to guess what the upstream developer intended. Hence, it's preferred to have an upstream one, as opposed to pathologically confusing cases like lua, where many downstream vendors provide their own pkg-config files with different names so you cannot actually rely on them.)

Do pkg-config files need to be dynamically generated, even for simple libraries without dependencies, as opposed to simply being hand-written? If so, what parts of the config file cannot by captured by a simple static .pc file?

They don't, though it can be convenient, and it become very convenient if your library has multiple dependencies, some of which are optionally enabled via build-time options. I'd generally suggest that if you're going to use a build system which knows how to autogenerate it, why not take advantage... but if not, you don't have to. The alternative is a simple, static template such as the following example:

prefix=@PREFIX@
libdir=@LIBDIR@
includedir=@INCLUDEDIR@

Name: tree-sitter
Description: An incremental parsing system for programming tools
URL: https://tree-sitter.github.io/
Version:  @VERSION@
Libs: -L${libdir} -ltree-sitter
Cflags: -I${includedir}

And you could use sed to replace @VERSION@ with 0.16.5, or statically code it and manually edit the .pc file every time. PREFIX/LIBDIR/INCLUDEDIR should also be configurable if the build system does anything other than installing straight to /usr/lib and /usr/include, this is something that meson's autogenerator would do automatically but could be done with a custom sed line or custom cmake configure_file rules or so on.

Would a simple Makefile suffice for this purpose? I like Make because it is universal.

A Makefile would be fine, as long as it has an install target and the compiler line uses CFLAGS. POSIX Make has built-in pattern rules for compiling .c -> .o while making use of CFLAGS (Single Suffix Rules), and GNU Make includes a $(LINK.o) helper macro for LDFLAGS but for portability you can just use $(CC) -o $@ $^ $(LDFLAGS) -Wl,-soname,libtree-sitter.so.$(SONAME)

@maxbrunsfeld
Copy link
Contributor

maxbrunsfeld commented Mar 18, 2020

And you could use sed to replace @VERSION@ with 0.16.5

This seems like a fine solution to me. How about we add a tree-sitter.pc and a Makefile that work similarly to the ones in RE2. I think that the Makefile could be a lot simpler than RE2's Makefile, because it doesn't need to deal with things like running tests, benchmarking, and fuzzing, since these are dealt with elsewhere. It can just be make and make install.

@eli-schwartz
Copy link
Contributor Author

That seems pretty reasonable to me as well. Thanks for the consideration!

As a slight matter of personal taste, I would recommend naming it tree-sitter.pc.in inside the git repo, and installing it with sed -e 's/.../.../g' -e 's/.../.../g' tree-sitter.pc.in > $(DESTDIR)$(libdir)/pkgconfig/tree-sitter.pc. Rationale:

  • .in is a popular naming convention for files which need to be configured, which this does,
  • it completely sidesteps the problem of detecting how to "portably" run sed -i/--in-place (just because it works on macOS and Linux doesn't mean it is in the POSIX manual, and sure, I'm not usually afraid of GNU options, but if you don't need it anyway...),
  • and it means you write the file only once, rather than three or four times (one for each separate sed and one for the install -m).

@maxbrunsfeld
Copy link
Contributor

Love it. I too was hoping to avoid that sed-in-place business.

@maxbrunsfeld
Copy link
Contributor

Is anyone interested in opening a PR with these additions?

@eli-schwartz
Copy link
Contributor Author

eli-schwartz commented Apr 21, 2020

I've finally gotten the time to take a look at this and I created #602 which I believe should cover all the bases. I've attempted to be fairly generic to ensure it's broadly usable. Tell me if there's anything you'd like tweaked.

It's inspired by some similar Makefiles e.g. the one mentioned above, but I've taken a few liberties to do things my (hopefully simpler) way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants