Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable unbundling dependencies and linking to the system libraries instead. #20284

Merged
merged 9 commits into from Jul 13, 2018

Conversation

perfinion
Copy link
Member

@perfinion perfinion commented Jun 25, 2018

This series adds a framework to be able to unbundle the dependencies from tensorflow. Previously, bazel rebuilds every single dep from scratch. This allows distro packages to disable bundling on a per-dep basis and link with the libraries that already exist on the system.

For more information why deps should be unbundled, see these links: https://wiki.gentoo.org/wiki/Why_not_bundle_dependencies
https://fedoraproject.org/wiki/Bundled_Libraries

The deps I have unbundled so far work on my Gentoo machine. Thanks to dennisjenkins@google.com for testing this on Gentoo as well. There are still more that need to be unbundled but this covers enough to be useful already. This has not been tested on any other distro than Gentoo but I don't forsee any big incompatibilities.

The libs that are unbundled are configured by setting build --action_env TF_SYSTEM_LIBS="comma,sep,list"
eg:

build --action_env TF_SYSTEM_LIBS="com_googlesource_code_re2,nasm,jpeg,png_archive,org_sqlite,gif_archive,six_archive,astor_archive,termcolor_archive,pcre,swig,curl,grpc,lmdb,zlib_archive,snappy,flatbuffers,cython,jemalloc"

Once configured, the tarball for the dep will not be downloaded or extracted, and the BUILD file will be swapped to the system_build_file instead which contains different rules to compile and link against the system package.
There are also macros if_system_lib(name, a, b) and if_not_system_lib(name, a, b) to configure other things in the build.

DO NOT MERGE THIS YET. I'm opening this for comments are review here before its ready for merging.

@perfinion
Copy link
Member Author

@angersson so you can make sure this doesn't conflict with your changes.
@meteorcloudy can you take a look at the bazel bits?

@angerson
Copy link
Contributor

@jart Can you have a look at this as well?

The gist of this looks good to me, but I'm not certain of the implications for other systems, etc. I'm working on updating my internal proposal to better work with this.

fail("\n%sSystem Library Configuration Error:%s %s\n" % (red, no_color, msg))


def _enable_syslibs(repository_ctx):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is Linux specific feature, can you disable it on Windows?
You can check if it's building on Windows with a function like:

def _is_windows(repository_ctx):
"""Returns true if the host operating system is windows."""
os_name = repository_ctx.os.name.lower()
if os_name.find("windows") != -1:
return True
return False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I added this

meteorcloudy
meteorcloudy previously approved these changes Jun 26, 2018
Copy link
Member

@meteorcloudy meteorcloudy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment about Windows.
LGTM from Bazel side, thanks for the great work!

@perfinion
Copy link
Member Author

I pushed an update:

  • disabled the syslibs_configure stuff on windows.
  • fixed the search path in cython.BUILD.
  • rebased on to latest master.
  • master had bumped jsoncpp too so I added a commit to unbundle that as well.
  • grpc.BUILD added targets for the secure libs too.

The header file paths for jsoncpp are different from the system paths so I had to symlink them. bazel complains when I tried doing includes=["/usr/include/jsoncpp"]. Maybe in a future bazel version we can add some configs to allow non- hermetic builds using the system headers or searching pkg-config or something.

I've run this patchset in the gentoo ebuild both bundled and unbundled and its working so far :)

@msarahan
Copy link

Thank you for working on this! I've been trying to use this for Anaconda's packages, too. I'm having some difficulty with the hardcoding of paths, such as with cython: https://github.com/tensorflow/tensorflow/pull/20284/files#diff-c8176cf1a58a40560176239b30ef7e52R8

Is there a good way to have that be more dynamic? For example, all of our build paths are relative to an environment variable, $PREFIX. For you, this is probably /usr or maybe /usr/local. For us, it can be anything, and essentially never /usr. Perhaps we could use $PREFIX in the build files, and have it fall back to a default of /usr when it is not defined?

@perfinion
Copy link
Member Author

I have pushed sci-libs/tensorflow-1.9.0_rc1-r2 to the gentoo tree with these patches. the system-libs USE-flag enabled the system ones, disabling the flag uses the bundled stuff like before in case there are any issues. Hopefully I dont get too many bugs filed because of it :D

@perfinion
Copy link
Member Author

@msarahan yeah, the prefixing is something that I forsaw as problematic. Normal gentoo uses $ROOT=/, but we definitely do support both cross-compiling (eg $ROOT=/usr/arm-linux-gnu/) and also installing things as a whole alongside another distro/machine/whatever (eg $PREFIX=/home/jason/root/). Currently those cases are probably also broken.

I was thinking to write some simple scripts like find-binary.sh and find-python.sh and change the genrule()'s to use those instead of directly cp'ing fixed paths. Once there are scripts then it would be quite easy to add extra logic per-distro or to search pkg-config for whatever flags. Not every distro puts binaries in the same place too so scripts would be more robust.

About $PREFIX specifically, bazel strips basically everything from the environment currently. There are issues filed with the bazel team and it will eventually more variables. In the mean time you'd need to add something like build --action_env=PREFIX.

This patch set is already getting pretty large so I think we should get this basic functionality merged in then later patchsets will be smaller and much easier to review :) Other than editing some paths in third_party/systemlibs/*.BUILD, has the rest mostly been working for you? If some of the core bits fail horribly for your case, that would probably warrant fixing first. You can look at the gentoo ebuilds for inspiration too but they aren't pretty yet: https://github.com/gentoo/gentoo/tree/master/sci-libs/tensorflow

@msarahan
Copy link

Thanks @perfinion. The problem with hardcoded paths at all is that we don't really use them for conda packages. $PREFIX points to some path that looks like /usr or /usr/local (with include, bin, and lib subdirs), but it is not a fixed path. For example, during a package build, it will be something like

/home/msarahan/miniconda3/conda-bld/tensorflow_(timestamp)/_h_env_(pad)

with (pad) representing enough placeholder characters to get us to a 255 character padding for any fixed paths that get baked into binaries. That's essential to relocatability.

Not having dynamic paths is a pretty big non-starter for us.

Adding --action_env=PREFIX is definitely something I knew we'd need to use - I'm just curious about how to incorporate that into the BUILD files.

To put headers and libs onto the search paths, I first symlinked my $PREFIX var in my build script to a location within the Tensforflow third_party folder, and added that folder to the include and link paths:

licenses(["notice"])  # BSD/MIT-like license

filegroup(
    name = "LICENSE",
    visibility = ["//visibility:public"],
)

cc_library(
    name = "png",
    copts = ["-Ithird_party/systemlibs/include"],
    linkopts = ["-lpng", "-Lthirdparty/systemlibs/lib"],
    visibility = ["//visibility:public"],
)

That worked OK, seemingly. Next I was hung up on the hard-coded paths as I mentioned before. I began to try something a little different, based on what I found at https://stackoverflow.com/questions/43928653/call-llvm-config-prefix-and-use-it-in-a-build-rule/43936381#43936381.

I removed the BUILD files completely, and instead tried to generate them more dynamically:

def _curl_impl(repository_ctx):
  prefix_path = repository_ctx.os.getenv("PREFIX", "/usr")
  repository_ctx.symlink(prefix_path, "external_prefix")
  repository_ctx.file("curl.BUILD", """
licenses(["notice"])  # MIT/X derivative license

filegroup(
    name = "COPYING",
    visibility = ["//visibility:public"],
)

cc_library(
    name = "curl",
    copts = ["-Iexternal_prefix/include"],
    linkopts = ["-lcurl", "-Lexternal_prefix/lib"],
    visibility = ["//visibility:public"],
)
""")

I have not gotten this working yet, but I think it might work. It would still require --action_env=PREFIX, but that's fine with me. Any thoughts on the viability of this approach? It need not be part of this PR, if it is better to get this initial work in and then adapt.

@msarahan
Copy link

PS: for the executables that needed to be copied, I symlinked after using which to find them:

def _cython_impl(repository_ctx):
  cython_path = repository_ctx.which("cython")
  repository_ctx.file("cython.BUILD", """
licenses(["notice"])  # Apache-2.0

genrule(
    name = "lncython",
    outs = ["cython.py"],
    cmd = "ln -s {} $@",
)

# May not be named "cython", since that conflicts with Cython/ on OSX
sh_binary(
    name = "cython_binary",
    srcs = ["cython.py"],
    visibility = ["//visibility:public"],
)
""".format(cython_path))

@perfinion
Copy link
Member Author

@msarahan for cython specifically, it used to prefix it with $PYTHON_BIN_PATH which made things not work on gentoo since we have a special thing to select which python version is the default. that has been reverted so looks like its really easy now.

You dont need to bother with any of the generating stuff, the cmd= and copts and linkopts can all do make and bash variable expansion.
https://docs.bazel.build/versions/master/be/make-variables.html#make-var-substitution
https://docs.bazel.build/versions/master/be/common-definitions.html#sh-tokenization

You should be able to have just cython.BUILD with this:

genrule(
    name = "lncython",
    outs = ["cython.py"],
    cmd = "ln -s $$(which cython) $@",
)

sh_binary(
    name = "cython_binary",
    srcs = ["cython.py"],
    visibility = ["//visibility:public"],
)

Alternatively, in .bazelrc set build --define PREFIX=/usr then use cmd = "ln -s $(PREFIX)/bin/cython"
The problem is if PREFIX is not defined it fails, I havent figured out yet how to make it default to a blank value.

putting an echo $${PREFIX} inside cmd= does not work either but I think it should. That would be the best since we could use bash defaulting to do eg: "${PREFIX:-/usr}/bin/cython"

In copts and linkopts you can probably do "-I$(PREFIX)/include".

@yifeif yifeif assigned yifeif and unassigned rmlarsen Jun 28, 2018
@msarahan
Copy link

Seems very close to working, but I'm confused on an error with the external grpc. I have put up my patch (I am applying it to 1.9.0rc1 right now): https://gist.github.com/msarahan/e38ccd45521617356a099d50b172f2b8

The error is:

Analyzing: target //tensorflow/tools/pip_package:build_pip_package (149 packages loaded)
Analyzing: target //tensorflow/tools/pip_package:build_pip_package (149 packages loaded)
ERROR: /Users/msarahan/miniconda3/conda-bld/tensorflow-base_1530276141984/work/tensorflow/core/distributed_runtime/rpc/BUILD:97:1: no such package '@grpc//': tf_http_archive rule //external:grpc must create a directory and referenced by '//tensorflow/core/distributed_runtime/rpc:grpc_channel'
ERROR: Analysis of target '//tensorflow/tools/pip_package:build_pip_package' failed; build aborted: no such package '@grpc//': tf_http_archive rule //external:grpc must create a directory
INFO: Elapsed time: 9.950s

My build script calls bazel with:

     bazel ${BAZEL_OPTS} build \
           --verbose_failures \
           --config=opt \
           --action_env TF_SYSTEM_LIBS="${TF_SYSTEM_LIBS}" \
           --define PREFIX="${PREFIX}" \
           //tensorflow/tools/pip_package:build_pip_package

Before I was passing in --define as you recommended, it was dying in earlier externals, such as png. That gives me confidence in the overall approach, but it seems I have something wrong. If I remove grpc from the list of externals, it gets further, but dies on missing the license from LMDB. It should not be necessary to list actual files in the external stuff, right?

@perfinion
Copy link
Member Author

@msarahan I rebased everything and also added the $(which foo) stuff

about your grpc error: its probably cuz you need to use double $$'s here:
cmd = "ln -s $(which grpc_python_plugin) $@",

I added those parts tho already so hopefully it should work if you just add the linkopts and includes.
I wonder if its easier for now to just add the linkopts and copts globally in the bazelrc?

@perfinion perfinion force-pushed the unbundle branch 2 times, most recently from 94e87b9 to 140c34b Compare July 4, 2018 13:37
@perfinion
Copy link
Member Author

I think I'm satisfied now with this patchset so far. It should be ready for review then merging when the workflow rejig is done.

I've pushed the older version with the gentoo _rc1-r2 ebuild. The _rc2 ebuild with the latest patches will be pushed to gentoo once some of the deps are bumped too.

@martinwicke
Copy link
Member

Can you fix the buildifier failure?

@angerson
Copy link
Contributor

angerson commented Jul 9, 2018

@perfinion Thanks for keeping up with updating this!

Would you mind commenting with some commands to demonstrate how to test this on Ubuntu? It'll double as documentation for later. I'm far from an expert on builds, and as a user I wouldn't know how to go from 'jpeg is one of the packages that can come unbundled' to 'I need to apt install libjpeg-dev and then provide --build=....

Signed-off-by: Jason Zaman <jason@perfinion.com>
…em_lib macros

Signed-off-by: Jason Zaman <jason@perfinion.com>
Signed-off-by: Jason Zaman <jason@perfinion.com>
The grpc license files are deeper in the heirarchy so are hard to nop
out in the system version of the BUILD file.

Signed-off-by: Jason Zaman <jason@perfinion.com>
The jsoncpp headers are included with a different path so we have to
symlink them so the are in the dir structure that is expected.

Signed-off-by: Jason Zaman <jason@perfinion.com>
…resh

Signed-off-by: Jason Zaman <jason@perfinion.com>
@perfinion
Copy link
Member Author

@martinwicke @angersson I fixed the buildifier style fix and rebased. can you re-trigger the tests?

I think the nobuild error was unrelated, looks like this commit fixes it. The rebase should pick that up too.

commit b13b2c15d2e864270b28eb5c5d3ec9f53d3932a7
Author: Yu-Cheng Ling <ycling@google.com>
Date:   Fri Jul 13 08:38:45 2018

    Fix the broekn BUILD rule under lite/delegates/eager.

nobuild error:
ERROR: /tmpfs/src/github/tensorflow/tensorflow/contrib/lite/delegates/eager/BUILD:22:1: no such package 'testing/base/public': BUILD file not found on package path and referenced by '//tensorflow/contrib/lite/delegates/eager:util_test'
ERROR: Analysis of target '//tensorflow/contrib/lite/delegates/eager:util_test' failed; build aborted: no such package 'testing/base/public': BUILD file not found on package path

@angerson angerson added the kokoro:force-run Tests on submitted change label Jul 13, 2018
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jul 13, 2018
@angerson angerson added the ready to pull PR ready for merge process label Jul 13, 2018
@angerson
Copy link
Contributor

The changes here need manual intervention to work with the new PR import process, which I am currently working on handling.

@tensorflow-copybara tensorflow-copybara merged commit 8fc498b into tensorflow:master Jul 13, 2018
tensorflow-copybara pushed a commit that referenced this pull request Jul 13, 2018
PiperOrigin-RevId: 204539752
@angerson
Copy link
Contributor

Done and merged! Thanks!

@dennisjenkins75
Copy link

Yeah! Thank you all for vastly improving the TensorFlow experience on Gentoo Linux.

@meteorcloudy
Copy link
Member

/cc @dslomov

@perfinion perfinion deleted the unbundle branch July 24, 2018 09:20
perfinion added a commit to perfinion/tensorflow that referenced this pull request Aug 4, 2018
tensorflow#20284 Added the framework
to unbundle deps and use system libraries. The TF_SYSTEM_LIBS variable
needed to be added manually. This makes configure handle it for
convenience. There is no prompt yet, that will be added later.

Signed-off-by: Jason Zaman <jason@perfinion.com>
perfinion added a commit to perfinion/tensorflow that referenced this pull request Aug 14, 2018
tensorflow#20284 Added the framework
to unbundle deps and use system libraries. The TF_SYSTEM_LIBS variable
needed to be added manually. This makes configure handle it for
convenience. There is no prompt yet, that will be added later.

Signed-off-by: Jason Zaman <jason@perfinion.com>
perfinion added a commit to perfinion/tensorflow that referenced this pull request Aug 17, 2018
tensorflow#20284 Added the framework
to unbundle deps and use system libraries. The TF_SYSTEM_LIBS variable
needed to be added manually. This makes configure handle it for
convenience. There is no prompt yet, that will be added later.

Signed-off-by: Jason Zaman <jason@perfinion.com>
@timokau
Copy link

timokau commented Jun 22, 2019

@perfinion do you observe a significant change in build time on gentoo when using system libraries? I tried to adopt this for the nix package, but to my surprise the build time barely changed (around 1h on a 12 core machine, 3.5h on a 4 core).

perfinion added a commit to perfinion/tensorflow that referenced this pull request Aug 9, 2019
tensorflow#20284 Added the framework
to unbundle deps and use system libraries. The TF_SYSTEM_LIBS variable
needed to be added manually. This makes configure handle it for
convenience. There is no prompt yet, that will be added later.

Signed-off-by: Jason Zaman <jason@perfinion.com>
@perfinion
Copy link
Member Author

@timokau Sorry, didnt see this earlier, yeah its a fair bit faster if you unbundle all of them. GRPC, protobuf and several others are decently big but you're right stuff like libpng are fairly small so are not a huge compile time. The biggest part I've noticed tho is the linking stage is a lot faster and uses a lot less RAM than when static linking. Also the python deps were really annoying unless unbundled.

Also feel free to email me (jason at perfinion dot com) if you have any issues with packaging, i'd love to help but I might not see this thread. :)

@timokau
Copy link

timokau commented Sep 21, 2019

Thank you for your feedback and the offer :)

We have since managed to get the source build working and made use of most of the decoupled dependencies. Bundled deps are not quite as painful with nix (as everything has its own FHS etc.), but it's still nice to be able to unbundle them. Thank you for your work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes ready to pull PR ready for merge process
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet