Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add soversion to libraries #22797

Merged
merged 1 commit into from
Mar 22, 2019
Merged

Conversation

perfinion
Copy link
Member

@perfinion perfinion commented Oct 6, 2018

Add version to library sonames

Previously the libs were just "libtensorflow.so" without any version.
This adds the version to the library (eg libtensorflow.so.1.13.0) and
adds the appropriate symlinks to the full name from the base and
soversion.

The soname is used by compilers to fill in the DT_NEEDED section in the
header of binaries that link to the library. Having the version means
that different versions of the library are able to co-exist and if the
ABI changes, programs linking to the lib do not break.

For more info see:
https://www.debian.org/doc/debian-policy/ch-sharedlibs.html
https://autotools.io/libtool/version.html

Signed-off-by: Jason Zaman jason@perfinion.com

@asimshankar
Copy link
Contributor

@perfinion : Thanks for the PR, but could you elaborate a bit on this? When building from source we'll always only have a single shared library, not a versioned one. So perhaps what you want is for this versioning to be done in the binary distributions rather than by changes to the BUILD rules?

Or am I missing something?

@asimshankar asimshankar added stat:awaiting response Status - Awaiting response from author and removed awaiting review Pull request awaiting review labels Oct 23, 2018
@perfinion
Copy link
Member Author

@asimshankar Sorry, the background is at https://groups.google.com/a/tensorflow.org/forum/#!topic/build/0FjOGrinf5k I should have linked that in the beginning.
basically the version needs to be put in the .so file when building. right now the soname is just "libtensorflow.so" so it cant be versioned later. the soname needs to be set and also requires the symlinks so the other bits of the build work.

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 24, 2018
@rmlarsen rmlarsen requested a review from gunan December 5, 2018 00:03
@angerson
Copy link
Contributor

angerson commented Dec 5, 2018

@asimshankar, @perfinion mentioned the other day that this naming setup was working for everything but Java. Could you take another look at it? (@perfinion, can you describe what you're doing again, like you did in the SIG Build meeting?)

@perfinion
Copy link
Member Author

A summary of the reasons for this change:

Distros are quite strict about versioning the libraries properly so that binary compatibility is easy. Whenever a library makes a breaking change the verison in the library is supposed to change accordingly, then other binaries will link to those specific versions. Lets take nsync as a specific example.

$ ls -al /usr/lib64/libnsync*
lrwxrwxrwx. 1 root root    13 Sep 21 01:58 /usr/lib64/libnsync.so -> libnsync.so.1*
lrwxrwxrwx. 1 root root    18 Sep 21 01:58 /usr/lib64/libnsync.so.1 -> libnsync.so.1.20.1*
-rwxr-xr-x. 1 root root 42856 Sep 21 01:58 /usr/lib64/libnsync.so.1.20.1*
lrwxrwxrwx. 1 root root    17 Sep 21 01:58 /usr/lib64/libnsync_cpp.so -> libnsync_cpp.so.1*
lrwxrwxrwx. 1 root root    22 Sep 21 01:58 /usr/lib64/libnsync_cpp.so.1 -> libnsync_cpp.so.1.20.1*
-rwxr-xr-x. 1 root root 51144 Sep 21 01:58 /usr/lib64/libnsync_cpp.so.1.20.1*
$ scanelf -ByF '%S %p' /usr/lib64/libnsync*1.20.1
libnsync.so.1 /usr/lib64/libnsync.so.1.20.1
libnsync_cpp.so.1 /usr/lib64/libnsync_cpp.so.1.20.1

when building a program that uses nsync, you'd build with gcc foo.c -lnsync then the compiler would look for libnsync.so, follow the symlink and link with that, then it will take the soname that is embedded in the library and put that in the NEEDED. So now when nsync releases a 2.0.0 version the lib will update the version and libnsync.so.2.0.0 will exist and the symlinks will point to that instead. libnsync.so.1 will still link to libnsync.so.1.20.1. This means both libnsync.so.1 and libnsync.so.2 can be installed at the same time and any existing binaries that link with the old one will keep working just fine and new compiled binaries will use the new one. More info from debian is here: https://www.debian.org/doc/debian-policy/ch-sharedlibs.html but the idea is exactly the same on all distros.

The linking is like this, notice the .1 that its using.

$ ldd /usr/lib64/libtensorflow_framework.so 
[ ... snip ... ]
	libnsync_cpp.so.1 => /usr/lib64/libnsync_cpp.so.1 (0x00007f7e751e2000)

Bazel does not really understand these versioning things so I've used a bunch of genrules to make the symlinks and they work well for all the C++ and go stuff. java is the bit that isn't linking quite right to them. or rather, it is linking to them properly, but later fails because it needs the lib itself and the symlink but only has one.

@rthadur rthadur added the size:M CL Change Size: Medium label Jan 2, 2019
@angerson
Copy link
Contributor

Asim has left the company, and @sjamesr is the new owner for the Java releases -- maybe he can help with the breakages.

@sjamesr
Copy link
Contributor

sjamesr commented Feb 8, 2019

Hi there, I looked at this a couple of days ago and couldn't reproduce the failures. I'll take another shot at it again today.

@perfinion
Copy link
Member Author

@sjamesr hey, thanks for looking into this! yeah, I rebased and tried recently too and the java tests pass on my machine. I'm not sure what changed to make it work now tho. The tests still appear to be failing on some of the builders (but not all, so it clearly sometimes works?). I've been playing around with building with the remote execution in a container to see if that'll help find the error but not gotten that going yet. Do let me know if you find how to repro the issues :)

@perfinion
Copy link
Member Author

I managed to get the java tests to fail using RBE. The targets all build properly, just the tests fail at runtime. Hopefully i'll figure out how to make them run now.

@perfinion
Copy link
Member Author

@angersson @sjamesr It should pass java and go tests now. RBE is a lot stricter so I could track things down. Basically added tf_binary_additional_srcs() to data= and a few places like that so It was properly in the deps. The symlink is not a proper library according to bazel so it can't be put in srcs= or deps= like normal but data= followed the same pattern as several other libs.

I'm seeing a failure of the tests in //tensorflow/lite/testing/... but those appear turned off in at least some of the CI bots so not sure if they are known bad?

@perfinion perfinion added the kokoro:force-run Tests on submitted change label Feb 16, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 16, 2019
Copy link
Contributor

@angerson angerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for this!

@angerson angerson added the ready to pull PR ready for merge process label Mar 18, 2019
@gunan gunan added the kokoro:force-run Tests on submitted change label Mar 18, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 18, 2019

return select({
clean_dep("//tensorflow:macos_with_framework_shared_object"): [
clean_dep("//tensorflow:libtensorflow_framework%s.dylib" % suffix),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@allenlavoie about the macos change here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me; sounds like name.major.minor.dylib is idiomatic.

@rthadur rthadur assigned rthadur and unassigned rmlarsen and ymodak Mar 19, 2019
@rthadur rthadur added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Mar 19, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 19, 2019
@tensorflow-copybara tensorflow-copybara merged commit ac1f6cc into tensorflow:master Mar 22, 2019
tensorflow-copybara pushed a commit that referenced this pull request Mar 22, 2019
tensorflow-copybara pushed a commit that referenced this pull request Mar 23, 2019
@perfinion perfinion deleted the soversion branch March 24, 2019 16:43
perfinion added a commit to perfinion/tensorflow that referenced this pull request Apr 4, 2019
tensorflow#22797 added the versioned
tensorflow_framework libraries. The pip wheel had both the .so.1 and
.so.1.13.1 libs but was missing the unversioned .so lib. This adds it
back.
The full .so.1.13.1 lib is probably not needed here but removing it
should come separately later on.

Signed-off-by: Jason Zaman <jason@perfinion.com>
perfinion added a commit to perfinion/tensorflow that referenced this pull request Apr 5, 2019
tensorflow#22797 added the versioned
tensorflow_framework libraries. The pip wheel had both the .so.1 and
.so.1.13.1 libs but was missing the unversioned .so lib. This adds it
back.

Also, fix //tensorflow/tools/lib_package:clib tar file to include
versioned libs.

Signed-off-by: Jason Zaman <jason@perfinion.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes ready to pull PR ready for merge process size:M CL Change Size: Medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.