Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable dead code elimination to reduce binary size #1477

Closed
rojer opened this issue Dec 5, 2019 · 25 comments
Closed

Enable dead code elimination to reduce binary size #1477

rojer opened this issue Dec 5, 2019 · 25 comments

Comments

@rojer
Copy link
Contributor

rojer commented Dec 5, 2019

Go's dead code elimination algorithm is described and implemented here.
TLDR is: elimination of unused public methods at link time will be inhibited if reflection function calling are used anywhere.

quick and dirty experiment shows that if DCE remains enabled, the uncompressed CPIO image size of just core goes down by ~2.3M: 15165376 -> 12707776, a 16% reduction.
with all exp commands, it's about the same: 19130524 - 16562332, about 14%

current uses of reflection in u-root are:

  1. elvish, to invoke built-in functions (here).
  2. dmidecode, to parse structs (addressed by dmidecode: Do not use reflect.Value.MethodByName #1476)
  3. text/template - uses reflection in struct attribute access, used in go/doc package, which is imported by go/build package.

(3) is the toughest to address, it's all in go core libraries.
but the payoff is significant, might be worth exploring.

@rojer
Copy link
Contributor Author

rojer commented Dec 5, 2019

compressed (XZ) image size reduction: 4325624 -> 3684448 (15%)

@rminnich
Copy link
Member

rminnich commented Dec 5, 2019

I am confused, how do you invoke it? My familiarity with DCE was when Go's linker was Ken's linker from Plan 9, which just always did DCE. When did it become an option and why, I wonder? We should always do DCE; I just assumed we did.

@rojer
Copy link
Contributor Author

rojer commented Dec 5, 2019

Go always does DCE, hen it can be sure that code is indeed dead.
use of reflection makes that impossible to say, as any code that seems dead may not really be, leading to runtime panics. so Go just disables DCE when reflection is used.
it used to be that use of anything from the reflect package would disable DCE, but over time that has been refined to just function/method invocation - see detailed description in the link above.
as described, i think it should be possible to get rid of these uses of reflection, re-enabling GCE and reducing the resulting binary size.

@rojer
Copy link
Contributor Author

rojer commented Dec 5, 2019

https://pastebin.com/wYLyp58R is the quick and dirty hack for go std library and elvish that re-enables DCE and that i used to get the numbers above.

@rminnich
Copy link
Member

rminnich commented Dec 5, 2019

so, wow, that is very interesting, I had no idea. Would it be worth extending the bb tool to warn when it sees code practices that disable gce? Or is there a tool usage that can do that for us?

@rminnich
Copy link
Member

rminnich commented Dec 5, 2019

Neat, if you want to put in that elvish fix I'll approve :-)

@hugelgupf
Copy link
Member

How does the fmt package affect us? Doesn't fmt use reflection, and fmt is used by nearly everything?

@rojer
Copy link
Contributor Author

rojer commented Dec 5, 2019

@rminnich i think it would, but first we need to figure out what to do about the go std library deps.

re: elvish - i'm pretty sure it breaks stuff, so no, it's not ready to be checked as is :) i'm not familiar with elvish enough (= at all) to be able to tell what exactly breaks. but since it's only builtin functions we're talking about, i guess they can be enumerated and the list made explicit, instead of just allowing anything via reflection?

@hugelgupf it does, but tno the parts that prevent DCE from working, i.e. not Value.Call or struct method lookup. it does seem that all that keeps us from getting DCE working is the text/template stuff, which i'm pretty sure is not too important.

however, since changes are all in the go runtime, i'm not sure what would be the best way to go about it. provide a patched runtime? this is not very user-friendly. but i guess could be done, e.g. provide a docker container with patched go that can be used to build smaller binary.
also, should it be raised as an issue upstream? maybe go can provide a build tag that disables method/function invocation in the template library, so users that care about binary size and are ok with not having that functionality (i.e. us) can build with that tag?

so, i can start by looking at elvish and coming up with a proper fix.
then, try to make a nice patch for go stdlib to build template w/o the naughty reflection bits and see if we can get any traction with it.

@rjoleary
Copy link
Contributor

rjoleary commented Dec 6, 2019

Thanks @rojer! That's a really cool find.

For text/template, I don't grasp the dependency graph. I thought go/build was only used at build time?

Also, is there an easy test to determine if a binary was built without DCE? It would be helpful for the u-root build to print a warning because this could easily regress again in the future.

@rojer
Copy link
Contributor Author

rojer commented Dec 6, 2019

@rjoleary so, i investigated the text/template use in u-root. turns out, the situation with unused code elimination in Go is pretty bad - unused global variables like maps and lists just cannot be eliminated (golang/go#31704).
text/template happens to have a very expensive such map: a map of builtin functions, including call, which invokes reflect.Value.Call and thus disabled DCE.
it, and all the functions it references, are thus considered alive (which also retains a good amount of code), and DCE is disabled.

the template functionality is not even used anywhere: the way we end up with this anchor on our neck is through cmds/core/installcommand, which uses pkgs/golang, which uses go/build, which invokes go/doc.Synopsis, which is a short helper function that does not use text/template at all, but other stuff in go/doc does and, because of the aforementioned bug, text/template.init gets retained, which retains builtins, which retains call, which uses reflect.Value.Call, which disabled DCE. ugh.

i filed golang/go#36021 - that way, if text/template.builtins is not used, it is never initialized. this will solve this particular case.
i also commented on the bigger issue here - golang/go#2559 (comment)
i bet there are quite a few global vars that retain dead code, so if that is iimplemented, it's bound to shave off some more bytes.

as for ways to detect if DCE is working, it is possible to have a sentinel method on some struct that should normally be eliminated and have a check at the end of build if it has been - go tool nm bb | grep DeadMethod or some such.

@hugelgupf
Copy link
Member

With #1358 I can probably get rid of the go/build dependency on pkg/golang.

@xaionaro
Copy link
Contributor

xaionaro commented Dec 6, 2019

Let me put my useless cents here :)

text/template happens to have a very expensive such map: a map of builtin functions, including call, which invokes reflect.Value.Call and thus disabled DCE.

In my very humble opinion all such stuff that prevents DCE (like call) should be disableable via a Go build tag (like forcedce or noreflectcall). A patch to Golang with noreflectcall seems to be quite easy to prepare, but I doubt the upstream will accept it :)


Also in gcc there's LTO which does a lot of stuff, including DCE. And a wild not confirmed guess is if somebody will use gccgo or llvm then may be somebody will be able to use:

  • Provided by gccgo/llvm DCE (which if works then it is lower-level and may be it's able to handle this case).
  • Something like -Os (man gcc).

Also just an useless thought-food:

cat > hello_world.go <<EOF
package main

import "fmt"

func main() {
	fmt.Println("Hello world!")
}
EOF
xaionaro@ubuntu:~/test$ go build -compiler gccgo -o gccgo.bin hello_world.go 
xaionaro@ubuntu:~/test$ go build -o gc.bin -ldflags="-w -s" hello_world.go 
xaionaro@ubuntu:~/test$ strip -s *.bin
xaionaro@ubuntu:~/test$ ls -l
total 1432
-rwxrwxr-x 1 xaionaro xaionaro 1434968 Dec  6 22:42 gc.bin
-rwxrwxr-x 1 xaionaro xaionaro   23464 Dec  6 22:42 gccgo.bin
-rw-rw-r-- 1 xaionaro xaionaro      73 Dec  6 22:09 hello_world.go
xaionaro@ubuntu:~/test$ go build -o gc-debug.bin hello_world.go 
xaionaro@ubuntu:~/test$ go build -compiler gccgo -o gccgo-debug.bin hello_world.go 
xaionaro@ubuntu:~/test$ nm gc-debug.bin | grep -c FieldByNameFunc
13
xaionaro@ubuntu:~/test$ nm gccgo-debug.bin | grep -c FieldByNameFunc
0

Also there's tinygo, but it has very cropped standard library and does not fit u-root requirements even close.


Also disassembled code of a binary made by Go compiler does not look compact. I believe that DCE is not the only strategy to shrink the binary, but also -Os-like strategies are valid, too.


In total:

  • May be we should try to implement build tag noreflectcall.
  • May be LLVM-based compilation will help to essentially shrink the binary. Moreover there're already existing projects trying to compile Golang via LLVM, so may be somebody should try it :)

@rojer
Copy link
Contributor Author

rojer commented Dec 6, 2019

@xaionaro, ooh, this is very interesting. so gcc-built binary trims a lot more fat. i wonder how/if reflection works in gcc-go built binaries. it may be that LTO/DCE (--gc-section?) is disabled in that case. definitely something to look into.

@xaionaro
Copy link
Contributor

xaionaro commented Dec 7, 2019

Eh... I've just tried to compile u-root/bb with gccgo and:

$ go build -compiler gccgo -gccgoflags "-Os -flto"; strip -s bb; ls -lh bb
-rwxrwxr-x 1 xaionaro xaionaro 18M Dec  7 00:20 bb

I also was have to do a lot of dirty hacks to make it work, but the size is even worse than with simple go build -ldflags '-w -s', it's not that simple as I hoped for :(

Looks like u-root/bb has some dependency that disables gccgo's DCE as well.


Also I've just tried to compile every tool separately:

TOOL gccgo gc ratio
ip 3935912 2519040 1.56247
dhclient 4794104 3096576 1.5482
init 3289280 2207744 1.48988
elvish 4815232 4079616 1.18032
fusermount 1848008 1945600 0.94984
kexec 2416552 2572288 0.939456
ls 1975320 2379776 0.830045
sshd 2811632 3817472 0.736517
ntpdate 1802416 2465792 0.730968
strace 1430792 2097152 0.682255
rmmod 1020840 1531904 0.666386
insmod 1016744 1556480 0.653233
shutdown 973440 1531904 0.635445
cpio 1457392 2306048 0.631987
hostname 968664 1540096 0.628963
mount 1046496 1691648 0.618625
losetup 1019632 1650688 0.617701
umount 987880 1642496 0.60145
hwclock 994920 1654784 0.601239
mknod 972912 1634304 0.595307
uname 977288 1642496 0.595002
mkfifo 968968 1634304 0.592893
dmesg 968776 1634304 0.592776
which 973168 1658880 0.586642
switch_root 987856 1699840 0.581146
stty 1080712 1949696 0.554298
find 1171408 2203648 0.531577
rsdp 967024 1921024 0.50339
ps 986648 1970176 0.500792
tr 933888 1900544 0.491379
less 1206536 2469888 0.488498
strings 910824 1880064 0.484464
basename 906808 1871872 0.484439
cp 944400 1961984 0.481349
more 906728 1888256 0.480193
md5sum 915272 1908736 0.479517
mktemp 911416 1908736 0.477497
shasum 915256 1929216 0.474419
tee 906872 1925120 0.471073
rm 911136 1937408 0.470286
tar 936696 2146304 0.436423
gzip 744224 1912832 0.389069
pci 777280 2420736 0.321092
gpgv 645488 2392064 0.269846
wget 817016 5324800 0.153436
dd 145088 1720320 0.0843378
gpt 137272 2084864 0.0658422
id 109256 1679360 0.0650581
io 98304 1613824 0.0609137
grep 114432 1937408 0.0590645
installcommand 178272 3264512 0.0546091
cmp 82712 1667072 0.0496151
man 86480 1916928 0.0451138
truncate 74600 1671168 0.0446394
tail 73816 1658880 0.0444975
kill 66424 1495040 0.0444296
lddfiles 84040 1945600 0.0431949
free 76064 1884160 0.0403702
sync 31296 851968 0.0367338
ln 58480 1732608 0.0337526
date 62352 1912832 0.0325967
df 54232 1679360 0.0322933
chroot 58392 1818624 0.0321078
true 23008 806912 0.0285136
wc 44776 1597440 0.0280298
msr 44496 1650688 0.026956
uptime 40328 1572864 0.0256399
false 23280 909312 0.0256018
comm 41008 1658880 0.0247203
sort 40600 1675264 0.024235
seq 40424 1712128 0.0236104
scp 40296 1732608 0.0232574
lsmod 36048 1556480 0.02316
sleep 36160 1630208 0.0221812
readlink 36264 1642496 0.0220786
echo 36248 1642496 0.0220689
mkdir 36424 1650688 0.022066
hexdump 36304 1650688 0.0219933
uniq 36296 1650688 0.0219884
mv 36216 1667072 0.0217243
pwd 36392 1675264 0.0217231
chmod 36240 1683456 0.0215271
unshare 36600 1781760 0.0205415
ping 44976 2228224 0.0201847
printenv 27600 1458176 0.0189278
netcat 41800 2265088 0.018454
dirname 27816 1531904 0.0181578
cat 27792 1642496 0.0169206
yes 23608 1560576 0.0151277

Total:

xaionaro@ubuntu:/tmp$ du -ms gc/ gccgo/
163	gc/
63	gccgo/

So this numbers gives hope. It looks like gccgo's DCE does work with the most of tools. But, for example, gccgo's ip weights more than gc's.

May be I'll be able to investigate this. I'll try to find it out in Monday :)


Update: I will try to spend a little-bit of time on investigation of the gccgo case on this week.

@xaionaro
Copy link
Contributor

xaionaro commented Dec 10, 2019

Created a ticket: golang/go#36073

It appears, basically, DCE does not work with GCCGO almost at all :(

An update: it appears I just used -gccgoflags a little-bit wrong.

@xaionaro
Copy link
Contributor

xaionaro commented Dec 10, 2019

An update:
Good news! It appears GCCGO allows to reduce binary size from 15884288 to 12190424. And the binary still works :)

root@ubuntu:~/go/src/github.com/u-root/u-root/bb# ./bb ls
bb
main.go

The line to compile:

go build -compiler gccgo -gccgoflags=all='-flto -Os -fdata-sections -ffunction-sections -Wl,--gc-sections'

What is required to do to use it:

  • Update Go-runtime of gccgo to add crypto/ed25519 (or apply another workaround).
  • Update Go-runtime of gccgo to add crypto/poly1305 (or apply another workaround),
  • Update Go-runtime of gccgo to add os.(*File).SyscallConn().
  • Fix gccgo or github.com/u-root/u-root/pkg/boot/multiboot/internal/trampoline to compile this trampoline. At the moment the problem:
# github.com/u-root/u-root/pkg/boot/multiboot/internal/trampoline
../pkg/boot/multiboot/internal/trampoline/trampoline_linux_amd64.s:10:10: fatal error: textflag.h: No such file or directory
 #include "textflag.h"
          ^~~~~~~~~~~~
compilation terminated.
#

@rojer
Copy link
Contributor Author

rojer commented Dec 14, 2019

#1486 removes use of reflection from elvish without (i hope) breaking any of the functionality.
confirmed that with this and the text/template patch DCE starts working:

[rojer@nbd ~/go/src/github.com/u-root/u-root/bb master]$ GOROOT=/home/rojer/go/go_210284 GOARCH=arm GOARM=5 go build -v && ll bb
-rwxr-xr-x 1 rojer rojer 18057313 Dec 14 18:32 bb
[rojer@nbd ~/go/src/github.com/u-root/u-root/bb elvish_noreflect]$ GOROOT=/home/rojer/go/go GOARCH=arm GOARM=5 go build -v && ll bb
-rwxr-xr-x 1 rojer rojer 18193874 Dec 14 18:33 bb
[rojer@nbd ~/go/src/github.com/u-root/u-root/bb elvish_noreflect]$ GOROOT=/home/rojer/go/go_210284 GOARCH=arm GOARM=5 go build -v && ll bb
-rwxr-xr-x 1 rojer rojer 15209315 Dec 14 18:33 bb

unfortunately, the aforementioned patch will be integrated into Go 1.15 which will be rleeased in aug 2020, so we need to find some way of incorporating it into builds before that. i'll look into it.

@xaionaro
Copy link
Contributor

xaionaro commented Dec 15, 2019

An update from my side:

  1. I made it work with gccgo without my dirty u-root hacks... But it appears I was stupid and did not notice that gccgo (in contrast to gc) builds binaries which dynamically links to libgo, libgcc, libc and other even if CGO is disabled. So even if DCE for Golang-code works better, when I add -static it still weights more. And in those libs there's a lot of stuff that should be eliminated (like atanf) but it is not (so it seems DCE works badly for those libs). In this context I tried linker gold instead of bfd, but it does not work with gccgo and -flto :(. Found a bug report of the same linking problem of rust community and they just decided just to exclude gold (and use bfd). So I decided this is a long way (while still probably do-able) and I should put this idea away for a while. So now I'm trying to find an easier way to reduce the size of bb.

  2. Another approach is to port it to 386 (from amd64). This could help to reduce the binary size only on amd64 of course. Preliminary experiment showed that the size of the binary reduce from 16MiB to 14MiB. So now I'm trying to make it work with GOARCH=386 and I will create another issue or/and PR for this.

rminnich pushed a commit that referenced this issue Dec 16, 2019
Use of `reflect.Value.Call` disables unused public method elimination at build time,
which leaves unused code in the resulting binary,
see #1477.

Elvish uses `reflect.Value.Call` to invoke builtin functions.
Replace its use with a static set of possible function signatures:
these cover all existing use cases and can be extended if needed.

Saves approx. 3.2M on amd64 (go 1.13.5 + text/template patch):
```
[rojer9@rojer9-t470 ~/go/src/github.com/u-root/u-root/bb master]$ GOROOT=$HOME/go_210284 go build && ls -l bb
-rwxrwxr-x. 1 rojer9 rojer9 20645771 Dec 16 16:40 bb
[rojer9@rojer9-t470 ~/go/src/github.com/u-root/u-root/bb master]$ gc elvish_noreflect
Switched to branch 'elvish_noreflect'
[rojer9@rojer9-t470 ~/go/src/github.com/u-root/u-root/bb elvish_noreflect]$ GOROOT=$HOME/go_210284 go build && ls -l bb
-rwxrwxr-x. 1 rojer9 rojer9 17257671 Dec 16 16:41 bb
```

Signed-off-by: Deomid "rojer" Ryabkov <rojer9@fb.com>
@rojer
Copy link
Contributor Author

rojer commented Dec 17, 2019

for DCE to work on libraries, they need to be built with -ffunction-sections as well, which they may not already be.
GOARCH=386 will definitely help a lot - instructions are more compact and it will reduce symbol table quite considerably (will be similar to arm, which is much smaller - see my quick experiment here).

@rojer
Copy link
Contributor Author

rojer commented Dec 20, 2019

as an interim solution, i have created a docker image that contains go with patched stdlib - #1491

it's pretty easy to use and delivers the result:

[rojer@nbd ~/go/src/github.com/u-root/u-root/tools/golang_patched golang_patched]$ docker build -t golang-patched .
Sending build context to Docker daemon  9.728kB
Step 1/4 : FROM golang:1.13.5-alpine
...
Successfully built 794bb9635caf
Successfully tagged golang-patched:latest
[rojer@nbd ~/go/src/github.com/u-root/u-root/tools/golang_patched golang_patched]$ docker run --rm -it -u $(id -u):$(id -u) \
>       -v $(go env GOPATH)/src:/go/src \
>       -v $(go env GOCACHE):/go/.cache \
>       -v $PWD:/out \
>     golang-patched sh -c 'go build github.com/u-root/u-root && \
>                           ./u-root -build=bb -o /out/initramfs'
2019/12/20 14:14:54 Disabling CGO for u-root...
2019/12/20 14:14:54 Build environment: GOARCH=amd64 GOOS=linux GOROOT=/usr/local/go GOPATH=/go CGO_ENABLED=0
2019/12/20 14:14:54 Filename is /out/initramfs
2019/12/20 14:15:03 Successfully wrote initramfs.
[rojer@nbd ~/go/src/github.com/u-root/u-root/tools/golang_patched golang_patched]$ ll initramfs 
-rwxr-xr-x 1 rojer rojer 12666816 Dec 20 14:15 initramfs

rvdm82 pushed a commit to rvdm82/u-root that referenced this issue Jan 6, 2020
Use of `reflect.Value.Call` disables unused public method elimination at build time,
which leaves unused code in the resulting binary,
see u-root#1477.

Elvish uses `reflect.Value.Call` to invoke builtin functions.
Replace its use with a static set of possible function signatures:
these cover all existing use cases and can be extended if needed.

Saves approx. 3.2M on amd64 (go 1.13.5 + text/template patch):
```
[rojer9@rojer9-t470 ~/go/src/github.com/u-root/u-root/bb master]$ GOROOT=$HOME/go_210284 go build && ls -l bb
-rwxrwxr-x. 1 rojer9 rojer9 20645771 Dec 16 16:40 bb
[rojer9@rojer9-t470 ~/go/src/github.com/u-root/u-root/bb master]$ gc elvish_noreflect
Switched to branch 'elvish_noreflect'
[rojer9@rojer9-t470 ~/go/src/github.com/u-root/u-root/bb elvish_noreflect]$ GOROOT=$HOME/go_210284 go build && ls -l bb
-rwxrwxr-x. 1 rojer9 rojer9 17257671 Dec 16 16:41 bb
```

Signed-off-by: Deomid "rojer" Ryabkov <rojer9@fb.com>
Signed-off-by: Rob Vandermeulen <rvandermeulen@google.com>
@xaionaro
Copy link
Contributor

xaionaro commented Jan 14, 2020

(will be similar to arm, which is much smaller - see my quick experiment here).

Could be partly related problem: golang/go#36313

experiment0@uroot:~/go/src/github.com/u-root/u-root/bb$ go tool nm -size -sort size bb | head -5
  e39500    4808314 r runtime.pclntab
 133ed00      65744 D runtime.trace
  e28080      34254 r runtime.findfunctab
  9af670      33410 T github.com/u-root/u-root/pkg/gpt.init
 1315f20      30720 D crypto/ed25519/internal/edwards25519.base

UPD: Just for future ideas, @rojer's approach with NOOP-ing ld.ftabaddstring seems to be useful here: 15675392 vs 14729216 (-6%).

@xaionaro
Copy link
Contributor

xaionaro commented Jan 15, 2020

Flag -gcflags=all=-l disables function inlining and it allows to reduce the binary on 9.5% in the uncompressed state and 15% in a compressed state.

-rwxr-xr-x. 1 experiment0 experiment0 14417920 Jan 15 11:08 bb-noinline
-rwxr-xr-x. 1 experiment0 experiment0 15917056 Jan 15 11:08 bb
-rwxr-xr-x. 1 experiment0 experiment0 3835208 Jan 15 11:08 bb-noinline.xz
-rwxr-xr-x. 1 experiment0 experiment0 4508904 Jan 15 11:10 bb.xz

(go build -ldflags="-w -s" vs go build -gcflags=all=-l -ldflags="-w -s")

I'll prepare PR to use this flag.

May be we should create a metatask for any size-reduction methods or rename this task?


UPD: The PR: #1512

@rojer
Copy link
Contributor Author

rojer commented Jan 15, 2020

wow, awesome!

May be we should create a metatask for any size-reduction methods or rename this task?

yes, i think so.

@xaionaro
Copy link
Contributor

xaionaro commented Jan 16, 2020

@rojer: JFYI, I summarized all found ways to reduce a Golang binary here:
https://github.com/xaionaro/documentation/blob/master/golang/reduce-binary-size.md

For example we may also consider adding -gcflags=all=-wb=false to disable "write barriers" (but it's required to research how safe it is; opened a ticket) and -gcflags=all=-B to disable "bounds checks" (but it's required to discuss if it is justified).


UPD: It's a link to an obsolete document above. I need to update it :(
From big size optimizations (not mentioned there): downgrading to Go1.8 allows to save a lot of space.

rojer pushed a commit that referenced this issue Mar 3, 2020
#1477

Signed-off-by: Deomid "rojer" Ryabkov <rojer9@fb.com>
@rojer
Copy link
Contributor Author

rojer commented Aug 27, 2020

so, DCE is working, we are using it and #1808 added a test to make sure we don't regress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants