Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcpdump 4.1.1 bus error when 'gcc -O2' (default) optimizatio #130

Closed
guyharris opened this issue Apr 16, 2013 · 4 comments
Closed

tcpdump 4.1.1 bus error when 'gcc -O2' (default) optimizatio #130

guyharris opened this issue Apr 16, 2013 · 4 comments

Comments

@guyharris
Copy link
Member

Converted from SourceForge issue 3042751, submitted by itillman

Platform:
tcpdump 4.1.1
libpcap 1.1.1
Solaris 10
gcc 4.2.4

(Also verified with gcc 3.3.6 on Solaris 9, gcc 4.1.2 on on Solaris 9, gcc 4.3.4 on Solaris 10, gcc 4.4.3 on Solaris 10

tcpdump 4.1.1 built with gcc 4.2.4 on Solaris 10 (defaults to 'gcc -O2'):

% /var/local/src/tcpdump-4.1.1/tcpdump -r sample.pcap -ne
reading from file sample.pcap, link-type EN10MB (Ethernet)
Bus Error (core dumped)

Adding -q makes the problem go away:

% /var/local/src/tcpdump-4.1.1/tcpdump -r sample.pcap -neq
reading from file sample.pcap, link-type EN10MB (Ethernet)
14:24:55.977040 04:1e:64:d7:92:e0 > 00:06:2a:db:a0:c0, 802.1Q, length 346: vlan 2000, p 0, ethertype IPv4, 140.180.46.122.68 > 128.112.128.1.67: UDP, length 300

Rebuild tcpdump adding '-g' to CCOPT in Makefile, so we can debug more easily.
Crashes as expected:

% /var/local/src/tcpdump-4.1.1/tcpdump -r sample.pcap -ne
reading from file sample.pcap, link-type EN10MB (Ethernet)
Bus Error (core dumped)

% gdb /var/local/src/tcpdump-4.1.1/tcpdump ./core
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.10".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /var/local/src/tcpdump-4.1.1/tcpdump...done.
Reading symbols from /usr/local/lib/libcrypto.so.0.9.8...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libcrypto.so.0.9.8
Reading symbols from /usr/local/lib/libpcap.so.1...(no debugging symbols found)...done.
Loaded symbols for /usr/local/lib/libpcap.so.1
Reading symbols from /lib/libsocket.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libsocket.so.1
Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/libc.so.1...(no debugging symbols found)...done.
[Thread debugging using libthread_db enabled]
[New Thread 1 (LWP 1)]
Loaded symbols for /lib/libc.so.1
Reading symbols from /lib/libdl.so.1...(no debugging symbols found)...done.
Loaded symbols for /lib/libdl.so.1
Reading symbols from /usr/local/lib/libgcc_s.so.1...done.
Loaded symbols for /usr/local/lib/libgcc_s.so.1
Reading symbols from /platform/SUNW,SPARC-Enterprise/lib/libc_psr.so.1...(no debugging symbols found)...done.
Loaded symbols for /platform/SUNW,SPARC-Enterprise/lib/libc_psr.so.1
Reading symbols from /lib/ld.so.1...(no debugging symbols found)...done.
Lo�aded symbols for /lib/ld.so.1
Co�re was generated by `/var/local/src/tcpdump-4.1.1/tcpdump -r sample.pcap -ne'.
Program terminated with signal 10, Bus error.
#0  bootp_print (cp=0x178f86 <error reading variable>, length=300) at print-bootp.c:76
76              if (bp->bp_htype == 1 && bp->bp_hlen == 6 && bp->bp_op == BOOTPREQUEST) {
(gdb) bt
#0�  bootp_print (cp=0x178f86 <error reading variable>, length=300) at print-bootp.c:76
#1  0x00035084 in ip_print_demux (ndo=0x1708d8, ipds=0xffbff3b0) at print-ip.c:434
#2  0x000356a0 in ip_print (ndo=0x1708d8, bp=<value optimized out>, length=328) at print-ip.c:687
#3  0x00030870 in ethertype_print (ether_type=<value optimized out>, p=0x178f6a <error reading variable>, length=328, caplen=328) at print-ether.c:258
#4  0x00030cd0 in ether_print (p=0x178f6a <error reading variable>, length=328, caplen=328, print_encap_header=0, encap_header_arg=0x0) at print-ether.c:217
#5  0x00030da0 in ether_if_print (h=<value optimized out>, p=0x178f58 "") at print-ether.c:240
#6  0x0006f554 in print_packet (user=0xffbff7b8 "", h=0xffbff5bc, sp=0x178f58 "") at tcpdump.c:1611
#7  0xff1c6140 in pcap_offline_read () from /usr/local/lib/libpcap.so.1
#8  0xff1b81fc in pcap_loop () from /usr/local/lib/libpcap.so.1
#9  0x000700e0 in main (argc=4, argv=0xffbff844) at tcpdump.c:1287

When I look at it this core with ddd, ddd tells me that at print-bootp.c:76,,
value 'bp' has been optimized out.

Ditto for when I go up a frame to print-ip.c:434 and look at value 'up'.

(That's what led me to try rebuilding without optimization to find
that worked around the issue.)

Rebuilding tcpdump without -O2 (regardless of whether I include -g), makes the problem go away:

% /var/local/src/tcpdump-4.1.1/tcpdump -r sample.pcap -ne
reading from file sample.pcap, link-type EN10MB (Ethernet)
14:24:55.977040 04:1e:64:d7:92:e0 > 00:06:2a:db:a0:c0, ethertype 802.1Q (0x8100), length 346: vlan 2000, p 0, ethertype IPv4, 140.180.46.122.68 > 128.112.128.1.67: BOOTP/DHCP, Request from 04:1e:64:d7:92:e0, length 300

So it seems like the problem is related to gcc optimizing away something
that tcpdump wants to reference.

@guyharris
Copy link
Member Author

Submitted by guy_harris

Is this Solaris-on-SPARC, or Solaris-on-x86/x86-64? This isn't BSD, so SIGBUS isn't being misused for a reference to an invalid pointer; the most common cause of SIGBUS on Solaris-on-SPARC is, as I remember, an unaligned access - they cause a trap on SPARC.

If this is Solaris-on-SPARC, and, for example, the optimizer is assuming that bp points to a properly-aligned structure after

bp = (const struct bootp *)cp;

and is optimizing

if (bp->bp_htype == 1 && bp->bp_hlen == 6 && bp->bp_op == BOOTPREQUEST)

into, for example, a load of the 32-bit word pointed to by bp followed by a mask against 0xFFFFFF00 and a comparison against 0x01010600, that would fail if bp isn't properly aligned.

If something such as that is the case, we might have to add an EXTRACT_8BITS() macro and hope that using tricks similar to what we have for EXTRACT_16BITS() keeps the optimizer from assuming bp is aligned on a 2-byte or 4-byte boundary.

(bp is probably "optimized out" because it has the same address in it that cp does; if this is SPARC, it probably just uses the register in which cp was passed.)

@guyharris
Copy link
Member Author

Submitted by guy_harris

ld [%i0], %g2
sethi %hi(16843776), %g1
and %g2, -256, %g2
or %g1, 512, %g1
cmp %g2, %g1

Ha, ha. That's exactly what it's doing....

@guyharris
Copy link
Member Author

Submitted by itillman

It is indeed Solaris on SPARC.

@guyharris
Copy link
Member Author

Submitted by guy_harris

I've checked a change into the trunk and 4.1 branch that prevents GCC 4.2.4, at least, from doing that optimization on "struct bootp". It prevents the crash, at least on the machine on which I tested it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

1 participant