Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"stack smashing" segmentation fault with znc-git on armv7h #1459

Closed
aphirst opened this issue Nov 21, 2017 · 31 comments
Closed

"stack smashing" segmentation fault with znc-git on armv7h #1459

aphirst opened this issue Nov 21, 2017 · 31 comments

Comments

@aphirst
Copy link

aphirst commented Nov 21, 2017

I'm running znc-git from the AUR on my ODROID-XU4, running Arch Linux ARM, after adding armv7h to the architecture field. Things were fine until I decided to update/rebuild yesterday.

After between 15m-1h of being online, the service suddenly stops.

[alarm@alarm ~]$ znc -v
ZNC 1.7.x-git-836-4a3f62f1 - https://znc.in
IPv6: yes, SSL: yes, DNS: threads, charset: yes, i18n: no, build: autoconf
[alarm@alarm ~]$ systemctl status znc
● znc.service - ZNC, an advanced IRC bouncer
   Loaded: loaded (/usr/lib/systemd/system/znc.service; enabled; vendor preset: disabled)
   Active: failed (Result: core-dump) since Mon 2017-11-20 22:30:02 GMT; 12h ago
  Process: 12211 ExecStart=/usr/bin/znc -f (code=dumped, signal=ABRT)
 Main PID: 12211 (code=dumped, signal=ABRT)

Nov 20 21:50:18 alarm znc[12211]: Adding server [irc.rizon.net +6697]...
Nov 20 21:50:18 alarm znc[12211]: Loading user module [chansaver]...
Nov 20 21:50:18 alarm znc[12211]: Loading user module [controlpanel]...
Nov 20 21:50:18 alarm znc[12211]: Loading user module [log]...
Nov 20 21:50:18 alarm znc[12211]: Loading user module [cert]...
Nov 20 21:50:18 alarm znc[12211]: Staying open for debugging [pid: 12211]
Nov 20 21:50:18 alarm znc[12211]: ZNC 1.7.x-git-836-4a3f62f1 - https://znc.in
Nov 20 22:30:02 alarm znc[12211]: *** stack smashing detected ***: <unknown> terminated
Nov 20 22:30:02 alarm systemd[1]: znc.service: Main process exited, code=dumped, status=6/ABRT
Nov 20 22:30:02 alarm systemd[1]: znc.service: Failed with result 'core-dump'.

I would report this in IRC, but none of my clients are set up for direct connection, only through my ZNC session.

If there's any additional info which would be helpful, please just ask; I'll be more than happy to generate/provide it.

@aphirst
Copy link
Author

aphirst commented Nov 21, 2017

It seems that this issue occurs even on stable ZNC 1.6.5. I am now inclined to suspect not ZNC, but an external module I use, znc-clientbuffer.

I will rebuild that package with debug symbols, report the issue there too, and see how this develops.

@psychon
Copy link
Member

psychon commented Nov 21, 2017

Sorry, but I don't think that coredump is of much use. Coredumps are specific to the binary in use, so in this case this depends on your /usr/bin/znc.
Do you have coredumpctl? If so, could you try https://wiki.archlinux.org/index.php/Core_dump#Examining_a_core_dump.
If not (and you do have gdb), then gdb /usr/bin/znc YOUR_CORE_FILE and running bt and/or bt full would be useful.

Also, don't upload your coredumps to the internet. For example, I am fairly sure that your /etc/hosts looks like this:

# /etc/hosts: static lookup table for host names
#<ip-address>	<hostname.domain.org>	<hostname>
127.0.0.1	localhost.localdomain	localhost
::1		localhost.localdomain	localhost
192.168.0.2	alarm			alarm
192.168.0.2	odroid-xu4		odroid-xu4
192.168.0.2	aphirst.ddns.net	aphirst
192.168.0.3	odroid-c2		odroid-c2
192.168.0.4	libreelec		libreelec
192.168.0.4	rpi3			rpi3
192.168.0.5	shimmer			shimmer
192.168.0.6	rakka			rakka
192.168.0.7	pino			pino
192.168.0.10	alarmpi			alarmpi
192.168.0.10	rpi2			rpi2
# End of file

I didn't find anything that looks like a password in there, but it might be possible to find passwords. And SSL private keys. And other secrets.

Edit: And you are using systemd's resolved, so ZNC ended up connecting to DBus.
Edit: You have a non-working IRC server in your config.

freedesktop.resolve1.DnsError.NXDOMAIN
shsentry.ponychat.net' not found

Edit: Okay, your password for rizonirpg is in there and also something that looks like a secret to enter some special channel.

@aphirst
Copy link
Author

aphirst commented Nov 21, 2017

Perhaps that was short sighted. I've taken it down. Thanks for the injection of common sense.

I'll have a look at this again later on.

@aphirst
Copy link
Author

aphirst commented Nov 21, 2017

What I've had chance to test so far:

  1. I re-enabled my old znc setup on my rPi2 (also running ALARM), and observed that it too experienced the same "stack smash" segfault.
  2. I uninstalled the clientbuffer add-on altogether, and (after the requisite changes in znc.conf) experienced the same behaviour on both machines.

I've closed the ticket over at znc-clientbuffer. I'll report back here once again as soon as I have the time to do some more thorough debugging. I'll also see if I can get in touch with any other ALARM users - the issue may be isolated to there (quirks in memory management for some core system package, or just badly built in some way etc.).

@0KiB
Copy link

0KiB commented Nov 24, 2017

Iam also experiencing this same error with raspberry 2, os: Arch Linux ARM
After i did full system update. Heres pacman.log of updated packages: https://pastebin.com/7CdNc9Ny

Fault code:
*** stack smashing detected ***: terminated
Aborted (core dumped)

@aphirst
Copy link
Author

aphirst commented Nov 24, 2017

I think we have grounds now to raise this with the Archlinux ARM maintainers. https://archlinuxarm.org/forum/viewtopic.php?f=15&t=12288

@psychon
Copy link
Member

psychon commented Nov 25, 2017

Okay, so on that list of upgrades: The znc upgrade has the commit message "icu 60.1 rebuild", so does not change any code. The following could be related: openssl, icu, (libidn2?)
So if you want/can, you could try with older versions of that.

Now on something more productive:
To get a backtrace from a core file:

  • gdb /usr/bin/znc your-core-file
  • bt full
    To run znc under gdb and get a backtrace once it crashes/aborts:
  • start screen or tmux if needed (if you have no idea what this is: you don't)
  • gdb /usr/bin/znc
  • run --foreground --debug
  • Wait for znc to abort. The stack smashing detected will still be printed when this happens.
  • The last few lines of output could be helpful, so write them down
  • bt full gets you a backtrace.

@aphirst
Copy link
Author

aphirst commented Nov 25, 2017

Huh, I'm surprised you got to see a commit message for that anywhere. On the archlinuxarm website some packages, ZNC included, just display an error instead of the PKGBUILD or changelog. I figure they must also have a git repo I'm not aware of.

@psychon
Copy link
Member

psychon commented Nov 26, 2017

The upgrade is:

[2017-11-22 18:43] [ALPM] upgraded znc (1.6.5-6 -> 1.6.5-7)

and this is just what https://git.archlinux.org/svntogit/community.git/commit/trunk?h=packages/znc&id=0a4fcc82a47522079462178a168b1f7161af3326 does. I didn't even know that archlinuxarm had its own website, but I would guess it does not deviate much from archlinux PKGBUILDs, no?

@Bonstra
Copy link

Bonstra commented Nov 26, 2017

I'm facing the same issue on ArchLinuxArm running on an ARMv5 machine.
I can confirm it is related to the recent icu upgrade to 60.1.
ZNC appears to crash while translating charsets for the result of a WHO command, for some users whose real name contains non-ASCII characters.
Here is the stack trace I could get:
https://gist.github.com/Bonstra/f38a184d0af66ee71d0dc9f8936db65c

I rebuilt znc with the --disable-charset option, and could not reproduce the bug after that.

@psychon
Copy link
Member

psychon commented Nov 27, 2017

So, apparently it is ucnv_convertEx from icu that causes the stack smash. I stared a bit at icoConv() and the API docs for ucnv_convertEx and could not spot anything wrong. Also, the list of changes for icu do not suggest anything interesting here. Oh and: why does this only seem to affect ARM? I took a quick look at https://wiki.debian.org/ArchitectureSpecificsMemo and the difference between amd64 and arm64 mostly seems to be signed/unsigned char. Hm... :-(

@aphirst
Copy link
Author

aphirst commented Nov 27, 2017

I can confirm @Bonstra's findings, and after adding --disable-charset to the build options, ZNC survives overnight, which it hasn't been able to do since the affected upgrades.

I've tried to keep some of the ALARM maintainers up-to-date on IRC but I suspect this might take some communication between them and the znc devs. I'll update the ALARM forum thread later today.

@AcouBass
Copy link

I don't have much to add aside from a 'me too' - running ALARM on a Pi 3.

Does the --disable-charset build option affect functionality at all? I'm happy to use it as a workaround until we get a proper fix!

@psychon
Copy link
Member

psychon commented Dec 1, 2017

Does the --disable-charset build option affect functionality at all?

Well, all charset conversion is disabled. So if you configured charsets in ZNC, they will no longer be applied.
(ZNC internally assumes UTF-8, or at least so I have been told. If you have e.g. channel names in Latin1, they will not be displayed properly on the web interface. The same applies to user names, nicks etc. Oh and I guess modpython with Python3 will get unhappy about invalid UTF8, if it occurs. Dunno what else.)

@AntiCompositeNumber
Copy link

I'm me too-ing this as well, ALARM on a Pi 2. I decided to downgrade ICU and znc by one release, instead of the build flag. I've got a module that requires modpython, which fails the build --disable-charset. Standard Arch warnings about partial upgrades apply, ICU did receive a soname bump this upgrade.

@srl295
Copy link

srl295 commented Dec 6, 2017

@psychon came here from http://bugs.icu-project.org/trac/ticket/13490#comment:1 — there's not a lot of detail here. What are the two converters being used? ucnv_convertEx converts from codepage to codepage via UTF-16 as a pivot. Can you extract the actual data values being passed? You can call ucnv_getName() on the in/out converters to get their canonical names.

@psychon
Copy link
Member

psychon commented Dec 6, 2017

@srl295 I don't really know details either and I am still hoping for more myself. Most of what I know is from https://gist.github.com/Bonstra/f38a184d0af66ee71d0dc9f8936db65c which is a gdb bt full. Frame 2 has the input data for ICU in sBuff and there is only one non-ASCII character in there (but I don't know its encoding...? Since this is copy&paste from a terminal running gdb in a likely not-too-ancient Linux, I guess this data is UTF8-encoded?). Hm... these two lines confuse me (oh and the line numbers in GDB's output don't match that source file, yay...). Hm... Gdb says that icuConv was called with the last two arguments being the same pointer, so it must be the first call that causes the problem here, the one with m_cnvIntStrict as both arguments and that is initialized as... being an UTF-8 converter with UCNV_FROM_U_CALLBACK_STOP and UCNV_TO_U_CALLBACK_STOP.

Wait, what? So, if I understand the above correctly, the code is trying to convert from UTF-8 to UTF-8 in strict mode when the stack overflow occurs.

To summarise:

  • this function
  • When called with the input string (first argument, CS_STRING is basically just std::string) :bruford.mozilla.org 352 Bonstra #rust ahf anubis.0x90.dk belew.mozilla.org ahf H :0 Alexander Færøy\r\n encoded in UTF-8
  • and cnv_in = cnv_out and set up like this
  • might, possible, depending on the phase of the moon and ARM, cause a stack overflow

@Bonstra What are your ZNC's encoding settings? Specifically for the Mozilla IRC server where someone is in #rust.

However, all of the above can very well be wrong. The gdb backtrace says

        indata = 0x8be1e0 ""
        indataend = 0xbefff374 ""

The difference between these two pointers is huge, but is supposed to be beginning and end of the input string to be converted...?? (well, &indata is given to ucnv_convertEx, so could be to this weird value somewhere in there, but seems unlikely; more likely could be that the above is part of the stack overflow.)
Uhm, indataend seems to point into the stack. It's the same value as outdataend?!? Staring at this gdb backtrace just seems to raise more questions...

And another random comment: The only interesting thing that I can see in https://wiki.debian.org/ArchitectureSpecificsMemo is that arm has unsigned char while amd64 has signed char. That could be part of why this is only seen on arm...?

You can call ucnv_getName() on the in/out converters to get their canonical names.

I cannot reproduce the issue. I just tried to get the people who can reproduce this to provide some more information and eventually gave mostly up.

Edit: Another random bit of information that I get from the bt full output: The buffer contents at the end of the function indicate that ICU first converted the input up to the æ and then the remaining r\r\n\0 (since this last part is at the beginning of the buffer and thus was likely produces by the last call to ICU, while beyond that there is "older" data; note that the printent contents of the buffer is shorter than 100 byte :-/ )

@srl295
Copy link

srl295 commented Dec 7, 2017

@psychon can you pull out the call site, the source code that calls ICU at this point?

Hm, doing a utf-8 to utf-8 conversion? Might be better to just skip conversion there.

As of ICU 60, UTF-8 error handling follows the w3c spec. This might result in longer error strings on output, but only in the presence of malformed input. A crash as a result would be most likely from a latent bug in calling ICU.

@srl295
Copy link

srl295 commented Dec 7, 2017

is the code here? https://github.com/jimloco/Csocket/blob/master/Csocket.cc I haven't evaluated the loop there yet, but I think it's doing a round about way of trying to see if something is valid UTF-8.

@DarthGandalf
Copy link
Member

DarthGandalf commented Dec 7, 2017

Yes, that's the code. https://github.com/jimloco/Csocket/blob/054857fe8304277ad6ab1182d854ad6ee4f24a65/Csocket.cc#L2498 is the relevant line.

There is a setting which says "receive as UTF-8 and as ___, send as ___", or "receive as UTF-8 and as ___, send as UTF-8", or "receive and send in ___ only".
Sending part doesn't matter in this case, because the offending string is received from server. It tries to parse the received string as UTF-8, and if it fails, falls back to a different encoding. That different encoding can also be UTF-8, in which case it'll fail again.

edit: I missed the "send as UTF-8" option, duplicating the other one twice

@DarthGandalf
Copy link
Member

m_cnvInt and m_cnvIntStrict stand for "internal", because internally ZNC assumes everything is UTF-8, and all conversion happens on the edge.

@psychon
Copy link
Member

psychon commented Dec 7, 2017

@srl295 My (longish) comment above has a link to the function and a description of what its parameters are in the list starting with "to summarize"

@srl295
Copy link

srl295 commented Dec 7, 2017

maybe jimloco/CSocket should be involved in this discussion? since the crash is there. when i get a round tuit I will review the conversion loop there to see if i see any red flags.

the CSocket loop (the call to ICU) seems buggy on first glance. It doesn't detect the case where the pivotbuf has run out of space. It seems like it might get stuck in a loop continuously appending data.

@DarthGandalf
Copy link
Member

Maybe he should. But it was me who added encodings support to Csocket. @jimloco

It doesn't detect the case where the pivotbuf has run out of space

In what cases 100 bytes can be not enough for the pivot?
AFAIR, I wrote it this way to make sure both cases are triggered often: the whole string fits, and single call to ucnv_convertEx is enough; the input string is bigger than the buffer, so that ucnv_convertEx is called several times.

@Bonstra
Copy link

Bonstra commented Dec 7, 2017

@psychon

What are your ZNC's encoding settings? Specifically for the Mozilla IRC server where someone is in #rust.

I have ClientEncoding = ^ISO-8859-15 set for the user and Encoding = ^ISO-8859-15 set for the network. This are the settings I used when the backtrace I posted was generated; however, I made a minimal test setup with Encoding = ^UTF-8 and ClientEncoding left unset, and it crashes just the same.

I'm making network captures in order to inject the data received just before the crash into an ICU sample program to see if it is enough to trigger the bug.

@srl295
Copy link

srl295 commented Dec 7, 2017

@DarthGandalf my err, sorry, pivotbuf is managed by the call. this ought to be OK.

Still, is the idea (in the utf-8/utf-8 case) to validate whether it is valid UTF-8? that can be done without as many moving parts

@psychon
Copy link
Member

psychon commented Dec 7, 2017

I have ClientEncoding = ^ISO-8859-15 set for the user and Encoding = ^ISO-8859-15 set for the network.

Okay, so this function is called with ^ISO-8859-15 to set up all the involved converters. That matches the call of icoConv with both converters being the same (only done since there is a leading ^).

Do the other people who can reproduce this have similar settings? Does it also happen without the ^? (and if so, a bt full from gdb for such a case might be useful, too)

@Bonstra
Copy link

Bonstra commented Dec 8, 2017

It appears to be a bug in ICU, where multibyte sequences may be written beyond the end of the output buffer, erasing part of the stack canary.
I filled info about the bug upstream: https://ssl.icu-project.org/trac/ticket/13490

@psychon
Copy link
Member

psychon commented Dec 8, 2017

Thanks a tons Bonstra!

I applied the following patch to your poc.cpp to make it compile without warnings:

--- poc.cpp.orig	2017-12-08 09:18:27.039252381 +0100
+++ poc.cpp	2017-12-08 09:19:16.010615391 +0100
@@ -11,7 +11,7 @@
 #include "unicode/uloc.h"
 #include "unicode/unistr.h"
 
-int main(int argc, char *argv[])
+int main()
 {
   const uint8_t bytes[110] = {
   /* First 100 bytes */
@@ -33,7 +33,6 @@ int main(int argc, char *argv[])
   UChar pivot[100];
   char target[104];
   UErrorCode status = U_ZERO_ERROR;
-  int32_t len;
   UConverter *src_cnv = ucnv_open("utf-8", &status);
   assert(status == U_ZERO_ERROR);
   UConverter *tgt_cnv = ucnv_open("utf-8", &status);
@@ -54,7 +53,7 @@ int main(int argc, char *argv[])
   ucnv_convertEx(tgt_cnv, src_cnv, &outdata, &target[100], &indata, &source[110], &pivot[0], &pivotsrc, &pivottgt, pivotlimit, true, true, &status);
   assert(status == U_BUFFER_OVERFLOW_ERROR);
 
-  if (target[100] == 0xba && target[101] == 0xd && target[102] == 0xca && target[103] == 0xfe) {
+  if (target[100] == '\xba' && target[101] == '\x0d' && target[102] == '\xca' && target[103] == '\xfe') {
     printf("Ok! No overflow in target buffer.\n");
   } else {
     printf("Buffer overflow detected:\nExpected ba d ca fe\nGot %x %x %x %x\n",

Building and running the result works fine:

$ gcc -Wall -Wextra poc.cpp -licuuc && ./a.out 
Ok! No overflow in target buffer.

I'm on Debian testing, so the above was with libicu-dev 57.1-8. I then installed libicu-dev 60.1-1 from experimental and the result changed:

$ gcc -Wall -Wextra poc.cpp -licuuc && ./a.out
Buffer overflow detected:
Expected ba d ca fe
Got bf bd ca fe

Indeed, the three byte sequence 0xef 0xbf 0xbd ended up beyond the output's buffer's end.

Since this pretty much confirms that the issue is in ICU, I am closing this bug.

@psychon psychon closed this as completed Dec 8, 2017
@DarthGandalf
Copy link
Member

Even though the bug is in ICU, perhaps we should stop attempting conversion from UTF-8 to UTF-8 this way.

@srl295
Copy link

srl295 commented Dec 8, 2017

for reasons unclear to me, there's a new ICU issue (with a commit against it) https://ssl.icu-project.org/trac/ticket/13510

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants