Encountering SIGSEGV after on successful request (statically linked, libpacparser, ppc64le) #17

mzpqnxow · 2021-09-23T14:38:54Z

Using the "feature" I submitted (static linking) in #16 I'm encountering a failure that seems to be associated with joining a thread after a request is complete. I'm entering this just for awareness and as a reminder for me to look further into it, I doubt you want to be supporting/debugging ppc64le & statically linked CNTLM- if you even have access to a ppc64le Linux machine :)

The request finishes successfully (the client gets the results) but then cntlm goes down with a SIGSEGV:

Connection                     => close
Content-Length                 => 93
Proxy-Authenticate             => NTLM
Sending headers (5)...
Body included. Length: 93
data_send: read 93 of 93 / 93 of 93 (errno = ok)
data_send: wrote 93 of 93
Body sent.
PROXY CLOSING CONNECTION
forward_request: palive=0, authok=0, ntlm=0, closed=1

Thread finished.
proxy_thread: request rc = 0xffffffffffffffff
Joined thread 70366707252944; rc: 0

In gdb, I'm seeing this, it appears to be a NULL pointer dereference, likely associated wioth the -1 return from proxy_thread():

Thread finished.
proxy_thread: request rc = 0xffffffffffffffff
[LWP 16317 exited]
Joined thread 70367536021200; rc: 0

Thread 1 "cntlm" received signal SIGSEGV, Segmentation fault.
0x0000000010002b08 in main ()
(gdb) 
(gdb) x/4i $pc
=> 0x10002b08 <main+10568>:	lxvd2x  vs0,0,r9
   0x10002b0c <main+10572>:	stxvd2x vs0,r1,r10
   0x10002b10 <main+10576>:	bl      0x1013408c <select+8>
   0x10002b14 <main+10580>:	nop
(gdb) i r vs0 r9
vs0            {uint128 = 0x00000000000000000000000000000000, v2_double = {0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v16_int8 = {0x0 <repeats 16 times>}}
r9             0x0	0
(gdb) bt
#0  0x0000000010002b08 in main ()
(gdb)

I'll look into this more as I get a chance. If you prefer, you can close the issue here and I can open it on my fork

The text was updated successfully, but these errors were encountered:

mzpqnxow · 2021-09-23T14:49:02Z

For reference:

				ret = direct_request(thread_data, request);
#ifdef ENABLE_PACPARSER
			} else if (pacparser_initialized) {
				/* If PAC is available, use it to serve request. */
				ret = pac_forward_request(thread_data, request, pac_list);
			} else {
				/* Else use statically configured proxies. */
				ret = forward_request(thread_data, request, NULL);
			}
#else
			}
			else
				ret = forward_request(thread_data, request);
#endif

			if (debug)
				printf("proxy_thread: request rc = %p\n", (void *)ret);
#ifdef ENABLE_PACPARSER
		} while (ret != NULL && ret != (void *)-1 && ret != (void *)-2);
#else
		} while (ret != NULL && ret != (void *)-1);
#endif
        if (debug)
            printf("proxy_thread: request rc = %p\n", (void *)&request);
		free_rr_data(&request);
	/*
	 * If client asked for proxy keep-alive, loop unless the last server response
	 * requested (Proxy-)Connection: close.
	 */
#ifdef ENABLE_PACPARSER
	} while (keep_alive && ret != (void *)-1 && ret != (void *)-2 && !serialize);
#else
	} while (keep_alive && ret != (void *)-1 && !serialize);
#endif

	/*
	 * Add ourselves to the "threads to join" list.
	 */
	if (!serialize) {
        if (debug)
            printf("threads_mtx = %p\n", &threads_mtx);
		pthread_mutex_lock(&threads_mtx);
		pthread_t thread_id = pthread_self();
		threads_list = plist_add(threads_list, (unsigned long)thread_id, NULL);
		pthread_mutex_unlock(&threads_mtx);
	}

#ifdef ENABLE_PACPARSER
	plist_free(pac_list);
#endif
	free(thread_data);
	close(cd);

	return NULL;
}

Another important note- this was cross-compiled, so I'll need to try with a native toolchain on a ppc64le host to see if that may be part of the cause

mzpqnxow · 2021-09-23T15:17:11Z

It appears that this is associated with a linking mistake made by a human, specifically, me.

I would like to:

Reproduce this and see if a check can be added on whichever pointer that's getting dereferenced. Maybe I'm missing something with my quick glance, or maybe one of those functions is a cpp macro with a dereference in it, but I don't see anything being dereferenced. If I can find it' I'll wrap null check or assert around it
Make it more difficult for someone to make the same mistake I did when building

jschwartzenberg · 2021-10-17T20:41:50Z

Maybe I misunderstand something, but what happens when you compile with debug symbols and type bt in GDB after it segfaults?

mzpqnxow · 2021-10-18T23:58:05Z

@jschwartzenberg I think your guess that you may be misunderstanding is correct. If you're reading this issue as "bug in libpacparser" then it's a misunderstanding- sorry for that. If you look at the backtrace above, it's dying in main()

Obviously It's theoretically possible that libpacparser to be at fault, but there's no indication of that. I included libpacparser in the title of the issue because libpacparser is statically linked into it- meaning only that it's not a typical build

If you're just curious about the bug then I have some more details as I've traced it back a little bit. Also, I noticed that building with -fsanitize=undefined -fsanitize-undefined-trap-on-error seems to prevent (or at least hide/mitigate) the issue. Also, preparing the static library archive and relinking "fixed" it

Anyway, if you want to see what gdb thinks the faulting line of code is, here it is with -g:

NTLM-to-basic: Returning client auth request.
forward_request: palive=0, authok=0, ntlm=1, closed=0

Thread finished.
proxy_thread: request rc = 0xffffffffffffffff
[LWP 86868 exited]
Joined thread 70367536021200; rc: 0

Thread 1 "cntlm" received signal SIGSEGV, Segmentation fault.
main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
(gdb) x/2i $pc
=> 0x10002b08 <main+10568>:	lxvd2x  vs0,0,r9
   0x10002b0c <main+10572>:	stxvd2x vs0,r1,r10
(gdb) bt
#0  main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
(gdb) i r vs0 r9
vs0            {uint128 = 0x00000000000000000000000000000000, v2_double = {0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
    0x0, 0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}}
r9             0x0	0
(gdb)

Somehow it's getting NULL for the address that holds the values used to reset the timeval structure, and faulting while resetting it on line 1873 (

cntlm/main.c

Line 1873 in d546bfe

tv.tv_sec = 1;

)

Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
(gdb) x/20i $pc-32
   0x10002ae8 <main+10536>:	bne     cr7,0x10002ac0 <main+10496>
   0x10002aec <main+10540>:	ld      r9,464(r1)     <-------- r1 + 464 holds {1, 0}, used to reset the timeval
   0x10002af0 <main+10544>:	li      r10,400
   0x10002af4 <main+10548>:	mr      r7,r14
   0x10002af8 <main+10552>:	li      r6,0
   0x10002afc <main+10556>:	li      r5,0
   0x10002b00 <main+10560>:	mr      r4,r19
   0x10002b04 <main+10564>:	li      r3,1024
=> 0x10002b08 <main+10568>:	lxvd2x  vs0,0,r9
   0x10002b0c <main+10572>:	stxvd2x vs0,r1,r10
   0x10002b10 <main+10576>:	bl      0x1013408c <select+8>
(gdb) i r r1
r1             0x3fffffffe860	70368744171616
(gdb) x/gx $r1+464
0x3fffffffea30:	0x0000000010148b20
(gdb) x/2wx *($r1+464)
0x10148b20:	0x00000001	0x00000000
(gdb) i r r9
r9             0x10148b20	269781792    <----- Successfully loaded &{0x1, 0x0} to r9
(gdb) x/2wx $r9
0x10148b20:	0x00000001	0x00000000       <----- Will be written to &tv.tv_sec in the stxvd2x

So it looks relatively straightforward, a pointer to {1, 0} in BSS is stored in memory, then there's a vectorized load and store to reset the timeval on each iteration...

(gdb) commands          
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>x/gx $r9
>c
>end
(gdb) c
Continuing.

I let it continue, idle for a few select() cycles, printing out $r9 each time, to make sure it looks right...

Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
0x10148b20:	0x0000000000000001

Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
0x10148b20:	0x0000000000000001

Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
0x10148b20:	0x0000000000000001

Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
0x10148b20:	0x0000000000000001

Now I make the request to try to trigger the SIGSEGV...

[New LWP 88769]

******* Round 1 C: 5 *******
Reading headers (5)...
HEAD: CONNECT www.google.com:443 HTTP/1.1

Thread 1 "cntlm" hit Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
0x10148b20:	0x0000000000000001
Parsed PAC Proxies:
   PROXY XXXXXXX 8080
Created PAC list with 1 item(s):
List data: 1 => 0x0x1020a540

~~~~~~~ (1/1) PAC PROXY XXXXX:8080 ~~~~~~~
Thread processing...
cntlm[88690]: Resolving proxy XXXXX
Resolve XXXX:
     10.x.x.x
so_connect: x.x.x.x : 8080 
Host                           => www.google.com:443
User-Agent                     => curl/7.47.0
Proxy-Connection               => Keep-Alive
NTLM-to-basic: Returning client auth request.
forward_request: palive=0, authok=0, ntlm=1, closed=0

Thread finished.
proxy_thread: request rc = 0xffffffffffffffff
[LWP 89021 exited]
Joined thread 70367536021200; rc: 0

The fault is coming, but it will break first...

Thread 1 "cntlm" hit Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
(gdb) x/2i $pc
=> 0x10002b08 <main+10568>:	lxvd2x  vs0,0,r9
   0x10002b0c <main+10572>:	stxvd2x vs0,r1,r10
(gdb) i r r9
r9             0x0	0

So r9 ended up with NULL, causing the vectorized load of $r9 to SIGSEGV. It would normally have a readable memory address, specifically a pointer to the BSS where the source timeval ({0x1, 0x0}) lives

Continuing on, to let it crash...

(gdb) step

Thread 1 "cntlm" received signal SIGSEGV, Segmentation fault.
main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
(gdb)

I'm not going to dig into how/why this is happening- but if I was going to start, it would be by setting a watchpoint on $r1 + 464, which appears to be on the stack. It should have a pointer to the BSS, but it's getting NULL. So I guess, technically, this is stack corruption... shrug

I should probably just close, I don't want to waste anyone else's time (or my own time!) any further :)

BTW- I think letting it run with the sanitizer flags might cause the select() loop to wait forever, which most people probably won't notice- but it's not ideal. I'm guessing the timeval gets stuck forever at {0, 0} as the reinitialization instruction(s) fail each time and are caught by the sanitizer (???)

mzpqnxow · 2021-10-18T23:58:44Z

I'm going to close this before it distracts or confuses anyone else :)

The issue goes away when the static library archive is produced properly anyway

mzpqnxow closed this as completed Oct 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encountering SIGSEGV after on successful request (statically linked, libpacparser, ppc64le) #17

Encountering SIGSEGV after on successful request (statically linked, libpacparser, ppc64le) #17

mzpqnxow commented Sep 23, 2021

mzpqnxow commented Sep 23, 2021 •

edited

Loading

mzpqnxow commented Sep 23, 2021

jschwartzenberg commented Oct 17, 2021

mzpqnxow commented Oct 18, 2021

mzpqnxow commented Oct 18, 2021

Encountering SIGSEGV after on successful request (statically linked, libpacparser, ppc64le) #17

Encountering SIGSEGV after on successful request (statically linked, libpacparser, ppc64le) #17

Comments

mzpqnxow commented Sep 23, 2021

mzpqnxow commented Sep 23, 2021 • edited Loading

mzpqnxow commented Sep 23, 2021

jschwartzenberg commented Oct 17, 2021

mzpqnxow commented Oct 18, 2021

mzpqnxow commented Oct 18, 2021

mzpqnxow commented Sep 23, 2021 •

edited

Loading