Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encountering SIGSEGV after on successful request (statically linked, libpacparser, ppc64le) #17

Closed
mzpqnxow opened this issue Sep 23, 2021 · 5 comments

Comments

@mzpqnxow
Copy link
Contributor

Using the "feature" I submitted (static linking) in #16 I'm encountering a failure that seems to be associated with joining a thread after a request is complete. I'm entering this just for awareness and as a reminder for me to look further into it, I doubt you want to be supporting/debugging ppc64le & statically linked CNTLM- if you even have access to a ppc64le Linux machine :)

The request finishes successfully (the client gets the results) but then cntlm goes down with a SIGSEGV:

Connection                     => close
Content-Length                 => 93
Proxy-Authenticate             => NTLM
Sending headers (5)...
Body included. Length: 93
data_send: read 93 of 93 / 93 of 93 (errno = ok)
data_send: wrote 93 of 93
Body sent.
PROXY CLOSING CONNECTION
forward_request: palive=0, authok=0, ntlm=0, closed=1

Thread finished.
proxy_thread: request rc = 0xffffffffffffffff
Joined thread 70366707252944; rc: 0

In gdb, I'm seeing this, it appears to be a NULL pointer dereference, likely associated wioth the -1 return from proxy_thread():

Thread finished.
proxy_thread: request rc = 0xffffffffffffffff
[LWP 16317 exited]
Joined thread 70367536021200; rc: 0

Thread 1 "cntlm" received signal SIGSEGV, Segmentation fault.
0x0000000010002b08 in main ()
(gdb) 
(gdb) x/4i $pc
=> 0x10002b08 <main+10568>:	lxvd2x  vs0,0,r9
   0x10002b0c <main+10572>:	stxvd2x vs0,r1,r10
   0x10002b10 <main+10576>:	bl      0x1013408c <select+8>
   0x10002b14 <main+10580>:	nop
(gdb) i r vs0 r9
vs0            {uint128 = 0x00000000000000000000000000000000, v2_double = {0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v16_int8 = {0x0 <repeats 16 times>}}
r9             0x0	0
(gdb) bt
#0  0x0000000010002b08 in main ()
(gdb) 

I'll look into this more as I get a chance. If you prefer, you can close the issue here and I can open it on my fork

@mzpqnxow
Copy link
Contributor Author

mzpqnxow commented Sep 23, 2021

For reference:

				ret = direct_request(thread_data, request);
#ifdef ENABLE_PACPARSER
			} else if (pacparser_initialized) {
				/* If PAC is available, use it to serve request. */
				ret = pac_forward_request(thread_data, request, pac_list);
			} else {
				/* Else use statically configured proxies. */
				ret = forward_request(thread_data, request, NULL);
			}
#else
			}
			else
				ret = forward_request(thread_data, request);
#endif

			if (debug)
				printf("proxy_thread: request rc = %p\n", (void *)ret);
#ifdef ENABLE_PACPARSER
		} while (ret != NULL && ret != (void *)-1 && ret != (void *)-2);
#else
		} while (ret != NULL && ret != (void *)-1);
#endif
        if (debug)
            printf("proxy_thread: request rc = %p\n", (void *)&request);
		free_rr_data(&request);
	/*
	 * If client asked for proxy keep-alive, loop unless the last server response
	 * requested (Proxy-)Connection: close.
	 */
#ifdef ENABLE_PACPARSER
	} while (keep_alive && ret != (void *)-1 && ret != (void *)-2 && !serialize);
#else
	} while (keep_alive && ret != (void *)-1 && !serialize);
#endif

	/*
	 * Add ourselves to the "threads to join" list.
	 */
	if (!serialize) {
        if (debug)
            printf("threads_mtx = %p\n", &threads_mtx);
		pthread_mutex_lock(&threads_mtx);
		pthread_t thread_id = pthread_self();
		threads_list = plist_add(threads_list, (unsigned long)thread_id, NULL);
		pthread_mutex_unlock(&threads_mtx);
	}

#ifdef ENABLE_PACPARSER
	plist_free(pac_list);
#endif
	free(thread_data);
	close(cd);

	return NULL;
}

Another important note- this was cross-compiled, so I'll need to try with a native toolchain on a ppc64le host to see if that may be part of the cause

@mzpqnxow
Copy link
Contributor Author

It appears that this is associated with a linking mistake made by a human, specifically, me.

I would like to:

  1. Reproduce this and see if a check can be added on whichever pointer that's getting dereferenced. Maybe I'm missing something with my quick glance, or maybe one of those functions is a cpp macro with a dereference in it, but I don't see anything being dereferenced. If I can find it' I'll wrap null check or assert around it
  2. Make it more difficult for someone to make the same mistake I did when building

@jschwartzenberg
Copy link
Collaborator

Maybe I misunderstand something, but what happens when you compile with debug symbols and type bt in GDB after it segfaults?

@mzpqnxow
Copy link
Contributor Author

@jschwartzenberg I think your guess that you may be misunderstanding is correct. If you're reading this issue as "bug in libpacparser" then it's a misunderstanding- sorry for that. If you look at the backtrace above, it's dying in main()

Obviously It's theoretically possible that libpacparser to be at fault, but there's no indication of that. I included libpacparser in the title of the issue because libpacparser is statically linked into it- meaning only that it's not a typical build

If you're just curious about the bug then I have some more details as I've traced it back a little bit. Also, I noticed that building with -fsanitize=undefined -fsanitize-undefined-trap-on-error seems to prevent (or at least hide/mitigate) the issue. Also, preparing the static library archive and relinking "fixed" it

Anyway, if you want to see what gdb thinks the faulting line of code is, here it is with -g:

NTLM-to-basic: Returning client auth request.
forward_request: palive=0, authok=0, ntlm=1, closed=0

Thread finished.
proxy_thread: request rc = 0xffffffffffffffff
[LWP 86868 exited]
Joined thread 70367536021200; rc: 0

Thread 1 "cntlm" received signal SIGSEGV, Segmentation fault.
main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
(gdb) x/2i $pc
=> 0x10002b08 <main+10568>:	lxvd2x  vs0,0,r9
   0x10002b0c <main+10572>:	stxvd2x vs0,r1,r10
(gdb) bt
#0  main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
(gdb) i r vs0 r9
vs0            {uint128 = 0x00000000000000000000000000000000, v2_double = {0x0, 0x0}, v4_float = {0x0, 0x0, 0x0, 0x0}, v4_int32 = {0x0, 0x0, 0x0, 0x0}, v8_int16 = {0x0, 0x0, 0x0, 0x0, 0x0, 
    0x0, 0x0, 0x0}, v16_int8 = {0x0 <repeats 16 times>}}
r9             0x0	0
(gdb) 

Somehow it's getting NULL for the address that holds the values used to reset the timeval structure, and faulting while resetting it on line 1873 (

cntlm/main.c

Line 1873 in d546bfe

tv.tv_sec = 1;
)

Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
(gdb) x/20i $pc-32
   0x10002ae8 <main+10536>:	bne     cr7,0x10002ac0 <main+10496>
   0x10002aec <main+10540>:	ld      r9,464(r1)     <-------- r1 + 464 holds {1, 0}, used to reset the timeval
   0x10002af0 <main+10544>:	li      r10,400
   0x10002af4 <main+10548>:	mr      r7,r14
   0x10002af8 <main+10552>:	li      r6,0
   0x10002afc <main+10556>:	li      r5,0
   0x10002b00 <main+10560>:	mr      r4,r19
   0x10002b04 <main+10564>:	li      r3,1024
=> 0x10002b08 <main+10568>:	lxvd2x  vs0,0,r9
   0x10002b0c <main+10572>:	stxvd2x vs0,r1,r10
   0x10002b10 <main+10576>:	bl      0x1013408c <select+8>
(gdb) i r r1
r1             0x3fffffffe860	70368744171616
(gdb) x/gx $r1+464
0x3fffffffea30:	0x0000000010148b20
(gdb) x/2wx *($r1+464)
0x10148b20:	0x00000001	0x00000000
(gdb) i r r9
r9             0x10148b20	269781792    <----- Successfully loaded &{0x1, 0x0} to r9
(gdb) x/2wx $r9
0x10148b20:	0x00000001	0x00000000       <----- Will be written to &tv.tv_sec in the stxvd2x

So it looks relatively straightforward, a pointer to {1, 0} in BSS is stored in memory, then there's a vectorized load and store to reset the timeval on each iteration...

(gdb) commands          
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>x/gx $r9
>c
>end
(gdb) c
Continuing.

I let it continue, idle for a few select() cycles, printing out $r9 each time, to make sure it looks right...

Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
0x10148b20:	0x0000000000000001

Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
0x10148b20:	0x0000000000000001

Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
0x10148b20:	0x0000000000000001

Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
0x10148b20:	0x0000000000000001

Now I make the request to try to trigger the SIGSEGV...

[New LWP 88769]

******* Round 1 C: 5 *******
Reading headers (5)...
HEAD: CONNECT www.google.com:443 HTTP/1.1

Thread 1 "cntlm" hit Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
0x10148b20:	0x0000000000000001
Parsed PAC Proxies:
   PROXY XXXXXXX 8080
Created PAC list with 1 item(s):
List data: 1 => 0x0x1020a540

~~~~~~~ (1/1) PAC PROXY XXXXX:8080 ~~~~~~~
Thread processing...
cntlm[88690]: Resolving proxy XXXXX
Resolve XXXX:
     10.x.x.x
so_connect: x.x.x.x : 8080 
Host                           => www.google.com:443
User-Agent                     => curl/7.47.0
Proxy-Connection               => Keep-Alive
NTLM-to-basic: Returning client auth request.
forward_request: palive=0, authok=0, ntlm=1, closed=0

Thread finished.
proxy_thread: request rc = 0xffffffffffffffff
[LWP 89021 exited]
Joined thread 70367536021200; rc: 0

The fault is coming, but it will break first...

Thread 1 "cntlm" hit Breakpoint 1, main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
(gdb) x/2i $pc
=> 0x10002b08 <main+10568>:	lxvd2x  vs0,0,r9
   0x10002b0c <main+10572>:	stxvd2x vs0,r1,r10
(gdb) i r r9
r9             0x0	0

So r9 ended up with NULL, causing the vectorized load of $r9 to SIGSEGV. It would normally have a readable memory address, specifically a pointer to the BSS where the source timeval ({0x1, 0x0}) lives

Continuing on, to let it crash...

(gdb) step

Thread 1 "cntlm" received signal SIGSEGV, Segmentation fault.
main (argc=<optimized out>, argv=<optimized out>) at main.c:1872
1872			tv.tv_sec = 1;
(gdb) 

I'm not going to dig into how/why this is happening- but if I was going to start, it would be by setting a watchpoint on $r1 + 464, which appears to be on the stack. It should have a pointer to the BSS, but it's getting NULL. So I guess, technically, this is stack corruption... shrug

I should probably just close, I don't want to waste anyone else's time (or my own time!) any further :)

BTW- I think letting it run with the sanitizer flags might cause the select() loop to wait forever, which most people probably won't notice- but it's not ideal. I'm guessing the timeval gets stuck forever at {0, 0} as the reinitialization instruction(s) fail each time and are caught by the sanitizer (???)

@mzpqnxow
Copy link
Contributor Author

I'm going to close this before it distracts or confuses anyone else :)

The issue goes away when the static library archive is produced properly anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants