-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some questions about the code #10
Comments
Thank you for your questions.
Yes, during the binary rewiring phase, some system calls are hooked. But, the initially applied hook function (enter_syscall) simply invokes a kernel-space system call that the rewriting program wishes to perform, therefore, the program can continue to run.
Regarding the reference materials for libopcodes, I could not find a decent one; I tried to find its usage from random online materials. About the NEW_DIS_ASM ifdef section, we needed it because the API of the disassembler library has been changed since version 2.39 (please refer to Makefile) and we needed to differentiate the code for the new and old APIs.
Checking whether rax is 3n + 1 or not is for a technique to improve the efficiency of the code at virtual address 0 ~ maximum syscall number, described in https://github.com/yasukata/zpoline/tree/master/Documentation#reducing-nop-overhead-by-0xeb-0x6a-0x90-may-2023 .
As seen, in case 2 (the address is n * 3 + 1), the code pushes 0x90 to the stack; this 0x90 is not necessary and we wish to discard it. if (rax_register_vallue % 3 == 1) {
rsp_register_value += 8;
} Regarding rt_sigreturn, we omitted it from the C-based hook function implementation because it is a bit special and complicated to handle.
The reason why we differentiate the setting for the clone system call comes from the code below. void ____asm_impl(void)
{
/*
* enter_syscall triggers a kernel-space system call
*/
asm volatile (
".globl enter_syscall \n\t"
"enter_syscall: \n\t"
"movq %rdi, %rax \n\t"
"movq %rsi, %rdi \n\t"
"movq %rdx, %rsi \n\t"
"movq %rcx, %rdx \n\t"
"movq %r8, %r10 \n\t"
"movq %r9, %r8 \n\t"
"movq 8(%rsp),%r9 \n\t"
".globl syscall_addr \n\t"
"syscall_addr: \n\t"
"syscall \n\t"
"ret \n\t"
); After the hook is applied, we trigger a system call through the code above (enter_syscall); this implementation gets back to the caller by The point here is that, in my understanding, a new thread made by the clone system call will initially come to the instruction right after /* push return address to the stack */
rsi -= sizeof(uint64_t);
*((uint64_t *) rsi) = retptr; Thank you very much for your interest. |
thank you for your reply
I have a few questions regarding the filtering of clone system calls. First, why not filter the fork system call?
Then if the flags contain the CLONE_VM flag, it means that the child process and the parent process share the memory of the parent process, then I think there is a return address in the stack at this time. Finally, I want to understand how the system call hook overhead is measured in 3.2 of the paper. |
Thank you for your message.
I guess, it may depend on how "directly" call rt_sigreturn, but the following code triggering rt_sigreturn (syscall number 15) caused a segmentation fault in my environment (x86-64); so, I think rt_sigreturn seems to be executed and it does something. int main(void)
{
asm volatile ("movq $15, %rax");
asm volatile ("syscall");
return 0;
} But, essentially, it seems that rt_sigreturn does not assume to be directly called according to the manual ( https://man7.org/linux/man-pages/man2/sigreturn.2.html ).
Yes, when I removed this addition to rsp, I found a handler registered with the signal system call does not work properly.
Regarding clone/fork, please let me first summarize my understanding.
Yes, I also think there is a return address in the stack at this time, but that is only for the stack of the parent thread. As mentioned in 4(ii), even if the stack of the parent thread has a return address at the top of it, the stack of the child thread may not have it; this is why we manually put it on the stack of the child thread in syscall_hook.
Contrary, as mentioned in 4(i), the stack of a process, newly created by fork, has the return address at its top, therefore, we do not need the procedure, done in syscall_hook for clone + CLONE_VM, to manually put the return address on the stack of the child thread; this is why we do not filter the fork system call.
To measure the system call overhead, I use the following program which executes a loop for a certain number of times (specified by For this time, let's say we compile the following program and generate an executable file named #include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <getopt.h>
#include <assert.h>
extern pid_t do_getpid(void);
void __do_getpid(void)
{
asm volatile (".globl do_getpid");
asm volatile ("do_getpid:");
asm volatile ("movq $39, %rax");
asm volatile ("syscall");
asm volatile ("ret");
}
int main(int argc, char* const* argv)
{
int ch;
unsigned long loopcnt = 0;
while ((ch = getopt(argc, argv, "c:")) != -1) {
switch (ch) {
case 'c':
loopcnt = atol(optarg);
break;
default:
printf("unknown option\n");
exit(1);
}
}
if (!loopcnt) {
printf("please specify loop count by -c\n");
exit(0);
}
{
pid_t my_pid = getpid();
{
unsigned long t;
{
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
t = ts.tv_sec * 1000000000UL + ts.tv_nsec;
}
{
unsigned long i;
for (i = 0; i < loopcnt; i++)
assert(my_pid == do_getpid());
}
{
struct timespec ts;
clock_gettime(CLOCK_REALTIME, &ts);
t = ts.tv_sec * 1000000000UL + ts.tv_nsec - t;
}
printf("average %lu nsec\n", t / loopcnt);
}
}
return 0;
} For the hook-applied case, to avoid executing the kernel-space getpid system call, I use the following hook program that always returns a dummy value (10000 this time) for a getpid system call rather than enters the kernel by #include <stdio.h>
#include <syscall.h>
typedef long (*syscall_fn_t)(long, long, long, long, long, long, long);
static syscall_fn_t next_sys_call = NULL;
static long hook_function(long a1, long a2, long a3,
long a4, long a5, long a6,
long a7)
{
if (a1 == __NR_getpid)
return 10000;
else
return next_sys_call(a1, a2, a3, a4, a5, a6, a7);
}
int __hook_init(long placeholder __attribute__((unused)),
void *sys_call_hook_ptr)
{
next_sys_call = *((syscall_fn_t *) sys_call_hook_ptr);
*((syscall_fn_t *) sys_call_hook_ptr) = hook_function;
return 0;
} To try this, please replace the content of The following command will execute
No, we do not involve the hook setting time as part of the system call hook overhead. Thank you very much for your questions. |
Thanks for your detailed reply, I will use the information you provided to reproduce. Finally, bro, this is really a great study. |
@yasukata Hello, I saw that you used ptrace to hijack system calls in your paper? I wonder how this is done? Is it through PTRACE_SYSEMU or PTRACE_SYSCALL in ptrace? Can you share your code? I understand that all system calls are hijacked, and then other calls are executed normally, but the getpid system call is hijacked into asm_syscall_hook |
Thank you for your message. The following program could be used for the getpid test; it leverages PTRACE_SYSCALL rather than PTRACE_SYSEMU. In the following program, to selectively change the behavior of a specific system call:
#include <stdio.h>
#include <stddef.h>
#include <stdbool.h>
#include <unistd.h>
#include <assert.h>
#include <wait.h>
#include <syscall.h>
#include <sys/user.h>
#include <sys/ptrace.h>
int main(int argc, char* const* argv)
{
pid_t pid;
assert(argc > 1);
pid = fork();
assert(pid >= 0);
if (pid == 0) {
assert(!ptrace(PTRACE_TRACEME, 0L, 0L, 0L));
execvp(argv[1], &argv[1]);
} else {
int status;
pid = wait(&status);
assert(!ptrace(PTRACE_SETOPTIONS, pid, 0, PTRACE_O_EXITKILL));
assert(!ptrace(PTRACE_SYSCALL, pid, 0, 0));
while (1) {
bool skipped = false;
struct user_regs_struct regs;
pid = wait(&status);
if (WIFEXITED(status))
break;
assert(!ptrace(PTRACE_GETREGS, pid, 0, ®s));
if (regs.orig_rax == __NR_getpid) {
assert(!ptrace(PTRACE_POKEUSER, pid, offsetof(struct user_regs_struct, orig_rax), __NR_getpid));
skipped = true;
}
assert(!ptrace(PTRACE_SYSCALL, pid, 0, 0));
pid = wait(&status);
if (WIFEXITED(status))
break;
if (skipped) {
regs.rax = 10000;
assert(!ptrace(PTRACE_SETREGS, pid, 0, ®s));
}
assert(!ptrace(PTRACE_SYSCALL, pid, 0, 0));
}
}
return 0;
} Let's say the code above is saved in a file named
The following executes the test where a.out is the program attached in the previous post; it executes getpid for a certain number of times specified by
Thank you for your question. |
Thank you for your detailed reply
|
Regarding the second question, after debugging, I found that the entrance and exit rips of the system call are the same, so the idea of adjusting the rips I mentioned is not feasible. |
Yes, I think this is correct; even if we do not have The reason why I put
I also think it would be nice if we could avoid getpid just for canceling the originally requested system call, but I could not find other easy options. I think a discussion in a paper https://www.usenix.org/conference/atc22/presentation/jansen (Appendices B.2 Reducing Per-syscall ptrace Stops) provides good insight. Thank you for your message. |
thanks for your reply |
Thank you very much for the series of questions. I would close this issue, but please feel free to reopen this or open another one if you have further comments or questions. |
This is a great job, I have some problems reading the code in main.c, can you help me:
Thanks for your help!!
The text was updated successfully, but these errors were encountered: