# System Call

![](resources/systemcall01.png)

## system call stack frame

![](resources/systemcall03.png)
![](resources/systemcall04.png)

1. Normal C function call stack frame
  
![](resources/systemcall02.png)

2. System call copy all parameters from the registers to kernel stack to form a normal C function call stack frame. 
   
   Then run the real system call function like calling a normal C function.

3. SAVE_ALL macro
   ```c
   #define SAVE_ALL \
	cld; \
	pushl %es; \
	pushl %ds; \
	pushl %eax; \
	pushl %ebp; \
	pushl %edi; \
	pushl %esi; \
	pushl %edx; \
	pushl %ecx; \
	pushl %ebx; \
	movl $(__USER_DS), %edx; \
	movl %edx, %ds; \
	movl %edx, %es;
   ```

--------------------------------

## exception table

内核中可能会遇到Page Fault异常，可能是来自错误的system call参数，为了处理这种异常，建立了exception table

因为system call中涉及内存访问的指令并不多，所以每个可能会触发这个异常的指令，都在exception table中做了记录

![](resources/systemcall05.png)
![](resources/systemcall06.png)
![](resources/systemcall07.png)
![](resources/systemcall08.png)
![](resources/systemcall09.png)


### linux2.6/kernel/extable.c

```c
extern struct exception_table_entry __start___ex_table[];
extern struct exception_table_entry __stop___ex_table[];

/* Sort the kernel's built-in exception table */
void __init sort_main_extable(void)
{
	sort_extable(__start___ex_table, __stop___ex_table);
}

/* Given an address, look for it in the exception tables. */
const struct exception_table_entry *search_exception_tables(unsigned long addr)
{
	const struct exception_table_entry *e;

	e = search_extable(__start___ex_table, __stop___ex_table-1, addr);
	if (!e)
		e = search_module_extables(addr);
	return e;
}
```


### linux2.6/lib/extable.c

```c
extern struct exception_table_entry __start___ex_table[];
extern struct exception_table_entry __stop___ex_table[];

#ifndef ARCH_HAS_SORT_EXTABLE
/*
 * The exception table needs to be sorted so that the binary
 * search that we use to find entries in it works properly.
 * This is used both for the kernel exception table and for
 * the exception tables of modules that get loaded.
 */
void sort_extable(struct exception_table_entry *start,
		  struct exception_table_entry *finish)
{
	struct exception_table_entry el, *p, *q;

	/* insertion sort */
	for (p = start + 1; p < finish; ++p) {
		/* start .. p-1 is sorted */
		if (p[0].insn < p[-1].insn) {
			/* move element p down to its right place */
			el = *p;
			q = p;
			do {
				/* el comes before q[-1], move q[-1] up one */
				q[0] = q[-1];
				--q;
			} while (q > start && el.insn < q[-1].insn);
			*q = el;
		}
	}
}
#endif

#ifndef ARCH_HAS_SEARCH_EXTABLE
/*
 * Search one exception table for an entry corresponding to the
 * given instruction address, and return the address of the entry,
 * or NULL if none is found.
 * We use a binary search, and thus we assume that the table is
 * already sorted.
 */
const struct exception_table_entry *
search_extable(const struct exception_table_entry *first,
	       const struct exception_table_entry *last,
	       unsigned long value)
{
	while (first <= last) {
		const struct exception_table_entry *mid;

		mid = (last - first) / 2 + first;
		/*
		 * careful, the distance between entries can be
		 * larger than 2GB:
		 */
		if (mid->insn < value)
			first = mid + 1;
		else if (mid->insn > value)
			last = mid - 1;
		else
			return mid;
        }
        return NULL;
}
#endif

```

### linux2.6/kernel/module.c

```c
/* Given an address, look for it in the module exception tables. */
const struct exception_table_entry *search_module_extables(unsigned long addr)
{
	unsigned long flags;
	const struct exception_table_entry *e = NULL;
	struct module *mod;

	spin_lock_irqsave(&modlist_lock, flags);
	list_for_each_entry(mod, &modules, list) {
		if (mod->num_exentries == 0)
			continue;
				
		e = search_extable(mod->extable,
				   mod->extable + mod->num_exentries - 1,
				   addr);
		if (e)
			break;
	}
	spin_unlock_irqrestore(&modlist_lock, flags);

	/* Now, if we found one, we are running inside it now, hence
           we cannot unload the module, hence no refcnt needed. */
	return e;
}
```


1. `sort_extable` sort ex_table. `[-1]` is a trick.

2. EIP,CS registers' values are stored on the stack frame for return from the exception handler(just like the normal function return)

![](resources/systemcall10.png)


-------------------------