Since there are still several unavoidable C code in the MesaPy's codebase, we seek to formally verify these code to guarantee their memory safety. These C code are located in RPython's translator/JIT backends and its libraries. To cross validate our verification results, We utilize three state-of-the-art verification tools -- Seahorn, Smack, and TIS to prove conclusively that the memory safety issues can never occur.
The memory safety issues in our current checks include
- Buffer overflow
- Buffer over-read
- Null pointer dereference
- Memory leak
Here are three verification tools:
-
Seahorn: leverages the SMT solver and Abstract Interpretation techniques to check memory safte issues. http://seahorn.github.io/blog/
-
SMACK: translates the LLVM compiler's popular intermediate representation (IR) into the Boogie intermediate verification language (IVL), and leverages the Boogie and Corral verifiers for memory safety checks. https://github.com/smackers/smack
-
TrustInSoft Analyzer: uses formal methods to deliver mathematical guarantees. With an extensible framework that enables collaboration between different formal methods, TrustInSoft Analyzer can mathematically guarantee that a given flaw is not present in your source code. https://trust-in-soft.com/
By formal verification, we found some "sat" cases which could be potential vulnerabilities. Here, we use two cases to show the verification results. However, the potential memory safety problems depend on the input data. The possibilities of vulnerabilities need more investigations.
static Bigint *
Balloc(int k)
{
int x;
Bigint *rv;
unsigned int len;
if (0<=k <= Kmax && (rv = freelist[k]))
freelist[k] = rv->next;
else {
x = 1 << k; // -----------------------------> potential integer overflow
len = (sizeof(Bigint) + (x-1)*sizeof(ULong) + sizeof(double) - 1)
/sizeof(double);
if (k <= Kmax && pmem_next - private_mem + len <= PRIVATE_mem) {
rv = (Bigint*)pmem_next;
pmem_next += len;
}
else {
rv = (Bigint*)MALLOC(len*sizeof(double));
if (rv == NULL)
return NULL;
}
rv->k = k;
rv->maxwds = x;
}
rv->sign = rv->wds = 0;
return rv;
}
Above example locates in dtoa.c
, and it shows there could be a potential Integer
Overflow at labeled line. K could be any value including these greater than 32.
In such case, x will be overflowed.
RPY_EXTERN
long *pypy_jit_codemap_del(unsigned long addr, unsigned int size)
{
unsigned long search_key = addr + size - 1;
long *result;
skipnode_t *node;
/* There should be either zero or one codemap entry in the range.
In theory it should take the complete range, but for alignment
reasons the [addr, addr+size] range can be slightly bigger. */
node = skiplist_search(&jit_codemap_head, search_key);
pypy_codemap_invalid_set(1);
skiplist_remove(&jit_codemap_head, node->key); // -----------> invalid memory access
pypy_codemap_invalid_set(0);
/* there should be at most one */
assert(skiplist_search(&jit_codemap_head, search_key)->key < addr);
result = ((codemap_data_t *)node->data)->bytecode_info;
free(node);
return result;
}
Above example locates in codemap.c
. It shows a potential invalid memory
access. skiplist_search
could return NULL, and node could be NULL. As a result,
node->key will cause a crash.
We aim to formally verify all C code in MesaPy. Here is a list of all C sources (except code for Windows platform).
rpython/jit/backend/llsupport/src/codemap.c
rpython/jit/backend/llsupport/src/skiplist.c
rpython/translator/c/gcc/test/elf64/track_jumptable.c
rpython/translator/c/src/threadlocal.c
rpython/translator/c/src/exception.c
rpython/translator/c/src/entrypoint.c
rpython/translator/c/src/signals.c
rpython/translator/c/src/stacklet/tests.c
rpython/translator/c/src/stacklet/stacklet.c
rpython/translator/c/src/mem.c
rpython/translator/c/src/profiling.c
rpython/translator/c/src/ll_math.c
rpython/translator/c/src/thread_pthread.c
rpython/translator/c/src/stack.c
rpython/translator/c/src/int.c
rpython/translator/c/src/thread.c
rpython/translator/c/src/asm.c
rpython/translator/c/src/support.c
rpython/translator/c/src/instrument.c
rpython/translator/c/src/dtoa.c
rpython/translator/c/src/debug_print.c
rpython/translator/c/src/debug_traceback.c
rpython/translator/c/src/asm_gcc_x86.c
rpython/translator/c/src/ll_strtod.c
rpython/translator/c/src/rtyper.c
We also found that besides above C sources, there are some C functions embedded in Python sources. When building MesaPy, the embedded functions will be filled into cache files for later building and linking. Therefore, we also consider to verify these C functions to make sure all possible C source code are verified.
${PYPY_USESSION_DIR}/module_cache/module_22.c
${PYPY_USESSION_DIR}/module_cache/module_23.c
${PYPY_USESSION_DIR}/module_cache/module_24.c
${PYPY_USESSION_DIR}/module_cache/module_25.c
${PYPY_USESSION_DIR}/module_cache/module_26.c
${PYPY_USESSION_DIR}/module_cache/module_27.c
${PYPY_USESSION_DIR}/module_cache/module_28.c
${PYPY_USESSION_DIR}/module_cache/module_29.c
${PYPY_USESSION_DIR}/module_cache/module_30.c
${PYPY_USESSION_DIR}/module_cache/module_31.c
${PYPY_USESSION_DIR}/module_cache/module_32.c
${PYPY_USESSION_DIR}/module_cache/module_33.c
${PYPY_USESSION_DIR}/module_cache/module_34.c
In each directory of verification tools, we also list C sources and functions we have already verified.