Home

Tavis Ormandy edited this page Jul 17, 2016 · 17 revisions
Clone this wiki locally

Welcome to ctypes.sh, a foreign function interface for bash.

ctypes.sh is a bash plugin that provides a foreign function interface directly in your shell. In other words, it allows you to call routines in shared libraries from within bash.

To help illustrate what ctypes.sh does, here is a trivial example.

    $ dlcall puts "Hello, World"
    Hello, World

    # A more complex example, use libm to calculate sin(PI/2)
    $ dlopen libm.so.6
    0x172ebf0
    $ dlcall -r double sin double:1.57079632679489661923
    double:1.000000

All of the ctypes.sh builtins support documentation, acccess it via the bash help command, for example, help dlopen.

Using ctypes.sh

ctypes.sh is a bash plugin. Do not confuse plugins with scripts, they are unrelated concepts. Plugins are rarely used, but allow you to extend bash at runtime with additional builtins.

A script that automates the loading process and provides some convenience functions is available to source from your scripts.

    $ source ctypes.sh

If you're using ctypes.sh in a script and want to verify that it loaded correctly, you can import it like this.

#!/bin/bash

if ! source ctypes.sh; then
    echo "please install ctypes.sh to continue"
fi

Coding Conventions

It is a common pattern in bash to obtain output like so

    $ output=$(command --flags input)

or equivalently

    $ output=`command --flags input`

This creates a subshell, and modifications to the subshell are discarded once the command completes. A bash programmer might naturally expect to write this to get a handle to a shared library

    $ handle=$(dlopen libc.so.6)  # DO NOT DO THIS

However, the handle retrieved will be invalid in the parent shell and was probably not what was intended. The correct way to do this is described in the section below. However, if you'e doing something that doesn't have any side-effects (such as printing a string), it will work. For example.

    $ string=$(dlcall printf "%lf" double:1.2345)

This will make the value of $string the string 1.2345

ctypes.sh is a powerful interface, and naturally allows you to shoot yourself in the foot. Using invalid handles or pointers is a good way to crash your shell, or cause unexpected behaviour.

    $ dlcall -h 0xdeadbeef crashcrashcrash
    Segmentation fault

Never report a bug to the bash maintainers unless you can reproduce it without the ctypes plugin loaded. The source of the bug is most likely either ctypes.sh, or your script.

Troubleshooting starting ctypes.sh

Bash plugins are not commonly used, and distributions often package the feature incompletely or incorrectly due to limited testing. Header files are very rarely provided and dynamic symbols are sometimes exported incorrectly.

  • A list of known-working distributions and platforms is here
  • A list of symptoms and suggested fixes is here. TODO

Loading dynamic shared objects

ctypes.sh provides comprehensive access to the dlopen interface, but for typical usage the defaults will work fine.

    $ dlopen libz.so
    0x2232450

By default, libraries are added to the global scope, so you probably won't need to use the handle returned. If you do need it, simply lookup the soname in the DLHANDLES array. You would usually only need to do this if you want

    $ echo ${DLHANDLES[libz.so]}
    0x2232450

Pseudo-handles

Two pseudo-handles are provided by ctypes.sh, $RTLD_DEFAULT and $RTLD_NEXT. These special handles are described in the dlopen(3) manual.

Accessing bash internals

If you want to reference an internal bash symbol (for example, you want to lookup the address of a bash variable) you don't need to use a handle, $RTLD_DEFAULT is assumed by default, so it is sufficient to do this:

    $ foobar="hello bash internals"
    $ dlcall -r pointer get_string_value foobar
    pointer:0x222b3f0
    $ dlcall puts pointer:0x222b3f0
    hello bash internals

Note that get_string_value is a symbol provided by bash, not ctypes.sh. ctypes.sh simply allows you to access these internal symbols.

Controlling dlopen

The default mode of dlopen is suitable for most operations, but for more control over the load you may specify flags on the commandline. dlopen supports bash-style switches for the most common flags, or you may specify the flags on the commandline if no switch exists.

    $ dlopen libz.so RTLD_GLOBAL RTLD_LAZY

or

    $ dlopen -l -g libz.so

If you need a very rarely used flag that ctypes.sh does not know about, you can specify it numerically.

    $ dlopen libc.so.6 0x101

For a full list of options supported, use the builtin help

    $ help dlopen

By default, libraries will be opened at global scope using RTLD_GLOBAL, but you can disable this with the -g flag.

Accessing functions from loaded libraries

To obtain a list of exported symbols from a loaded library, use the standard UNIX command nm.

    $ nm -D /lib64/libz.so
    00000000000023f0 T adler32
    000000000000c680 T compress
    0000000000002a60 T crc32
    00000000000046a0 T deflate
    ...

To call a function, you must know its return type and its parameters, you then call the function with dlcall. By default, the return value is stored in the DLRETVAL variable, but you can change that if you wish.

Lets look at an example before the details are explained.

    # What are the parameters to crc32?
    $ grep crc32 /usr/include/zlib.h
    unsigned long crc32(unsigned long crc, const char *buf, unsigned len);

    $ dlopen libz.so
    0x2232450
    $ dlcall -r long crc32 long:0 "hello" 5
    long:907060870

    # What is that in hex?
    $ printf "%#x\n" ${DLRETVAL##*:}
    0x3610a686

Encoding types

Because bash only supports two primitive data types (strings and integers), it is necessary to introduce a syntax to encode additional types that might be encountered. ctypes.sh uses prefixed types strings, like so:

    <primitive type>:<formatted value>

For example:

    float:3.141459
    long:25979456
    int8:-2
    string:hello
    pointer:0xdeadbeef

However, for convenience, if a type is not prefixed, then the following rule applies:

  • If it can be parsed perfectly as an integer, it is assumed to be an integer.
    • perfectly means endptr=='\0' and endptr != nptr, see the strtoul manual for details.
  • Otherwise, it is assumed to be a nul-terminated C string.

You should always use a prefix for non-hardcoded values, or unexpected colons or integers might disrupt parsing.

dlcall recognises some common primitive type names:

Prefix Example Range Notes
uint8 uint8:128 0-255
int8 int8:-12 -127-128
uint16 uint16:387
int16 int16:-922
uint32 uint32:299769
int32 int32:-1
uint64 uint64:11
int64 int64:-123
float float:3.1412
double double:12e10
char char:10
uchar uchar:102
ushort ushort:123
short short:-123
unsigned unsigned:0
int int:-23
ulong ulong:1231
long long:123
longdouble longdouble:1.23
pointer pointer:0xdeadbeef
string string:hello
void void:
rawdouble rawdouble:0x1.8p+0
rawfloat rawfloat:0x1.8p+0

A good way to experiment with prefixed types is by calling printf.

    $ dlcall printf "%s %u %p %c" string:Hello unsigned:123 pointer:0xdeadbeef int:10
Return types
Structures and pointers

ctypes.sh can automatically import most structure definitions from libraries via the struct command.

More information on using struct is available here.

Accessing bash arrays

TODO

Callbacks and function pointers

ctypes.sh can generate callable function pointers to bash functions, for use as callbacks or function pointers. Example of where this is necessary are the standard library functions qsort and bsearch.

To write a native callable function, first define the function. The first parameter will be a pointer to store the return code, followed by the formal parameters.

It is usually not possible to return the value using the return command in bash, because functions in bash can only return small integers <= 255. For this reason, a pointer is provided to the required return type.

Lets see how this works by calling qsort from bash.

#!/bin/bash

source ctypes.sh

declare -i sortsize=128     # size of array
declare -a values           # array of values
set -e

# int compare(const void *, const void *)
function compare {
    local -a x=(int)
    local -a y=(int)
    local -a result

    # extract the parameters
    unpack $2 x
    unpack $3 y

    # remove the prefix
    x=${x##*:}
    y=${y##*:}

    # calculate result
    result=(int:$((y - x)))

    # return result to caller
    pack $1 result

    return
}

# Generate a function pointer to compare that can be called from native code.
callback -n compare compare int pointer pointer

# Generate an array of random values
for ((i = 0; i < sortsize; i++)); do
    values+=(int:$RANDOM)
done

# Verify that array is not sorted
if sort --check=silent --numeric <(IFS=$'\n'; echo "${values[*]##*:}"); then
    echo FAIL
    exit 1
fi

# Allocate space for integers
dlcall -n buffer -r pointer malloc $((sortsize * 4))

# Pack our random array into that native array
pack $buffer values

# Now qsort can sort them
dlcall qsort $buffer long:$sortsize long:4 $compare

# Unpack the sorted array back into a bash array
unpack $buffer values

# Verify they're sorted
if ! sort --check --numeric <(IFS=$'\n'; echo "${values[*]##*:}"); then
    echo FAIL
    exit 1
fi

echo PASS

Here is the output

$ bash qsort.sh 
PASS
Exported values

Sometimes you may want to access an exported symbol that is not a function, or you want to know the address of an exported function. For example, you might want to know the address of environ or errno.

To do this, use the dlsym builtin. For example, here is how to access a bash internal symbol.

# I don't want to use $!, let's grab it from inside bash.
$ dlsym last_asynchronous_pid
pointer:0x6ecf14
$ pid=(int)
$ sleep 100 &
[2] 57271
$ unpack pointer:0x6ecf14 pid
$ echo ${pid##*:}
57271
More Examples
$ dlopen libm.so.6
0x1dcead0
$ dlcall -n result -r double sin double:123.123
double:-0.565374