Skip to content

0004: Static compilation for cfunction

Jameson Nash edited this page Mar 13, 2018 · 8 revisions

Primary issue

The way cfunction is defined requires that we have a compiler available at runtime to emit the calling-convention translator. Since calling-conventions in C are fixed (no dispatch), it not clear how much this functionality is used in practice. This Julep is a corollary to 0001: static-ccall). Secondly, cfunction doesn't handle closures (non-singleton functions), which is somewhat surprising for a runtime construct, and has led to further limits in its actual usage.

For example, let’s use the posix function qsort_r to demonstrate how we might currently use cfunction for a relatively advanced example:

function sort!(a::Array{T}, cmp::C) where {T, C}
    if isbits(T)
        compar_bits(cmp, a, b)::Cint = cmp(a, b)
        c = cfunction(compar_bits, Cint, Tuple{Ref{C}, Ref{T}, Ref{T}})
        width = sizeof(T)
    elseif is_pointer_array(a) # = !Base.datatype_pointerfree(Base.RefValue{T})
        compar_ptr(cmp, a, b)::Cint = cmp(a::T, b::T)
        c = cfunction(compar_ptr, Cint, Tuple{Ref{C}, Ref{Any}, Ref{Any}})
        width = sizeof(Ptr{T})
    else
        error("Array{$T} does not have a C-compatible layout")
    end
    ccall(:qsort_r, Cvoid, (Ptr{T}, Csize_t, Csize_t, Ptr{Cvoid}, Ptr{Cvoid}),
        a, length(a), width, Ref(cmp), c)
    return a
end

Summary of changes

Remove the jl_function_ptr function and the cfunction call. Replace with a @cfunction macro with similar syntax. In the example above, this is just a small change to the syntax of the call:

    cmp = @cfunction(compar_bits, Cint, (Ref{C}, Ref{T}, Ref{T}))::Ptr{Void}

To support capturing closure functions also, the name instead will need to be written with a $ (alternative: the local keyword?):

let cmp = @cfunction($compare_bits, Cint, (Ref{C}, Ref{T}, Ref{T}))::Ref{Ptr{Void}}
    GC.@preserve cmp begin
        use(cmp[])
    end
end

This formulation will allocate a box for compare_bits, and keep it and the function pointer inside cmp alive for as long as the cmp ref is alive.

We could re-write the above example to ignore the thunk parameter to qsort_r (as an example of using a C API which fails to provide this functionality) thusly:

    compare_bits = (unused, a, b) -> convert(Cint, cmp(a, b))
    let c = @cfunction($compare_bits, Cint, Tuple{Ptr{Cvoid}, Ref{T}, Ref{T}})
        GC.@preserve c begin
            use(c[])
        end
    end

Goals

  • Statically compilable (use declarative syntax rather than runtime discovery for c-signature)
  • No dependency on ccall to a C++ code-generator at runtime (except trampoline / nest support)
  • Support all current features
    • Parameterized return and call types for Ptr, Array, Ref
    • Calls happen in new jl_current_world environment
    • Calls can happen on a thread (and in theory, could also init julia)
    • Can be exposed as aliases in static compilation
  • Add support for attributes (pure, fortran, compiler target)
  • Add support for calling convention on cfunction
  • Don't require the evaluation of arbitrary code to compile the function body
  • Removed features
    • Type signature must now be static at definition time
    • Called function must now be a global-bound name (Q: allow local function definitions?)
  • Simplify / hide the cfunction internal cache (and related logic) – this should make it easier to add support for attributes
    • Removing jl_function_ptr from runtime means we won’t need to care about performance
    • Removes the special-case in ccall for intercepting the call to jl_function_ptr
    • Avoid needing to add logic to inference to infer this ccall
    • Use a simple id-dict over a MethodInstance instead
  • Avoid wanting to add backedges
    • But still optimize already generated/running methods to inject an updated handle, without changing the pointer
    • Always indirect the handle through a PLT+GOT
    • Track the GOT location, so it can overwrite the pointer later into a new trampoline (if the target method instance changes for the latest world)
  • Be clear (syntactically) when the input closure argument needs a gc root, but make the common and performance-optimal case be even simpler.
  • Never capture the current world-age in a closure: it is not a first-class value

Non-goals

  • Supporting va_args more (left as an exercise for future PR)
  • Avoiding all trampoline usage (avoided for specialized code, but very useful in generic code)

Implementation details

In most cases, FemtoCleaner.jl should be able to do this transform without difficulty. Although a few cases (Gtk.jl, for example) will require also changing the caller to a macro, or manually adding/removing a $ (if FemtoCleaner guesses wrong).

This would expand to: Expr(:cfunction, Ptr{Void}, $compar_bits, $Cint, $svec(Ref{C}, Ref{T}, Ref{T}), :(:ccall)). This ensures all necessary information is preserved in the IR for directly emitting the cfunction pointer immediately after method definition – except for the static parameters of the function. This guarantees that when the JIT is available (and we're compiling a fully specialized method signature), we will be able to compile a specific copy of this cfunction for the containing function.

Evaluate signature parameters using ccall rules:

  • Eval all arguments in global scope + sparams-env. Same spec as ccall:
    • Permit dependencies on the static parameters of the enclosing function that resolve to simple UnionAll bounds and don't impact the ABI of the type (all pointers are equivalent, but other values may not be handled identically across all ABIs).
    • Don't permit uncomputable dependencies on the static parameters of the function
    • The first argument is not evaluated if marked by a $ (wrapped in Expr(:$)), but will instead be expanded normally in the current scope
  • Don't permit dependancies on the later definition of other functions, late binding resolution, etc.

Lowering is very similar to ccall. In general:

@cfunction(fcn, ReturnType, (ArgTypes...,), attrs...)

splices into the AST the result of evaluating:

Expr(:cfunction, Ptr{Void}, QuoteNode(fcn), ReturnType, svec(ArgTypes...), :(:ccall), :(attrs)...)))

And

@cfunction($fcn, ReturnType, (ArgTypes...,), attrs...)

splices into the AST the result of evaluating:

Expr(:cfunction, CFunction, :(fcn), ReturnType, svec(ArgTypes...), :(:ccall), :(attrs)...)))

The first argument gives the return type. It must either be Ptr{Void}, or the type of a struct with the correct number and type of fields (currently Ptr{Void}, Any, Ptr{Void}, Ptr{Void}). Only the first field is directly applicable, and gives the handle. In most usages, this will be implicitly converted to Ptr{Cvoid} during a ccall.

The second argument evaluates to the function handle. If the first argument is Ptr{Void}, it'll be evaluated at method-definition time, otherwise, it'll be expected to be a runtime value.

The third argument is evaluated at method-definition time to be the return type.

The fourth argument is evaluated at method-definition time to be the list of argument types.

The fifth argument gives the calling convention.

The remaining arguments will give any additional attributes (when implemented).

The backend will emit a trampoline to load the extra parameter from the hidden nest argument, if needed. This trampoline will not be deleted (garbage collected via a finalizer) until the return value is deleted. The Julia handle fcn will also be gc-protected. If fcn is a constant value however (as in syntax case 1), no trampoline will be emitted and the result (and fcn) will never be garbage collected.

For compiling a generic and unspecialized signature, it will need to also handle emitting trampoline code to permit capturing those parameter values on-the-fly. These trampolines can be stored in a callsite-specific id-hash table that provides a mapping between method-instance <=> cfunction-fptr-closure. This also typically avoids the question of garbage collecting them (for syntax case 1 above) – they aren't – and there also won't be many of them.

The trampoline functions will be stored in a htable_t indexed by the Julia handle fcn. A second mapping of these Julia handles back to their trampoline(s) will also exist to handle garbage collecting them (adding their memory back to the trampoline memory pool) when the finalizer is triggered.

Supporting details

Edit history

3/2/18 vtjnash: Added notes that a trampoline will be needed to handle the general case, and how that'll be handled in caching, gc, etc.

3/8/18 vtjnash: Added sketch of closure support via Ref (for gc) and trampolines (for late-load / capture).

3/9/18 vtjnash: Corrections to closure support lowering

3/13/18 vtjnash: Redesign closure support based on implementation learning. Need to try to ensure that allocations happen in the correct place with the right types, and that lifetimes are associated properly with user intent.

Comments