Bind to cython from Numba #43

stuartarchibald · 2019-10-21T11:04:21Z

@mroeschke @jreback I'm not really sure where to put this, so it's going here. Feel free to close instantly/move the issue elsewhere!

RE this comment pandas-dev#28987 (comment) . I'm not entirely sure of the context as the pandas code base is not familiar, however I thought I'd mention that Numba can actually call Cython exported functions (docs). Numba does this a lot already to get the BLAS bindings from SciPy, and https://github.com/numba/numba-scipy also does this to bind to the scipy.special functions. I'm wondering if this might help with the code duplication hinted at in the above?

One thing to note though would be that Numba can call the cython functions, but the contents of the functions cannot take part in the optimisation passes (like inlining) that are performed by LLVM. So, depending on how hot the functions are maintaining two implementations may yield better performance.

The text was updated successfully, but these errors were encountered:

mroeschke · 2019-10-21T17:15:01Z

Ah thanks for the insight @stuartarchibald!

Just for context, essentially pandas currently has a:

A Cython class to calculate window boundaries.
A Cython function to perform window aggregations.

(They are together in one function today but planned to be split out)

So essentially 2) will be replaced with a Numba function and 1) is referenced in the issue that we were thinking needed to be duplicated. Eventually we want 1) to be in Numba as well but during the migration period we might be able to do what you suggested!

mroeschke · 2021-07-24T06:01:00Z

Hi @stuartarchibald,

Coming back to your original suggestion here (almost 2 years later wow!) to call Cython from Numba , I had a few clarification questions as I am not too familar with accessing C funcs

Does this approach work if the Cython functions are not cdef | cpdef?
How would defining ctypes.CFUNCTYPE work if the Cython func has memory view or ndarray arguments/return types? Typically the Cython funcs have signatures like

def roll_something(
    const float64_t[:] values,
    ndarray[int64_t] start, 
    ndarray[int64_t] end, 
    int64_t minp
) -> ndarray[float64_t]:

mroeschke · 2021-08-02T05:51:26Z

@stuartarchibald whenever you have a moment I would greatly appreciate the feedback. Thanks!

stuartarchibald · 2021-08-02T09:44:48Z

Hi @mroeschke

Not sure. The key thing is that the __pyx_capi__ attribute on the module exists and the functions are described in there, as this is what Numba "looks" at. I also had some questions about this Questions about __pyx_capi__ use/stability. cython/cython#3077.
I think you can probably "lie" about the array types with either ctypes.c_void_p or by saying they are ctypes.c_intp and passing the arrays as ndarray.ctypes.data. These are just guesses based on things which have been known to work previously. There's also the question of what to do with the returned array as it'll need converting into a Numba datamodel representation and likely reference counting. I suspect that @intrinsic will be needed to help describe types and do conversions etc.

If you have a potentially runnable example of what you want to do, I can take a look.

mroeschke · 2021-08-06T04:36:23Z

Thanks for the reply @stuartarchibald and sorry for the delay on my end:

As an example the aggregations.pyx file would contain:

import numpy as np

cdef extern from "src/headers/cmath" namespace "std":
    bint notnan(float64_t) nogil
    int signbit(float64_t) nogil

cdef inline float64_t calc_mean(int64_t minp, Py_ssize_t nobs,
                                Py_ssize_t neg_ct, float64_t sum_x) nogil:
    cdef:
        float64_t result

    if nobs >= minp and nobs > 0:
        result = sum_x / <float64_t>nobs
        if neg_ct == 0 and result < 0:
            # all positive
            result = 0
        elif neg_ct == nobs and result > 0:
            # all negative
            result = 0
        else:
            pass
    else:
        result = NaN
    return result


cdef inline void add_mean(float64_t val, Py_ssize_t *nobs, float64_t *sum_x,
                          Py_ssize_t *neg_ct, float64_t *compensation) nogil:
    """ add a value from the mean calc using Kahan summation """
    cdef:
        float64_t y, t

    # Not NaN
    if notnan(val):
        nobs[0] = nobs[0] + 1
        y = val - compensation[0]
        t = sum_x[0] + y
        compensation[0] = t - sum_x[0] - y
        sum_x[0] = t
        if signbit(val):
            neg_ct[0] = neg_ct[0] + 1


cdef inline void remove_mean(float64_t val, Py_ssize_t *nobs, float64_t *sum_x,
                             Py_ssize_t *neg_ct, float64_t *compensation) nogil:
    """ remove a value from the mean calc using Kahan summation """
    cdef:
        float64_t y, t

    if notnan(val):
        nobs[0] = nobs[0] - 1
        y = - val - compensation[0]
        t = sum_x[0] + y
        compensation[0] = t - sum_x[0] - y
        sum_x[0] = t
        if signbit(val):
            neg_ct[0] = neg_ct[0] - 1


def roll_mean(const float64_t[:] values, ndarray[int64_t] start,
              ndarray[int64_t] end, int64_t minp) -> np.ndarray:
    cdef:
        float64_t val, compensation_add = 0, compensation_remove = 0, sum_x = 0
        int64_t s, e
        Py_ssize_t nobs = 0, i, j, neg_ct = 0, N = len(values)
        ndarray[float64_t] output

    output = np.empty(N, dtype=np.float64)

    with nogil:

        for i in range(0, N):
            s = start[i]
            e = end[i]

            if i == 0:

                # setup
                for j in range(s, e):
                    val = values[j]
                    add_mean(val, &nobs, &sum_x, &neg_ct, &compensation_add)

            else:

                # calculate deletes
                for j in range(start[i - 1], s):
                    val = values[j]
                    remove_mean(val, &nobs, &sum_x, &neg_ct, &compensation_remove)

                # calculate adds
                for j in range(end[i - 1], e):
                    val = values[j]
                    add_mean(val, &nobs, &sum_x, &neg_ct, &compensation_add)

            output[i] = calc_mean(minp, nobs, neg_ct, sum_x)

    return output

And then the py file would have

import ctypes
from numba.extending import get_cython_function_address

addr = get_cython_function_address("aggregations", "roll_mean")
functype = ctypes.CFUNCTYPE(
    ctypes.c_float, # should be a float array
    ctypes.c_int, # should be an int array
    ctypes.c_int, # should be an int array
    ctypes.c_int
)
roll_mean = functype(addr)

stuartarchibald · 2021-08-06T11:28:35Z

@mroeschke thanks for the example, I've spent a bit of time looking at this and have found the following...

I'm now pretty sure that functions needs to be declared with api for them to appear in __pyx_capi__, e.g. cdef api ndarray roll_mean(...).

I am not convinced that it'll be possible to pass NumPy arrays to cython functions from Numba, this because Numba has an internal representation of a NumPy array that is not the python object that cython seems to expect. It should be possible to pass the data from the NumPy array through to cython though.

I am also not convinced that it'll be possible to return a NumPy array from cython, again this because of how Numba has an internal representation of a NumPy array that is not a python object. There's also refcounting involved which may get tricky.

However, in the above, it might just be a question of working out how to "spell" in ctypes what it is that cython wants/needs, which might be a question for the cython devs. If you can create a ctypes representation of the types to go into the cython function it might be possible to wire Numba's representation through.

How to proceed...

I'd probably start with a trivial cython function that does something like:

def foo(x, y): #x and y are arrays
  y[0] = x[0]

and try and work out how to make that into something via cython that produces a C-like signature, e.g.

void c_foo(double * x_data, size_t x_len, double * y_data, size_t y_len) {}

then from there work out how to call that with ctypes using NumPy arrays, and finally do that from Numba.

An example of the above might be:
aggregations.pyx:

ctypedef api double cy_float64_t
ctypedef api long long int cy_int64_t

cdef api void roll_mean(cy_float64_t * x_data, cy_int64_t x_len,
                        cy_float64_t * y_data, cy_int64_t y_len):
    with nogil:
        x_data[0] = y_data[0]

call_aggregations.py:

import ctypes
from numba.extending import get_cython_function_address
from numba import njit
import numpy as np

addr = get_cython_function_address("aggregations", "roll_mean")
functype = ctypes.CFUNCTYPE(
    ctypes.c_voidp,
    ctypes.POINTER(ctypes.c_double),
    ctypes.c_long,
    ctypes.POINTER(ctypes.c_double),
    ctypes.c_long,
)
roll_mean = functype(addr)

n = 5
x = np.zeros(n)
y = np.ones(n)

@njit
def sink(*args):
    return args[0]

@njit
def foo(x, y):
    roll_mean(x.ctypes, len(x), y.ctypes, len(y))
    sink(x, y)

print(x)
foo(x, y)
print(x)

which gives:

$ python call_aggregations.py 
[0. 0. 0. 0. 0.]
[1. 0. 0. 0. 0.]

Another option here is to use https://numba.readthedocs.io/en/stable/user/withobjmode.html#numba.objmode which reacquires the GIL and makes the call to whatever is in the block using the cpython APIs.

mroeschke · 2021-08-08T01:13:13Z

Awesome thanks for all this investigation @stuartarchibald! I really appreciate all this detail with examples

Just for some background, in pandas we have some cython aggregations (mean for example) that currently only applies to DataFrame.rolling.mean, and we are exploring if numba could share that aggregation across DataFrame.groupby.mean and DataFrame.mean

While it appears possible to do this from your demo, it appears we would have to somewhat refactor our current cython aggregations for numba. This is a possible path forward, but we are also exploring just rewriting our cython aggregations in numba as well.

stuartarchibald · 2021-08-09T11:40:40Z

@mroeschke No problem.

I'll raise this at the weekly public Numba meeting to see if anyone else has any ideas for transporting data to cython.

The issue with Numba calling the cython bindings is that they appear as function calls to Numba and this limits the optimisation possible and also obviously requires a function call to be made. If you were to rewrite the aggregations as Numba jit compiled functions you'd likely get the benefit of them taking part in LLVMs optimisation routines such that inlining and IPO would potentially take place leading to faster code.

I guess this comes down to trade-offs over effort required now, long term maintenance and performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bind to cython from Numba #43

Bind to cython from Numba #43

stuartarchibald commented Oct 21, 2019

mroeschke commented Oct 21, 2019

mroeschke commented Jul 24, 2021

mroeschke commented Aug 2, 2021

stuartarchibald commented Aug 2, 2021

mroeschke commented Aug 6, 2021

stuartarchibald commented Aug 6, 2021

mroeschke commented Aug 8, 2021

stuartarchibald commented Aug 9, 2021

Bind to cython from Numba #43

Bind to cython from Numba #43

Comments

stuartarchibald commented Oct 21, 2019

mroeschke commented Oct 21, 2019

mroeschke commented Jul 24, 2021

mroeschke commented Aug 2, 2021

stuartarchibald commented Aug 2, 2021

mroeschke commented Aug 6, 2021

stuartarchibald commented Aug 6, 2021

mroeschke commented Aug 8, 2021

stuartarchibald commented Aug 9, 2021