Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bind to cython from Numba #43

Open
stuartarchibald opened this issue Oct 21, 2019 · 8 comments
Open

Bind to cython from Numba #43

stuartarchibald opened this issue Oct 21, 2019 · 8 comments

Comments

@stuartarchibald
Copy link

@mroeschke @jreback I'm not really sure where to put this, so it's going here. Feel free to close instantly/move the issue elsewhere!

RE this comment pandas-dev#28987 (comment) . I'm not entirely sure of the context as the pandas code base is not familiar, however I thought I'd mention that Numba can actually call Cython exported functions (docs). Numba does this a lot already to get the BLAS bindings from SciPy, and https://github.com/numba/numba-scipy also does this to bind to the scipy.special functions. I'm wondering if this might help with the code duplication hinted at in the above?

One thing to note though would be that Numba can call the cython functions, but the contents of the functions cannot take part in the optimisation passes (like inlining) that are performed by LLVM. So, depending on how hot the functions are maintaining two implementations may yield better performance.

@mroeschke
Copy link
Collaborator

Ah thanks for the insight @stuartarchibald!

Just for context, essentially pandas currently has a:

  1. A Cython class to calculate window boundaries.
  2. A Cython function to perform window aggregations.

(They are together in one function today but planned to be split out)

So essentially 2) will be replaced with a Numba function and 1) is referenced in the issue that we were thinking needed to be duplicated. Eventually we want 1) to be in Numba as well but during the migration period we might be able to do what you suggested!

@mroeschke
Copy link
Collaborator

Hi @stuartarchibald,

Coming back to your original suggestion here (almost 2 years later wow!) to call Cython from Numba , I had a few clarification questions as I am not too familar with accessing C funcs

  1. Does this approach work if the Cython functions are not cdef | cpdef?
  2. How would defining ctypes.CFUNCTYPE work if the Cython func has memory view or ndarray arguments/return types? Typically the Cython funcs have signatures like
def roll_something(
    const float64_t[:] values,
    ndarray[int64_t] start, 
    ndarray[int64_t] end, 
    int64_t minp
) -> ndarray[float64_t]:

@mroeschke
Copy link
Collaborator

@stuartarchibald whenever you have a moment I would greatly appreciate the feedback. Thanks!

@stuartarchibald
Copy link
Author

Hi @mroeschke

  1. Not sure. The key thing is that the __pyx_capi__ attribute on the module exists and the functions are described in there, as this is what Numba "looks" at. I also had some questions about this Questions about __pyx_capi__ use/stability. cython/cython#3077.
  2. I think you can probably "lie" about the array types with either ctypes.c_void_p or by saying they are ctypes.c_intp and passing the arrays as ndarray.ctypes.data. These are just guesses based on things which have been known to work previously. There's also the question of what to do with the returned array as it'll need converting into a Numba datamodel representation and likely reference counting. I suspect that @intrinsic will be needed to help describe types and do conversions etc.

If you have a potentially runnable example of what you want to do, I can take a look.

@mroeschke
Copy link
Collaborator

Thanks for the reply @stuartarchibald and sorry for the delay on my end:

As an example the aggregations.pyx file would contain:

import numpy as np

cdef extern from "src/headers/cmath" namespace "std":
    bint notnan(float64_t) nogil
    int signbit(float64_t) nogil

cdef inline float64_t calc_mean(int64_t minp, Py_ssize_t nobs,
                                Py_ssize_t neg_ct, float64_t sum_x) nogil:
    cdef:
        float64_t result

    if nobs >= minp and nobs > 0:
        result = sum_x / <float64_t>nobs
        if neg_ct == 0 and result < 0:
            # all positive
            result = 0
        elif neg_ct == nobs and result > 0:
            # all negative
            result = 0
        else:
            pass
    else:
        result = NaN
    return result


cdef inline void add_mean(float64_t val, Py_ssize_t *nobs, float64_t *sum_x,
                          Py_ssize_t *neg_ct, float64_t *compensation) nogil:
    """ add a value from the mean calc using Kahan summation """
    cdef:
        float64_t y, t

    # Not NaN
    if notnan(val):
        nobs[0] = nobs[0] + 1
        y = val - compensation[0]
        t = sum_x[0] + y
        compensation[0] = t - sum_x[0] - y
        sum_x[0] = t
        if signbit(val):
            neg_ct[0] = neg_ct[0] + 1


cdef inline void remove_mean(float64_t val, Py_ssize_t *nobs, float64_t *sum_x,
                             Py_ssize_t *neg_ct, float64_t *compensation) nogil:
    """ remove a value from the mean calc using Kahan summation """
    cdef:
        float64_t y, t

    if notnan(val):
        nobs[0] = nobs[0] - 1
        y = - val - compensation[0]
        t = sum_x[0] + y
        compensation[0] = t - sum_x[0] - y
        sum_x[0] = t
        if signbit(val):
            neg_ct[0] = neg_ct[0] - 1


def roll_mean(const float64_t[:] values, ndarray[int64_t] start,
              ndarray[int64_t] end, int64_t minp) -> np.ndarray:
    cdef:
        float64_t val, compensation_add = 0, compensation_remove = 0, sum_x = 0
        int64_t s, e
        Py_ssize_t nobs = 0, i, j, neg_ct = 0, N = len(values)
        ndarray[float64_t] output

    output = np.empty(N, dtype=np.float64)

    with nogil:

        for i in range(0, N):
            s = start[i]
            e = end[i]

            if i == 0:

                # setup
                for j in range(s, e):
                    val = values[j]
                    add_mean(val, &nobs, &sum_x, &neg_ct, &compensation_add)

            else:

                # calculate deletes
                for j in range(start[i - 1], s):
                    val = values[j]
                    remove_mean(val, &nobs, &sum_x, &neg_ct, &compensation_remove)

                # calculate adds
                for j in range(end[i - 1], e):
                    val = values[j]
                    add_mean(val, &nobs, &sum_x, &neg_ct, &compensation_add)

            output[i] = calc_mean(minp, nobs, neg_ct, sum_x)

    return output

And then the py file would have

import ctypes
from numba.extending import get_cython_function_address

addr = get_cython_function_address("aggregations", "roll_mean")
functype = ctypes.CFUNCTYPE(
    ctypes.c_float, # should be a float array
    ctypes.c_int, # should be an int array
    ctypes.c_int, # should be an int array
    ctypes.c_int
)
roll_mean = functype(addr)

@stuartarchibald
Copy link
Author

@mroeschke thanks for the example, I've spent a bit of time looking at this and have found the following...

I'm now pretty sure that functions needs to be declared with api for them to appear in __pyx_capi__, e.g. cdef api ndarray roll_mean(...).

I am not convinced that it'll be possible to pass NumPy arrays to cython functions from Numba, this because Numba has an internal representation of a NumPy array that is not the python object that cython seems to expect. It should be possible to pass the data from the NumPy array through to cython though.

I am also not convinced that it'll be possible to return a NumPy array from cython, again this because of how Numba has an internal representation of a NumPy array that is not a python object. There's also refcounting involved which may get tricky.

However, in the above, it might just be a question of working out how to "spell" in ctypes what it is that cython wants/needs, which might be a question for the cython devs. If you can create a ctypes representation of the types to go into the cython function it might be possible to wire Numba's representation through.

How to proceed...

I'd probably start with a trivial cython function that does something like:

def foo(x, y): #x and y are arrays
  y[0] = x[0]

and try and work out how to make that into something via cython that produces a C-like signature, e.g.

void c_foo(double * x_data, size_t x_len, double * y_data, size_t y_len) {}

then from there work out how to call that with ctypes using NumPy arrays, and finally do that from Numba.

An example of the above might be:
aggregations.pyx:

ctypedef api double cy_float64_t
ctypedef api long long int cy_int64_t

cdef api void roll_mean(cy_float64_t * x_data, cy_int64_t x_len,
                        cy_float64_t * y_data, cy_int64_t y_len):
    with nogil:
        x_data[0] = y_data[0]

call_aggregations.py:

import ctypes
from numba.extending import get_cython_function_address
from numba import njit
import numpy as np

addr = get_cython_function_address("aggregations", "roll_mean")
functype = ctypes.CFUNCTYPE(
    ctypes.c_voidp,
    ctypes.POINTER(ctypes.c_double),
    ctypes.c_long,
    ctypes.POINTER(ctypes.c_double),
    ctypes.c_long,
)
roll_mean = functype(addr)

n = 5
x = np.zeros(n)
y = np.ones(n)

@njit
def sink(*args):
    return args[0]

@njit
def foo(x, y):
    roll_mean(x.ctypes, len(x), y.ctypes, len(y))
    sink(x, y)

print(x)
foo(x, y)
print(x)

which gives:

$ python call_aggregations.py 
[0. 0. 0. 0. 0.]
[1. 0. 0. 0. 0.]

Another option here is to use https://numba.readthedocs.io/en/stable/user/withobjmode.html#numba.objmode which reacquires the GIL and makes the call to whatever is in the block using the cpython APIs.

@mroeschke
Copy link
Collaborator

Awesome thanks for all this investigation @stuartarchibald! I really appreciate all this detail with examples

Just for some background, in pandas we have some cython aggregations (mean for example) that currently only applies to DataFrame.rolling.mean, and we are exploring if numba could share that aggregation across DataFrame.groupby.mean and DataFrame.mean

While it appears possible to do this from your demo, it appears we would have to somewhat refactor our current cython aggregations for numba. This is a possible path forward, but we are also exploring just rewriting our cython aggregations in numba as well.

@stuartarchibald
Copy link
Author

@mroeschke No problem.

I'll raise this at the weekly public Numba meeting to see if anyone else has any ideas for transporting data to cython.

The issue with Numba calling the cython bindings is that they appear as function calls to Numba and this limits the optimisation possible and also obviously requires a function call to be made. If you were to rewrite the aggregations as Numba jit compiled functions you'd likely get the benefit of them taking part in LLVMs optimisation routines such that inlining and IPO would potentially take place leading to faster code.

I guess this comes down to trade-offs over effort required now, long term maintenance and performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants