-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bind to cython from Numba #43
Comments
Ah thanks for the insight @stuartarchibald! Just for context, essentially pandas currently has a:
(They are together in one function today but planned to be split out) So essentially 2) will be replaced with a Numba function and 1) is referenced in the issue that we were thinking needed to be duplicated. Eventually we want 1) to be in Numba as well but during the migration period we might be able to do what you suggested! |
Hi @stuartarchibald, Coming back to your original suggestion here (almost 2 years later wow!) to call Cython from Numba , I had a few clarification questions as I am not too familar with accessing C funcs
|
@stuartarchibald whenever you have a moment I would greatly appreciate the feedback. Thanks! |
Hi @mroeschke
If you have a potentially runnable example of what you want to do, I can take a look. |
Thanks for the reply @stuartarchibald and sorry for the delay on my end: As an example the
And then the
|
@mroeschke thanks for the example, I've spent a bit of time looking at this and have found the following... I'm now pretty sure that functions needs to be declared with I am not convinced that it'll be possible to pass NumPy arrays to cython functions from Numba, this because Numba has an internal representation of a NumPy array that is not the python object that cython seems to expect. It should be possible to pass the data from the NumPy array through to cython though. I am also not convinced that it'll be possible to return a NumPy array from cython, again this because of how Numba has an internal representation of a NumPy array that is not a python object. There's also refcounting involved which may get tricky. However, in the above, it might just be a question of working out how to "spell" in ctypes what it is that cython wants/needs, which might be a question for the cython devs. If you can create a ctypes representation of the types to go into the cython function it might be possible to wire Numba's representation through. How to proceed... I'd probably start with a trivial cython function that does something like:
and try and work out how to make that into something via cython that produces a C-like signature, e.g.
then from there work out how to call that with ctypes using NumPy arrays, and finally do that from Numba. An example of the above might be: ctypedef api double cy_float64_t
ctypedef api long long int cy_int64_t
cdef api void roll_mean(cy_float64_t * x_data, cy_int64_t x_len,
cy_float64_t * y_data, cy_int64_t y_len):
with nogil:
x_data[0] = y_data[0]
import ctypes
from numba.extending import get_cython_function_address
from numba import njit
import numpy as np
addr = get_cython_function_address("aggregations", "roll_mean")
functype = ctypes.CFUNCTYPE(
ctypes.c_voidp,
ctypes.POINTER(ctypes.c_double),
ctypes.c_long,
ctypes.POINTER(ctypes.c_double),
ctypes.c_long,
)
roll_mean = functype(addr)
n = 5
x = np.zeros(n)
y = np.ones(n)
@njit
def sink(*args):
return args[0]
@njit
def foo(x, y):
roll_mean(x.ctypes, len(x), y.ctypes, len(y))
sink(x, y)
print(x)
foo(x, y)
print(x) which gives:
Another option here is to use https://numba.readthedocs.io/en/stable/user/withobjmode.html#numba.objmode which reacquires the GIL and makes the call to whatever is in the block using the cpython APIs. |
Awesome thanks for all this investigation @stuartarchibald! I really appreciate all this detail with examples Just for some background, in pandas we have some cython aggregations (mean for example) that currently only applies to While it appears possible to do this from your demo, it appears we would have to somewhat refactor our current cython aggregations for numba. This is a possible path forward, but we are also exploring just rewriting our cython aggregations in numba as well. |
@mroeschke No problem. I'll raise this at the weekly public Numba meeting to see if anyone else has any ideas for transporting data to cython. The issue with Numba calling the cython bindings is that they appear as function calls to Numba and this limits the optimisation possible and also obviously requires a function call to be made. If you were to rewrite the aggregations as Numba I guess this comes down to trade-offs over effort required now, long term maintenance and performance. |
@mroeschke @jreback I'm not really sure where to put this, so it's going here. Feel free to close instantly/move the issue elsewhere!
RE this comment pandas-dev#28987 (comment) . I'm not entirely sure of the context as the pandas code base is not familiar, however I thought I'd mention that Numba can actually call Cython exported functions (docs). Numba does this a lot already to get the BLAS bindings from SciPy, and https://github.com/numba/numba-scipy also does this to bind to the
scipy.special
functions. I'm wondering if this might help with the code duplication hinted at in the above?One thing to note though would be that Numba can call the cython functions, but the contents of the functions cannot take part in the optimisation passes (like inlining) that are performed by LLVM. So, depending on how hot the functions are maintaining two implementations may yield better performance.
The text was updated successfully, but these errors were encountered: