Skip to content

Commit

Permalink
Refactored nanobind so that it works with Py_LIMITED_API
Browse files Browse the repository at this point in the history
There is an ongoing effort to refactor CPython internals to improve its
performance. This has serious consequences for tools like nanobind,
which rely on various CPython implementation details that are now
subject to change.

This commit changes nanobind so that it can (optionally) work via the
Py_LIMITED_API, which means that it treats all CPython data structures
as fully opaque and only accesses them through an official API/ABI with
long-term stability.

This requires a change that is pending for inclusion into Python 3.11
(issue 93012).
  • Loading branch information
wjakob committed May 26, 2022
1 parent 865a8cf commit 0b3548a
Show file tree
Hide file tree
Showing 32 changed files with 1,753 additions and 1,021 deletions.
39 changes: 28 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -142,16 +142,21 @@ long-standing performance issues in _pybind11_:
pointer chasing compared to _pybind11_). The per-instance overhead for
wrapping a C++ type into a Python object shrinks by 2.3x. (_pybind11_: 56
bytes, _nanobind_: 24 bytes.)

- C++ function binding information is now co-located with the Python function
object (less pointer chasing).

- C++ type binding information is now co-located with the Python type object
(less pointer chasing, fewer hashtable lookups).

- _nanobind_ internally replaces `std::unordered_map` with a more efficient
hash table ([tsl::robin_map](https://github.com/Tessil/robin-map), which is
included as a git submodule).

- function calls from/to Python are realized using [PEP 590 vector
calls](https://www.python.org/dev/peps/pep-0590), which gives a nice speed
boost. The main function dispatch loop no longer allocates heap memory.

- _pybind11_ was designed as a header-only library, which is generally a good
thing because it simplifies the compilation workflow. However, one major
downside of this is that a large amount of redundant code has to be compiled
Expand All @@ -160,15 +165,18 @@ long-standing performance issues in _pybind11_:
support library (`libnanobind`) and links it against the binding code to
avoid redundant compilation. When using the CMake `nanobind_add_module()`
function, this all happens transparently.

- `#include <pybind11/pybind11.h>` pulls in a large portion of the STL (about
2.1 MiB of headers with Clang and libc++). _nanobind_ minimizes STL usage to
avoid this problem. Type casters even for for basic types like `std::string`
require an explicit opt-in by including an extra header file (e.g. `#include
<nanobind/stl/string.h>`).

- _pybind11_ is dependent on *link time optimization* (LTO) to produce
reasonably-sized bindings, which makes linking a build time bottleneck. With
_nanobind_'s split into a precompiled core library and minimal
metatemplating, LTO is no longer important.

- _nanobind_ maintains efficient internal data structures for lifetime
management (needed for `nb::keep_alive`, `nb::rv_policy::reference_internal`,
the `std::shared_ptr` interface, etc.). With these changes, it is no longer
Expand All @@ -180,6 +188,18 @@ long-standing performance issues in _pybind11_:
Besides performance improvements, _nanobind_ includes a quality-of-live
improvements for developers:

- _nanobind_ has [greatly
improved](https://github.com/wjakob/nanobind/blob/master/docs/tensor.md)
support for exchanging CPU/GPU/TPU/.. tensor data structures with modern
array programming frameworks.

- _nanobind_ can target Python's [stable ABI
interface](https://docs.python.org/3/c-api/stable.html) starting with Python
3.12. This means that extension modules will eventually be compatible with
any future version of Python without having to compile separate binaries per
version. That vision is still far out, however: it will require Python 3.12+
to be widely deployed.

- When the python interpreter shuts down, _nanobind_ reports instance, type,
and function leaks related to bindings, which is useful for tracking down
reference counting issues.
Expand All @@ -195,11 +215,6 @@ improvements for developers:
- _nanobind_ docstrings have improved out-of-the-box compatibility with tools
like [Sphinx](https://www.sphinx-doc.org/en/master/).

- _nanobind_ has [greatly
improved](https://github.com/wjakob/nanobind/blob/master/docs/tensor.md)
support for exchanging tensor data structures with modern array programming
frameworks.

### Dependencies

_nanobind_ depends on recent versions of everything:
Expand Down Expand Up @@ -419,24 +434,26 @@ changes are detailed below.


- **Supplemental type data**: _nanobind_ can store supplemental data along
with registered types. This information is co-located with the Python type
object. An example use of this fairly advanced feature are libraries that
register large numbers of different types (e.g. flavors of tensors). A
single generically implemented function can then query this supplemental
information to handle each type slightly differently.
with registered types. An example use of this fairly advanced feature are
libraries that register large numbers of different types (e.g. flavors of
tensors). A single generically implemented function can then query this
supplemental information to handle each type slightly differently.

```cpp
struct Supplement {
... // should be a POD (plain old data) type
};

// Register a new type Test, and reserve space for sizeof(Supplement)
nb::class_<Test> cls(m, "Test", nb::supplement<Supplement>())
nb::class_<Test> cls(m, "Test", nb::supplement<Supplement>(), nb::is_final())

/// Mutable reference to 'Supplement' portion in Python type object
Supplement &supplement = nb::type_supplement<Supplement>(cls);
```

The supplement is not propagated to subclasses created within Python.
Such types should therefore be created with `nb::is_final()`.

- **Low-level interface**: _nanobind_ exposes a low-level interface to
provide fine-grained control over the sequence of steps that instantiates a
Python object wrapping a C++ instance. Like the above point, this is useful
Expand Down
51 changes: 38 additions & 13 deletions cmake/nanobind-config.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -80,15 +80,15 @@ function (nanobuild_build_library TARGET_NAME TARGET_TYPE)
${NB_DIR}/include/nanobind/stl/vector.h
${NB_DIR}/include/nanobind/stl/list.h

${NB_DIR}/src/internals.h
${NB_DIR}/src/buffer.h
${NB_DIR}/src/internals.cpp
${NB_DIR}/src/common.cpp
${NB_DIR}/src/tensor.cpp
${NB_DIR}/src/nb_internals.h
${NB_DIR}/src/nb_internals.cpp
${NB_DIR}/src/nb_func.cpp
${NB_DIR}/src/nb_type.cpp
${NB_DIR}/src/nb_enum.cpp
${NB_DIR}/src/common.cpp
${NB_DIR}/src/error.cpp
${NB_DIR}/src/tensor.cpp
${NB_DIR}/src/trampoline.cpp
${NB_DIR}/src/implicit.cpp
)
Expand Down Expand Up @@ -161,8 +161,12 @@ function(nanobind_disable_stack_protector name)
endfunction()

function(nanobind_extension name)
set_target_properties(${name} PROPERTIES
PREFIX "" SUFFIX "${NB_SUFFIX}")
set_target_properties(${name} PROPERTIES PREFIX "" SUFFIX "${NB_SUFFIX}")
endfunction()

function(nanobind_extension_abi3 name)
get_filename_component(ext "${NB_SUFFIX}" LAST_EXT)
set_target_properties(${name} PROPERTIES PREFIX "" SUFFIX ".abi3${ext}")
endfunction()

function (nanobind_cpp17 name)
Expand All @@ -187,23 +191,44 @@ function (nanobind_headers name)
endfunction()

function(nanobind_add_module name)
cmake_parse_arguments(PARSE_ARGV 1 ARG "NOMINSIZE;NOSTRIP;NB_STATIC;NB_SHARED;PROTECT_STACK;LTO" "" "")
cmake_parse_arguments(PARSE_ARGV 1 ARG "NOMINSIZE;STABLE_ABI;NOSTRIP;NB_STATIC;NB_SHARED;PROTECT_STACK;LTO" "" "")

Python_add_library(${name} MODULE ${ARG_UNPARSED_ARGUMENTS})

nanobind_cpp17(${name})
nanobind_extension(${name})
nanobind_msvc(${name})
nanobind_headers(${name})

if (ARG_NB_STATIC)
nanobuild_build_library(nanobind-static STATIC)
target_link_libraries(${name} PRIVATE nanobind-static)
# Limited API interface only supported in Python >= 3.12
if ((Python_VERSION_MAJOR EQUAL 3) AND (Python_VERSION_MINOR LESS 12))
set(ARG_STABLE_ABI OFF)
endif()

if (ARG_STABLE_ABI)
if (ARG_NB_STATIC)
nanobuild_build_library(nanobind-static-abi3 STATIC)
set(libname nanobind-static-abi3)
else()
nanobuild_build_library(nanobind-abi3 SHARED)
set(libname nanobind-abi3)
endif()

target_compile_definitions(${libname} PUBLIC -DPy_STABLE_ABI=0x030C0000)
nanobind_extension_abi3(${name})
else()
nanobuild_build_library(nanobind SHARED)
target_link_libraries(${name} PRIVATE nanobind)
if (ARG_NB_STATIC)
nanobuild_build_library(nanobind-static STATIC)
set(libname nanobind)
else()
nanobuild_build_library(nanobind SHARED)
set(libname nanobind)
endif()

nanobind_extension(${name})
endif()

target_link_libraries(${name} PRIVATE ${libname})

if (NOT ARG_PROTECT_STACK)
nanobind_disable_stack_protector(${name})
endif()
Expand Down
7 changes: 7 additions & 0 deletions docs/cmake.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,13 @@ it performs the following steps to produce efficient bindings.
- It appends the library suffix (e.g., `.cpython-39-darwin.so`) based on
information provided by CMake's `FindPython` module.

- When requested via the optional `STABLE_ABI` parameter, and when your
version of Python is sufficiently recent (3.12 +), the implementation
will build a [stable ABI](https://docs.python.org/3/c-api/stable.html)
extension module with a different suffix (e.g., `.abi3.so`). This comes at a
performance cost since _nanobind_ can no longer access the internals of
various data structures directly.

- It statically or dynamically links against `libnanobind` depending on the
value of the `NB_SHARED` parameter of the CMake project. Note that
`NB_SHARED` is not an input of the `nanobind_add_module()` function. Rather,
Expand Down
8 changes: 4 additions & 4 deletions include/nanobind/nb_accessor.h
Original file line number Diff line number Diff line change
Expand Up @@ -120,13 +120,13 @@ struct num_item_list {
using key_type = Py_ssize_t;

NB_INLINE static void get(PyObject *obj, Py_ssize_t index, PyObject **cache) {
*cache = PyList_GET_ITEM(obj, index);
*cache = NB_LIST_GET_ITEM(obj, index);
}

NB_INLINE static void set(PyObject *obj, Py_ssize_t index, PyObject *v) {
PyObject *old = PyList_GET_ITEM(obj, index);
PyObject *old = NB_LIST_GET_ITEM(obj, index);
Py_INCREF(v);
PyList_SET_ITEM(obj, index, v);
NB_LIST_SET_ITEM(obj, index, v);
Py_DECREF(old);
}
};
Expand All @@ -136,7 +136,7 @@ struct num_item_tuple {
using key_type = Py_ssize_t;

NB_INLINE static void get(PyObject *obj, Py_ssize_t index, PyObject **cache) {
*cache = PyTuple_GET_ITEM(obj, index);
*cache = NB_TUPLE_GET_ITEM(obj, index);
}

template <typename...Ts> static void set(Ts...) {
Expand Down
10 changes: 6 additions & 4 deletions include/nanobind/nb_attr.h
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,16 @@ struct is_method {};
struct is_implicit {};
struct is_operator {};
struct is_arithmetic {};
struct is_final { };
struct is_enum {
bool is_signed;
};

template <size_t /* Nurse */, size_t /* Patient */> struct keep_alive {};
template <typename T> struct supplement {};
struct type_callback {
type_callback(void (*value)(PyTypeObject *) noexcept) : value(value) {}
void (*value)(PyTypeObject *) noexcept;
type_callback(void (*value)(PyType_Slot **) noexcept) : value(value) {}
void (*value)(PyType_Slot **) noexcept;
};
struct raw_doc {
const char *value;
Expand Down Expand Up @@ -94,7 +96,7 @@ enum class func_flags : uint32_t {
is_implicit = (1 << 12),
/// Is this function an arithmetic operator?
is_operator = (1 << 13),
/// When the function is GCed, do we need to call func_data::free?
/// When the function is GCed, do we need to call func_data_prelim::free?
has_free = (1 << 14),
/// Should the func_new() call return a new reference?
return_ref = (1 << 15),
Expand All @@ -110,7 +112,7 @@ struct arg_data {
bool none;
};

template <size_t Size> struct func_data {
template <size_t Size> struct func_data_prelim {
// A small amount of space to capture data used by the function/closure
void *capture[3];

Expand Down
15 changes: 7 additions & 8 deletions include/nanobind/nb_call.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,14 @@ template <typename T>
NB_INLINE void call_analyze(size_t &nargs, size_t &nkwargs, const T &value) {
using D = std::decay_t<T>;

if constexpr (std::is_same_v<D, arg_v>) {
if constexpr (std::is_same_v<D, arg_v>)
nkwargs++;
} else if constexpr (std::is_same_v<D, args_proxy>) {
else if constexpr (std::is_same_v<D, args_proxy>)
nargs += len(value);
} else if constexpr (std::is_same_v<D, kwargs_proxy>) {
else if constexpr (std::is_same_v<D, kwargs_proxy>)
nkwargs += len(value);
} else {
else
nargs += 1;
}

(void) nargs; (void) nkwargs; (void) value;
}
Expand All @@ -53,7 +52,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,

if constexpr (std::is_same_v<D, arg_v>) {
args[kwargs_offset + nkwargs] = value.value.release().ptr();
PyTuple_SET_ITEM(kwnames, nkwargs++,
NB_TUPLE_SET_ITEM(kwnames, nkwargs++,
PyUnicode_InternFromString(value.name));
} else if constexpr (std::is_same_v<D, args_proxy>) {
for (size_t i = 0, l = len(value); i < l; ++i)
Expand All @@ -65,7 +64,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,
while (PyDict_Next(value.ptr(), &pos, &key, &entry)) {
Py_INCREF(key); Py_INCREF(entry);
args[kwargs_offset + nkwargs] = entry;
PyTuple_SET_ITEM(kwnames, nkwargs++, key);
NB_TUPLE_SET_ITEM(kwnames, nkwargs++, key);
}
} else {
args[nargs++] =
Expand All @@ -88,7 +87,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,
args[0] = nullptr; \
args_p = args + 1; \
} \
nargs |= PY_VECTORCALL_ARGUMENTS_OFFSET; \
nargs |= NB_VECTORCALL_ARGUMENTS_OFFSET; \
return steal(obj_vectorcall(base, args_p, nargs, kwnames, method_call))

template <typename Derived>
Expand Down
2 changes: 1 addition & 1 deletion include/nanobind/nb_cast.h
Original file line number Diff line number Diff line change
Expand Up @@ -318,7 +318,7 @@ tuple make_tuple(Args &&...args) {
size_t nargs = 0;
PyObject *o = result.ptr();

(PyTuple_SET_ITEM(o, nargs++,
(NB_TUPLE_SET_ITEM(o, nargs++,
detail::make_caster<Args>::from_cpp(
(detail::forward_t<Args>) args,
detail::infer_policy<Args>(policy), nullptr).ptr()),
Expand Down
28 changes: 20 additions & 8 deletions include/nanobind/nb_class.h
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,19 @@ enum class type_flags : uint32_t {
is_arithmetic = (1 << 15),

/// This type is an arithmetic enumeration
has_type_callback = (1 << 16)
has_type_callback = (1 << 16),

/// This type does not permit subclassing from Python
is_final = (1 << 17),

/// This type does not permit subclassing from Python
has_supplement = (1 << 18)
};

struct type_data {
uint32_t size : 24;
uint32_t size;
uint32_t align : 8;
uint32_t flags : 20;
uint32_t supplement : 12;
uint32_t flags : 24;
const char *name;
const char *doc;
PyObject *scope;
Expand All @@ -77,10 +82,11 @@ struct type_data {
void (*move)(void *, void *) noexcept;
const std::type_info **implicit;
bool (**implicit_py)(PyTypeObject *, PyObject *, cleanup_list *) noexcept;
void (*type_callback)(PyTypeObject *) noexcept;
void (*type_callback)(PyType_Slot **) noexcept;
void *supplement;
};

static_assert(sizeof(type_data) == 8 + sizeof(void *) * 13);
static_assert(sizeof(type_data) == 8 + sizeof(void *) * 14);

NB_INLINE void type_extra_apply(type_data &t, const handle &h) {
t.flags |= (uint32_t) type_flags::has_base_py;
Expand All @@ -104,14 +110,20 @@ NB_INLINE void type_extra_apply(type_data &t, is_enum e) {
t.flags |= (uint32_t) type_flags::is_unsigned_enum;
}

NB_INLINE void type_extra_apply(type_data &t, is_final) {
t.flags |= (uint32_t) type_flags::is_final;
}

NB_INLINE void type_extra_apply(type_data &t, is_arithmetic) {
t.flags |= (uint32_t) type_flags::is_arithmetic;
}

template <typename T>
NB_INLINE void type_extra_apply(type_data &t, supplement<T>) {
static_assert(sizeof(T) <= 0xFF, "Supplement is too big!");
t.supplement += sizeof(T);
static_assert(std::is_trivially_default_constructible_v<T>,
"The supplement type must be a POD (plain old data) type");
t.flags |= (uint32_t) type_flags::has_supplement;
t.supplement = (void *) malloc(sizeof(T));
}

template <typename... Args> struct init {
Expand Down

0 comments on commit 0b3548a

Please sign in to comment.