Refactored nanobind so that it works with Py_LIMITED_API

There is an ongoing effort to refactor CPython internals to improve its performance. This has serious consequences for tools like nanobind, which rely on various CPython implementation details that are now subject to change. This commit changes nanobind so that it can (optionally) work via the Py_LIMITED_API, which means that it treats all CPython data structures as fully opaque and only accesses them through an official API/ABI with long-term stability. This requires a change that is pending for inclusion into Python 3.11 (issue 93012).
wjakob · May 26, 2022 · 0b3548a · 0b3548a
1 parent 865a8cf
commit 0b3548a
Show file tree

Hide file tree

Showing 32 changed files with 1,753 additions and 1,021 deletions.
diff --git a/README.md b/README.md
@@ -142,16 +142,21 @@ long-standing performance issues in _pybind11_:
   pointer chasing compared to _pybind11_). The per-instance overhead for
   wrapping a C++ type into a Python object shrinks by 2.3x. (_pybind11_: 56
   bytes, _nanobind_: 24 bytes.)
+
 - C++ function binding information is now co-located with the Python function
   object (less pointer chasing).
+
 - C++ type binding information is now co-located with the Python type object
   (less pointer chasing, fewer hashtable lookups).
+
 - _nanobind_ internally replaces `std::unordered_map` with a more efficient
   hash table ([tsl::robin_map](https://github.com/Tessil/robin-map), which is
   included as a git submodule).
+
 - function calls from/to Python are realized using [PEP 590 vector
   calls](https://www.python.org/dev/peps/pep-0590), which gives a nice speed
   boost. The main function dispatch loop no longer allocates heap memory.
+
 - _pybind11_ was designed as a header-only library, which is generally a good
   thing because it simplifies the compilation workflow. However, one major
   downside of this is that a large amount of redundant code has to be compiled
@@ -160,15 +165,18 @@ long-standing performance issues in _pybind11_:
   support library (`libnanobind`) and links it against the binding code to
   avoid redundant compilation. When using the CMake `nanobind_add_module()`
   function, this all happens transparently.
+
 - `#include <pybind11/pybind11.h>` pulls in a large portion of the STL (about
   2.1 MiB of headers with Clang and libc++). _nanobind_ minimizes STL usage to
   avoid this problem. Type casters even for for basic types like `std::string`
   require an explicit opt-in by including an extra header file (e.g. `#include
   <nanobind/stl/string.h>`).
+
 - _pybind11_ is dependent on *link time optimization* (LTO) to produce
   reasonably-sized bindings, which makes linking a build time bottleneck. With
   _nanobind_'s split into a precompiled core library and minimal
   metatemplating, LTO is no longer important.
+
 - _nanobind_ maintains efficient internal data structures for lifetime
   management (needed for `nb::keep_alive`, `nb::rv_policy::reference_internal`,
   the `std::shared_ptr` interface, etc.). With these changes, it is no longer
@@ -180,6 +188,18 @@ long-standing performance issues in _pybind11_:
 Besides performance improvements, _nanobind_ includes a quality-of-live
 improvements for developers:
 
+- _nanobind_ has [greatly
+  improved](https://github.com/wjakob/nanobind/blob/master/docs/tensor.md)
+  support for exchanging CPU/GPU/TPU/.. tensor data structures with modern
+  array programming frameworks.
+
+- _nanobind_ can target Python's [stable ABI
+  interface](https://docs.python.org/3/c-api/stable.html) starting with Python
+  3.12. This means that extension modules will eventually be compatible with
+  any future version of Python without having to compile separate binaries per
+  version. That vision is still far out, however: it will require Python 3.12+
+  to be widely deployed.
+
 - When the python interpreter shuts down, _nanobind_ reports instance, type,
   and function leaks related to bindings, which is useful for tracking down
   reference counting issues.
@@ -195,11 +215,6 @@ improvements for developers:
 - _nanobind_ docstrings have improved out-of-the-box compatibility with tools
   like [Sphinx](https://www.sphinx-doc.org/en/master/).
 
-- _nanobind_ has [greatly
-  improved](https://github.com/wjakob/nanobind/blob/master/docs/tensor.md)
-  support for exchanging tensor data structures with modern array programming
-  frameworks.
-
 ### Dependencies
 
 _nanobind_ depends on recent versions of everything:
@@ -419,24 +434,26 @@ changes are detailed below.
 
 
   - **Supplemental type data**: _nanobind_ can store supplemental data along
-    with registered types. This information is co-located with the Python type
-    object. An example use of this fairly advanced feature are libraries that
-    register large numbers of different types (e.g. flavors of tensors). A
-    single generically implemented function can then query this supplemental
-    information to handle each type slightly differently.
+    with registered types. An example use of this fairly advanced feature are
+    libraries that register large numbers of different types (e.g. flavors of
+    tensors). A single generically implemented function can then query this
+    supplemental information to handle each type slightly differently.
 
     ```cpp
     struct Supplement {
         ... // should be a POD (plain old data) type
     };
 
     // Register a new type Test, and reserve space for sizeof(Supplement)
-    nb::class_<Test> cls(m, "Test", nb::supplement<Supplement>())
+    nb::class_<Test> cls(m, "Test", nb::supplement<Supplement>(), nb::is_final())
 
     /// Mutable reference to 'Supplement' portion in Python type object
     Supplement &supplement = nb::type_supplement<Supplement>(cls);
     ```
 
+    The supplement is not propagated to subclasses created within Python.
+    Such types should therefore be created with `nb::is_final()`.
+
   - **Low-level interface**: _nanobind_ exposes a low-level interface to
     provide fine-grained control over the sequence of steps that instantiates a
     Python object wrapping a C++ instance. Like the above point, this is useful

diff --git a/cmake/nanobind-config.cmake b/cmake/nanobind-config.cmake
@@ -80,15 +80,15 @@ function (nanobuild_build_library TARGET_NAME TARGET_TYPE)
     ${NB_DIR}/include/nanobind/stl/vector.h
     ${NB_DIR}/include/nanobind/stl/list.h
 
-    ${NB_DIR}/src/internals.h
     ${NB_DIR}/src/buffer.h
-    ${NB_DIR}/src/internals.cpp
-    ${NB_DIR}/src/common.cpp
-    ${NB_DIR}/src/tensor.cpp
+    ${NB_DIR}/src/nb_internals.h
+    ${NB_DIR}/src/nb_internals.cpp
     ${NB_DIR}/src/nb_func.cpp
     ${NB_DIR}/src/nb_type.cpp
     ${NB_DIR}/src/nb_enum.cpp
+    ${NB_DIR}/src/common.cpp
     ${NB_DIR}/src/error.cpp
+    ${NB_DIR}/src/tensor.cpp
     ${NB_DIR}/src/trampoline.cpp
     ${NB_DIR}/src/implicit.cpp
   )
@@ -161,8 +161,12 @@ function(nanobind_disable_stack_protector name)
 endfunction()
 
 function(nanobind_extension name)
-  set_target_properties(${name} PROPERTIES
-    PREFIX "" SUFFIX "${NB_SUFFIX}")
+  set_target_properties(${name} PROPERTIES PREFIX "" SUFFIX "${NB_SUFFIX}")
+endfunction()
+
+function(nanobind_extension_abi3 name)
+  get_filename_component(ext "${NB_SUFFIX}" LAST_EXT)
+  set_target_properties(${name} PROPERTIES PREFIX "" SUFFIX ".abi3${ext}")
 endfunction()
 
 function (nanobind_cpp17 name)
@@ -187,23 +191,44 @@ function (nanobind_headers name)
 endfunction()
 
 function(nanobind_add_module name)
-  cmake_parse_arguments(PARSE_ARGV 1 ARG "NOMINSIZE;NOSTRIP;NB_STATIC;NB_SHARED;PROTECT_STACK;LTO" "" "")
+  cmake_parse_arguments(PARSE_ARGV 1 ARG "NOMINSIZE;STABLE_ABI;NOSTRIP;NB_STATIC;NB_SHARED;PROTECT_STACK;LTO" "" "")
 
   Python_add_library(${name} MODULE ${ARG_UNPARSED_ARGUMENTS})
 
   nanobind_cpp17(${name})
-  nanobind_extension(${name})
   nanobind_msvc(${name})
   nanobind_headers(${name})
 
-  if (ARG_NB_STATIC)
-    nanobuild_build_library(nanobind-static STATIC)
-    target_link_libraries(${name} PRIVATE nanobind-static)
+  # Limited API interface only supported in Python >= 3.12
+  if ((Python_VERSION_MAJOR EQUAL 3) AND (Python_VERSION_MINOR LESS 12))
+    set(ARG_STABLE_ABI OFF)
+  endif()
+
+  if (ARG_STABLE_ABI)
+    if (ARG_NB_STATIC)
+      nanobuild_build_library(nanobind-static-abi3 STATIC)
+      set(libname nanobind-static-abi3)
+    else()
+      nanobuild_build_library(nanobind-abi3 SHARED)
+      set(libname nanobind-abi3)
+    endif()
+
+    target_compile_definitions(${libname} PUBLIC -DPy_STABLE_ABI=0x030C0000)
+    nanobind_extension_abi3(${name})
   else()
-    nanobuild_build_library(nanobind SHARED)
-    target_link_libraries(${name} PRIVATE nanobind)
+    if (ARG_NB_STATIC)
+      nanobuild_build_library(nanobind-static STATIC)
+      set(libname nanobind)
+    else()
+      nanobuild_build_library(nanobind SHARED)
+      set(libname nanobind)
+    endif()
+
+    nanobind_extension(${name})
   endif()
 
+  target_link_libraries(${name} PRIVATE ${libname})
+
   if (NOT ARG_PROTECT_STACK)
     nanobind_disable_stack_protector(${name})
   endif()

diff --git a/docs/cmake.md b/docs/cmake.md
@@ -59,6 +59,13 @@ it performs the following steps to produce efficient bindings.
 - It appends the library suffix (e.g., `.cpython-39-darwin.so`) based on
   information provided by CMake's `FindPython` module.
 
+- When requested via the optional `STABLE_ABI` parameter, and when your
+  version of Python is sufficiently recent (3.12 +), the implementation
+  will build a [stable ABI](https://docs.python.org/3/c-api/stable.html)
+  extension module with a different suffix (e.g., `.abi3.so`). This comes at a
+  performance cost since _nanobind_ can no longer access the internals of
+  various data structures directly.
+
 - It statically or dynamically links against `libnanobind` depending on the
   value of the `NB_SHARED` parameter of the CMake project. Note that
   `NB_SHARED` is not an input of the `nanobind_add_module()` function. Rather,

diff --git a/include/nanobind/nb_accessor.h b/include/nanobind/nb_accessor.h
@@ -120,13 +120,13 @@ struct num_item_list {
     using key_type = Py_ssize_t;
 
     NB_INLINE static void get(PyObject *obj, Py_ssize_t index, PyObject **cache) {
-        *cache = PyList_GET_ITEM(obj, index);
+        *cache = NB_LIST_GET_ITEM(obj, index);
     }
 
     NB_INLINE static void set(PyObject *obj, Py_ssize_t index, PyObject *v) {
-        PyObject *old = PyList_GET_ITEM(obj, index);
+        PyObject *old = NB_LIST_GET_ITEM(obj, index);
         Py_INCREF(v);
-        PyList_SET_ITEM(obj, index, v);
+        NB_LIST_SET_ITEM(obj, index, v);
         Py_DECREF(old);
     }
 };
@@ -136,7 +136,7 @@ struct num_item_tuple {
     using key_type = Py_ssize_t;
 
     NB_INLINE static void get(PyObject *obj, Py_ssize_t index, PyObject **cache) {
-        *cache = PyTuple_GET_ITEM(obj, index);
+        *cache = NB_TUPLE_GET_ITEM(obj, index);
     }
 
     template <typename...Ts> static void set(Ts...) {

diff --git a/include/nanobind/nb_attr.h b/include/nanobind/nb_attr.h
@@ -51,14 +51,16 @@ struct is_method {};
 struct is_implicit {};
 struct is_operator {};
 struct is_arithmetic {};
+struct is_final { };
 struct is_enum {
     bool is_signed;
 };
+
 template <size_t /* Nurse */, size_t /* Patient */> struct keep_alive {};
 template <typename T> struct supplement {};
 struct type_callback {
-    type_callback(void (*value)(PyTypeObject *) noexcept) : value(value) {}
-    void (*value)(PyTypeObject *) noexcept;
+    type_callback(void (*value)(PyType_Slot **) noexcept) : value(value) {}
+    void (*value)(PyType_Slot **) noexcept;
 };
 struct raw_doc {
     const char *value;
@@ -94,7 +96,7 @@ enum class func_flags : uint32_t {
     is_implicit = (1 << 12),
     /// Is this function an arithmetic operator?
     is_operator = (1 << 13),
-    /// When the function is GCed, do we need to call func_data::free?
+    /// When the function is GCed, do we need to call func_data_prelim::free?
     has_free = (1 << 14),
     /// Should the func_new() call return a new reference?
     return_ref = (1 << 15),
@@ -110,7 +112,7 @@ struct arg_data {
     bool none;
 };
 
-template <size_t Size> struct func_data {
+template <size_t Size> struct func_data_prelim {
     // A small amount of space to capture data used by the function/closure
     void *capture[3];
 

diff --git a/include/nanobind/nb_call.h b/include/nanobind/nb_call.h
@@ -31,15 +31,14 @@ template <typename T>
 NB_INLINE void call_analyze(size_t &nargs, size_t &nkwargs, const T &value) {
     using D = std::decay_t<T>;
 
-    if constexpr (std::is_same_v<D, arg_v>) {
+    if constexpr (std::is_same_v<D, arg_v>)
         nkwargs++;
-    } else if constexpr (std::is_same_v<D, args_proxy>) {
+    else if constexpr (std::is_same_v<D, args_proxy>)
         nargs += len(value);
-    } else if constexpr (std::is_same_v<D, kwargs_proxy>) {
+    else if constexpr (std::is_same_v<D, kwargs_proxy>)
         nkwargs += len(value);
-    } else {
+    else
         nargs += 1;
-    }
 
     (void) nargs; (void) nkwargs; (void) value;
 }
@@ -53,7 +52,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,
 
     if constexpr (std::is_same_v<D, arg_v>) {
         args[kwargs_offset + nkwargs] = value.value.release().ptr();
-        PyTuple_SET_ITEM(kwnames, nkwargs++,
+        NB_TUPLE_SET_ITEM(kwnames, nkwargs++,
                          PyUnicode_InternFromString(value.name));
     } else if constexpr (std::is_same_v<D, args_proxy>) {
         for (size_t i = 0, l = len(value); i < l; ++i)
@@ -65,7 +64,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,
         while (PyDict_Next(value.ptr(), &pos, &key, &entry)) {
             Py_INCREF(key); Py_INCREF(entry);
             args[kwargs_offset + nkwargs] = entry;
-            PyTuple_SET_ITEM(kwnames, nkwargs++, key);
+            NB_TUPLE_SET_ITEM(kwnames, nkwargs++, key);
         }
     } else {
         args[nargs++] =
@@ -88,7 +87,7 @@ NB_INLINE void call_init(PyObject **args, PyObject *kwnames, size_t &nargs,
         args[0] = nullptr;                                                     \
         args_p = args + 1;                                                     \
     }                                                                          \
-    nargs |= PY_VECTORCALL_ARGUMENTS_OFFSET;                                   \
+    nargs |= NB_VECTORCALL_ARGUMENTS_OFFSET;                                   \
     return steal(obj_vectorcall(base, args_p, nargs, kwnames, method_call))
 
 template <typename Derived>

diff --git a/include/nanobind/nb_cast.h b/include/nanobind/nb_cast.h
@@ -318,7 +318,7 @@ tuple make_tuple(Args &&...args) {
     size_t nargs = 0;
     PyObject *o = result.ptr();
 
-    (PyTuple_SET_ITEM(o, nargs++,
+    (NB_TUPLE_SET_ITEM(o, nargs++,
                       detail::make_caster<Args>::from_cpp(
                           (detail::forward_t<Args>) args,
                           detail::infer_policy<Args>(policy), nullptr).ptr()),

diff --git a/include/nanobind/nb_class.h b/include/nanobind/nb_class.h
@@ -57,14 +57,19 @@ enum class type_flags : uint32_t {
     is_arithmetic            = (1 << 15),
 
     /// This type is an arithmetic enumeration
-    has_type_callback        = (1 << 16)
+    has_type_callback        = (1 << 16),
+
+    /// This type does not permit subclassing from Python
+    is_final                 = (1 << 17),
+
+    /// This type does not permit subclassing from Python
+    has_supplement           = (1 << 18)
 };
 
 struct type_data {
-    uint32_t size : 24;
+    uint32_t size;
     uint32_t align : 8;
-    uint32_t flags : 20;
-    uint32_t supplement : 12;
+    uint32_t flags : 24;
     const char *name;
     const char *doc;
     PyObject *scope;
@@ -77,10 +82,11 @@ struct type_data {
     void (*move)(void *, void *) noexcept;
     const std::type_info **implicit;
     bool (**implicit_py)(PyTypeObject *, PyObject *, cleanup_list *) noexcept;
-    void (*type_callback)(PyTypeObject *) noexcept;
+    void (*type_callback)(PyType_Slot **) noexcept;
+    void *supplement;
 };
 
-static_assert(sizeof(type_data) == 8 + sizeof(void *) * 13);
+static_assert(sizeof(type_data) == 8 + sizeof(void *) * 14);
 
 NB_INLINE void type_extra_apply(type_data &t, const handle &h) {
     t.flags |= (uint32_t) type_flags::has_base_py;
@@ -104,14 +110,20 @@ NB_INLINE void type_extra_apply(type_data &t, is_enum e) {
         t.flags |= (uint32_t) type_flags::is_unsigned_enum;
 }
 
+NB_INLINE void type_extra_apply(type_data &t, is_final) {
+    t.flags |= (uint32_t) type_flags::is_final;
+}
+
 NB_INLINE void type_extra_apply(type_data &t, is_arithmetic) {
     t.flags |= (uint32_t) type_flags::is_arithmetic;
 }
 
 template <typename T>
 NB_INLINE void type_extra_apply(type_data &t, supplement<T>) {
-    static_assert(sizeof(T) <= 0xFF, "Supplement is too big!");
-    t.supplement += sizeof(T);
+    static_assert(std::is_trivially_default_constructible_v<T>,
+                  "The supplement type must be a POD (plain old data) type");
+    t.flags |= (uint32_t) type_flags::has_supplement;
+    t.supplement = (void *) malloc(sizeof(T));
 }
 
 template <typename... Args> struct init {