FYI, I polished the benchmarks #20

jblespiau · 2022-04-07T07:57:57Z

jblespiau
Apr 7, 2022

Hi,

I think good benchmarks are key to any projects, and were seriously lacking for pybind11.

I improved a little and moved the benchmarks you have done in https://github.com/jblespiau/pybind11_benchmarks so one can just clone and run them (the benchmarks you had were assuming OSX, and you had not explained how you had install things around).

I also have other benchmarks on my computer using the raw C++ Python API, and I plan to add them, so we can compare pybind11 & nanobind wrappers with these versions.

It's mostly for FYI, in case you would like to contribute to it whenever you improve your benchmarks

wjakob · 2022-04-07T15:27:44Z

wjakob
Apr 7, 2022
Maintainer

Neat, thanks for this! Many performance improvements made in this project should be portable to pybind11, but be warned that doing so will be a lot of work ;-).

0 replies

wjakob · 2022-04-07T15:40:10Z

wjakob
Apr 7, 2022
Maintainer

By the way, what do you mean by "caching the type look-up into a static variable is a very large optimization which should be done first"?

0 replies

jblespiau · 2022-04-07T19:08:24Z

jblespiau
Apr 7, 2022
Author

Currently, to cast from a C++ type to a Python type, the look-up is dynamic and done each time. There is also another look-up to know whether we have already bound a given object, to not create a new one (this should likely be opt-in but it's another question).

When we convert an std::vector into a Python list, the unordered_map mapping from the std::type_info associated to T, to the Python class to use will be accessed vector_size times.

I had a change in a local client at some point, here is what I could find from some demo code, (it's a demo, it does not deal with all cases, I haven't pushed it to pybind11 internals):

template <typename T>
PyTypeObject* PyTypeFor() {
  static PyTypeObject* expected_type = []() {
    const std::type_info& ti = typeid(T);
    detail::type_info* internal_ti =
        detail::get_global_type_info(std::type_index(ti));
    return internal_ti->type;
  }();
  return expected_type;
}

template <typename T>
bool fast_type_check(handle h) {
  static PyTypeObject* expected_type = []() {
    const std::type_info& ti = typeid(T);
    detail::type_info* internal_ti =
        detail::get_global_type_info(std::type_index(ti));
    return internal_ti->type;
  }();

  PyTypeObject* srctype = Py_TYPE(h.ptr());
  return expected_type == srctype;
}

...
template <typename T>
object MakeFastObject(std::unique_ptr<T> arg) {
  PyTypeObject* py_type = PyTypeFor<T>();

  // PyObject* new_instance = detail::make_new_instance(type);

  // See type_caster_generic for similar code.
  object inst = reinterpret_steal<object>(detail::make_new_instance(py_type));
  auto instance = reinterpret_cast<detail::instance*>(inst.ptr());

  void*& valueptr = detail::values_and_holders(instance).begin()->value_ptr();

  auto& holder = detail::values_and_holders(instance)
                     .begin()
                     ->holder<std::unique_ptr<T>>();

  auto old_ptr = arg.release();

  holder.reset(old_ptr);

  const void* src = &arg;
  // valueptr = new std::unique_ptr<T>(std::move(*const_cast<std::unique_ptr<T>
  // *>(reinterpret_cast<const std::unique_ptr<T> *>(arg))));
  valueptr = old_ptr;
  // valueptr = holder.
  instance->owned = true;

  //   detail::type_caster_base<std::unique_ptr<JaxCompiledFunction>>::make_move_constructor(
  return inst;
}

This was several times faster. In my mind, there is no need for dynamic look-up; types are registered once and for-all at start-up and then unchanged.

I don't know if it's precise enough and whether you see what I am trying to say. Maybe I missed some reason why we cannot store the result of the look-up in a static variable.

1 reply

wjakob Apr 8, 2022
Maintainer

Aha -- nanobind does something like this already (https://github.com/wjakob/nanobind/blob/master/src/nb_type.cpp#L439). Since the C++ RTTI data is co-located with the python type object, you can quickly check type equivalence without having to call PyType_IsSubtype.

Honestly, if you are looking for that level of performance, you will have an easier time starting with nanobind and adding the features you need, rather than changing pybind11 (which has all sorts of extra complications like holders and multiple inheritance)

jblespiau · 2022-04-08T17:54:54Z

jblespiau
Apr 8, 2022
Author

In practice, what we did was to use the C API, because the performance of pybind11 was too low:

https://github.com/tensorflow/tensorflow/blob/e70e3196b7f130b6c5bd9e73a8411a41869b5770/tensorflow/compiler/xla/python/py_buffer.cc#L458-L489 and we have IsInstance and co here

As it's within large systems, Tensorflow, JAX, and plenty of others, which are inter-operable, we cannot really change easily how wrappers are done. So even though changing pybind11 is harder, it may be a more realistic approach than breaking interfaces that people are using.

So I sympatize with the idea, but it's unclear to me what's the best approach. I am trying to get closer to pybind11 governance and introduce some changes there. I think that we can get very good performance (I don't remember the exact numbers, but with the pybind11 object data layout, I had very close performance, for casting back and forth).

It's hard to know when to start from scratch versus improving things in place, but I am a believer that we (all the computer science community) start from scratch probably too much compared to what's optimal.

Thanks for the pointers and the discussion! I hope to get more time soon to keep improving this benchmarks.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FYI, I polished the benchmarks #20

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

FYI, I polished the benchmarks #20

jblespiau Apr 7, 2022

Replies: 4 comments · 1 reply

wjakob Apr 7, 2022 Maintainer

wjakob Apr 7, 2022 Maintainer

jblespiau Apr 7, 2022 Author

wjakob Apr 8, 2022 Maintainer

jblespiau Apr 8, 2022 Author

jblespiau
Apr 7, 2022

Replies: 4 comments 1 reply

wjakob
Apr 7, 2022
Maintainer

wjakob
Apr 7, 2022
Maintainer

jblespiau
Apr 7, 2022
Author

wjakob Apr 8, 2022
Maintainer

jblespiau
Apr 8, 2022
Author