Switch branches/tags
Nothing to show
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
740 lines (594 sloc) 23.7 KB

C++11 for Python

Date: draft 2018/5/7, 2018/5/20

C++11 is a giant stride in the C++ history. After 8 years of work, lots of new features are added to the language. Most if not all have been longed for years. They give the language a new life.

But even if C++11 improves a lot, it's still C++. In many cases the complex language is an overkill. That's why wrappers are commonplace: keep the cycle-sensitive core in C++, and the rest in a higher-level language, like Python. The new standard introduces many new features, but I found the four things are particularly useful in a C++-Python hybrid environment: smart pointers, move semantics, lambda, and variadic template.

Smart Pointers

Smart pointers are the key to make C++ friendly to Python. Specifically, :cpp:class:`std::shared_ptr` provides a compatibility layer to the resource management model in Python. In Python we use the reference count in :c:type:`PyObject`, and C++ shared pointers work in the same way, but provides much more features. Therefore, although there are other smart pointers, what we care is actually only shared pointers. And we don't need to code up the interfacing ourselves. Boost.python and pybind11 do that for us.

Automatic destruction

A shared pointer, except its construction and destruction, works similar to a raw (standard) pointer. See an example:

Let's start with the function :cpp:func:`use_raw_pointer`, showing how we use a raw pointer. First, allocate and initiate the object:

Series * series_ptr = new Series(10, 2);

Second, do what we want with the object:

series_ptr->sum(); // call member function
// OUT: Series::sum() = 65

When finishing using the object, we need to free the resource. A raw pointer needs manual a treatment:

// remember to delete the object or we leak memory
std::cout << "before explicit deletion, Series::count = "
          << Series::count << std::endl;
// OUT: before explicit deletion, Series::count = 1
delete series_ptr;
std::cout << "after the resource is manually freed, Series::count = "
          << Series::count << std::endl;
// OUT: after the resource is manually freed, Series::count = 0

If we don't delete it, after leaving the function we will neven access it again. Only OS can reclaim it when the program finishes.

A shared pointer can do this for us. In the function :cpp:func:`use_shared_pointer`, we don't manually delete the object:

std::shared_ptr<Series> series_sptr(new Series(10, 3));
series_sptr->sum(); // call member function
// OUT: Series::sum() = 75
// note shared_ptr handles deletion for series_sptr

:cpp:class:`std::shared_ptr` deletes the object when it goes out of the scope. Outside :cpp:func:`use_shared_pointer`, we see the object is deleted:

std::cout << "no memory leak: Series::count = "
          << Series::count << std::endl;
// OUT: no memory leak: Series::count = 0
return 0;

Resource ownership

A shared pointer is capable of automatic deletion of unused objects because it tracks shared ownership. The last owner of the pointer is responsible for freeing the object. Reference counting is a common technique to implement it.

The use of ownership wasn't obvious in the previous example, since the shared pointer was used locally. Let's see another example that returns a shared pointer:

In the function :cpp:func:`make_shared_pointer`, we create a shared pointer and return it:

return std::shared_ptr<Series>(new Series(size, lead));

In :cpp:func:`use_shared_pointer`, we take and use it:

void use_shared_pointer(const std::shared_ptr<Series> & series_sptr) {
  series_sptr->sum(); // call member function
  // OUT: Series::sum() = 65

But this time, the object isn't destructed at the end of the function. Outside the function we still see the object alive, because in :cpp:func:`main` we still own the shared pointer:

// now, shared_ptr
// the object is still alive
std::cout << "Series::count = " << Series::count << std::endl;
// OUT: Series::count = 1

The object gets deleted when we say it's not used anymore, by setting the shared pointer to null:

// reset the pointer
series_sptr = nullptr;
std::cout << "no memory leak: Series::count = "
          << Series::count << std::endl;
// OUT: no memory leak: Series::count = 0

This starts to show the power of a shared pointer. It frees programmers from the tedious book-keeping for pointers. A shared pointer clearly defines when an object should be destructed, and does it automatically. The interface is a drop-in replacement of a raw pointer. Thus, when resources need to be shared, we usually think of a shared pointer.

But keep in mind that the convenience comes with costs, although we aren't discussing it here.

Enable from this

Pointers are used both from outside and inside of a class. But when we want to use a shared pointer form inside the object the pointer points to, can we just create a new :cpp:class:`std::shared_ptr`?

class Series {
  std::shared_ptr<Series> get_this_bad() {
    return std::shared_ptr<Series>(this);

No! When you create the bad shared pointer, it looks fine. But after it is destructed, you will get double free:

std::shared_ptr<Series> sp1(new Series(10, 2));
assert(sp1.use_count() == 1)
auto sp2 = sp1->get_this_bad();
assert(sp2.use_count() == 1) // this isn't 2 and is wrong
sp2 = nullptr;
assert(sp1->count == 0) // uhoh, Series object is destructed
sp1 = nullptr; // double free!  This gets you segfault if you are lucky

We need :cpp:class:`std::enable_shared_from_this` and the helper function :cpp:func:`shared_from_this` it provides. To use a shared pointer from inside the object it points to, the class needs to be derived from :cpp:class:`std::enable_shared_from_this`. Note it's a class template and you should provide the derived class as the template argument:

By using :cpp:func:`shared_from_this`, we get a correct reference count from inside the class.

Ensure to Share

The risk of double free doesn't only appear when one creates a shared pointer from within the object. It's easy to make a similar mistake when one first uses a shared pointer:

Series * p1 = new Series(3, 7);
std::shared_ptr<Series> sp1(p1);
// wrong! Double free when both pointers are destructed
std::shared_ptr<Series> sp2(p1);

To prevent this mistake, we can hide the constructor of :cpp:class:`Series`, so that no one can get a raw pointer from a newly constructed object:

There are two key points. First, we make the constructor private:

// private constructor
Series(size_t size, int lead) : m_data(size) {
  for (size_t it=0; it<size; it++) { m_data[it] = lead+it; }

Second, we provide a static factory method to construct the object and return the shared pointer managing it:

// factory method to construct the object
// and put it in the shared pointer
static std::shared_ptr<Series> make(size_t size, int lead) {
  return std::shared_ptr<Series>(new Series(size, lead));

Because the class doesn't allow to be constructed from outside, the factory method is the only way to create a new instance, and then all instances must be managed by a shared pointer.

Hold on, didn't we miss something? Copy constructor!

Series o2(*sp1); // uhoh, we forgot copy construction!
// OUT: Series::sum() = 65

Let's say we don't want the object to be copyable. For a resource object holding a lot of memory, it's not uncommon. Instead of allowing the object to be copied, it is foreced to use the idiom of transfer ownership.

// no copy, no move
Series(Series const & ) = delete;
Series(Series       &&) = delete;
Series & operator=(Series const & ) = delete;
Series & operator=(Series       &&) = delete;

That's it. We have a class totally managed by a shared pointer. I probably can add one more comment about performace. The reference count of share pointers requires atomic operation, and it's not free. The cost is especially significant when multiple threads are in use. Put synchronization aside, the reference couter needs to be dynamically allocated. The pointed instance itself needs to be on the heap as well. Then there are two allocations. This is why shared pointers are a performance killer for small objects. But even for large objects, we hope to reduce the allocation overhead.

:cpp:func:`std::make_shared` can help. It only make allocation once for both the class and the reference counter. The use is simple:

// factory method to construct the object
// and put it in the shared pointer
static std::shared_ptr<Series> make(size_t size, int lead) {
  return std::make_shared<Series>(size, lead, ctor_passkey());

What is that :cpp:class:`ctor_passkey`? It's there because :cpp:func:`std::make_shared` cannot work with a private constructor! But we want no one but the class itself to access the constructor. That :cpp:class:`ctor_passkey` is the solution:

  struct ctor_passkey {};
  Series(size_t size, int lead, ctor_passkey const &) : m_data(size) {
    for (size_t it=0; it<size; it++) { m_data[it] = lead+it; }
  static std::shared_ptr<Series> make(size_t size, int lead) {
    return std::make_shared<Series>(size, lead, ctor_passkey());

Since :cpp:class:`ctor_passkey` can only be used inside the class, no one from outside can call the constructor. Our system isn't compromised. (And without additional overhead. The compiler optimizes away the :cpp:class:`ctor_passkey` object since it's not used at all.)

Move Semantics

High-performance number-crunching code needs large arrays as memory buffers. When using large arrays, we don't want to copy them frequently. For example, it's challenging to fit a 50,000 \times 50,000 double-precision dense matrix into memory, not to say copy it.

Before C++11, there are some cases that unnecessary copy isn't avoidable:

The :cpp:func:`extend` function that we want to use results into two :cpp:class:`Storage` instances, although only the first is necessary. It copies from stor1 so that we aren't changing it:

Storage stor2 = extend(stor1);
// OUT: Storage(this=0x7ffc355e3580)::Storage(const &): costly copy

But upon :cpp:func:`extend` returning, the second copy is perform to prepare a temporary object for returning:

The return value is not eligible for copy elision (return value optimization, RVO), because it is not a local variable, but from the function's argument. Compiler must call a constructor for it, although it's nothing more than a temporary variable whose resources can be moved to whoever needs it at the caller.

The C++11 move semantics provides a solution. By adding a move constructor (a constructor that takes a rvalue reference), compiler knows how to treat the return value more efficiently:

Storage(Storage && other)
: data(, size(other.size), reserved(other.reserved) {
  // this is much faster
  std::printf("Storage(this=%p)::Storage(&&): cheaper move\n", this);
  other.size = 0;
  other.reserved = 0; = nullptr;

For the argument of :cpp:func:`extend`, the copy constructor is still used to copy the data to a new instance. But when the function returns, the move constructor is used to move the data from the copy-constructed temporary to the stor2 variable at the caller.

In the above example, it could be confusing why compiler knows the move constructor should be used. The standard requires a compiler to either elide the return copy or treat the return instance as a rvalue when returning a local variable. Because :cpp:func:`extend` returns its function argument, RVO doesn't engage. The function then effectly works like:

Thus when a move constructor of :cpp:class:`Storage` is available, it is called for the return value.


C++ lambda expression is the syntactic sugar that makes writing C++ almost like writing Python, literally.

Although the lambda expression provides many more features than just defining nested functions, the simple use may be just as nested functions. It is especially useful when writing wrapper code for high-level languages.

Variadic Template

This is a versatile addition to C++. My interest in it is the capability to write generic wrapping code cleanly. For example, the C++11 standard doesn't include the function template :cpp:func:`std::make_unique` (it's an oversight, and added back to standard in C++14), but we can easily implement it using variadic template:

No matter what types and number of argument the wrapped constructor has, the variadic template faithfully translates. It provides type safety but doesn't incur any runtime overhead. The expansion is done in the compile-time.

The above example showed how to wrap C++ functions. But when one wraps C/C++ code for Python, the same technique applies. For a dynamic language like Python, a dynamic-static type translation layer needs to be in between.