# Modern C++

1. Copy elision / return value optimization
    1. Make a tracker class for copy construction
    2. Show the copy elision in action
    3. Inspect the assembly
2. Move semantics and copy elision
    1. Forced move is a bad idea
3. Data concatenation
    1. Style 1: return `vector`
    2. Style 2: use output `vector`
    3. Style 3: use a class for both return and output argument
4. Variadic template
5. Perfect forwarding
6. Lambda expression
    1. Keep a lambda in a local variable
    2. Difference between `auto` and `std::function`
7. Closure
    1. Comments on functional style

# Copy elision / return value optimization

Copy elision is one of the two forms of optimization, alongside allocation elision and extension, that is allowed to change the side effects.

Sometimes copy elision is also called return value optimization (RVO) or named return value optimization (NRVO).

In the following example (`03_elision/01_copy.cpp`), even though no optimization flag is used, the copy is optimized out.

## Make a tracker class for copy construction

To show that the copy construction is not called, we make a help class:

```cpp
class IsCopied
{
public:
    static IsCopied & instance()
    {
        static IsCopied inst;
        return inst;
    }

    IsCopied & void on() { m_status = true; return *this; }
    operator bool() const { return m_status; }
    ~IsCopied() = default;

private:
    IsCopied() : m_status(false) {}
    IsCopied(IsCopied const & ) = delete;
    IsCopied(IsCopied       &&) = delete;
    IsCopied & operator=(IsCopied const & ) = delete;
    IsCopied & operator=(IsCopied       &&) = delete;
    bool m_status;
};
```

In the copy constructor of the class of interest is called, it will set the status to be true.

## Show the copy elision in action

In the testing program, we will check for the copy status:

```cpp
class Data
{
public:
    constexpr const static size_t NELEM = 1024*8;
    Data()
    {
        std::cout << "Data constructed @" << this << std::endl;
    }
    Data(Data const & other)
    {
        copy_from(other);
        std::cout << "Data copied to @" << this << " from @" << &other << std::endl;
    }
    ~Data()
    {
        std::cout << "Data destructed @" << this << std::endl;
    }
    void copy_from(Data const & other)
    {
        for (size_t it=0; it < NELEM; ++it)
        {
            m_buffer[it] = other.m_buffer[it];
        }
        IsCopied::instance().on();
    }
private:
    // A lot of data that we don't want to reconstruct.
    int m_buffer[NELEM];
};

void manipulate_with_reference(Data & data, int value)
{
    std::cout << "Manipulate with reference: " << &data << std::endl;

    for (size_t it=0; it < data.size(); ++it)
    {
        data[it] = value + it;
    }
    // In a real consumer function we will do much more meaningful operations.

    // However, we cannot destruct an object passed in with a reference.
}

Data worker1()
{
    Data data;

    // Manipulate the Data object.
    manipulate_with_reference(data, 3);

    return data;
}

Data worker2()
{
    Data data = worker1();

    // Manipulate the Data object, again.
    manipulate_with_reference(data, 8);

    return data;
}

int main(int argc, char ** argv)
{
    std::cout
        << (bool(IsCopied::instance()) ? "Something" : "Nothing")
        << " is copied" << std::endl;
    Data data = worker2();
    std::cout
        << (bool(IsCopied::instance()) ? "Something" : "Nothing")
        << " is copied" << std::endl;
}
```

While running it, we will see that the copy constructor is not called:

In [1]:
!make -C 03_elision clean ; make -C 03_elision OPT= 01_copy
!03_elision/01_copy

rm -rf *.o *.dSYM/ 01_copy 02_move 03_accumulate
g++  -std=c++17 -g      01_copy.cpp   -o 01_copy
Nothing is copied
Data constructed @0x7ffee0222f90
Manipulate with reference: 0x7ffee0222f90
Manipulate with reference: 0x7ffee0222f90
Nothing is copied
Data destructed @0x7ffee0222f90


## Inspect the assembly

Let us take a look at the generated code for where the object is allocated, using the following procedure:

1. Inspect the symbol table of the binary, to learn what functions we should read.
2. Take a look at `worker1()` and `worker2()`.  The copy constructor is not called there.
3. Check the entry point `main()` and `Data` constructor.

### Symbol table

In [2]:
!make -C 03_elision clean ; make -C 03_elision OPT= 01_copy
!r2 -Aqc "e scr.color=0 ; afl" 03_elision/01_copy

rm -rf *.o *.dSYM/ 01_copy 02_move 03_accumulate
g++  -std=c++17 -g      01_copy.cpp   -o 01_copy
0x100000e50    6 314          entry0
0x100001cde    1 6            sym.imp.___chkstk_darwin
0x100001c60    1 6            fcn.100001c60
0x100001c6c    1 6            fcn.100001c6c
0x100001cd2    1 6            fcn.100001cd2
0x100000c60    1 29           sym.std::__1::basic_ostream_char_std::__1::char_traits_char__::operator___std::__1::basic_ostream_char_std::__1::char_traits_char_______std::__1::basic_ostream_char_std::__1::char_traits_char
0x100000dd0    3 66   -> 85   sym.worker2
0x100000d10    3 66   -> 85   sym.worker1
0x100000d90    1 27           sym.func.100000d90
0x100001060    1 79           sym.Data::Data
0x100001cb4    1 6            sym.std::__1::basic_ostream_char_std::__1::char_traits_char__::operator___voidconst
0x100000b70    4 160          sym.manipulate_with_reference_Data__int
0x100000d69    1 19           loc.100000d69
0x100000db0    1 27           sym.func.100000db0
0

### `worker1()` and `worker2()`

In [3]:
!r2 -Aqc "e scr.color=0 ; s sym.worker1 ; pdf" 03_elision/01_copy

            ;-- func.100000d10:
/ (fcn) sym.worker1 85
|   sym.worker1 (int32_t arg1);
|           ; var int32_t var_28h @ rbp-0x28
|           ; var int32_t var_20h @ rbp-0x20
|           ; var int32_t var_1h @ rbp-0x1
|           ; arg int32_t arg1 @ rdi
|           ; CALL XREF from sym.worker2 (0x100000de7)
|           0x100000d10      55             push rbp
|           0x100000d11      4889e5         mov rbp, rsp
|           0x100000d14      4883ec30       sub rsp, 0x30              ; '0'
|           0x100000d18      4889f8         mov rax, rdi               ; arg1
|           0x100000d1b      c645ff00       mov byte [var_1h], 0
|           0x100000d1f      48897de0       mov qword [var_20h], rdi   ; arg1
|           0x100000d23      488945d8       mov qword [var_28h], rax
|           0x100000d27      e864000000     call sym.func.100000d90
|           0x100000d2c      be03000000     mov esi, 3
|           0x100000d31      488b7de0       mov rdi, qword [var_20h]
|           0x10000

In [4]:
!r2 -Aqc "e scr.color=0 ; s sym.worker2 ; pdf" 03_elision/01_copy

            ;-- func.100000dd0:
/ (fcn) sym.worker2 85
|   sym.worker2 (int32_t arg1);
|           ; var int32_t var_28h @ rbp-0x28
|           ; var int32_t var_20h @ rbp-0x20
|           ; var int32_t var_1h @ rbp-0x1
|           ; arg int32_t arg1 @ rdi
|           ; CALL XREF from entry0 (0x100000ed2)
|           0x100000dd0      55             push rbp
|           0x100000dd1      4889e5         mov rbp, rsp
|           0x100000dd4      4883ec30       sub rsp, 0x30              ; '0'
|           0x100000dd8      4889f8         mov rax, rdi               ; arg1
|           0x100000ddb      c645ff00       mov byte [var_1h], 0
|           0x100000ddf      48897de0       mov qword [var_20h], rdi   ; arg1
|           0x100000de3      488945d8       mov qword [var_28h], rax
|           0x100000de7      e824ffffff     call sym.worker1
|           0x100000dec      be08000000     mov esi, 8
|           0x100000df1      488b7de0       mov rdi, qword [var_20h]
|           0x100000df5      e8

### `main()` and constructor

In [5]:
!r2 -Aqc "e scr.color=0 ; s entry0 ; pdf" 03_elision/01_copy

            ;-- main:
            ;-- _main:
            ;-- func.100000e50:
            ;-- rip:
/ (fcn) entry0 314
|   entry0 (int32_t arg1, int32_t arg2);
|           ; var int32_t var_8058h @ rbp-0x8058
|           ; var int32_t var_8050h @ rbp-0x8050
|           ; var char *var_8048h @ rbp-0x8048
|           ; var char *var_8039h @ rbp-0x8039
|           ; var int32_t var_8038h @ rbp-0x8038
|           ; var int32_t var_8030h @ rbp-0x8030
|           ; var int32_t var_8028h @ rbp-0x8028
|           ; var int32_t var_8010h @ rbp-0x8010
|           ; var int32_t var_10h @ rbp-0x10
|           ; var int32_t var_4h @ rbp-0x4
|           ; arg int32_t arg1 @ rdi
|           ; arg int32_t arg2 @ rsi
|           0x100000e50      55             push rbp
|           0x100000e51      4889e5         mov rbp, rsp
|           0x100000e54      b860800000     mov eax, 0x8060
|           0x100000e59      e8800e0000     call sym.imp.___chkstk_darwin
|           0x100000e5e      4829c4         sub 

In [6]:
!r2 -Aqc "e scr.color=0 ; s sym.Data::Data ; pdf" 03_elision/01_copy

            ;-- func.100001060:
/ (fcn) sym.Data::Data 79
|   sym.Data::Data (int32_t arg1);
|           ; var int32_t var_18h @ rbp-0x18
|           ; var int32_t var_10h @ rbp-0x10
|           ; var int32_t var_8h @ rbp-0x8
|           ; arg int32_t arg1 @ rdi
|           ; CALL XREF from sym.func.100000d90 (0x100000da0)
|           0x100001060      55             push rbp
|           0x100001061      4889e5         mov rbp, rsp
|           0x100001064      4883ec20       sub rsp, 0x20
|           0x100001068      488b05990f00.  mov rax, qword [reloc._ZNSt3__14coutE] ; [0x100002008:8]=0
|           0x10000106f      48897df8       mov qword [var_8h], rdi    ; arg1
|           0x100001073      488b7df8       mov rdi, qword [var_8h]
|           0x100001077      48897df0       mov qword [var_10h], rdi
|           0x10000107b      4889c7         mov rdi, rax
|           0x10000107e      488d35700e00.  lea rsi, str.Data_constructed ; 0x100001ef5 ; "Data constructed @"
|           0x1000010

# Move semantics and copy elision

Move semantics greatly helps us to avoid copying expensive resources.  To take advantage of that, our `Data` class should be changed to use dynamic allocation (`03_elision/02_move.cpp`):

```cpp
class Data
{

public:

    constexpr const static size_t NELEM = 1024*8;

    Data()
    {
        m_buffer = new int[NELEM];
        std::cout << "Data constructed @" << this << std::endl;
    }

    Data(Data const & other)
    {
        m_buffer = new int[NELEM];
        copy_from(other);
        std::cout << "Data copied to @" << this << " from @" << &other << std::endl;
    }

    Data & operator=(Data const & other)
    {
        if (nullptr == m_buffer) { m_buffer = new int[NELEM]; }
        copy_from(other);
        std::cout << "Data copy assigned to @" << this << " from @" << &other << std::endl;
        return *this;
    }

    Data(Data && other)
    {
        m_buffer = other.m_buffer;
        other.m_buffer = nullptr;
        std::cout << "Data moved to @" << this << " from @" << &other << std::endl;
        Status::instance().set_moved();
    }

    Data & operator=(Data && other)
    {
        if (m_buffer) { delete[] m_buffer; }
        m_buffer = other.m_buffer;
        other.m_buffer = nullptr;
        std::cout << "Data move assigned to @" << this << " from @" << &other << std::endl;
        Status::instance().set_moved();
        return *this;
    }

    ~Data()
    {
        if (m_buffer) { delete[] m_buffer; }
        std::cout << "Data destructed @" << this << std::endl;
    }

```

## Forced move is a bad idea

Although the move semantics indeed avoids copy the expensive buffer in the `Data` class, it cannot avoid copy the `Data` object itself.  However, copy elision (RVO & NRVO) can avoid copy the `Data` object.

```cpp
Data worker1()
{
    Data data;

    // Manipulate the Data object.
    manipulate_with_reference(data, 3);

    return data;
}

Data worker2()
{
    Data data = worker1();

    // Manipulate the Data object, again.
    manipulate_with_reference(data, 8);

#ifdef FORCEMOVE
    // Explicit move semantics destroys copy elision.
    return std::move(data);
#else
    return data;
#endif
}

int main(int argc, char ** argv)
{
    std::cout
        << "Status:"
        << (bool(Status::instance().is_copied()) ? " copied" : " uncopied")
        << (bool(Status::instance().is_moved()) ? " moved" : " unmoved")
        << std::endl;
    Data data = worker2();
    std::cout
        << "Status:"
        << (bool(Status::instance().is_copied()) ? " copied" : " uncopied")
        << (bool(Status::instance().is_moved()) ? " moved" : " unmoved")
        << std::endl;
}
```

## Compiler does copy elision

In [7]:
# Let compiler perform copy elision / NRVO.
!make -C 03_elision clean ; make -C 03_elision OPT= 02_move
!03_elision/02_move

rm -rf *.o *.dSYM/ 01_copy 02_move 03_accumulate
g++  -std=c++17 -g      02_move.cpp   -o 02_move
Status: uncopied unmoved
Data constructed @0x7ffee4aebf88
Manipulate with reference: 0x7ffee4aebf88
Manipulate with reference: 0x7ffee4aebf88
Status: uncopied unmoved
Data destructed @0x7ffee4aebf88


## Forced move incurs more operations

In [8]:
# See what happens with forced move.
!make -C 03_elision clean ; make -C 03_elision FLAGS=-DFORCEMOVE OPT= 02_move
!03_elision/02_move

rm -rf *.o *.dSYM/ 01_copy 02_move 03_accumulate
g++  -std=c++17 -g  -DFORCEMOVE    02_move.cpp   -o 02_move
Status: uncopied unmoved
Data constructed @0x7ffee6facee8
Manipulate with reference: 0x7ffee6facee8
Manipulate with reference: 0x7ffee6facee8
Data moved to @0x7ffee6facf88 from @0x7ffee6facee8
Data destructed @0x7ffee6facee8
Status: uncopied moved
Data destructed @0x7ffee6facf88


# Data concatenation

Because of copy elision, for readibility in C++ it is prefer to write:

```cpp
std::vector<int> worker_return();
```

than

```cpp
void worker_argument(std::vector<int> & output /* output argument */);
```

Because in consumer code:

```cpp
// It reads clearly that the worker produces new result.
std::vector<int> result = worker_return();

// It takes a second to understand that the worker is using result as a buffer
// for output.
std::vector<int> result;
worker_argument(result);

/*
 * The result is pre-populated before sending to the worker.  From the
 * following lines we can't know how the worker will use result.
 *
 * By reading the worker signature we know that result may be used for output.
 * We can only be sure that result is used for output after reading the full
 * implemnetation of the worker.
 *
 * The worker may or may not expect the output argument to be pre-populated.
 * Regardless, it has to use runtime check to ensure either case.
 */
std::vector<int> result(100);
std::fill(result.begin(), result.end(), 7);
worker_argument(result);
```

The ambiguity is a productivity killer.  (Runtime performance is another story.)

## Style 1: return `vector`

The first style returns a vector from inner and appends it in outer.  It is easier to read and test.  The inner worker:

```cpp
std::vector<Data> inner1(size_t start, size_t len)
{
    std::cout << "** inner1 begins with " << start << std::endl;
    std::vector<Data> ret;
    for (size_t it=0; it < len; ++it)
    {
        Data data(start+it);
        ret.emplace_back(std::move(data));
    }
    return ret;
}
```

The outer worker:

```cpp
void outer1(size_t len)
{
    std::cout << "* outer1 begins" << std::endl;
    std::vector<Data> vec;
    for (size_t it=0; it < len; ++it)
    {
        std::cout << std::endl;
        std::cout << "* outer1 loop it=" << it << " begins" << std::endl;
        std::vector<Data> subvec = inner1(vec.size(), it+1);
        std::cout << "* outer1 obtained inner1 at " << vec.size() << std::endl;
        vec.insert(
            vec.end()
          , std::make_move_iterator(subvec.begin())
          , std::make_move_iterator(subvec.end())
        );
        std::cout << "* outer1 inserted subvec.size()=" << subvec.size() << std::endl;
    }
    std::cout << "* outer1 result.size() = " << vec.size() << std::endl << std::endl;
}
```

In [9]:
# Style 1 result
!make -C 03_elision clean ; make -C 03_elision FLAGS="-DOTYPE=1" 03_accumulate
!03_elision/03_accumulate

rm -rf *.o *.dSYM/ 01_copy 02_move 03_accumulate
g++  -std=c++17 -g -O3 -DOTYPE=1    03_accumulate.cpp   -o 03_accumulate
* outer1 begins

* outer1 loop it=0 begins
** inner1 begins with 0
Data #0 constructed @0x7ffee92cdec8
Data #0 moved to @0x7fed62405960 from @0x7ffee92cdec8
Data #0 destructed @0x7ffee92cdec8
* outer1 obtained inner1 at 0
Data #0 moved to @0x7fed62405970 from @0x7fed62405960
* outer1 inserted subvec.size()=1
Data #0 destructed @0x7fed62405960

* outer1 loop it=1 begins
** inner1 begins with 1
Data #1 constructed @0x7ffee92cdec8
Data #1 moved to @0x7fed62405960 from @0x7ffee92cdec8
Data #1 destructed @0x7ffee92cdec8
Data #2 constructed @0x7ffee92cdec8
Data #2 moved to @0x7fed62405990 from @0x7ffee92cdec8
Data #1 copied to @0x7fed62405980 from @0x7fed62405960
Data #1 destructed @0x7fed62405960
Data #2 destructed @0x7ffee92cdec8
* outer1 obtained inner1 at 1
Data #1 moved to @0x7fed624059b0 from @0x7fed62405980
Data #2 moved to @0x7fed624059c0 from @0x7fed62405990
Data

The unwanted copies come from `std::vector` resizing.  To mitigate it, we should mark the move constructor with `noexcept`:

```cpp
Data(Data && other) noexcept
{
    m_serial = other.m_serial;
    m_buffer = other.m_buffer;
    other.m_buffer = nullptr;
    std::cout << "Data #" << m_serial << " moved to @" << this << " from @" << &other << std::endl;
}
```

In [10]:
# Unwanted copies are turned into move.
!make -C 03_elision clean ; make -C 03_elision FLAGS="-DMOVENOEXCEPT -DOTYPE=1" 03_accumulate
!03_elision/03_accumulate

rm -rf *.o *.dSYM/ 01_copy 02_move 03_accumulate
g++  -std=c++17 -g -O3 -DMOVENOEXCEPT -DOTYPE=1    03_accumulate.cpp   -o 03_accumulate
* outer1 begins

* outer1 loop it=0 begins
** inner1 begins with 0
Data #0 constructed @0x7ffeef204ec8
Data #0 moved to @0x7fc5eb405960 from @0x7ffeef204ec8
Data #0 destructed @0x7ffeef204ec8
* outer1 obtained inner1 at 0
Data #0 moved to @0x7fc5eb405970 from @0x7fc5eb405960
* outer1 inserted subvec.size()=1
Data #0 destructed @0x7fc5eb405960

* outer1 loop it=1 begins
** inner1 begins with 1
Data #1 constructed @0x7ffeef204ec8
Data #1 moved to @0x7fc5eb405960 from @0x7ffeef204ec8
Data #1 destructed @0x7ffeef204ec8
Data #2 constructed @0x7ffeef204ec8
Data #2 moved to @0x7fc5eb405990 from @0x7ffeef204ec8
Data #1 moved to @0x7fc5eb405980 from @0x7fc5eb405960
Data #1 destructed @0x7fc5eb405960
Data #2 destructed @0x7ffeef204ec8
* outer1 obtained inner1 at 1
Data #1 moved to @0x7fc5eb4059b0 from @0x7fc5eb405980
Data #2 moved to @0x7fc5eb4059c0 from @0x7fc

## Style 2: use output `vector`

The second style uses an output argument which is passed from outer to inner.  The inner worker:

```cpp
void inner2(size_t start, size_t len, std::vector<Data> & result /* for output */)
{
    std::cout << "** inner2 begins with " << start << std::endl;
    for (size_t it=0; it < len; ++it)
    {
        Data data(start+it);
        result.emplace_back(std::move(data));
    }
}
```

The outer worker:

```cpp
void outer2(size_t len)
{
    std::cout << "* outer2 begins" << std::endl;
    std::vector<Data> vec;
    for (size_t it=0; it < len; ++it)
    {
        std::cout << std::endl;
        std::cout << "* outer2 loop it=" << it << " begins" << std::endl;
        inner2(vec.size(), it+1, vec);
    }
    std::cout << "* outer2 result.size() = " << vec.size() << std::endl << std::endl;
}
```

There is no longer the intermediate vector and it saves quite a number of movement.  The prize we pay is less testability.

In [11]:
# The output argument results into less movement.
!make -C 03_elision clean ; make -C 03_elision FLAGS="-DMOVENOEXCEPT -DOTYPE=2" 03_accumulate
!03_elision/03_accumulate

rm -rf *.o *.dSYM/ 01_copy 02_move 03_accumulate
g++  -std=c++17 -g -O3 -DMOVENOEXCEPT -DOTYPE=2    03_accumulate.cpp   -o 03_accumulate
* outer2 begins

* outer2 loop it=0 begins
** inner2 begins with 0
Data #0 constructed @0x7ffeeaf7eed8
Data #0 moved to @0x7fd4f5c05960 from @0x7ffeeaf7eed8
Data #0 destructed @0x7ffeeaf7eed8

* outer2 loop it=1 begins
** inner2 begins with 1
Data #1 constructed @0x7ffeeaf7eed8
Data #1 moved to @0x7fd4f5c05980 from @0x7ffeeaf7eed8
Data #0 moved to @0x7fd4f5c05970 from @0x7fd4f5c05960
Data #0 destructed @0x7fd4f5c05960
Data #1 destructed @0x7ffeeaf7eed8
Data #2 constructed @0x7ffeeaf7eed8
Data #2 moved to @0x7fd4f5c059b0 from @0x7ffeeaf7eed8
Data #1 moved to @0x7fd4f5c059a0 from @0x7fd4f5c05980
Data #0 moved to @0x7fd4f5c05990 from @0x7fd4f5c05970
Data #1 destructed @0x7fd4f5c05980
Data #0 destructed @0x7fd4f5c05970
Data #2 destructed @0x7ffeeaf7eed8

* outer2 loop it=2 begins
** inner2 begins with 3
Data #3 constructed @0x7ffeeaf7eed8
Data #3 moved to

## Style 3: use a class for both return and output argument

The third style uses a class so that it support both vector returnning and output argument for the vector.  The class is:

```cpp
struct Accumulator
{

public:
    // This can be called if consumers want the sub-operation one by one, and
    // make the code more testable. But it isn't really used in the example.
    std::vector<Data> inner1(size_t start, size_t len)
    {
        std::cout << "** Accumulator::inner1 begins with " << start << std::endl;
        std::vector<Data> ret;
        ret.reserve(len);
        inner2(start, len, ret);
        return ret;
    }

private:
    void inner2(size_t start, size_t len, std::vector<Data> & ret)
    {
        std::cout << "** Accumulator::inner2 begins with " << start << std::endl;
        for (size_t it=0; it < len; ++it)
        {
            Data data(start+it);
            ret.emplace_back(std::move(data));
        }
    }

public:
    // This is used when batch operation is in demand.
    void outer(size_t len)
    {
        std::cout << "* Accumulator::outer begins" << std::endl;
        result.reserve(len*(len+1)/2);
        for (size_t it=0; it < len; ++it)
        {
            std::cout << std::endl;
            std::cout << "* Accumulator::outer loop it=" << it << " begins" << std::endl;
            inner2(result.size(), it+1, result);
        }
        std::cout << "* Accumulator::outer result.size() = " << result.size() << std::endl << std::endl;
    }

public:
    std::vector<Data> result;

}; /* end struct Accumulator */
```

Although `Accumulator::outer` still calls the function `Accumulator::inner2` that takes an output argument, we also have the function `Accumulator::inner1` that wraps around `Accumulator::inner2` and make it testable.

To further save unwanted movements, we pre-calculate the number of elements to be populated in the vector and reserve the space.

In [12]:
# The class-style implementation is also enhanced with reserve.
!make -C 03_elision clean ; make -C 03_elision FLAGS="-DMOVENOEXCEPT -DOTYPE=3" 03_accumulate
!03_elision/03_accumulate

rm -rf *.o *.dSYM/ 01_copy 02_move 03_accumulate
g++  -std=c++17 -g -O3 -DMOVENOEXCEPT -DOTYPE=3    03_accumulate.cpp   -o 03_accumulate
* Accumulator::outer begins

* Accumulator::outer loop it=0 begins
** Accumulator::inner2 begins with 0
Data #0 constructed @0x7ffee4baeec8
Data #0 moved to @0x7fbdaa405960 from @0x7ffee4baeec8
Data #0 destructed @0x7ffee4baeec8

* Accumulator::outer loop it=1 begins
** Accumulator::inner2 begins with 1
Data #1 constructed @0x7ffee4baeec8
Data #1 moved to @0x7fbdaa405970 from @0x7ffee4baeec8
Data #1 destructed @0x7ffee4baeec8
Data #2 constructed @0x7ffee4baeec8
Data #2 moved to @0x7fbdaa405980 from @0x7ffee4baeec8
Data #2 destructed @0x7ffee4baeec8

* Accumulator::outer loop it=2 begins
** Accumulator::inner2 begins with 3
Data #3 constructed @0x7ffee4baeec8
Data #3 moved to @0x7fbdaa405990 from @0x7ffee4baeec8
Data #3 destructed @0x7ffee4baeec8
Data #4 constructed @0x7ffee4baeec8
Data #4 moved to @0x7fbdaa4059a0 from @0x7ffee4baeec8
Data #4 destructe

Evolution of the three styles demonstrate how one may develop sophisticated code from a standalone helper to an optimized class library.

# Variadic template

Variadic template allows us to capture any number of template arguments in a function template.  Assuming we have 2 constructors for `Data`:

```cpp
Data(size_t serial, ctor_passkey const &)
  : m_serial(serial)
{
    m_buffer = new int[NELEM];
    initialize(0);
    std::cout << "Data #" << m_serial << " constructed @" << this
              << "(serial=" << m_serial << ")" << std::endl;
}

Data(size_t serial, int base, ctor_passkey const &)
  : m_serial(serial+base)
{
    m_buffer = new int[NELEM];
    initialize(0);
    std::cout << "Data #" << m_serial << " constructed @" << this
              << "(serial=" << m_serial << ")"
              << "(base=" << base << ")" << std::endl;
}
```

We will need two factories methods for them:

```cpp
static std::shared_ptr<Data> create(size_t serial)
{
    return std::make_shared<Data>(serial, ctor_passkey());
}

static std::shared_ptr<Data> create(size_t serial, int base)
{
    return std::make_shared<Data>(serial, int, ctor_passkey());
}
```

It's tedious to add the corresponding factory functions, although it is not too much an issue, since the compiler will complain.  Let's assume we forgot the add the second factory overload and see what may happen.

In [13]:
!make -C 04_template clean ; make -C 04_template FLAGS=-DUSE_CREATE 01_factory

rm -rf *.o *.dSYM/ 01_factory
g++  -std=c++17 -g -O3 -m64 -DUSE_CREATE    01_factory.cpp   -o 01_factory
[1m01_factory.cpp:142:37: [0m[0;1;31merror: [0m[1mtoo many arguments to function call, expected
      single argument 'serial', have 2 arguments[0m
            data = Data::create(it, base);
[0;1;32m                   ~~~~~~~~~~~~     ^~~~
[0m[1m01_factory.cpp:22:5: [0m[0;1;30mnote: [0m'create' declared here[0m
    static std::shared_ptr<Data> create(size_t serial)
[0;1;32m    ^
[0m1 error generated.
make: *** [01_factory] Error 1


Variadic template can conveniently help us summarize the two overloads into one template function, and also capture every new public constructor that will be added in the future.

```cpp
template < typename ... Args >
static std::shared_ptr<Data> make(Args && ... args)
{
    // Forget about the 'forward' for now. It will be discussed later.
    return std::make_shared<Data>(std::forward<Args>(args) ..., ctor_passkey());
}
```

Run the following code:

```cpp
void outer1(size_t len)
{
    std::cout << "* outer1 begins" << std::endl;
    std::vector<std::shared_ptr<Data>> vec;
    for (size_t it=0; it < len; ++it)
    {
        std::cout << std::endl;
        std::cout << "* outer1 loop it=" << it << " begins" << std::endl;
        std::vector<std::shared_ptr<Data>> subvec = inner1(vec.size(), it+1);
        std::cout << "* outer1 obtained inner1 at " << vec.size() << std::endl;
        vec.insert(
            vec.end()
          , std::make_move_iterator(subvec.begin())
          , std::make_move_iterator(subvec.end())
        );
        std::cout << "* outer1 inserted subvec.size()=" << subvec.size() << std::endl;
    }
    std::cout << "* outer1 result.size() = " << vec.size() << std::endl << std::endl;

    std::cout << "* outer1 end" << std::endl << std::endl;
}

std::vector<std::shared_ptr<Data>> inner1(size_t base, size_t len)
{
    std::cout << "** inner1 begins with " << base << std::endl;
    std::vector<std::shared_ptr<Data>> ret;
    for (size_t it=0; it < len; ++it)
    {
        std::shared_ptr<Data> data;
        if (0 == base)
        {
            data = Data::make(it);
        }
        else
        {
            data = Data::make(it, base);
        }
        ret.emplace_back(data);
    }
    return ret;
}
```

In [14]:
!make -C 04_template clean ; make -C 04_template 01_factory
!04_template/01_factory

rm -rf *.o *.dSYM/ 01_factory
g++  -std=c++17 -g -O3 -m64     01_factory.cpp   -o 01_factory
* outer1 begins

* outer1 loop it=0 begins
** inner1 begins with 0
Data #0 constructed @0x7fd0b7405978(serial=0)
* outer1 obtained inner1 at 0
* outer1 inserted subvec.size()=1

* outer1 loop it=1 begins
** inner1 begins with 1
Data #1 constructed @0x7fd0b74059d8(serial=1)(base=1)
Data #2 constructed @0x7fd0b7405a18(serial=2)(base=1)
* outer1 obtained inner1 at 1
* outer1 inserted subvec.size()=2

* outer1 loop it=2 begins
** inner1 begins with 3
Data #3 constructed @0x7fd0b7405aa8(serial=3)(base=3)
Data #4 constructed @0x7fd0b7405ae8(serial=4)(base=3)
Data #5 constructed @0x7fd0b7405b28(serial=5)(base=3)
* outer1 obtained inner1 at 3
* outer1 inserted subvec.size()=3
* outer1 result.size() = 6

* outer1 end

Data #5 destructed @0x7fd0b7405b28
Data #4 destructed @0x7fd0b7405ae8
Data #3 destructed @0x7fd0b7405aa8
Data #2 destructed @0x7fd0b7405a18
Data #1 destructed @0x7fd0b74059d8
Data #0 destr

# Perfect forwarding

In the previous section we used `std::forward`, which enables perfect forwarding:

```cpp
template < typename ... Args >
static std::shared_ptr<Data> make(Args && ... args)
{
    return std::make_shared<Data>(std::forward<Args>(args) ..., ctor_passkey());
}, 
```

Although the template is named `forward`, it doesn't forward anything.  Like `std::move`, it serves as a cast to rvalue reference.  The difference is that:

1. `std::move` unconditionally casts the input to rvalue reference.
2. `std::forward` casts to rvalue reference only when it can.

When we write `Data &&`, it is a rvalue reference.  With `T &&` as a template argument, when we write `T &&`, it can be either lvalue or rvalue, so it is also called universal reference.  The rule of thumb is that when `T` is a deductible type (`auto &&` falls into this category too), `T &&` is a universal reference rather a strict rvalue reference.

So `std::forward<Args>(args)` preserves the type of reference of the arguments, and the pattern is called perfect forwarding.  Because the arguments of `Data` constructors were both fundamental types, it doesn't matter whether or not we use perfect forwarding.  To demonstrate how it works, we add the two wrapper:

```cpp
// Proxy to copy and move constructor.
Data(Data const &  other, ctor_passkey const &) : Data(std::forward<Data const &>(other)) {}
Data(Data       && other, ctor_passkey const &) : Data(std::forward<Data &&>(other)) {}
```

And we use a slightly different `outer`:

```cpp
void outer1(size_t len)
{
    std::cout << "* outer1 begins" << std::endl;
    std::vector<std::shared_ptr<Data>> vec;
    for (size_t it=0; it < len; ++it)
    {
        std::cout << std::endl;
        std::cout << "* outer1 loop it=" << it << " begins" << std::endl;
        std::vector<std::shared_ptr<Data>> subvec = inner1(vec.size(), it+1);
        std::cout << "* outer1 obtained inner1 at " << vec.size() << std::endl;
        vec.insert(
            vec.end()
          , std::make_move_iterator(subvec.begin())
          , std::make_move_iterator(subvec.end())
        );
        std::cout << "* outer1 inserted subvec.size()=" << subvec.size() << std::endl;
    }
    std::cout << "* outer1 result.size() = " << vec.size() << std::endl << std::endl;

    // Exercise the perfect forwarding.
    vec.emplace_back(Data::make(*vec[0]));
    vec.emplace_back(Data::make(std::move(*vec[1])));

    std::cout << "* outer1 end" << std::endl << std::endl;
}
```

In [15]:
# Perfect forwarding dispatches to the correct constructors.
!make -C 04_template clean ; make -C 04_template FLAGS=-DSHOW_PERFECT_FORWARD 01_factory
!04_template/01_factory

rm -rf *.o *.dSYM/ 01_factory
g++  -std=c++17 -g -O3 -m64 -DSHOW_PERFECT_FORWARD    01_factory.cpp   -o 01_factory
* outer1 begins

* outer1 loop it=0 begins
** inner1 begins with 0
Data #0 constructed @0x7f9e60c05978(serial=0)
* outer1 obtained inner1 at 0
* outer1 inserted subvec.size()=1

* outer1 loop it=1 begins
** inner1 begins with 1
Data #1 constructed @0x7f9e60c059d8(serial=1)(base=1)
Data #2 constructed @0x7f9e60c05a18(serial=2)(base=1)
* outer1 obtained inner1 at 1
* outer1 inserted subvec.size()=2

* outer1 loop it=2 begins
** inner1 begins with 3
Data #3 constructed @0x7f9e60c05aa8(serial=3)(base=3)
Data #4 constructed @0x7f9e60c05ae8(serial=4)(base=3)
Data #5 constructed @0x7f9e60c05b28(serial=5)(base=3)
* outer1 obtained inner1 at 3
* outer1 inserted subvec.size()=3
* outer1 result.size() = 6

Data #0 copied to @0x7f9e60c05b68 from @0x7f9e60c05978
Data #1 moved to @0x7f9e60c05a58 from @0x7f9e60c059d8
* outer1 end

Data #1 destructed @0x7f9e60c05a58
Data #0 destructed @0x

# Lambda expression

C++ lambda expression enables a shorthand for anonymous function.  The syntax (no variable is captured) is:

```cpp
[] (/* arguments */) { /* body */ }
```

It works basically like a functor.

```cpp
struct Functor
{
    bool operator()(int v)
    {
        return 0 == v % 23;
    }
}; /* end struct Functor */

int main(int argc, char ** argv)
{
    std::vector<int> data(63712);
    for (size_t i=0 ; i<data.size(); ++i) { data[i] = i;}

    std::cout
        << "Number divisible by 23 (count by functor): "
        << std::count_if(data.begin(), data.end(), Functor())
        << std::endl;

    std::cout
        << "Number divisible by 23 (count by lambda): "
        << std::count_if(data.begin(), data.end(), [](int v){ return 0 == v%23; })
        << std::endl;

    return 0;
}
```

In [16]:
!make -C 05_lambda clean ; make -C 05_lambda 01_lambda
!05_lambda/01_lambda

rm -rf *.o *.dSYM/ 01_lambda 02_stored 03_closure
g++  -std=c++17 -g -O3 -m64     01_lambda.cpp   -o 01_lambda
Number divisible by 23 (count by functor): 2771
Number divisible by 23 (count by lambda): 2771


## Keep a lambda in a local variable

Lambda is considered as anonymous function, but we can give it a 'name' by assigning it to a variable.  There are two choices: `auto` or `std::function`.

```cpp
int main(int argc, char ** argv)
{
    std::vector<int> data(63712);
    for (size_t i=0 ; i<data.size(); ++i) { data[i] = i;}

    std::cout
        << "Number divisible by 23 (count by lambda inline): "
        << std::count_if(data.begin(), data.end(), [](int v){ return 0 == v%23; })
        << std::endl;

    auto condition = [](int v){ return 0 == v%23; };

    std::cout
        << "Number divisible by 23 (count by lambda in auto): "
        << std::count_if(data.begin(), data.end(), condition)
        << std::endl;

    std::function<bool (int)> condition_function = [](int v){ return 0 == v%23; };

    std::cout
        << "Number divisible by 23 (count by lambda in std::function): "
        << std::count_if(data.begin(), data.end(), condition_function)
        << std::endl;

    return 0;
}
```

In [17]:
!make -C 05_lambda clean ; make -C 05_lambda 02_stored
!05_lambda/02_stored

rm -rf *.o *.dSYM/ 01_lambda 02_stored 03_closure
g++  -std=c++17 -g -O3 -m64     02_stored.cpp   -o 02_stored
Number divisible by 23 (count by lambda inline): 2771
Number divisible by 23 (count by lambda in auto): 2771
Number divisible by 23 (count by lambda in std::function): 2771


## Difference between `auto` and `std::function`

Although both `auto` and `std::function` can hold a lambda, the two ways are not exactly the same.  A lambda works like a functor and the `auto` type reflects that.  A `std::function` is more versatile than it, and takes more memory as well.

This is a list of targets (callables) that a `std::function` can hold: free functions, member functions, functors, lambda expressions, and bind expressions.

```cpp
std::cout
    << std::endl
    << "The differences between lambda and std::function"
    << std::endl;
std::cout
    << "type name of lambda: "
    << typeid(condition).name() << std::endl;
std::cout
    << "type name of std::function: "
    << typeid(condition_function).name() << std::endl;

std::cout
    << "size of lambda: "
    << sizeof(condition) << std::endl;
std::cout
    << "size of std::function: "
    << sizeof(condition_function) << std::endl;
```

In [18]:
!make -C 05_lambda clean ; make -C 05_lambda FLAGS=-DSHOW_DIFF 02_stored
!05_lambda/02_stored

rm -rf *.o *.dSYM/ 01_lambda 02_stored 03_closure
g++  -std=c++17 -g -O3 -m64 -DSHOW_DIFF    02_stored.cpp   -o 02_stored
Number divisible by 23 (count by lambda inline): 2771
Number divisible by 23 (count by lambda in auto): 2771
Number divisible by 23 (count by lambda in std::function): 2771

The differences between lambda and std::function
type name of lambda: Z4mainE3$_1
type name of std::function: NSt3__18functionIFbiEEE
size of lambda: 1
size of std::function: 48


In [19]:
!c++filt "Z4mainE3$_1"

Z4mainE3


In [20]:
!c++filt NSt3__18functionIFbiEEE

std::__1::function<bool (int)>


# Closure

So far our use of lambda expressions doesn't capture any local variables.  When it does, we call the lambda expression a closure.

We must tell the compiler what type of capture the lambda espression would like to use.  Otherwise the compilation fails.

```cpp
int main(int argc, char ** argv)
{
    std::vector<int> data(63712);
    for (size_t i=0 ; i<data.size(); ++i) { data[i] = i;}

    int divisor = 23;

#if WRONG_CAPTURE
    std::cout
        << "Count (wrong capture): "
        << std::count_if(data.begin(), data.end(), [](int v){ return 0 == v%divisor; })
        << " (divisor: " << divisor << ")"
        << std::endl;
#endif

    return 0;
}
```

In [21]:
!make -C 05_lambda clean ; make -C 05_lambda FLAGS=-DWRONG_CAPTURE 03_closure

rm -rf *.o *.dSYM/ 01_lambda 02_stored 03_closure
g++  -std=c++17 -g -O3 -m64 -DWRONG_CAPTURE    03_closure.cpp   -o 03_closure
[1m03_closure.cpp:16:77: [0m[0;1;31merror: [0m[1mvariable 'divisor' cannot be implicitly captured in
      a lambda with no capture-default specified[0m
  ...data.end(), [](int v){ return 0 == v%divisor; })
[0;1;32m                                          ^
[0m[1m03_closure.cpp:11:9: [0m[0;1;30mnote: [0m'divisor' declared here[0m
    int divisor = 23;
[0;1;32m        ^
[0m[1m03_closure.cpp:16:52: [0m[0;1;30mnote: [0mlambda expression begins here[0m
        << std::count_if(data.begin(), data.end(), [](int v){ return 0 =...
[0;1;32m                                                   ^
[0m1 error generated.
make: *** [03_closure] Error 1


We may explicitly tell the compiler that we want `divisor` to be captured by the lambda expression by value:

```cpp
int divisor = 23;

std::cout
    << "Count (lambda explicitly capture by value): "
    << std::count_if(data.begin(), data.end(), [divisor](int v){ return 0 == v%divisor; })
    << " (divisor: " << divisor << ")"
    << std::endl;
```

Use `=` to implicitly capture by value:

```cpp
std::cout
    << "Count (lambda implicitly capture by value): "
    << std::count_if(data.begin(), data.end(), [=](int v){ return 0 == v%divisor; })
    << " (divisor: " << divisor << ")"
    << std::endl;
```

Use `&` to capture by reference:

```cpp
std::cout
    << "Count (lambda explicitly capture by reference): "
    << std::count_if(data.begin(), data.end(), [&divisor](int v){ divisor = 10; return 0 == v%divisor; })
    << " (divisor: " << divisor << ")"
    << std::endl;
```

`&` can also be put standalone in `[]` to indicate that the default capture is by reference.

The execution results:

In [22]:
!make -C 05_lambda clean ; make -C 05_lambda 03_closure
!05_lambda/03_closure

rm -rf *.o *.dSYM/ 01_lambda 02_stored 03_closure
g++  -std=c++17 -g -O3 -m64     03_closure.cpp   -o 03_closure
Count (lambda explicitly capture by value): 2771 (divisor: 23)
Count (lambda implicitly capture by value): 2771 (divisor: 23)
Count (lambda explicitly capture by reference): 6372 (divisor: 10)


## Comments on functional style

The lambda expression and closure allow functional style of programming.  As shown in the `std::count_if` example, it is a convenient tool to reduce the lines of code.  It generally makes the code looks cleaner and easier to maintain.  That buys us time to do more important things or optimize performance hotspot.

But there are times that we cannot entrust the optimization to the compiler.  Lambda expressions are not the easist place to add intrinsics or assemblies.

When working on a container object equipped with proper iterator interface, I go with the functional style.  The lambda expression may help avoid expensive intermediate buffers.  It works well at least for the initial prototype.

# Exercises

1. Measure the performance between using an output vector and returning a new vector.

# References

1. [Copy elision](https://en.cppreference.com/w/cpp/language/copy_elision) at cppreference.com .
1. [Lambda expressions](https://en.cppreference.com/w/cpp/language/lambda) at cppreference.com .
1. [C++ Lambdas Under The Hood](https://web.mst.edu/~nmjxv3/articles/lambdas.html)