C and C plus plus

C++11 features

Move ctor and move operator

http://blog.smartbear.com/c-plus-plus/c11-tutorial-introducing-the-move-constructor-and-the-move-assignment-operator/

class and typename

Stanley Lippman’s answer: https://blogs.msdn.microsoft.com/slippman/2004/08/11/why-c-supports-both-class-and-typename-for-type-parameters/

rvalue etc.

The best article is: http://thbecker.net/articles/rvalue_references/section_01.html
Good for the history : https://blog.smartbear.com/development/c11-tutorial-explaining-the-ever-elusive-lvalues-and-rvalues/
Start from example programs: https://eli.thegreenplace.net/2011/12/15/understanding-lvalues-and-rvalues-in-c-and-c/
Good for 5 types of values: https://www.justsoftwaresolutions.co.uk/cplusplus/core-c++-lvalues-and-rvalues.html

Graph generated from:

digraph values {
  "C++11 values" -> lvalues, rvalues;
  lvalues -> glvalues;
  rvalues -> xvalues, prvalues;
  xvalues -> glvalues;
}

using dot:

$ dot -Tpng rvalue.dot > rvalue.png

and then view it using eog:

$ eog rvalue.png

./images/rvalue.png

smart pointers

https://www.codeproject.com/Articles/541067/Cplusplus-Smart-Pointers

addr2line vs. ASLR vs. pie

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=860394

要关闭 pie，不然地址是 position independent 的。

$ g++ -no-pie -g

另外，ASLR 打开的时候，pie 的文件的地址会随机化。

ASLR can be disabled globally with sysctl:

$ sudo sysctl -w kernel.randomize_va_space=0

Or with a boot argument:

norandmaps

Or (preferrably) locally for the process you are interested:

$ setarch `uname -m` -R /some/program

Often it is convenient to use the above command with /bin/bash so as to quickly create a non-randomized environment for yourself.

ASLR 即使被关掉了，pie 的文件还是会被加载到固定的地址，但是这个地址还不是我们想要的。

You can view the loaded offsets using the dynamic loader:

$ LD_TRACE_PRELINKING=1 /some/program | grep '=>'

Why excessive virtual memory?

The typical case is found the VM values list by `top’ is much larger than expected. For example, VM is 18G, RES is 4G, subtract the stack size used by each thread (which is malloc-ed), there are 8G. On x64, each core consumes eight 64M. To calculate, 16 cores * 8 * 64M = 8G. Using pmap and cat/proc/<pid>/smaps, there are many 65404K blocks. On a x64 box, there are 124.

See https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en and http://udrepper.livejournal.com/20948.html

优化整数除法以及取余

http://www.hackersdelight.org/magic.htm 除法可以被优化为乘法和移位。编译器产生的汇编代码会有类似的优化。 https://github.com/ridiculousfish/libdivide/blob/master/README.txt 优化除法的一个函数库（头文件）

Intel 的 idiv 指令可以一次性的计算出商和余数，如果除法和模是紧挨着的两条指令，编译器可能会优化为一个 idiv。

编译器通常会将常量表达式优化为常量，但是如果该表达式中用到了常量数组，可能不会被优化掉（原因可能是因为常量数组跨了文件）。

Stack unwind

http://blog.reverberate.org/2013/05/deep-wizardry-stack-unwinding.html 写得比较好懂，但是缺乏 x64 的很多细节。

上文给出的一个链接：https://blogs.oracle.com/eschrock/entry/debugging_on_amd64_part_one

operator()

#include <cassert>
#include <cstdio>
#include <iostream>

using namespace std;

class Foo
{
public:
  Foo() { cout << "Foo()" << endl; }
  ~Foo() { cout << "~Foo()" << endl; }
  void operator()() { cout << "operator()" << endl; }
  bool operator()(int x) { cout << "operator(x)" << endl;  return true; }
};

int main(void)
{
  int i = 0;

  for (i = 0; i < 100; i++)
  {
    Foo foo;
    cout << i << endl;
    foo();
    (void) foo(1);
  }
  return 0;
}

装饰者模式

https://sourcemaking.com/design_patterns/decorator https://sourcemaking.com/design_patterns/decorator/cpp/2

装饰者模式给对象增加一些附加的特性，但是保持接口不变。也避免采用继承（因为要给不同对象增加不同的特性）。

递归模板

http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Curiously_Recurring_Template_Pattern http://www.codeproject.com/Tips/537606/Cplusplus-Prefer-Curiously-Recurring-Template-Patt

Curiously Recurring Template Pattern。静态的多态，可以避免vtable的开销。Specialize a base class using the derived class as a template argument.

template <class Derived>
struct base
{
    void interface()
    {
        // ...
        static_cast<Derived*>(this)->implementation();
        // ...
    }
 
    static void static_interface()
    {
        // ...
        Derived::static_implementation();
        // ...
    }
 
    // The default implementation may be (if exists) or should be (otherwise) 
    // overridden by inheriting in derived classes (see below)
    void implementation();
    static void static_implementation();
};
 
// The Curiously Recurring Template Pattern (CRTP)
struct derived_1 : base<derived_1>
{
    // This class uses base variant of implementation
    //void implementation();
 
    // ... and overrides static_implementation
    static void static_implementation();
};
 
struct derived_2 : base<derived_2>
{
    // This class overrides implementation
    void implementation();
 
    // ... and uses base variant of static_implementation
    //static void static_implementation();
};

默认pthread_mutex_lock/trylock/unlock/不可递归（不可重入）

pthread_mutex_init/destroy可以设置属性，pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;用的是默认属性。

From man page:

If the mutex type is PTHREAD_MUTEX_DEFAULT, attempting to recursively lock the mutex results
in undefined behavior. Attempting to unlock the mutex if it was not locked  by  the  calling
thread  results  in  undefined  behavior. Attempting to unlock the mutex if it is not locked
results in undefined behavior.

例子：

// g++ -Wall -pthread -g pthread_mutext.cc
#include <cassert>
#include <iostream>
#include <pthread.h>

using namespace std;

int main()
{
    int rc1, rc2, rc3, rc4;
    pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
    rc1 = pthread_mutex_lock(&mutex);
    // default: it will be blocked at next line.               <== 1
    rc2 = pthread_mutex_lock(&mutex);
    rc3 = pthread_mutex_unlock(&mutex);
    // if the line after <== 1 is removed, it will return 0    <== 2
    // at next line, too.
    rc4 = pthread_mutex_unlock(&mutex);
    cout << rc1 << endl;
    cout << rc2 << endl;
    cout << rc3 << endl;
    cout << rc4 << endl;
    return 0;
}

$ ./a.out 
(hanging)

(gdb) bt
#0  0x00007f3471f7d59d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f3471f79179 in _L_lock_814 () from /lib64/libpthread.so.0
#2  0x00007f3471f79048 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000400a5b in main () at pthread_mutext.cc:13
(gdb) fr 3
#3  0x0000000000400a5b in main () at pthread_mutext.cc:13
13	    rc2 = pthread_mutex_lock(&mutex);

MySQL修改了默认的属性，例如：

/* Define mutex types, see my_thr_init.c */
#define MY_MUTEX_INIT_SLOW   NULL
#ifdef PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP
extern pthread_mutexattr_t my_fast_mutexattr;
#define MY_MUTEX_INIT_FAST &my_fast_mutexattr
#else
#define MY_MUTEX_INIT_FAST   NULL
#endif
#ifdef PTHREAD_ERRORCHECK_MUTEX_INITIALIZER_NP
extern pthread_mutexattr_t my_errorcheck_mutexattr;
#define MY_MUTEX_INIT_ERRCHK &my_errorcheck_mutexattr
#else
#define MY_MUTEX_INIT_ERRCHK   NULL
#endif

以及，

mysql_mutex_init(key_THR_LOCK_malloc, &THR_LOCK_malloc, MY_MUTEX_INIT_FAST);
mysql_mutex_init(key_THR_LOCK_open, &THR_LOCK_open, MY_MUTEX_INIT_FAST);
mysql_mutex_init(key_THR_LOCK_charset, &THR_LOCK_charset, MY_MUTEX_INIT_FAST);
mysql_mutex_init(key_THR_LOCK_threads, &THR_LOCK_threads, MY_MUTEX_INIT_FAST);

snprintf的返回值

出错返回负数；正常返回 n < size；否则返回 n >= size。

From man page:

Return value
    Upon successful return, these functions return the number of characters printed (not includ-
    ing the trailing '\0' used to end output to strings).

    The  functions  snprintf()  and vsnprintf() do not write more than size bytes (including the
    trailing '\0').  If the output was truncated due to this limit then the return value is  the
    number  of characters (not including the trailing '\0') which would have been written to the
    final string if enough space had been available.  Thus, a return value of size or more means
    that the output was truncated.  (See also below under NOTES.)

    If an output error is encountered, a negative value is returned.

头文件中的static变量

static const int v = 12345;

include一次就在该cpp文件中出现一个实例，每个实例对象的地址都不同！nm或gdb可以看到符号。正确的做法是直接在.cpp中定义。
如果定义了，但是没有代码引用其地址，-O2会在link时把符号都优化掉，最终在nm中看不到符号。

class A
{
public:
    static const int AVar = 54321;
    void foo() { }
};

类中的静态成员可以在头文件中定义。旧版本的g++不支持。
这种变量没有地址，因此&A::AVar或引用型的参数都会在链接时报错：

static.cc:(.text+0xcb): undefined reference to `A::AVar'

不要构造和析构函数中调用虚函数

http://www.bkjia.com/cjjc/497412.html https://msdn.microsoft.com/en-us/magazine/cc163897.aspx

因为构造的时候，基类对象先产生，子类对象还没有产生，基类对象调用不到子类的虚函数。对称的，析构的时候，子类对象先析构，基类对象也调用不到子类的虚函数。

Pure Virtual Function Called

http://www.artima.com/cppsource/pure_virtual.html

Design by Contract (assertion)

http://www.state-machine.com/resources/samek0308.pdf

State machine

http://www.state-machine.com/resources/articles.php

x64调用约定

The first six integer or pointer arguments are passed in registers：RDI, RSI, RDX, RCX, R8, and R9

while：XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6 and XMM7

are used for floating point arguments. For system calls, R10 is used instead of RCX. As in the Microsoft x64 calling convention, additional arguments are passed on the stack and the return value is stored in RAX.

pahole打印的类的空洞为什么会很大？

这个大洞是基类对象占据的空间，实际没有那么大的洞。

RDTSC

https://www.ccsl.carleton.ca/~jamuir/rdtscpm1.pdf http://www.mcs.anl.gov/~kazutomo/rdtsc.html

TLS变量的地址

http://www.akkadia.org/drepper/tls.pdf ELF格式对ABI的支持 https://www.technovelty.org/linux/debugging-__thead-variables-from-coredumps.html

不清楚为什么p buf地址打印的是__thread变量中记录的前4个字节的内容。p &buf才能打出变量的实际地址。

Allocator分配内存不对齐导致pthread_spinlock_t跨cache line，从而死锁

$ find . -name '*pthread*lock*' | grep spin | grep x86_64
./sysdeps/x86_64/nptl/pthread_spin_unlock.S
./sysdeps/x86_64/nptl/pthread_spin_lock.S
./sysdeps/x86_64/nptl/pthread_spin_trylock.S

http://cs61.seas.harvard.edu/cs61wiki/images/6/6d/Cs61-2013-l25-scribe1.pdf
lock incl(%eax) -- atomic increment!

8.1.4 Effects of a LOCK Operation on Internal Processor Caches
For the Intel486 and Pentium processors, the LOCK# signal is always asserted on the bus during a LOCK operation,
even if the area of memory being locked is cached in the processor.
For the P6 and more recent processor families, if the area of memory being locked during a LOCK operation is
cached in the processor that is performing the LOCK operation as write-back memory and is completely contained
in a cache line, the processor may not assert the LOCK# signal on the bus. Instead, it will modify the memory location
internally and allow it’s cache coherency mechanism to ensure that the operation is carried out atomically. This
operation is called “cache locking.” The cache coherency mechanism automatically prevents two or more processors
that have cached the same area of memory from simultaneously modifying data in that area.

spin lock的值默认为1。lock时会将值从1自减1到0，unlock时会将值从0直接改到1。lock是一个带lock prefix的decl指令。unlock是没有lock prefix的movl。

$ objdump -S /lib64/libpthread-2.12.so出来的内容：
0000003854c0c110 <pthread_spin_lock>:
  3854c0c110:   f0 ff 0f                lock decl (%rdi)                                  <===
  3854c0c113:   75 0b                   jne    3854c0c120 <pthread_spin_lock+0x10>
  3854c0c115:   31 c0                   xor    %eax,%eax
  3854c0c117:   c3                      retq
  3854c0c118:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  3854c0c11f:   00
  3854c0c120:   f3 90                   pause
  3854c0c122:   83 3f 00                cmpl   $0x0,(%rdi)
  3854c0c125:   7f e9                   jg     3854c0c110 <pthread_spin_lock>
  3854c0c127:   eb f7                   jmp    3854c0c120 <pthread_spin_lock+0x10>
  3854c0c129:   90                      nop
  3854c0c12a:   90                      nop
  3854c0c12b:   90                      nop
  3854c0c12c:   90                      nop
  3854c0c12d:   90                      nop
  3854c0c12e:   90                      nop
  3854c0c12f:   90                      nop

./sysdeps/x86_64/nptl/pthread_spin_lock.S
	.globl	pthread_spin_lock
	.type	pthread_spin_lock,@function
	.align	16
pthread_spin_lock:
1:	LOCK
	decl	0(%rdi)                                                                    <=== 
	jne	2f
	xor	%eax, %eax
	ret

	.align	16
2:	rep
	nop
	cmpl	$0, 0(%rdi)
	jg	1b
	jmp	2b
	.size	pthread_spin_lock,.-pthread_spin_lock

objdump中没有找到pthread_spin_unlock的代码。不知道怎么优化掉了？

平台相关的代码：unlock给lock赋值为1时是没有加锁的

./sysdeps/x86_64/nptl/pthread_spin_unlock.S
	.globl	pthread_spin_unlock
	.type	pthread_spin_unlock,@function
	.align	16
pthread_spin_unlock:
	movl	$1, (%rdi)
	xorl	%eax, %eax
	retq
	.size	pthread_spin_unlock,.-pthread_spin_unlock

	/* The implementation of pthread_spin_init is identical.  */
	.globl	pthread_spin_init
pthread_spin_init = pthread_spin_unlock

平台无关的代码：./nptl/pthread_spin_unlock.c

int
pthread_spin_unlock (pthread_spinlock_t *lock)
{
  atomic_full_barrier ();
  *lock = 0;
  return 0;
}

这应该是个编译器搞的barrier。CPU的barrier应该加mfence： http://patchwork.ozlabs.org/patch/404751/？

#define atomic_full_barrier() __asm ("" ::: "memory")

source code: sandbox/thread_local.cc

对齐的重要性

Intel的手册对不同指令的对齐要求做了明确的说明
man malloc返回的地址是对齐的，64位上测试出来是16字节对齐
http://www.alexonlinux.com/aligned-vs-unaligned-memory-access

LD_PRELOAD以及malloc的重载

http://elinux.org/images/b/b5/Elc2013_Kobayashi.pdf https://scaryreasoner.wordpress.com/2007/11/17/using-ld_preload-libraries-and-glibc-backtrace-function-for-debugging/

localtime_r不是信号安全的

http://forums.fedoraforum.org/showthread.php?t=187375 http://www.gnu.org/software/libc/manual/html_node/Nonreentrancy.html#Nonreentrancy

#include <signal.h>
#include <stdio.h>
#include <time.h>
#include <unistd.h>

#define UNSAFE

void handler(int signum)
{
  char result[100];
  time_t now;
  struct tm time1;

  now = time(NULL);
  localtime_r(&now, &time1);
  strftime(result, 100, "%T", &time1);
  printf("At %s, user pressed Ctrl-C\n", result);
}

int main (void)
{
  time_t now;
  struct tm ltime;

  if (signal(SIGINT, handler) == SIG_IGN)
    signal(SIGINT, SIG_IGN);

  now = time(NULL);
  while(1) {
#ifdef UNSAFE
    localtime_r(&now, &ltime);
#endif
  }

  return 0;
}

gcc attribute((constructor))

5.24 Declaring Attributes of Functions
https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
* __STDC_LIMIT_MACROS, __STDC_CONSTANT_MACROS

http://stackoverflow.com/questions/986426/what-do-stdc-limit-macros-and-stdc-constant-macros-mean https://sourceware.org/bugzilla/show_bug.cgi?id=15366

Symbol versioning

https://blog.blahgeek.com/glibc-and-symbol-versioning/

Created by Wenliang Zhang.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly