Data-Structures-in-the-Real-World

course 'Data Structures in the Real World' in freshman year, summer semester

PA2

SIMD
调整块大小
调整编译选项

case1 的测试结果 (7.31)：

2020011000@w3:~$ perf stat -e task-clock,cycles,instructions,cache-references,c
ache-misses,L1-dcache-loads,L1-dcache-load-misses,L1-icache-loads,L1-icache-loa
d-misses,branches,branch-misses -p 96751

 Performance counter stats for process id '96751':

          8,172.32 msec task-clock                #    0.817 CPUs utilized
    33,195,436,584      cycles                    #    4.062 GHz
    19,059,250,564      instructions              #    0.57  insn per cycle
     1,105,811,470      cache-references          #  135.312 M/sec
       362,929,433      cache-misses              #   32.820 % of all cache refs
     9,665,183,751      L1-dcache-loads           # 1182.673 M/sec
       497,489,637      L1-dcache-load-misses     #    5.15% of all L1-dcache hits
     8,097,317,073      L1-icache-loads           #  990.823 M/sec
        19,058,979      L1-icache-load-misses     #    0.24% of all L1-icache hits
     3,921,389,854      branches                  #  479.838 M/sec
       316,166,733      branch-misses             #    8.06% of all branches
      10.000844821 seconds time elapsed

2020011000@w3:~$

最终结果 (8.1)：5.1 4.0 4.9 2.0 5.4 1.1 5.4

助教的测试结果：5.69 3.69 5.06 2.0 5.61 1.52 5.96

8.2 测试结果：4.55 3.25 4.35 1.76 4.55 1.10 4.71

作业报告见 /codes/homework projects/pa2/report/report.pdf.

PA3

Cache line size: 64
L2 cache size: 512K

试试无锁的 long long 修改和有锁的 SIMD 修改哪个比较快

无锁的 SIMD 修改：lock cmpxchg16b （似乎需要手写内嵌汇编）

(project: other projects/concorrent programming 2)

每个线程 1e8 次 inc 操作，计数器不同

串行：<< 1s

NumThreads	atomic add (in the same block)	shared_mutex (in the same block)	atomic add	shared_mutex
1	0.56s	2.26s	0.55s	2.0s
2	1.7s	25s	0.56s	2.05s
4	3.4s	55s	0.59s	2.1s
8	8s	90s	0.61s	2.15s
16	21s	130s	0.62s	2.4s
32	32s	140s	0.66s	2.6s

（前两个程序中计数器的内存地址是相邻的，因此会对内存的 fetch 有影响）

看来锁是挺慢的；两种方法的 scalability 看起来差不多

Other projects

B Tree vs. std::set

$10^7$ 次查询

2020011000@w3:~/dsrw/other projects/b_tree_test$ pidof rbt_main
14844
2020011000@w3:~/dsrw/other projects/b_tree_test$ perf stat -p 14844 -e task-clock,cycles,instructions,cache-references,cache-misses,L1-dcache-loads,L1-dcache-load-misses,L1-icache-loads,L1-icache-load-misses,branches,branch-misses

 Performance counter stats for process id '14844':

         10,069.68 msec task-clock                #    0.915 CPUs utilized
    41,658,655,712      cycles                    #    4.137 GHz      (49.99%)
     2,785,254,190      instructions              #    0.07  insn per cycle      (49.99%)
       679,708,309      cache-references          #   67.500 M/sec      (49.99%)
       285,332,547      cache-misses              #   41.979 % of all cache refs      (49.99%)
     1,564,298,610      L1-dcache-loads           #  155.347 M/sec      (50.00%)
       278,295,887      L1-dcache-load-misses     #   17.79% of all L1-dcache hits    (50.01%)
        21,004,188      L1-icache-loads           #    2.086 M/sec      (50.01%)
         1,137,718      L1-icache-load-misses     #    5.42% of all L1-icache hits    (50.01%)
       845,487,346      branches                  #   83.964 M/sec      (50.01%)
       131,509,984      branch-misses             #   15.55% of all branches      (50.00%)

      11.000841391 seconds time elapsed

2020011000@w3:~/dsrw/other projects/b_tree_test$ pidof bt_main
14850
2020011000@w3:~/dsrw/other projects/b_tree_test$ perf stat -p 14850 -e task-clock,cycles,instructions,cache-references,cache-misses,L1-dcache-loads,L1-dcache-load-misses,L1-icache-loads,L1-icache-load-misses,branches,branch-misses

 Performance counter stats for process id '14850':

          4,721.74 msec task-clock                #    0.674 CPUs utilized
    19,521,646,691      cycles                    #    4.134 GHz      (50.01%)
     2,850,202,004      instructions              #    0.15  insn per cycle      (50.06%)
     1,038,002,536      cache-references          #  219.835 M/sec      (50.06%)
       398,697,584      cache-misses              #   38.410 % of all cache refs      (50.06%)
     1,312,079,827      L1-dcache-loads           #  277.881 M/sec      (50.06%)
       438,963,900      L1-dcache-load-misses     #   33.46% of all L1-dcache hits    (49.99%)
        27,522,187      L1-icache-loads           #    5.829 M/sec      (49.94%)
         1,264,367      L1-icache-load-misses     #    4.59% of all L1-icache hits    (49.94%)
       837,967,913      branches                  #  177.470 M/sec      (49.94%)
       110,206,403      branch-misses             #   13.15% of all branches      (49.94%)

       7.000584882 seconds time elapsed

2020011000@w3:~/dsrw/other projects/b_tree_test$

在调整了块大小之后，当阶数 m=111 时能跑出 2.270s，比红黑树快了 4.435x.

 Performance counter stats for process id '15813':

          2,270.28 msec task-clock                #    0.757 CPUs utilized          
     9,469,385,284      cycles                    #    4.171 GHz                      (49.83%)
     9,102,713,936      instructions              #    0.96  insn per cycle           (49.83%)
       444,582,408      cache-references          #  195.827 M/sec                    (49.83%)
       142,199,531      cache-misses              #   31.985 % of all cache refs      (49.86%)
     1,536,226,932      L1-dcache-loads           #  676.669 M/sec                    (50.04%)
       218,731,554      L1-dcache-load-misses     #   14.24% of all L1-dcache hits    (50.17%)
        18,004,353      L1-icache-loads           #    7.930 M/sec                    (50.17%)
           476,000      L1-icache-load-misses     #    2.64% of all L1-icache hits    (50.17%)
     2,600,702,488      branches                  # 1145.544 M/sec                    (50.14%)
        50,667,378      branch-misses             #    1.95% of all branches          (49.96%)

       3.000377804 seconds time elapsed

详细结果见目录下的 results 文件夹.

并行计算-统计质数个数

统计 $[1, {10}^6]$ 内质数个数，切成 #threads 段分别统计.

线程数	运行时间(us)	效率
1	8748868	100%
2	5560359	78.67%
3	3912365	74.54%
4	3049737	71.72%
5	2498937	70.02%
6	2147586	67.90%
7	1945037	64.26%
8	1763587	62.01%
9	1561146	62.27%
10	1661819	52.65%

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
homework projects		homework projects
other projects		other projects
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data-Structures-in-the-Real-World

PA2

PA3

Other projects

B Tree vs. std::set

并行计算-统计质数个数

About

Releases

Packages

Languages

ywwywwyww/Data-Structures-in-the-Real-World

Folders and files

Latest commit

History

Repository files navigation

Data-Structures-in-the-Real-World

PA2

PA3

Other projects

B Tree vs. std::set

并行计算-统计质数个数

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages