## CoMD

Classical Molecular Dynamics algorithms and workloads

**Links:**  
[Doxygen Documentation](http://exmatex.github.io/CoMD/doxygen-mpi/index.html)  
[github](https://github.com/exmatex/CoMD)  
[Lab Home Page](http://www.exmatex.org/comd.html)  

### Serial Run
(Haswell) Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz  
`divsd`: 10-20 Cycles  
`mulsd`: 5 cycles  
`movsd`: 3 cycles  
L1 Cache: 32 kB, 8 way, 64 sets, 64 B line size, **latency 4**, per core.    
L2 Cache: 256 kB, 8 way, 512 sets, 64 B line size, **latency 12**, per core.
  
CPUTIME: 8.94e+06  
94.2% in `ljForce(SimFlat* SimFlat*)` function within `ljForce.c`  
76.7% Total Time in `for (int iOff=iBox*MAXATOMS,ii=0; ii<nIBox; ii++,iOff++)` loop, line 189  

```c 
|169|for (int iBox=0; iBox<s->boxes->nLocalBoxes; iBox++)
       (...)
|175|  for (int jTmp=0; jTmp<nNbrBoxes; jTmp++)
         (...)
|185|    for (int iOff=iBox*MAXATOMS,ii=0; ii<nIBox; ii++,iOff++)
           (...)                       
           
        /* |  CPUTIME  | Instructions per Cycle | L1 Data Cache Miss Rate | */
        /* |-----------|------------------------|-------------------------| */
        /* |-  for (int jOff=MAXATOMS*jBox,ij=0; ij<nJBox; ij++,jOff++)  -| */
        /* | 76.7% CPU |    1.01 Ins / Cycle    | 0.6% L1 Data Cache Miss | */
        
|189|      for (int jOff=MAXATOMS*jBox,ij=0; ij<nJBox; ij++,jOff++)
|190|      {                                                        
|191|        real_t dr[3];
|192|        int jId = s->atoms->gid[jOff];  
|193|        if (jBox < s->boxes->nLocalBoxes && jId <= iId )      //  5.9% CPUTIME
|194|          continue; // don't double count local-local pairs.
|195|        real_t r2 = 0.0;
|196|        for (int m=0; m<3; m++)
|197|        {
|198|          dr[m] = s->atoms->r[iOff][m]-s->atoms->r[jOff][m];  //  6.4% CPUTIME

        /* |-------------------  r2+=dr[m]*dr[m];  -----------------------| */
        /* | 11.7% CPU |    1.38 Ins / Cycle    | 1.6% L1 Data Cache Miss | */
        
|199|          r2+=dr[m]*dr[m];                                     
|200|        }
|201|
|202|        if ( r2 > rCut2) continue;                            //  6.8% CPUTIME
|203|
|204|        // Important note:
|205|        // from this point on r actually refers to 1.0/r
|206|        r2 = 1.0/r2;

        /* |---------------  real_t r6 = s6 * (r2*r2*r2);  ---------------| */
        /* | 12.8% CPU |    0.68 Ins / Cycle    | 0.2% L1 Data Cache Miss | */
        /*                     NOTE: Dependency Chain                       */
        
|207|        real_t r6 = s6 * (r2*r2*r2);                           
|208|        real_t eLocal = r6 * (r6 - 1.0) - eShift;             //  5.4% CPUTIME
|209|        s->atoms->U[iOff] += 0.5*eLocal;                      //  6.6% CPUTIME
|210|        s->atoms->U[jOff] += 0.5*eLocal;
|211|
|212|        // calculate energy contribution based on whether
|213|        // the neighbor box is local or remote
|214|        if (jBox < s->boxes->nLocalBoxes)
|215|          ePot += eLocal;
|216|        else
|217|          ePot += 0.5 * eLocal;
|218|
|219|        // different formulation to avoid sqrt computation
|220|        real_t fr = - 4.0*epsilon*r6*r2*(12.0*r6 - 6.0);
|221|        for (int m=0; m<3; m++)
|222|        {
|223|          s->atoms->f[iOff][m] -= dr[m]*fr;
|224|          s->atoms->f[jOff][m] += dr[m]*fr;                         
|225|        }
|226|      }// loop over atoms in jBox 
|227|    } // loop over atoms in iBox
|228|  } // loop over neighbor boxes
|229|} // loop over local boxes in system
```

 ---



clang-5.0.1  
-std=c99 -DDOUBLE -g -march=native -O3

In [5]:
3.21e+09 / 2.32e+09    

1.3836206896551724

### CPI
`ljForce() | Loop at 189`: .99 Cycles per Instruction  
`line 199`: .72 Cycles per Instruction   
`line 207`: 1.48 Cycles per Instruction

#### Issue Cycles
`ljForce() | Loop at 189`:  
-- 4.51e+09 Full Issue | 28.5%  Cycles Issuing Max Instructions  
-- 1.60e+07 No Issue | 0.1% Cycles Issuing No Instructions  
-- 1.58e+10 Total Cycles   
`line 199`:  
-- 6.90e+08 Full Issue | 29.7% Cycles Issuing Max Instructions  
-- 2.00e+06 No Issue | 0.1% Cycles Issuing No Instructions  
-- 2.32e+09 Total Cycles  
`line 207`:  
-- 1.67e+09 Full Issue | 66.0% Cycles Issuing Max Instructions  
-- 6.00e+06 No Issue | 0.2% Cycles Issuing No Instructions  
-- 2.53e+09 Total Cycles
#### Retire Cycles
`ljForce() | Loop at 189`:  
-- 4.11e+09 Full Retire | 26.0% Cycles Retiring Max Instructions  
-- 4.32e+09 No Retire | 27.3% Cycles Retiring No Instructions  
-- 1.58e+10 Total Cycles   
`line 199`:  
-- 8.40e+08 Full Retire | 36.2% Cycles Retiring Max Instructions   
-- 5.56e+08 No Retire | 24.0% Cycles Retiring No Instructions  
-- 2.32e+09 Total Cycles   
`line 207`:  
-- 6.58e+08 Full Retire | 26.0% Cycles Retiring Max Instructions  
-- 8.16e+08 No Retire | 32.3% Cycles Retiring No Instructions  
-- 2.53e+09 Total Cycles

### Memory
#### Data Cache 
`ljForce() | Loop at 189`:  
-- 4.60e+07 L1 Data Cache Misses | 0.6% L1 Cache Miss Rate  
-- 1.55e+07 L2 Data Cache Misses | 33.7% L2 Cache Miss Rate     
-- 7.12e+09 Load/Store Instructions    
`line 199`:  
-- 2.00e+07 L1 Data Cache Misses | 1.6% L1 Cache Miss Rate  
-- 6.72e+06 L2 Data Cache Misses | 33.6% L2 Cache Miss Rate     
-- 1.22e+09 Load/Store Instructions  
`line 207`:  
-- 2.00e+06 L1 Data Cache Misses | 0.2% L1 Cache Miss Rate  
-- 6.78e+05 L2 Data Cache Misses | 33.9% L2 Cache Miss Rate       
-- 1.24e+09 Load Store Instructions 

In [8]:
.678 / 2

0.339



---

### Memory Density 
1.67e+10 Load Instructions.  
2.88e+09 Store Instructions.   
4.02e+10 Total Cycles.   
(1.67 + .288) / 4.02 = .487  
48.7% Memory Density

---  

In [34]:
(8.94e+06 * 1e-6) * 72.6


649.0439999999999

----

```
| Num Of   |                    Ports pressure in cycles                         |      |
|  Uops    |  0  - DV    |  1   |  2  -  D    |  3  -  D    |  4   |  5   |  6   |  7   |
-----------------------------------------------------------------------------------------
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || mov rdx, qword ptr [rbx+0x18]
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || mov r9, qword ptr [rbx+0x20]
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || mov r13d, dword ptr [rdx+0xc]
|   1*     |             |      |             |             |      |      |      || cmp edi, r13d
|   0*F    |             |      |             |             |      |      |      || jnl 0x10
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || mov rdx, qword ptr [r9+0x8]
|   2      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || cmp dword ptr [rdx+rax*1], r10d
|   0*F    |             |      |             |             |      |      |      || jle 0x12c
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || mov rdx, qword ptr [r9+0x18]
|   1      |             |      |             |             |      | 1.0  |      || lea r8, ptr [rsi+rsi*2]
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || vmovsd xmm0, qword ptr [rdx+r8*8]
|   2      | 0.5         | 0.5  | 0.5     0.5 | 0.5     0.5 |      |||| vsubsd xmm1, xmm0, qword ptr [rdx+rbp*8-0x8]
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm0, xmm1, xmm1
|   1      | 0.5         | 0.5  |             |             |      |      |      || vaddsd xmm2, xmm0, xmm11
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      ||| vmovupd xmm0, xmmword ptr [rdx+r8*8+0x8]
|   2      | 0.5         | 0.5  | 0.5     0.5 | 0.5     0.5 |      |||| vsubpd xmm0, xmm0, xmmword ptr [rdx+rbp*8]
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulpd xmm3, xmm0, xmm0
|   1      | 0.5         | 0.5  |             |             |      |      |      || vaddsd xmm2, xmm2, xmm3
|   1      |             |      |             |             |      | 1.0  |      || vpermilpd xmm3, xmm3, 0x1
|   1      | 0.5         | 0.5  |             |             |      |      |      || vaddsd xmm2, xmm2, xmm3
|   1      | 1.0         |      |             |             |      |      |      || vucomisd xmm2, xmm7
|   1      |             |      |             |             |      |      | 1.0  || jnbe 0xe7
|   1      | 1.0     4.0 |      |             |             |      |      |      || vdivsd xmm3, xmm12, xmm2
|   1      |             | 1.0  |             |             |      |      |      || vmulsd xmm2, xmm3, xmm3
|   1      |             | 1.0  |             |             |      |      |      || vmulsd xmm2, xmm3, xmm2
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm4, xmm8, xmm2
|   1      | 0.5         | 0.5  |             |             |      |      |      || vaddsd xmm2, xmm4, xmm13
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm2, xmm4, xmm2
|   1      | 0.5         | 0.5  |             |             |      |      |      || vsubsd xmm2, xmm2, xmm9
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm5, xmm2, xmm14
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || mov rdx, qword ptr [r9+0x30]
|   2      | 0.5         | 0.5  | 0.5     0.5 | 0.5     0.5 |      |      ||| vaddsd xmm6, xmm5, qword ptr [rdx+rsi*8]
|   2      |             |      | 0.5         | 0.5         | 1.0  |      |      || vmovsd qword ptr [rdx+rsi*8], xmm6
|   2      | 0.5         | 0.5  | 0.5     0.5 | 0.5     0.5 |      |      ||| vaddsd xmm6, xmm5, qword ptr [rdx+rax*2]
|   2      |             |      | 0.5         | 0.5         | 1.0  |      |      || vmovsd qword ptr [rdx+rax*2], xmm6
|   1*     |             |      |             |             |      |      |      || cmp edi, r13d
|   0*F    |             |      |             |             |      |      |      || jl 0x6
|   1*     |             |      |             |             |      |      |      || vmovapd xmm2, xmm5
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm5, xmm10, xmm4
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm3, xmm3, xmm5
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm4, xmm4, xmm15
|   2^     | 0.5         | 0.5  | 0.5     0.5 | 0.5     0.5 |      |||| vaddsd xmm4, xmm3, qword ptr [rip+0x8963]
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm3, xmm3, xmm4
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || mov rdx, qword ptr [r9+0x28]
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm1, xmm3, xmm1
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || vmovsd xmm4, qword ptr [rdx+r8*8]
|   1      | 0.5         | 0.5  |             |             |      |      |      || vsubsd xmm4, xmm4, xmm1
|   2      |             |      | 0.5         | 0.5         | 1.0  |      |      || vmovsd qword ptr [rdx+r8*8], xmm4
|   2      | 0.5         | 0.5  | 0.5     0.5 | 0.5     0.5 |      |      ||| vaddsd xmm1, xmm1, qword ptr [rdx+rbp*8-0x8]
|   2      |             |      | 0.5         | 0.5         | 1.0  |      ||| vmovsd qword ptr [rdx+rbp*8-0x8], xmm1
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm1, xmm3, xmm0
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      ||| vmovsd xmm4, qword ptr [rdx+r8*8+0x8]
|   1      | 0.5         | 0.5  |             |             |      |      |      || vsubsd xmm4, xmm4, xmm1
|   2      |             |      | 0.5         | 0.5         | 1.0  |      ||| vmovsd qword ptr [rdx+r8*8+0x8], xmm4
|   2      | 0.5         | 0.5  | 0.5     0.5 | 0.5     0.5 |      |      ||| vaddsd xmm1, xmm1, qword ptr [rdx+rbp*8]
|   2      |             |      | 0.5         | 0.5         | 1.0  |      |      || vmovsd qword ptr [rdx+rbp*8], xmm1
|   1      |             |      |             |             |      | 1.0  |      || vpermilpd xmm0, xmm0, 0x1
|   1      | 0.5         | 0.5  |             |             |      |      |      || vmulsd xmm0, xmm3, xmm0
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      ||| vmovsd xmm1, qword ptr [rdx+r8*8+0x10]
|   1      | 0.5         | 0.5  |             |             |      |      |      || vsubsd xmm1, xmm1, xmm0
|   2      |             |      | 0.5         | 0.5         | 1.0  |      ||| vmovsd qword ptr [rdx+r8*8+0x10], xmm1
|   2      | 0.5         | 0.5  | 0.5     0.5 | 0.5     0.5 |      |||| vaddsd xmm0, xmm0, qword ptr [rdx+rbp*8+0x8]
|   2      |             |      | 0.5         | 0.5         | 1.0  |      ||| vmovsd qword ptr [rdx+rbp*8+0x8], xmm0
|   1      |             |      | 0.5     0.5 | 0.5     0.5 |      |      |      || vmovsd xmm0, qword ptr [rsp+0x8]
|   1      | 0.5         | 0.5  |             |             |      |      |      || vaddsd xmm0, xmm0, xmm2
|   2^     |             |      | 0.5         | 0.5         | 1.0  |      |      || vmovsd qword ptr [rsp+0x8], xmm0
Total Num Of Uops: 81

```