## Serie 3 - Solution

### Exercise 1 (Theoretical roofline).

We consider a fid is node  $({\tt --constraint}\ {\tt E5v4})$  : Intel Xeon E52690 v4 microarchitecture Broadwell

- 1. 19.2 [GBytes/s]
- 2. 41.6 [Gflops/s] in double precision (DP)
  - (a) 2.6 [GHz]
  - (b) 2 Floating point ports
  - (c) 2 Operations per cycle (FMA)
  - (d) 256 bit vector size (4 double precision fp)

Ridge point: 2.23 [flops/Byte]

### Exercise 2 (measured roofline).

- Stream : 16.3 [Bytes/s]
- Dgemm : 35.8 [flops/s]
- Ridge point : 2.2 [flops/Byte]

#### Exercise 3 (Jacobi stencil).

For 28 cores:

- 1. Theoretical: 76.8 [GBytes/s], measured: 121.8 [GBytes/s]
- 2. Theoretical: 1164.8 [Gflops/s], measured: 646.2 [Gflops/s]
- 3. Theoretical: 15.1 [flops/Byte], measured: 5.3 [flops/Byte]

# Exercise 4 (If you finish early...).

- 1. 4 [flops] / 5 8 [Bytes] = 0.1 [flops/Bytes]
- 2.  $16.3 \ 0.1 = 1.63 \ [Fflops/s]$