Formulas:

***FLOPs***

***IOPs***

Use the FMA instruction: Fused Multiply Add, , used on AVX register

, the number comes from the reciprocal (inverse) throughput column, which reads 0.5, and inversing this value gives you , which comes out to 2 FMA instructions per cycle

, 256-bit registers in AVX2, floats are single-precision, meaning 32 bits, so 8 floats per AVX2 register, but FMA instruction is multiply & add, each of which require 8 floats, so # floating point operations adds up to 16 in total

Use the PADDB instruction

, comes from reciprocal (inverse) throughput column value which reads

, 256-bit registers in AVX2, integers are stored as 8 bits long, so 32 integers per AVX2 register, which means # integer operations comes up to 32 in total

1. My laptop computer contains an AMD Ryzen 7 4800H CPU based on AMD’s Zen 2 architecture
2. ,

,

efficiency

1. ,

,

efficiency: