Skip to content
Hüseyin Tuğrul BÜYÜKIŞIK edited this page Apr 18, 2022 · 1 revision

Welcome to the VectorizedKernel wiki!

VectorizedKernel offers scalar-like programming to work on multiple work-items in parallel. It requires a user lambda function with factory-generated (scalar-like) variables that does some computation. It runs same lambda for multiple simd(=16,32,64,etc)-wide groups of (so-called) threads and the remaining tail is computed with simd=1 width.

For every thread (work-item), it computes each scalar-like variable in synchronization to all other work-items in same simd. Every operation like add/mul/fusedMultiplyAdd/readFrom/writeTo/etc are computed in lock-step manner similar to OpenCL and CUDA.