Home

Welcome to the VectorizedKernel wiki!

VectorizedKernel offers scalar-like programming to work on multiple work-items in parallel. It requires a user lambda function with factory-generated (scalar-like) variables that does some computation. It runs same lambda for multiple simd(=16,32,64,etc)-wide groups of (so-called) threads and the remaining tail is computed with simd=1 width.

For every thread (work-item), it computes each scalar-like variable in synchronization to all other work-items in same simd. Every operation like add/mul/fusedMultiplyAdd/readFrom/writeTo/etc are computed in lock-step manner similar to OpenCL and CUDA.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally