The HPC toolbox: fused matrix multiplication, convolution, data-parallel strided tensor primitives, OpenMP facilities, SIMD, JIT Assembler, CPU detection, state-of-the-art vectorized BLAS for floats and integers
deep-learning
assembler
parallel
openmp
jit
simd
matrix-multiplication
high-performance-computing
blas
convolution
tensor
compiler-optimization
gemm
runtime-cpu-detection
-
Updated
Jan 4, 2024 - Nim