@wichtounet wichtounet released this Jan 9, 2018 · 88 commits to master since this release

Assets 2
  • Feature Support for embeddings and embedding gradients
  • Feature Support for merging matrices together
  • Feature Support for bias_batch_var_2d
  • Feature Support for dropout masks
  • Feature Support for normalization
  • Performance Vectorize hyperbolic functions
  • Performance Advanced GPU patterns detections
  • Performance Asynchronous GPU computation
  • GPU Support for uniform and normal random generators
  • GPU Support for shuffle operations
  • Bug Fix fast_dyn_matrix with bool
  • Bug Fix possible stack overflow with fast matrix and aliasing
  • Bug Correctly handle aliasing in assignable (sub_view for instance)
  • Bug Fix small compilation bug with sub_matrix
  • Bug Fix CPU/GPU consistency bug with iterators
  • Bug Fix bug with GPU convolution flipping

@wichtounet wichtounet released this Oct 1, 2017 · 252 commits to master since this release

Assets 2
  • Feature GPU support for basic expressions (such as c = 1.0 * b + d + e - 1.0)
  • Feature GPU Support for unary and binary operators
  • Feature Support for convolutions for matrices of different data types
  • Feature Support for log2 / log10
  • Feature Default selection of algorithms by default
  • Feature Support for categorical cross entropy loss and error
  • Feature Improve support for complex numbers and etl::complex
  • Performance Improved performance of using parallel BLAS
  • Misc Full cleanup of the traits
  • Misc Use of variable templates (C++14) for the traits
  • Misc Improved support for clang
  • Misc Reduced compilation time for non-tests / non-benchmark code
  • Misc Reduce durations of the tests
  • Misc Preliminary C++17 if constexpr support
  • Bug Fix bug in the GEMM kernel for CM = CM * CM
  • Bug Vectorization bug for binary operations with different data types
  • Bug GPU memory was not correctly handled when std::move is used

@wichtounet wichtounet released this Aug 9, 2017 · 732 commits to master since this release

Assets 2
  • Performance Better dispatching for alignment
  • Performance Much faster multiplications between matrices of different major
  • Performance Highly improved performed of multiplications with transpose
  • Performance Vectorization of signed integer operations
  • Performance Faster CPU convolutions
  • Performance Better parallelization of convolutions
  • Performance Much better GEMM/GEMV/GEVM kernels (when BLAS not available)
  • Performance Reduced overhead for 3D/4D matrices access by indices
  • Performance Use of non-temporal stores for large matrices
  • Performance Forced alignment of matrices
  • Performance Force basic padding of vectors
  • Performance Better thread reuse
  • Performance Faster dot product
  • Performance Faster batched outer product
  • Performance Better usage of FMA
  • Performance SSE/AVX double-precision exponentiation
  • Performance Much faster Pooling for various dimensions
  • Feature: Sub matrices in 2D, 3D and 4D
  • Feature Helpers for Machine Learning
  • Feature Comparisons operators and functions equal, not_equal, almost_equal
  • Feature Logical operators for boolean containers
  • Feature Shuffle and noise can now operate on custom random engines
  • Feature Pooling with stride is now supported
  • Feature Custom fast and dyn matrices support
  • Feature Matrices and vectors slices view
  • Feature Deeper pooling support
  • Feature bias_add (2D and 4D) (Machine Learning)
  • Feature bias_batch_mean (2D and 4D) (Machine Learning)
  • Feature Transposed convolution
  • GPU Better usage of contexts
  • GPU Pooling and Upsample support
  • GPU batch_outer support
  • GPU sigmoid and RELU and derivatives
  • GPU Memory pool handling
  • GPU Avoid a lot of temporaries
  • Misc Reduced duplications in the code base
  • Misc Simplifications of the iterators to DMA expressions
  • Misc Faster compilation of the test cases
  • Misc Generalized SSE/AVX versions into VEC versions
  • Misc Reviewed completely temporary expressions
  • Bug Lots of small fixes
  • Bug Transpose on GPU was not working on column major matrix
  • Bug 4D Pooling
  • Bug Q/R Decomposition

@wichtounet wichtounet released this Aug 9, 2017 · 2498 commits to master since this release

Assets 2

Initial version (was rolling released before) with the following main features:

  • Smart Expression Templates
  • Matrix and vector (runtime-sized and compile-time-sized)
  • Simple element-wise operations
  • Reductions (sum, mean, max, ...)
  • Unary operations (sigmoid, log, exp, abs, ...)
  • Matrix multiplication
  • Convolution (1D and 2D and higher variations)
  • Max Pooling
  • Fast Fourrier Transform
  • Use of SSE/AVX to speed up operations
  • Use of BLAS/MKL/CUBLAS/CUFFT/CUDNN libraries to speed up operations
  • Symmetric matrix adapter (experimental)
  • Sparse matrix (experimental)