timholy/AxisAlgorithms.jl

Efficient filtering and linear algebra routines for multidimensional arrays
Julia
Switch branches/tags
Nothing to show
timholy Merge pull request #10 from timholy/teh/drop_compat
`Bump Julia version, drop Compat and functorize, and update docstring …`
Latest commit f1f3d0f Aug 3, 2017
 Failed to load latest commit information. src Aug 3, 2017 test Feb 4, 2017 .gitignore Apr 14, 2015 .travis.yml Aug 3, 2017 LICENSE.md Apr 14, 2015 README.md Jul 28, 2016 REQUIRE Aug 3, 2017

AxisAlgorithms

AxisAlgorithms is a collection of filtering and linear algebra algorithms for multidimensional arrays. For algorithms that would typically apply along the columns of a matrix, you can instead pick an arbitrary axis (dimension).

Note that all functions come in two variants, a `!` version that uses pre-allocated output (where the output is the first argument) and a version that allocates the output. Below, the `!` versions will be described.

Tridiagonal and Woodbury inversion

If `F` is an LU-factorization of a tridiagonal matrix, or a Woodbury matrix created from such a factorization, then `A_ldiv_B_md!(dest, F, src, axis)` will solve the equation `F\b` for 1-dimensional slices along dimension `axis`. Unlike many linear algebra algorithms, this one is safe to use as a mutating algorithm with `dest=src`. The tridiagonal case does not create temporaries, and it has excellent cache behavior.

Matrix multiplication

Multiply a matrix `M` to all 1-dimensional slices along a particular dimension. Here you have two algorithms to choose from:

• `A_mul_B_perm!(dest, M, src, axis)` uses `permutedims` and standard BLAS-accelerated routines; it allocates temporary storage.
• `A_mul_B_md!(dest, M, src, axis)` is a non-allocating naive routine. This also has optimized implementations for sparse `M` and 2x2 matrices.

In general it is very difficult to get efficient cache behavior for multidimensional multiplication, and often using `A_mul_B_perm!` is the best strategy. However, there are cases where `A_mul_B_md!` is faster. It's a good idea to time both and see which works better for your case.