`Composed` improvements #46

torfjelde · 2019-09-15T04:10:24Z

I've re-visted some ideas I had earlier on but didn't have quite the knowledge nor time to explore. As a result I think I've come up with some really nice improvements to the Composed bijector.

The changes suggested here already exists on a local branch of mine and tests are passing without issues. So if people seem happy with this I'll make a PR asap.

Nested compositions is bleh

We're currently not doing anything to "flatten" the compositions, e.g.

julia> using Bijectors

julia> b = Bijectors.Exp()
Bijectors.Exp()                                                                                                                                                                                                                                                                
                                                                                                                                                                                                                                                                               
julia> map(typeof, (b ∘ b ∘ b).ts)
(Bijectors.Exp, Composed{Tuple{Bijectors.Exp,Bijectors.Exp}})

Notice that it we get nested compositions here.

Personally I'd like ∘ to encourage a flat structure, so we instead get:

julia> map(typeof, (b ∘ b ∘ b).ts)
(Bijectors.Exp, Bijectors.Exp, Bijectors.Exp)

The above was accomplished by simply introducing the following lines:

∘(b1::Bijector, b2::Bijector) = composel(b2, b1)  # <= already exists
∘(b1::Composed, b2::Bijector) = composel(b2, b1.ts...)       # new 
∘(b1::Bijector, b2::Composed) = composel(b2.ts..., b1)       # new
∘(b1::Composed, b2::Composed) = composel(b2.ts..., b1.ts...) # new

If the user really wants to use nested compositions rather than this flat structure, they can call composel and/or composer explicitly. I struggle to see a case where you'd like to group bijectors together but wouldn't already have access to this grouping outside of the composition but I'd still like to allow both possibilities. This solution does exactly that.

Other than just being "nice", it has two advantages over nesting:

Dispatching on different compositions is easier since you only need to consider Composed{Tuple{B1, B2, B3}} rather than all combinations of nestedness.
Helps with performance

Recursive implementations ain't so good

Currently we're using a recursive implementation for the following methods for Composed:

(cb::Composed)(x)
forward(cb::Composed, x)
logabsdetjac(cb::Composed, x)

For small compositions, this isn't really an issue but still sub-optimal. Especially in forward and logabsdetjac where we need intermediate values to split apart the forward result. This can't be done recursively without memory allocation for these intermediate values.

One alternative, which is further improved by encouraging a flat structure as described earlier, is to unroll the recursion using @generated:

@generated function forward_generated(cb::Composed{T}, x) where {T<:Tuple}
    expr = Expr(:block)
    push!(expr.args, :((y, logjac) = forward(cb.ts[1], x)))
    for i = 2:length(T.parameters)
        push!(expr.args, :(res = forward(cb.ts[$i], y)))
        push!(expr.args, :(y = res.rv))
        push!(expr.args, :(logjac += res.logabsdetjac))
    end
    push!(expr.args, :(return (rv = y, logabsdetjac = logjac)))

    return expr
end

For the code

b = Bijectors.Exp()
forward(b ∘ b ∘ b, randn())

we get the generated code

quote
    (y, logjac) = forward(cb.ts[1], x)
    res = forward(cb.ts[2], y)
    y = res.rv
    logjac += res.logabsdetjac
    res = forward(cb.ts[3], y)
    y = res.rv
    logjac += res.logabsdetjac
    return (rv = y, logabsdetjac = logjac)
end

Comparing this to the current recursive implementation to new forward_generated:

julia> using Revise, Bijectors, BenchmarkTools

julia> using Bijectors: Log, Exp

julia> b = Bijectors.Exp()
Exp()

julia> cb = inv(b) ∘ b
Composed{Tuple{Exp,Log}}((Exp(), Log()))

julia> x = randn()
1.59358111703819

julia> @btime Bijectors.forward($cb, $x)
  28.426 ns (0 allocations: 0 bytes)
(rv = 1.59358111703819, logabsdetjac = 0.0)

julia> @btime Bijectors.forward_gen($cb, $x)
  28.788 ns (0 allocations: 0 bytes)
(rv = 1.59358111703819, logabsdetjac = 0.0)

So not much gained here. But if we consider deeply nested compositions the difference becomes quite singificant:

julia> cb = foldl(∘, [inv(b) ∘ b for i = 1:50])
Composed{Tuple{Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log,Exp,Log}}((Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log(), Exp(), Log()))

julia> @btime Bijectors.forward($cb, $x)
  431.207 μs (201 allocations: 6.25 KiB)
(rv = 1.59358111703819, logabsdetjac = 0.0)

julia> @btime Bijectors.forward_gen($cb, $x)
  1.934 μs (0 allocations: 0 bytes)
(rv = 1.59358111703819, logabsdetjac = 0.0)

I've also consider @inline for the recursive implementation but for large cases compile-times are quite bad. So I think it's pretty clear which is the winner:)

Neat composition rules for extra neatness

# For extra neat-ness for a lot of bijectors:)
∘(b1::B, b2::Inversed{B}) where {B<:Bijector} = Identity()
∘(b1::Inversed{B}, b2::B) where {B<:Bijector} = Identity()

# Can also implement on case-by-case basis where we don't use `Inversed`
macro inverses(B1, B2)
    expr = Expr(:block)
    push!(expr.args, :(Base.:∘(b1::$B1, b2::$B2) = Identity()))
    push!(expr.args, :(Base.:∘(b1::$B2, b2::$B1) = Identity()))
    return expr
end

@inverses Log Exp  # super-easy to declare the inverses

# And these
∘(::Identity, ::Identity) = Identity()
∘(::Identity, b::Bijector) = b
∘(b::Bijector, ::Identity) = b

then

julia> using Bijectors: Log, Exp

julia> Log() ∘ Exp()
Bijectors.Identity()

EDIT: forgot to use @btime forward($cb, $x), etc. as @mohamed82008 pointed out.

The text was updated successfully, but these errors were encountered:

mohamed82008 · 2019-09-15T07:06:27Z

When you time with @btime use $, e.g. @btime Bijectors.forward($cb, $x)

xukai92 · 2019-09-15T21:32:54Z

Looks great to me.

BTW can someone explain a bit on the benchmark time difference with and without $?

willtebbutt · 2019-09-16T15:22:44Z

Nice work @torfjelde . Could you provide some benchmarks for vector-valued transformations? I would imagine that these are where we're most likely to see deeply nested structures, so it would be interesting to know if the performance gains persist here or not.

torfjelde · 2019-09-16T18:50:50Z

It does but actual transformation-cost of course reduces the difference:

julia> using Bijectors: PlanarLayer

julia> b = PlanarLayer(10);

julia> x = randn(10);

julia> @btime Bijectors.forward($cb, $x)
  2.912 μs (47 allocations: 4.63 KiB)
(rv = [-0.226818, -5.05146, -1.45391, 2.3923, 2.71439, 1.76895, -0.988026, 1.12392, 3.00605, -2.77136], logabsdetjac = -0.019665352458356412)

julia> @btime Bijectors.forward_gen($cb, $x)
  2.922 μs (47 allocations: 4.63 KiB)
(rv = [-0.226818, -5.05146, -1.45391, 2.3923, 2.71439, 1.76895, -0.988026, 1.12392, 3.00605, -2.77136], logabsdetjac = -0.019665352458356412)

julia> @btime Bijectors.forward_it($cb, $x)
  3.305 μs (49 allocations: 4.73 KiB)
(rv = [-0.226818, -5.05146, -1.45391, 2.3923, 2.71439, 1.76895, -0.988026, 1.12392, 3.00605, -2.77136], logabsdetjac = -0.019665352458356412)

and for deep ones where recursion is no longer inlined:

julia> cb = foldl(∘, [b for i = 1:50]);

julia> @btime Bijectors.forward($cb, $x)
  116.340 μs (1299 allocations: 128.05 KiB)
(rv = [0.56373, -8.38962, -2.83878, 3.42109, 4.44716, 2.13511, -0.910896, 2.02499, 4.82258, -3.82649], logabsdetjac = -168.86605036964167)

julia> @btime Bijectors.forward_gen($cb, $x)
  70.403 μs (1151 allocations: 114.88 KiB)
(rv = [0.56373, -8.38962, -2.83878, 3.42109, 4.44716, 2.13511, -0.910896, 2.02499, 4.82258, -3.82649], logabsdetjac = -168.86605036964167)

julia> @btime Bijectors.forward_it($cb, $x)
  75.723 μs (1153 allocations: 115.84 KiB)
(rv = [0.56373, -8.38962, -2.83878, 3.42109, 4.44716, 2.13511, -0.910896, 2.02499, 4.82258, -3.82649], logabsdetjac = -168.86605036964167)

A further test, also including an extreme case of Stacked from #36 (which also uses @generated for forward to get type-stability)

julia> using Bijectors, BenchmarkTools

julia> using Bijectors: Logit

julia> D = 100
100

julia> sb = Stacked(tuple([Logit(0.0, 1.0) for i = 1:D]...));

julia> bs = [
           ("PlanarLayer($D)", PlanarLayer(D)),
           ("RadialLayer($D)", RadialLayer(D)),
           ("Stacked{Logit ∘ inv(Logit), $D}", sb ∘ inv(sb))
       ];

julia> x = randn(D);

julia> for (name, b) in bs
           @info "$(name): b ∘ b"
           cb = b ∘ b
           @btime Bijectors.forward($cb, $x)
           @btime Bijectors.forward_gen($cb, $x)
           @btime Bijectors.forward_it($cb, $x)

           N = 50
           @info "$(name): b ∘ b ∘ ... ∘ b ($N times)"
           cb = foldl(∘, [b for i = 1:N])
           @btime Bijectors.forward($cb, $x)
           @btime Bijectors.forward_gen($cb, $x)
           @btime Bijectors.forward_it($cb, $x)
       end
[ Info: PlanarLayer(100): b ∘ b
  4.746 μs (47 allocations: 14.69 KiB)
  4.758 μs (47 allocations: 14.69 KiB)
  5.091 μs (49 allocations: 14.80 KiB)
[ Info: PlanarLayer(100): b ∘ b ∘ ... ∘ b (50 times)
  162.648 μs (1299 allocations: 379.61 KiB)
  116.781 μs (1151 allocations: 366.44 KiB)
  119.156 μs (1153 allocations: 367.41 KiB)
[ Info: RadialLayer(100): b ∘ b
  3.263 μs (35 allocations: 8.53 KiB)
  3.287 μs (35 allocations: 8.53 KiB)
  3.642 μs (37 allocations: 8.64 KiB)
[ Info: RadialLayer(100): b ∘ b ∘ ... ∘ b (50 times)
  131.128 μs (999 allocations: 225.70 KiB)
  82.406 μs (851 allocations: 212.53 KiB)
  85.703 μs (853 allocations: 213.50 KiB)
[ Info: Stacked{Logit ∘ inv(Logit), 100}: b ∘ b
  69.882 μs (2009 allocations: 194.28 KiB)
  70.363 μs (2009 allocations: 194.28 KiB)
  82.146 μs (2028 allocations: 260.78 KiB)
[ Info: Stacked{Logit ∘ inv(Logit), 100}: b ∘ b ∘ ... ∘ b (50 times)
  9.063 ms (55549 allocations: 35.88 MiB)
  1.830 ms (50201 allocations: 4.74 MiB)
  7.016 ms (50700 allocations: 35.88 MiB)

torfjelde · 2019-09-16T18:56:54Z

Also, we'll still keep the "simple" iterative implementation. So we have two different ways of constructing a Composed; using a Tuple or Array. Using Tuple gives type-stable implementations using @generated, using Array gives unstable iterative implementations.

@kai

* converted types of vals into instances * added TruncatedBijector as suggested in #40 * improved compositions as discussed in #46 * added a bunch of type-stability testing throughout * added a bunch of type-stability tests and fixed some instabilities * fixed type-stability issue with inv for Stacked * dropped generated inv for Stacked in favour of using map * added type-stable implementations of inv for Stacked and Composed * forget to reverse order previously * improved docstring for Composed * restructuring of project structure * forgot a file in last commit * combined forward and inverse jacobians for ADBijector * fixed typo thanks to @kai * fixed Scale custom adjoints and added proper testing for these * Operations involving Composed with array now returns Composed using array, etc. * now throwing error when composing stable and unstable Composed

torfjelde · 2020-02-19T13:37:12Z

Closed by #52

torfjelde changed the title ~~Composed design~~ Composed improvements Sep 15, 2019

torfjelde added a commit to torfjelde/Bijectors.jl that referenced this issue Nov 7, 2019

improved compositions as discussed in TuringLang#46

8958fc5

torfjelde mentioned this issue Nov 11, 2019

Improvements to Composed and TruncatedBijector fixing #40 #52

Merged

torfjelde closed this as completed Feb 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`Composed` improvements #46

`Composed` improvements #46

torfjelde commented Sep 15, 2019 •

edited

Loading

mohamed82008 commented Sep 15, 2019

xukai92 commented Sep 15, 2019

willtebbutt commented Sep 16, 2019

torfjelde commented Sep 16, 2019

torfjelde commented Sep 16, 2019

torfjelde commented Feb 19, 2020

Composed improvements #46

Composed improvements #46

Comments

torfjelde commented Sep 15, 2019 • edited Loading

Nested compositions is bleh

Recursive implementations ain't so good

Neat composition rules for extra neatness

mohamed82008 commented Sep 15, 2019

xukai92 commented Sep 15, 2019

willtebbutt commented Sep 16, 2019

torfjelde commented Sep 16, 2019

torfjelde commented Sep 16, 2019

torfjelde commented Feb 19, 2020

`Composed` improvements #46

`Composed` improvements #46

torfjelde commented Sep 15, 2019 •

edited

Loading