Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New interface #6

Closed
yebai opened this issue Sep 28, 2018 · 25 comments
Closed

New interface #6

yebai opened this issue Sep 28, 2018 · 25 comments
Labels
discussion enhancement New feature or request

Comments

@yebai
Copy link
Member

yebai commented Sep 28, 2018

logpdf(t::Transform, l::Distribution, x)

@willtebbutt
Copy link
Member

I was thinking something along the lines of

struct Transformed{D<:Distribution, T} <: Distribution
    d::D
    t::T
end
logpdf(::Transformed{D, T}, ::Real) = ...
rand(::Transformed{D, T}) = ...

(this would need to be specialised for Univariate, Multivariate etc)

@yebai yebai added discussion enhancement New feature or request labels Sep 28, 2018
@torfjelde
Copy link
Member

Hey! I made an attempt at implementing automatic differentiable variational inference (ADVI) for Turing, which requires transforming the distribution to a distribution with real support, i.e. can use Bijectors.jl for this!

While doing this I was thinking a bit about the interface and implemented the following as a "wrapper" for Bijectors.jl.

using Distributions, Bijectors
import Distributions: logpdf, rand

# Issue: this is not enforcing `VariateForm` of `D` and `Transformed{D}` to be the same
struct Transformed{D} <: Distribution{VariateForm, Continuous} where D <: Distribution
    d::D
end

# size
Base.size(td::Transformed) = size(td.d)
Base.length(td::Transformed) = length(td.d)

# logp
logpdf(d::Transformed{D} where D <: Distribution{Univariate}, x::T where T <: Real) = logpdf_with_trans(d.d, x, true)
logpdf(d::Transformed{D} where D <: Distribution{Multivariate}, x::AbstractMatrix{T} where T <: Real) = logpdf_with_trans(d.d, x, true)

# rand
rand(td::Transformed, n::Int64) = link(td.d, rand(td.d, n))
rand(d::Transformed{D} where D <: Distribution{Univariate}) = first(rand(d, 1))
rand(d::Transformed{D} where D <: Distribution{Multivariate}) = rand(d, 1)

Important: vectorized logpdf does not work, and results in the following error

ERROR: MethodError: no method matching iterate(::Transformed{MvNormal{Float64,PDMats.PDMat{Float64,Array{Float64,2}},Array{Float64,1}}})

If we could fix the above issue, maybe this would be a nice "ad-hoc" implementation. Comparing to the above suggestion which to me seems like it would require a more thorough rewrite of Bijectors.jl (though I might be thoroughly wrong!).

I haven't investigated this to a satisfactory degree, but I was encouraged on #turing to share anyways to initiate a discussion.

@cpfiffer
Copy link
Member

cpfiffer commented Apr 7, 2019

Can you implement an iterate function for the multivariate subtype case? Not sure what that would look like, exactly.

@torfjelde
Copy link
Member

torfjelde commented Apr 7, 2019

Ah, sorry I ran it on the wrong distribution; meant to use a Univariate distribution. Result is the same.

p = Transformed(Normal(0.0, 1.0))

logpdf(p, rand(p))         # <= this works
logpdf.(p, rand(p, 10))    # <= results in the following error

ERROR: MethodError: no method matching iterate(::Transformed{Normal{Float64}})

@torfjelde
Copy link
Member

I also think that it would be fairly easy to transition from the Transformed{D} I suggested to the more general Transformed{D, T} suggested by @willtebbutt when you get to that. For simple use-cases the interface wouldn't change, e.g. the below code would work in both cases

p = Transformed(Exponential(1.0))

if we simply implement a constructor similar to

Transformed(d::D <: PositiveDistribution) = Transformed(d,  PositiveDistributionTransformation())

behind the scenes. This way you could use the simpler interface while transitioning to the more general approach.

Also, maybe I misunderstood your question @cpfiffer. Did you question how to implement iterate(...) for a Multivariate in particular or any Distribution? I'm honestly a bit confused by documentation for Distributions.jl. If I follow https://juliastats.github.io/Distributions.jl/latest/extends.html#Create-a-Multivariate-Distribution-1 and try to implement Distributions._rand! I get errors saying no sampler exists, but at the same time isa(p_transformed, Sampleable) returns true. Not sure if I'm missing something here or this is an actual bug in Distributions.jl.

@cpfiffer
Copy link
Member

cpfiffer commented Apr 9, 2019

Took a second look at this. I thought you needed to overload iterate, but the fix is simpler than that. Looks like really all you need is two additional overloads of logpdf and to remove the broadcast operator on logpdf(p, rand(p,10)):

logpdf(d::Transformed{D} where D <: Distribution{Univariate}, x::Vector{T} where T <: Real) = logpdf_with_trans(d.d, x, true)
logpdf(d::Transformed{D} where D <: Distribution{Multivariate}, x::Vector{AbstractMatrix{T}} where T <: Real) = logpdf_with_trans(d.d, x, true)

This works now:

p = Transformed(Normal(0.0, 1.0))
logpdf(p, rand(p,10))
10-element Array{Float64,1}:
 -1.0758450227893368
 -1.0166693096930153
 -0.9469445907722853
 -2.3735502039370373
 -0.924499243891403 
 -1.2068563756279285
 -1.363355793193587 
 -1.6091214288686937
 -1.0012221537005328
 -0.9931074723392034

Here's the whole MWE again:

using Distributions, Bijectors
import Distributions: logpdf, rand

# Issue: this is not enforcing `VariateForm` of `D` and `Transformed{D}` to be the same
struct Transformed{D} <: Distribution{VariateForm, Continuous} where D <: Distribution
    d::D
end

# size
Base.size(td::Transformed) = size(td.d)
Base.length(td::Transformed) = length(td.d)

# logp
logpdf(d::Transformed{D} where D <: Distribution{Univariate}, x::T where T <: Real) = logpdf_with_trans(d.d, x, true)
logpdf(d::Transformed{D} where D <: Distribution{Multivariate}, x::AbstractMatrix{T} where T <: Real) = logpdf_with_trans(d.d, x, true)
logpdf(d::Transformed{D} where D <: Distribution{Univariate}, x::Vector{T} where T <: Real) = logpdf_with_trans(d.d, x, true)
logpdf(d::Transformed{D} where D <: Distribution{Multivariate}, x::Vector{AbstractMatrix{T}} where T <: Real) = logpdf_with_trans(d.d, x, true)

# rand
rand(td::Transformed, n::Int64) = link(td.d, rand(td.d, n))
rand(d::Transformed{D} where D <: Distribution{Univariate}) = first(rand(d, 1))
rand(d::Transformed{D} where D <: Distribution{Multivariate}) = rand(d, 1)

p = Transformed(Normal(0.0, 1.0))

logpdf(p, rand(p))         # <= this works
logpdf(p, rand(p,10))    # <= this works now

@torfjelde
Copy link
Member

torfjelde commented Apr 9, 2019

Nice!
The only potential (future) issue is that I believe logpdf(p, rand(p, 10)) is supposed to be deprecated for Univariate. At least if I run

p = Normal(0.0, 1.0)
logpdf(p, rand(p, 10))

I get
Warning: `logpdf(d::UnivariateDistribution, X::AbstractArray)` is deprecated, use `logpdf.(d, X)` instead.
which can be verified at https://github.com/JuliaStats/Distributions.jl/blob/master/src/deprecates.jl#L39. That's why I was trying to make it work with vectorization, though I'm still not able to..

Also, do we need logpdf for Vector{AbstractMatrix{T}}?

And regarding overloading iterate; the weird thing is no Distribution in Distributions.jl implements iterate. Might be related to Distributions.jl/issues/579 and Distributions.jl/issues/846. I'll look a bit more into it tomorrow.

@torfjelde
Copy link
Member

torfjelde commented Apr 17, 2019

Got it working.

using Distributions, Bijectors
import Distributions: logpdf, rand

# Issue: this is not enforcing `VariateForm` of `D` and `Transformed{D}` to be the same
struct Transformed{D} <: Distribution{VariateForm, Continuous} where D <: Distribution
    d::D
end

# size
Base.size(td::Transformed) = size(td.d)
Base.length(td::Transformed) = length(td.d)

# logp
logpdf(d::Transformed{D} where D <: Distribution{Univariate}, x::T where T <: Real) = logpdf_with_trans(d.d, x, true)
logpdf(d::Transformed{D} where D <: Distribution{Multivariate}, x::AbstractMatrix{T} where T <: Real) = logpdf_with_trans(d.d, x, true)

# rand
rand(td::Transformed, n::Int64) = link(td.d, rand(td.d, n))
rand(d::Transformed{D} where D <: Distribution{Univariate}) = first(rand(d, 1))
rand(d::Transformed{D} where D <: Distribution{Multivariate}) = rand(d, 1)

# makes it possible to vectorize operations on `Transformed` for `UnivariateDistributions`
Broadcast.broadcastable(d::Transformed{D} where D <: UnivariateDistribution) = Ref(d)

The issue was that the Transformed{D} didn't inherit the implementation of Broadcast.broadcastable for UnivariateDistribution from Distributions.jl/common.jl#L100 due to the unspecified VariateForm in Transformed{D}. But now the implementation works fine for the case where it failed before;

p = Transformed(Normal(0.0, 1.0))

logpdf(p, rand(p))         # <= this works
logpdf.(p, rand(p, 10))    # <= this also works now

Though it might be a good idea to consider maybe making explicit types for Univariate and Multivariate, as this ensures issues like the above does not happen and instead sane behavior will be "inherited" from Distributions.jl. E.g. vectorization, user-specified rng, pdf implementation using logpdf. It also makes it easier to stay up-to-date with Distributions.jl.
An example of such an implementation could be:

using Distributions, Bijectors
import Random: AbstractRNG
import Distributions: logpdf, rand, rand!, _rand!, _logpdf

struct UnivariateTransformed <: Distribution{Univariate, Continuous}
    d::UnivariateDistribution
end

struct MultivariateTransformed <: Distribution{Multivariate, Continuous}
    d::MultivariateDistribution
end

# constructors
transformed(d::UnivariateDistribution) = UnivariateTransformed(d)
transformed(d::MultivariateDistribution) = MultivariateTransformed(d)

# size
Base.length(td::MultivariateTransformed) = length(td.d)

# logp
logpdf(d::UnivariateTransformed, x::T where T <: Real) = logpdf_with_trans(d.d, x, true)
_logpdf(d::MultivariateTransformed, x::AbstractVector{T} where T <: Real) = logpdf_with_trans(d.d, x, true)

# rand
rand(rng::AbstractRNG, td::UnivariateTransformed) = first(rand(td.d))
_rand!(rng::AbstractRNG, td::MultivariateTransformed, x::AbstractVector{T} where T <: Real) = begin
    rand!(rng, td.d, x)
     y = link(td.d, x)
    copyto!(x, y)
end

From a user-perspective the interface is still "nice":

p = transformed(Normal(0.0, 1.0))
logpdf.(p, rand(p, 10)) # <= this just works now

EDIT: fixed the _rand function

@xukai92
Copy link
Member

xukai92 commented Jul 7, 2019

I think we need another level of abstraction on transformation as well, for the purpose of making this package compatible with possible incoming normalzing flows.

struct InvertibleTransformation end

logabsdetjacob(it::InvertibleTransformation, x) # this returns the logabsdetjacob
forward(it::InvertibleTransformation, x) # this returns forward and the corresponding logabsdetjacob
inverse(it::InvertibleTransformation, x) # this returns inverse and the corresponding logabsdetjacob

This is inline with #6 (comment), in which a transformed distribution consists a base distributin and an invertiable transformation (for which the logdetjacob is also computable).

Edit: logdetjacob -> logabsdetjacob (suggested by @willtebbutt
#6 (comment))

@willtebbutt
Copy link
Member

willtebbutt commented Jul 7, 2019

Would suggest logdetjacob be called logabsdetjacob, but other than that @xukai92 's proposal sounds reasonable to me. Also maybe extend inv rather than create a new function inverse?

edit: actually, perhaps inverse should be reverse or backward to mirror the forward function?

@xukai92
Copy link
Member

xukai92 commented Jul 7, 2019

Yes this should be logabsdetjacob!

I like the term inverse as we are evaluating using the inverse function.

@torfjelde
Copy link
Member

torfjelde commented Jul 7, 2019

I did some work on such an interface earlier this week. But then I wrote a simple test which I'm having issues with because of (what seems like) numerical inaccuracies leading to wrong results. Should I submit a WIP PR so you can all see what it looks like?

EDIT: I'm using AD to compute the jacbobian from link, which is probably why I'm not qetting quite the correct results. This is just a "placeholder" implementation instead of implementing each of the different logabsdetjac functions manually.

@willtebbutt
Copy link
Member

I like the term inverse as we are evaluating using the inverse function.

Yes, this is reasonable, and I could certainly live with it. My issue is with the use of the name inv(t, x) as a shorthand for what we really mean: evaluate_inverse_function(t, x) i.e. I would prefer to have inv be reserved unary methods that return the inverse of the thing that they're applied to.

An alternative interface might be:

abstract type InvertibleTransformation end

logabsdetjacob(it::InvertibleTransformation, x) # this returns the logabsdetjacob
forward(it::InvertibleTransformation, x) # applies the invertible transformation to a data point
inv(it::InvertibleTransformation) # returns another invertible transformation - the inverse of `it`

This way, if we want to compute the inverse, we would write

forward(inv(it), x)

to get the inverse transformation evaluated at x, and the negation of the logabsdet of the jacobian of it at x.

Maybe I'm being overly pedantic and, as I said, I could definitely live with your proposal above @xukai92

@xukai92
Copy link
Member

xukai92 commented Jul 8, 2019

This looks great! I guess there are two options to implement the inverse transform as well.

One is to implement a one to one corresponding transforms, say for A we also have InverseA and define inv(a::A) = InverseA(...) and inv(ia::InverseA) = A(...).

Or we can define a wrapper called Inversed and define inv(it:: InvertibleTransformation) = Inversed(it) and inv(iit::Inversed{T})::T = iit. original. And for each it::IT we define forward(it::IT, x) and forward(iit::Inversed{IT}, x).

I feel the second option is more neat. How do you think? @willtebbutt

@xukai92
Copy link
Member

xukai92 commented Jul 8, 2019

@torfjelde Re: #6 (comment)

We could make the AD way as the callback functions for logabsdetjacob. The main reason we need a designated abstraction is that when the diemension is high, using AD to compute this term is computationally intractable. This is the case if you are working with natrue images where the transfomraiton is parametrized by neural networks.

@xukai92
Copy link
Member

xukai92 commented Jul 8, 2019

@willtebbutt I have a working example for the proposed abstraction, see below.

# Abstraction
abstract type AbstractInvertibleTransformation end

logabsdetjacob(t::T1, x::T2) where {T1<:AbstractInvertibleTransformation,T2} = 
    error("`logabsdetjacob(t::$T1, x::$T2)` is not implemented.")
forward(t::T1, x::T2) where {T1<:AbstractInvertibleTransformation,T2} = 
    error("`forward(t::$T1, x::$T2)` is not implemented.")

struct Inversed{T<:AbstractInvertibleTransformation} <: AbstractInvertibleTransformation
    original::T
end

inv(t::T) where {T<:AbstractInvertibleTransformation} = Inversed(t)
inv(it::Inversed{T}) where {T<:AbstractInvertibleTransformation} = it.original

logabsdetjacob(it::T1, y::T2) where {T<:AbstractInvertibleTransformation,T1<:Inversed{T},T2} = 
    error("`logabsdetjacob(it::$T1, y::$T2)` is not implemented.")
forward(it::T1, y::T2) where {T<:AbstractInvertibleTransformation,T1<:Inversed{T},T2} = 
    error("`forward(it::$T1, y::$T2)` is not implemented.")

# Demo: logit transformation
using StatsFuns: logit, logistic

struct Logit{T<:Real} <: AbstractInvertibleTransformation
    a::T
    b::T
end

logabsdetjacob(t::Logit{<:Real}, x::Real) = log((x - t.a) * (t.b - x) / (t.b - t.a))

forward(t::Logit, x::Real) = (rv=logit((x - t.a) / (t.b - t.a)), logabsdetjacob=-logabsdetjacob(t, x))

function forward(it::Inversed{Logit{T}}, y::Real) where {T<:Real}
    t = it.original
    x = (t.b - t.a) * logistic(y) + t.a
    return (rv=x, logabsdetjacob=logabsdetjacob(t, x))
end

# Simple demo
using Distributions, Bijectors

a, b = 1, 3
d = Truncated(Normal(0, 1), a, b)
t = Logit(a, b)
y = randn()
it = inv(t)
itres = forward(it, y)
@info itres
@info "This implementation" logpdf(d, itres.rv) + itres.logabsdetjacob
@info "Bijectors.jl" logpdf_with_trans(d, itres.rv, true)

The demo outputs:

┌ Info: (rv = 1.930013312433704, logabsdetjacob = -0.6980573521827962)
└ @ Main In[5]:7
┌ Info: This implementation
│   logpdf(d, itres.rv) + itres.logabsdetjacob = -1.6299051579255193
└ @ Main In[5]:8
┌ Info: Bijectors.jl
│   logpdf_with_trans(d, itres.rv, true) = -1.6299051579255193
└ @ Main In[5]:9

Comments are welcome!

@torfjelde torfjelde mentioned this issue Jul 8, 2019
6 tasks
@willtebbutt
Copy link
Member

willtebbutt commented Jul 9, 2019

@xukai92 glad you like the suggestion, the proposed implementation LGTM. One thing I would suggest is that instead of implementing stuff for Inversed{Logit}, we just implement Logistic <: AbstractInvertibleTransformation, and make

inv(t::Logit) = Logistic(some_function_of_parameters_of_t)

and make the inverse of Logistic a Logit.

@xukai92
Copy link
Member

xukai92 commented Jul 9, 2019

Yes I was thinking of this approach as well. My issues with it is are

  • For a lot of the transformations used in the normalzing flow. Only the forward has a name and we would need to make up a name for each inverse. E.g. for AffineCoupling we would need to create InversedAffineCoupling. If this is case I'd rather have Inversed{AffineCoupling} to be more consistent.
  • We need to implement two extra inv :p

In fact, we can still overload the inv function for specific transformations to do what you proposed anyway :)

@willtebbutt
Copy link
Member

Good point - I really like your Inversed type for those situations in which the inverse isn't really something that really has a name, as you suggest, and we just add the inverse type where the inverse has a well-established name in its own right.

@torfjelde
Copy link
Member

I've now tried to incorporate the above comments and Flows.jl into my previous proposal #27; any feedback would be nice:)

@xukai92
Copy link
Member

xukai92 commented Aug 29, 2019

Closed by PR #27.

@xukai92 xukai92 closed this as completed Aug 29, 2019
@xukai92
Copy link
Member

xukai92 commented Aug 29, 2019

@JuliaRegistrator register()

@JuliaRegistrator
Copy link

Registration pull request created: JuliaRegistries/General/3061

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if Julia TagBot is installed, or can be done manually through the github interface, or via:

git tag -a v0.4.0 -m "<description of version>" 6ea2c373ac573295332c48557bbae7c3182edbc4
git push origin v0.4.0

@yebai
Copy link
Member Author

yebai commented Sep 3, 2019

@JuliaRegistrator register()

@JuliaRegistrator
Copy link

Registration pull request updated: JuliaRegistries/General/3061

After the above pull request is merged, it is recommended that a tag is created on this repository for the registered package version.

This will be done automatically if Julia TagBot is installed, or can be done manually through the github interface, or via:

git tag -a v0.4.0 -m "<description of version>" 635a71a868a496f90fe9edea90fdf37f947c792d
git push origin v0.4.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants