merge into NNlib and CUDA? #32

CarloLucibello · 2020-12-21T04:46:14Z

Hi,
in FluxML/Flux.jl#1431 there was some talk about having the primitives defined here more widely available in the ecosystem. In order to do this, the Zygote and CUDA dependencies should be dropped, because they could be an unnecessary and huge payload for other packages. Therefore, we should have the following steps:

Replace Zygote and ZygoteRules adjoint's definitions with ChainRules ones
Move the cpu implementations to NNlib.jl
Move the gpu kernels to CUDA.jl (which already depends on NNlib.jl) if @maleadt is willing to accept them

@yuehhua does this plan make sense?

cc @dfdx @jeremiedb @chengchingwen

dfdx · 2020-12-22T15:44:43Z

Currently, NNlib doesn't depend on ChainRules, so it may be better to move adjoints definitions to Flux directly.
I also suggest splitting pullbacks into forward and reverse functions so that they could be used separately by libraries with other gradient calculation rules (e.g. Yota). For example instead of this:

@adjoint function scatter_add!(ys::AbstractArray, us::AbstractArray, xs::AbstractArray)
    ys_ = copy(ys)
    scatter_add!(ys_, us, xs)
    ys_, Δ -> (Δ, gather(Δ, xs), nothing)
end

have this (assuming I understood semantics of @adjoint correctly):

∇scatter_add_ys!(Δ, xs) = Δ
∇scatter_add_us!(Δ, xs) = gather(Δ, xs)

function rrule(::typeof(scatter_add!), args...; kwargs...)
    ys_ = copy(ys)
    scatter_add!(ys_, us, xs)
    ys_, Δ -> (∇scatter_add_ys!(Δ, xs), ∇scatter_add_us!(Δ, xs), nothing)
end

If this looks good to everyone, I can try it out in NNlib / CUDA / Flux during this or next weekend.

yuehhua · 2020-12-23T00:30:57Z

It makes sense to me. If there is anything I can help, just let me know.
Currently, as @dfdx suggested, put adjoints definitions to Flux directly and ChainRules's definitions to NNlib and CUDA separately?
Also, separate forward and reverse functions?
I will generalize scatter operations to every dimensions as well.

chengchingwen · 2020-12-23T00:32:56Z

@dfdx Some gradient function need the intermediate values from the forward pass. If we are going to split the definition, then some backward function would need extra argument to get the values instead of recalculate again.

CarloLucibello · 2020-12-23T05:54:08Z

The only correction to the comments above is that NNlib's rules are currently being moved to NNlib itself FluxML/NNlib.jl#242, so scatter's rules should go there as well.

@yuehhua it would be nice if you could file the PR to NNlib yourself so that you preserve authorship.

dfdx · 2020-12-23T20:31:45Z

@dfdx Some gradient function need the intermediate values from the forward pass. If we are going to split the definition, then some backward function would need extra argument to get the values instead of recalculate again.

True, and in general some refactoring of forward pass functions may be needed. But in ScatterNNlib all adjoint definitions follow the same, simple to split pattern.

All in all, it looks much easier to have separate forward and reverse pass functions and combine them in a pullback than to have only pullback and try to extract forward & reverse passes from it. This is essentially the reason Yota.jl (and perhaps any non-pullback-based library) still doesn't use ChainRules.jl.

(I hope it doesn't sound like a selfish argument :))

yuehhua · 2021-05-28T15:08:07Z

Record current status:

scatter
- cpu, forward: Add scatter operations FluxML/NNlib.jl#255
- cpu, backward: Add rrule for scatter FluxML/NNlib.jl#297, scatter cleanup and fix min/max FluxML/NNlib.jl#319
- cuda, forward: Add scatter for CUDA support FluxML/NNlib.jl#296, Add scatter for CUDA support FluxML/NNlibCUDA.jl#1
- cuda, backward: Support scatter for CUDA gradient FluxML/NNlibCUDA.jl#13
gather
- cpu, forward: add gather FluxML/NNlib.jl#280
- cpu, backward: Add gradient for gather FluxML/NNlib.jl#318
- cuda, forward: Gather for CUDA support FluxML/NNlibCUDA.jl#8
- cuda, backward: Support gather for cuda gradient FluxML/NNlibCUDA.jl#12
Support CartesianIndex: scatter and gather support element type of idx to be CartesianIndex FluxML/NNlib.jl#308

yuehhua · 2021-07-02T14:13:24Z

All migrations are complete! Thank you everyone.

CarloLucibello · 2021-07-02T14:16:33Z

Amazing and relentless work, thanks @yuehhua !

yuehhua mentioned this issue Dec 26, 2020

Add scatter operations FluxML/NNlib.jl#255

Merged

yuehhua mentioned this issue Jan 6, 2021

missing adjoints for pooling functions with specified graph size FluxML/GeometricFlux.jl#147

Closed

ilancoulon mentioned this issue Apr 1, 2021

Conflict between Flux and ScatterNNLib inside GeometricFlux FluxML/GeometricFlux.jl#169

Closed

yuehhua closed this as completed Jul 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merge into NNlib and CUDA? #32

merge into NNlib and CUDA? #32

CarloLucibello commented Dec 21, 2020 •

edited

Loading

dfdx commented Dec 22, 2020

yuehhua commented Dec 23, 2020

chengchingwen commented Dec 23, 2020

CarloLucibello commented Dec 23, 2020

dfdx commented Dec 23, 2020

yuehhua commented May 28, 2021 •

edited

Loading

yuehhua commented Jul 2, 2021

CarloLucibello commented Jul 2, 2021

merge into NNlib and CUDA? #32

merge into NNlib and CUDA? #32

Comments

CarloLucibello commented Dec 21, 2020 • edited Loading

dfdx commented Dec 22, 2020

yuehhua commented Dec 23, 2020

chengchingwen commented Dec 23, 2020

CarloLucibello commented Dec 23, 2020

dfdx commented Dec 23, 2020

yuehhua commented May 28, 2021 • edited Loading

yuehhua commented Jul 2, 2021

CarloLucibello commented Jul 2, 2021

CarloLucibello commented Dec 21, 2020 •

edited

Loading

yuehhua commented May 28, 2021 •

edited

Loading