Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown in package load times #456

Closed
IanButterworth opened this issue Apr 16, 2020 · 12 comments · Fixed by #497
Closed

Slowdown in package load times #456

IanButterworth opened this issue Apr 16, 2020 · 12 comments · Fixed by #497
Labels
latency Slow loading or latency to execution

Comments

@IanButterworth
Copy link
Contributor

IanButterworth commented Apr 16, 2020

I'm seeing a strong slowdown in package load times that seems to be related to Revise being loaded.
If I do a single @time using LsqFit, the result of ~6 seconds actually prints in ~14 seconds.
I'm illustrating this below using a @time @time approach, which might be flawed(?)

With Revise loaded (autoloaded)

julia> @time @time using LsqFit
  5.924649 seconds (14.27 M allocations: 722.932 MiB, 5.58% gc time)
 12.519490 seconds (29.66 M allocations: 1.445 GiB, 5.27% gc time)

Without Revise loaded

julia> @time @time using LsqFit
  4.529166 seconds (8.39 M allocations: 449.575 MiB, 3.31% gc time)
  4.550707 seconds (8.43 M allocations: 451.830 MiB, 3.30% gc time)

In this case, LsqFit isn't a dev-ed package, so I'm wondering whether a fix to this would be to skip tracking non-deved packages?

Julia 1.4.0
Revise v2.6.1

cc. @MasonProtter because we were talking about it on slack

@IanButterworth
Copy link
Contributor Author

I hacked together #457 to disable tracking for packages residing in .julia/packages.

julia> @time @time using LsqFit
  5.803122 seconds (14.24 M allocations: 719.565 MiB, 2.17% gc time)
  6.161489 seconds (15.09 M allocations: 762.200 MiB, 2.40% gc time)

Might that be a viable solution? .julia/packages should be read-only wrt. code, I believe?

@timholy
Copy link
Owner

timholy commented Apr 17, 2020

Interesting. With Revise I get

julia> @time @time using LsqFit
  5.202688 seconds (13.00 M allocations: 664.292 MiB, 3.53% gc time)
  7.190375 seconds (22.60 M allocations: 1.113 GiB, 4.99% gc time)

and without

julia> @time @time using LsqFit
  5.405149 seconds (9.71 M allocations: 513.312 MiB, 3.22% gc time)
  5.433042 seconds (9.75 M allocations: 515.567 MiB, 3.20% gc time)

So there's something to it but for me it's a much smaller effect.

I'll discuss your proposed solution in #457.

@IanButterworth
Copy link
Contributor Author

I tried setting JL_DEBUG_METHOD_INVALIDATION=1 in julia src/gf.c and rebuilding, and after initial precomp, auto-loading Revise in a blank julia instance (not in Juno) gives this full output which shows 13 points of invalidation.

@timholy
Copy link
Owner

timholy commented May 2, 2020

Fascinating. I didn't know about this trick. Let me see if I'm interpreting this correctly: starting with the first case, _tryrequire_from_serialized is being recompiled because CodeTracking defines a new PkgId method.

My attempts to analyze why this happens have been eye-opening. The new method adds to the end of the MethodTable (position 6), whereas the method for modules is at position 4. The PkgId(M) call on line 704 can be dispatched at compile time (it knows M is a module), and so the new method should not force recompilation. In contrast, the one at line 718 is not inferrable. You might think this shouldn't require recompilation (I was guessing it would do runtime dispatch), but check this out: from code_typed(Base._tryrequire_from_serialized, (Base.PkgId, UInt64, String)), I get

│    %152 = (isa)(%149, Module)::Bool
└───        goto #56 if not %152
55 ─ %154 = π (%149, Module)
│    %155 = invoke %151(%154::Module)::Base.PkgId
└───        goto #61
56 ─ %157 = (isa)(%149, CodeTracking.PkgFiles)::Bool
└───        goto #58 if not %157
57 ─ %159 = π (%149, CodeTracking.PkgFiles)
│    %160 = Base.getfield(%159, :id)::Base.PkgId
└───        goto #61
58 ─ %162 = (isa)(%149, Revise.PkgData)::Bool
└───        goto #60 if not %162
59 ─ %164 = π (%149, Revise.PkgData)
│    %165 = Base.getfield(%164, :info)::CodeTracking.PkgFiles
│    %166 = Base.getfield(%165, :id)::Base.PkgId
└───        goto #61
60 ─ %168 = Base.PkgId(%149)::Base.PkgId
└───        goto #61
61 ┄ %170 = φ (#55 => %155, #57 => %160, #59 => %166, #60 => %168)::Base.PkgId

So it's union-splitting! Naturally if you change the method table, anything that has been union-split (here across all 1-argument methods) is going to have to change.

So the first recompilation will be fixed by annotating M::Module.

Obviously this is laborious. Do you know how one should decide priorities? A huge list of functions seems to be invalidated by OrderedCollections.sort!, mostly stemming from this _print method. I am pretty sure it ultimately stems from the typedef of Iterators.Pairs, which is an AbstractDict for which the return type of keys cannot be inferred to be a KeySet. Whew.

@IanButterworth
Copy link
Contributor Author

I was unaware too before the trial above. It was a recommendation from @maleadt.

I've been trying it out on a few packages, like trying to figure out why Flux load has got slow on MacOS.
As for prioritization, I've only been looking at lines associated with each invalidation, but I'm sure that's suboptimal..

I suspect as you've found, solving each may mostly be a laborious effort by someone deep in the knowledge of a package, but easier reporting of the invalidations and dumps seems like it would be helpful. Given it requires a c code level change to activate, I'm wondering about getting some official invalidation-snooping binaries built, and setting it up as a github action to return the output as a text dump. Perhaps a more formatted report in the future

@timholy
Copy link
Owner

timholy commented May 3, 2020

Serious progress in JuliaLang/julia#35714... from 13 sources of invalidation to 7. But the big prize is the logging stuff and that's still not fixed.

@IanButterworth
Copy link
Contributor Author

This is awesome. :)

@timholy timholy added the latency Slow loading or latency to execution label May 25, 2020
@timholy
Copy link
Owner

timholy commented Sep 9, 2020

Now on Julia 1.6 the gap is much larger. Without Revise:

julia> @time @time using LsqFit
  1.451346 seconds (3.75 M allocations: 259.499 MiB)
  1.469372 seconds (3.78 M allocations: 261.634 MiB)

With Revise (revise3 branch):

julia> @time @time using LsqFit
  7.703635 seconds (15.09 M allocations: 900.068 MiB, 3.59% gc time)
  7.722227 seconds (15.13 M allocations: 902.205 MiB, 3.59% gc time)

Profiling shows that the difference is entirely due to @require statements in LsqFit or its dependencies: those force a lot of Revise's internal code to be compiled. My efforts to reduce latency have not yet considered @require.

Probably the best solution is to do something "cheap" initially and then do the expensive thing only if the contents of the require block get modified by the user.

@IanButterworth
Copy link
Contributor Author

Ach.. It feels rather cruel that given the long battle you've fought and won since this issue, that it turns out to be somewhat of an edge case..

@timholy
Copy link
Owner

timholy commented Sep 9, 2020

No worries. I'm pretty pleased about making Revise look worse when 95% of the reason is that Base got much better.

Besides...on a local branch I now have

julia> @time @time using LsqFit
  2.235738 seconds (5.13 M allocations: 355.079 MiB, 4.50% gc time)
  2.255638 seconds (5.16 M allocations: 357.196 MiB, 4.46% gc time)

with Revise#revise3 (plus local modifications). There's one hitch, though: when your @require block has one or more include statements in it, currently Revise will track changes in those included files even if you haven't made any changes to the file containing the @require statement. This requires some sort of parsing of the @require expression. To make sure it's really robust (for example, for fn in ("file1.jl", "file2.jl") include(fn) end constructions), Revise uses the interpreter for that task, and intercepts include statements immediately before executing them:

Revise.jl/src/lowered.jl

Lines 400 to 405 in 850d9f2

elseif skip_include && (f === modinclude || f === Base.include || f === Core.include)
# include calls need to be managed carefully from several standpoints, including
# path management and parsing new expressions
add_includes!(methodinfo, mod, @lookup(frame, stmt.args[2]))
assign_this!(frame, nothing) # FIXME: the file might return something different from `nothing`
pc = next_or_nothing!(frame)

There's really no way to do it as reliably without using the interpreter. And there's no way to use the interpreter without paying a big compile-time cost. I could scan the expression and look for :call expressions to :include, which would have very little latency, but it wouldn't be as robust.

Thoughts? With the imminent release of Revise 3 I can break some stuff, but this could be a regression if not managed carefully.

@timholy
Copy link
Owner

timholy commented Sep 9, 2020

Oh wait, it's worse than that...if it has any include statements, fast loading is doomed: Revise can be efficient if there's a source cache in the *.ji precompile file; if not, then Revise has to defensively parse the file before there are any changes so that it can detect changes when they arrive.

But I can scan for any include statement, or any eval which might disguise one. That will trigger the penalty for some packages, but most will be OK.

@timholy
Copy link
Owner

timholy commented Sep 9, 2020

I'm going to declare that this will be (mostly) fixed by #497. It's getting to be quite a long list of issues that closes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
latency Slow loading or latency to execution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants