Also make sure allocHeapClosure updates profiling counters with the memory allocated.
- Move array representation knowledge into SMRep - Separate out low-level heap-object allocation so that we can reuse it from doNewArrayOp - remove card-table initialisation, we can safely ignore the card table for newly allocated arrays.
I'd like to be able to pack together non-pointer fields that are less than a word in size, and this is a necessary prerequisite.
This results in a 46% runtime decrease when allocating an array of 16 unit elements on a 64-bit machine. In order to allow newArray# to have both an inline and an out-of-line implementation, cgOpApp is refactored slightly. The new implementation of cgOpApp should make it easier to add other primops with both inline and out-of-line implementations in the future.
Nowadays SetLevels floats case expressions as well as let-bindings, and case expressions bind type variables. We need to clone all such floated binders, to avoid accidental name capture. But I'd forgotten to substitute for the cloned type variables, causing #8714. (In the olden days only Ids were cloned, from let-bindings.) This patch fixes the bug and does quite a bit of clean-up refactoring as well, by putting the context level in the LvlEnv. There is no effect on performance, except that nofib 'rewrite' improves allocations by 3%. On investigation I think it was a fluke to do with loop-cutting in big letrec nests. But at least it's a fluke in the right direction. Program Size Allocs Runtime Elapsed TotalMem -------------------------------------------------------------------------------- Min -0.4% -3.0% -19.4% -19.4% -26.7% Max -0.0% +0.0% +17.9% +17.9% 0.0% Geometric Mean -0.1% -0.0% -0.7% -0.7% -0.4%
Yet another small way in which polymorphic kinds needs a bit of care See Note [Unify kinds in deriving] in TcDeriv
The issue here is described in Note [Binding scoped type variables] in TcPat. When implementing this fix I was able to make things quite a bit simpler: * The type variables in a type signature now never unify with each other, and so can be straightfoward skolems. * We only need the SigTv stuff for signatures in patterns, and for kind variables.
When deriving Functor, Foldable, Traversable, we need only look at the way that the last type argument is treated. It's fine for there to be existentials etc, provided they don't affect the last type argument. See Note [Check that the type variable is truly universal] in TcDeriv.
This fixes #8824.
Because of GADTs and casts we were getting binders whose demand annotation was more deeply nested than made sense for its type. See Note [Trimming a demand to a type], in Demand.lhs, which I reproduce here: Note [Trimming a demand to a type] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Consider this: f :: a -> Bool f x = case ... of A g1 -> case (x |> g1) of (p,q) -> ... B -> error "urk" where A,B are the constructors of a GADT. We'll get a U(U,U) demand on x from the A branch, but that's a stupid demand for x itself, which has type 'a'. Indeed we get ASSERTs going off (notably in splitUseProdDmd, Trac #8569). Bottom line: we really don't want to have a binder whose demand is more deeply-nested than its type. There are various ways to tackle this. When processing (x |> g1), we could "trim" the incoming demand U(U,U) to match x's type. But I'm currently doing so just at the moment when we pin a demand on a binder, in DmdAnal.findBndrDmd.
This patch improves the call arity analysis in various ways. Most importantly, it enriches the analysis result information so that when looking at a call, we do not have to make a random choice about what side we want to take the information from. Instead we can combine the results in a way that does not lose valuable information. To do so, besides the incoming arities, we store remember "what can be called with what", i.e. an undirected graph between the (interesting) free variables of an expression. Of course it makes combining the results a bit more tricky (especially mutual recursion), but still doable. The actually implemation of the graph structure is abstractly put away in a module of its own (UnVarGraph.hs) The implementation is geared towards efficiently representing the graphs that we need (which can contain large complete and large complete bipartite graphs, which would be huge in other representations). If someone feels like designing data structures: There is surely some speed-up to be obtained by improving that data structure. Additionally, the analysis now takes into account that if a RHS stays a thunk, then its calls happen only once, even if the variables the RHS is bound to is evaluated multiple times, or is part of a recursive group.
When building a binary distribution with TAR_COMP=xz, using the -9e flag (extremely high compression) results in substantial savings: for the Mavericks builds, bzip2 scores in at about 120mb, while xz at level 9 scores about 60mb - a huge reduction! This of course takes significantly longer - but it does not affect decompression speed for end users, so it's certainly worth it. Signed-off-by: Austin Seipp <firstname.lastname@example.org>