[WIP] Add some parallelization to ahc-ld and ahc-link #622

gkaracha · 2020-04-27T15:35:28Z

This is WIP. TODOs as described in #621:

When loading archives and object files in ahc-ld, we can parallelize the deserialization of each object file. All object files are converted to ByteStrings first, either via direct reading or ArchiveEntry, then the deserialization can be performed in parallel.
After the gc-sections pass is run, the shrinked AsteriusModule should be fully evaluated, and this can be done in parallel as well.
In the binaryen backend, we can parallelize the marshaling of different data segments and functions. binaryen will transparently switch to a new allocator when it notices it's allocating an IR node on a different thread, so we should ensure each Haskell worker thread is pinned using forkOn.

asterius/app/ahc-ld.hs

asterius/src/Asterius/Internals/Parallel.hs

TerrorJack · 2020-04-29T08:04:14Z

asterius/src/Asterius/Ld.hs

@@ -45,9 +46,13 @@ data LinkTask
 loadTheWorld :: LinkTask -> IO AsteriusCachedModule
 loadTheWorld LinkTask {..} = do
  ncu <- newNameCacheUpdater
-  lib <- mconcat <$> for linkLibs (loadAr ncu)
-  objs <- rights <$> for linkObjs (tryGetFile ncu)
+  lib <- parallelFor 1 linkLibs (loadAr ncu) -- TODO: Parameterize


loadAr is already parallelized, and here we use parallelFor again, which creates nested parallelism. Not to say this won't run, but the overall efficiency will be reduced.

We need to either:

Refactor loadAr, so that it only returns the ArchiveEntrys, and after we collect all the ArchiveEntrys, parallelize their deserialization (along with objs)

Improve our parallelization framework so that it supports nested parallelism; each job can spawn new jobs to be evaluated somewhere else.

I went with the first approach:

Refactor loadAr, so that it only returns the ArchiveEntrys, and after we collect all the ArchiveEntrys, parallelize their deserialization (along with objs)

but I don't completely understand what you mean with the second bullet:

Improve our parallelization framework so that it supports nested parallelism; each job can spawn new jobs to be evaluated somewhere else.

Nested parallelism is already allowed, right? I understand that nested calls would pin threads to the same capabilities (hence slower), but it'd still work. What is the alternative solution you mean here?

Ah, the second approach elaborated is:

We have a P monad which you can lift IO actions into. It carries the context (thread pool, task queue) needed for all parallelism

There's a fork :: P a -> P (Future a) method which spawns a P a action for parallel execution. The P a action itself can fork new jobs, thus nested parallelism here.

There's a join :: Future a -> P a method which blocks until the result is available.

Going through all the trouble of creating the P monad abstraction has an advantage: we can enforce the number of worker threads to match CPU core count all the time. Directly calling parallelFor inside parallelFor would be much less efficient.

asterius/src/Asterius/Internals/Parallel.hs

gkaracha · 2020-04-30T10:47:43Z

@TerrorJack, I think it is ready for reviewing now. I skipped parallelization of data segments in the binaryen backend as we discussed so that it doesn't backfire on us. I have also left a couple of TODOs in the comments I'd like you to have a look at if you can.

asterius/src/Asterius/Types.hs

TerrorJack · 2020-04-30T12:34:16Z

asterius/src/Asterius/Main.hs

@@ -103,6 +103,11 @@ parseTask args = case err_msgs of
             in if i >= 0 && i <= 2
                  then t {shrinkLevel = i}
                  else error "Shrink level must be [0..2]",
+          str_opt "thread-pool-size" $ \s t ->


Don't forget to pass the --thread-pool-size option to ahc-ld as well. ahc-ld is called as the linker executable by ahc, so search for --optl in this module and see how other ahc-ld options are being passed.

Good catch! Should be ok now.

TerrorJack · 2020-05-03T19:29:17Z

We have to hold this one for a bit longer; I gave this a try on my build server, and the overall clock time of linking pandoc has actually increased with --thread-pool-size=8 compared to 1. So we need to investigate further:

Are we picking the right places for parallelization
Are we implementing parallelFor correctly

Merely reading the flamegraph doesn't seem to help too much either. I think we also need to investigate the eventlogs with tools like threadscope and such.

gkaracha · 2020-05-05T14:22:54Z

@TerrorJack I took your advice and tried out threadscope:

Are we implementing parallelFor correctly

The implementation of parallelFor was actually not OK; laziness got in the way. Hopefully this is now fixed (see late commit 06efdb9).

Are we picking the right places for parallelization

Not sure about that, but if you could have another go at linking pandoc we might find out more about this.

TerrorJack · 2020-05-06T00:25:51Z

Nice finding about the missed laziness in parallelFor. Do you have updated numbers for the Cabal test case locally, for thread pool sizes 1, 2, 4, 8?

gkaracha · 2020-05-27T09:18:57Z

@TerrorJack some updates on this one:

Inspired by your fix for #663 (0085dcf), I tried giving up on parallelization in loadTheWorld (56a5aeb), which basically combined all the object files into one eagerly. Indeed, this took the OOM errors away, but it also showed no benefit whatsoever in the parallelization of the rest. For

import Distribution.Simple
main = defaultMain

wall clock time seems to only be getting worse:

$ time stack build --exec "ahc-link --thread-pool-size=1 --input-hs perf-test/Setup.hs"
real    0m46,915s
user    3m56,870s
sys     0m2,098s

$ time stack build --exec "ahc-link --thread-pool-size=2 --input-hs perf-test/Setup.hs"
real    0m53,966s
user    4m2,502s
sys     0m2,822s

$ time stack build --exec "ahc-link --thread-pool-size=4 --input-hs perf-test/Setup.hs"
real    0m48,266s
user    4m2,818s
sys     0m2,243s

$ time stack build --exec "ahc-link --thread-pool-size=8 --input-hs perf-test/Setup.hs"
real    0m48,725s
user    4m8,747s
sys     0m3,086s

TerrorJack · 2020-05-27T09:41:51Z

@gkaracha Thanks for the update! Indeed, it seems that adding parallelization is pragmatic only after we further cut down memory usage of the single-threaded case.

gkaracha linked an issue Apr 28, 2020 that may be closed by this pull request

Implement parallel ahc-ld/ahc-link #621

Open

TerrorJack suggested changes Apr 29, 2020

View reviewed changes

gkaracha force-pushed the wip-parallel-experiment branch from 0415318 to e5c99d8 Compare April 29, 2020 13:41

TerrorJack suggested changes Apr 29, 2020

View reviewed changes

asterius/src/Asterius/Internals/Parallel.hs Outdated Show resolved Hide resolved

asterius/src/Asterius/Internals/Parallel.hs Outdated Show resolved Hide resolved

gkaracha force-pushed the wip-parallel-experiment branch from 168c9b9 to dd2b411 Compare April 30, 2020 08:30

gkaracha marked this pull request as ready for review April 30, 2020 10:47

TerrorJack suggested changes Apr 30, 2020

View reviewed changes

gkaracha force-pushed the wip-parallel-experiment branch from 41e06a2 to 73d3789 Compare May 4, 2020 08:51

gkaracha force-pushed the wip-parallel-experiment branch from c9bce59 to a3c554f Compare May 26, 2020 08:58

gkaracha mentioned this pull request May 26, 2020

WIP Implement ahc-ar #663

Merged

gkaracha marked this pull request as draft May 27, 2020 09:19

gkaracha changed the title ~~Add some parallelization to ahc-ld and ahc-link~~ [WIP] Add some parallelization to ahc-ld and ahc-link Jul 13, 2020

gkaracha self-assigned this Aug 11, 2020

WIP (wip-parallel-experiment, squash)

cc6d042

gkaracha force-pushed the wip-parallel-experiment branch from 5586a43 to cc6d042 Compare September 22, 2020 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add some parallelization to ahc-ld and ahc-link #622

[WIP] Add some parallelization to ahc-ld and ahc-link #622

gkaracha commented Apr 27, 2020 •

edited

Loading

TerrorJack Apr 29, 2020

gkaracha Apr 29, 2020

TerrorJack Apr 29, 2020

gkaracha commented Apr 30, 2020

TerrorJack Apr 30, 2020

gkaracha Apr 30, 2020

TerrorJack commented May 3, 2020

gkaracha commented May 5, 2020

TerrorJack commented May 6, 2020

gkaracha commented May 27, 2020

TerrorJack commented May 27, 2020

[WIP] Add some parallelization to ahc-ld and ahc-link #622

Are you sure you want to change the base?

[WIP] Add some parallelization to ahc-ld and ahc-link #622

Conversation

gkaracha commented Apr 27, 2020 • edited Loading

TerrorJack Apr 29, 2020

Choose a reason for hiding this comment

gkaracha Apr 29, 2020

Choose a reason for hiding this comment

TerrorJack Apr 29, 2020

Choose a reason for hiding this comment

gkaracha commented Apr 30, 2020

TerrorJack Apr 30, 2020

Choose a reason for hiding this comment

gkaracha Apr 30, 2020

Choose a reason for hiding this comment

TerrorJack commented May 3, 2020

gkaracha commented May 5, 2020

TerrorJack commented May 6, 2020

gkaracha commented May 27, 2020

TerrorJack commented May 27, 2020

gkaracha commented Apr 27, 2020 •

edited

Loading