Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[async] Switch to llvm::SmallVector<llvm::SmallSet> for edges #1936

Merged
merged 2 commits into from
Oct 10, 2020

Conversation

k-ye
Copy link
Member

@k-ye k-ye commented Oct 10, 2020

According to #1927, llvm::SmallVector<llvm::SmallSet> seems to be the most performant.

Here's the profiling data for async_mgpcg.py:

  • Before this change: 13.418 s + 18.605 s + 20.120 s
     54.404  s synchronize                   [8 x   6.800  s]
         11.333  s 20.83%  optimize_listgen      [75 x 151.111 ms]
              ... ...
              8.523  s 75.21%  rebuild_graph         [44 x 193.713 ms]
                  6.988  s 81.98%  insert_task           [397621 x  17.574 us]
                      0.180  s  2.57%  get_task_meta         [397621 x 451.757 ns]
                      2.874  s 41.14%  insert_task meta->input_states [1149493 x   2.501 us]
                          2.069  s 71.96%  insert_edge           [1149493 x   1.800 us]
                          0.806  s 28.04%  [unaccounted]
                      2.053  s 29.38%  insert_task meta->output_states [409654 x   5.011 us]
                          1.286  s 62.63%  insert_edge           [1063555 x   1.209 us]
                          0.767  s 37.37%  [unaccounted]
                      1.007  s 14.41%  insert_task latest_state_readers_ [1149493 x 876.057 ns]
                      0.874  s 12.50%  [unaccounted]
                  0.005  s  0.06%  reid_nodes            [44 x 108.058 us]
                  0.002  s  0.03%  reid_pending_nodes    [44 x  53.108 us]
                  1.529  s 17.93%  [unaccounted]
              ... ...
          ... ...
          9.339  s 17.17%  demote_activation     [71 x 131.542 ms]
              ... ...
              8.576  s 91.82%  rebuild_graph         [40 x 214.396 ms]
                  7.055  s 82.27%  insert_task           [393579 x  17.925 us]
                      0.177  s  2.51%  get_task_meta         [393579 x 449.787 ns]
                      2.950  s 41.81%  insert_task meta->input_states [1178512 x   2.503 us]
                          2.123  s 71.96%  insert_edge           [1178512 x   1.801 us]
                          0.827  s 28.04%  [unaccounted]
                      2.018  s 28.61%  insert_task meta->output_states [406848 x   4.960 us]
                          1.379  s 68.33%  insert_edge           [1120853 x   1.230 us]
                          0.639  s 31.67%  [unaccounted]
                      1.031  s 14.61%  insert_task latest_state_readers_ [1178512 x 874.833 ns]
                      0.879  s 12.46%  [unaccounted]
                  0.004  s  0.05%  reid_nodes            [40 x 107.372 us]
                  0.002  s  0.02%  reid_pending_nodes    [40 x  49.353 us]
                  1.515  s 17.66%  [unaccounted]
              0.419  s  4.48%  [unaccounted]
  • After this change: 11.009 s + 14.249 s + 14.891 s
42.394  s synchronize                   [8 x   5.299  s]
          7.644  s 18.03%  optimize_listgen      [75 x 101.920 ms]
              ... ...
              5.227  s 68.38%  rebuild_graph         [44 x 118.793 ms]
                  4.890  s 93.55%  insert_task           [397621 x  12.298 us]
                      0.185  s  3.79%  get_task_meta         [397621 x 466.267 ns]
                      1.459  s 29.84%  insert_task meta->input_states [1149493 x   1.269 us]
                          0.653  s 44.79%  insert_edge           [1149493 x 568.482 ns]
                          0.806  s 55.21%  [unaccounted]
                      1.285  s 26.27%  insert_task meta->output_states [409654 x   3.136 us]
                          0.494  s 38.47%  insert_edge           [1063555 x 464.718 ns]
                          0.790  s 61.53%  [unaccounted]
                      1.048  s 21.44%  insert_task latest_state_readers_ [1149493 x 912.139 ns]
                      0.912  s 18.65%  [unaccounted]
                  0.005  s  0.09%  reid_nodes            [44 x 108.621 us]
                  0.002  s  0.04%  reid_pending_nodes    [44 x  48.941 us]
                  0.330  s  6.31%  [unaccounted]
              ... ...
          ... ...
          6.117  s 14.43%  demote_activation     [73 x  83.791 ms]
              ... ...
              5.414  s 88.52%  rebuild_graph         [42 x 128.916 ms]
                  5.079  s 93.81%  insert_task           [415641 x  12.220 us]
                      0.189  s  3.72%  get_task_meta         [415641 x 454.920 ns]
                      1.564  s 30.80%  insert_task meta->input_states [1244758 x   1.257 us]
                          0.701  s 44.82%  insert_edge           [1244758 x 563.253 ns]
                          0.863  s 55.18%  [unaccounted]
                      1.232  s 24.26%  insert_task meta->output_states [429700 x   2.867 us]
                          0.550  s 44.61%  insert_edge           [1184969 x 463.830 ns]
                          0.682  s 55.39%  [unaccounted]
                      1.126  s 22.17%  insert_task latest_state_readers_ [1244758 x 904.753 ns]
                      0.968  s 19.05%  [unaccounted]
                  0.005  s  0.09%  reid_nodes            [42 x 121.593 us]
                  0.002  s  0.04%  reid_pending_nodes    [42 x  47.105 us]
                  0.328  s  6.06%  [unaccounted]
              0.337  s  5.52%  [unaccounted]

Unfortunately, it could be due to the recent macOS 10.15.7 upgrade, the GUI has become much slower on my end recently. I wasn't able to perceive improvements just by looking at some of the examples...

Related issue = #742

[Click here for the format server]


@codecov
Copy link

codecov bot commented Oct 10, 2020

Codecov Report

Merging #1936 into master will increase coverage by 0.92%.
The diff coverage is 16.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1936      +/-   ##
==========================================
+ Coverage   42.80%   43.72%   +0.92%     
==========================================
  Files          45       45              
  Lines        6413     6207     -206     
  Branches     1101     1103       +2     
==========================================
- Hits         2745     2714      -31     
+ Misses       3498     3322     -176     
- Partials      170      171       +1     
Impacted Files Coverage Δ
python/taichi/misc/gui.py 8.68% <16.66%> (ø)
...d/taichi-dev/taichi/python/taichi/misc/__init__.py
...ild/taichi-dev/taichi/python/taichi/code_format.py
.../taichi-dev/taichi/python/taichi/make_changelog.py
...build/taichi-dev/taichi/python/taichi/lang/meta.py
...build/taichi-dev/taichi/python/taichi/lang/expr.py
...aichi-dev/taichi/python/taichi/lang/transformer.py
...-dev/taichi/python/taichi/lang/kernel_arguments.py
.../build/taichi-dev/taichi/python/taichi/diagnose.py
...ld/taichi-dev/taichi/python/taichi/tools/np2ply.py
... and 81 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4fa0d60...567b599. Read the comment docs.

Copy link
Collaborator

@xumingkuan xumingkuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you so much!

Maybe the next step is to refactor

std::unordered_map<AsyncState, Node *> latest_state_owner_;
std::unordered_map<AsyncState, std::unordered_set<Node *>>
    latest_state_readers_;

-- I believe they take up most of the [unaccounted] time in insert_task.

Copy link
Member

@yuanming-hu yuanming-hu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you so much! It's nice to see insert_edge is now 3x faster :-)

@yuanming-hu yuanming-hu merged commit a655fd7 into taichi-dev:master Oct 10, 2020
@yuanming-hu yuanming-hu mentioned this pull request Oct 10, 2020
@k-ye k-ye deleted the llvm-vec-set branch October 11, 2020 07:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants