Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance and metadata representation issues in roledb #1005

Closed
lukpueh opened this issue Mar 19, 2020 · 3 comments
Closed

Fix performance and metadata representation issues in roledb #1005

lukpueh opened this issue Mar 19, 2020 · 3 comments

Comments

@lukpueh
Copy link
Member

lukpueh commented Mar 19, 2020

Update on 2020-04-08:
The particular performance issue described below as Problem 1 has been resolved in #1012 by applying the below proposed Quick fix (a), however, the underlying Problem 2 remains.

I suggest to keep this issue around for context, and proceed with below outlined Long-term fixes (c) and (d).


As reported by @woodruffw (thanks!) and reproduced in the profiling experiment below, repository tool's delegate_hashed_bins method has massive performance issues when delegating to many bins (e.g. 16K), for the following reasons:

Problem 1:
delegate_hashed_bins calls delegate for each new bin. delegate then adds the new delegated role to roledb, but also adds delegation infos to the steadily growing delegating role in roledb in each iteration.

Problem 2:
Operations on roledb are extremely expensive, because they perform two deepcopys of a roleinfo dictionary, like so:

1. roleinfo = roledb.get_roleinfo() # deepcopying roleinfo out of roledb
2. modify roledinfo
3. roledb.update_roleinfo(roleinfo) # deepcopying roleinfo back into roleb

Furthermore, update_roleinfo calls tuf.formats.ROLEDB_SCHEMA.check_match for the passed roleinfo, which recursively iterates over all its elements to check for some criteria.


I propose the following fixes:

  • Quick fix (a):

Change delegate_hashed_bins to update the delegating role only once in the roledb, instead of once for every bin. This means that we can't call delegate as it is right now, but would need to replicate or, ideally, factor out common functionality.

  • Intermediate fix (b):

Stop deepcopying and instead update roledb by reference. The deepcopys are an unnecessary overhead throughout the codebase (see profiling experiment). This seems quite feasible in most cases, but it is also risky. It might be worth skipping the intermediate fix, and move from quick fix directly to the long-term fix.

Also see secondary objective "Has only one internal structure ..." of #846 ")

@lukpueh
Copy link
Member Author

lukpueh commented Mar 19, 2020

I was able to reproduce and profile the issue using below script. Generating 16K hashed bins took ~15 hours on my MBP (early 2015), of which ~3.5 hours (12355.858 seconds) were CPU time.

Applying fix (a) form above, the CPU time dropped to ~80 seconds, with fix (b) and (d) it dropped to ~60 seconds.

pip install securesystemslib[crypto,pynacl] tuf
curl https://gist.githubusercontent.com/lukpueh/724bd1d7b477f201a9f199b037d85747/raw/profile_tuf_bins.py -o profile_tuf_bins.py
python -m cProfile -o stats profile_tuf_bins.py
>>> import pstats, os, glob
>>> from pstats import SortKey
>>> newest_stats = os.path.join(sorted(glob.glob("tuf_stats_*")).pop(), "stats")
>>> p = pstats.Stats(newest_stats)
>>> p.sort_stats(SortKey.CUMULATIVE).print_stats("tuf|securesystemslib|copy", 20)
# Thu Mar 19 10:46:32 2020    tuf_stats__1584550063__bins_16384/stats
#
#          31214279022 function calls (25298241328 primitive calls) in 12355.858 seconds
#
#    Ordered by: cumulative time
#    List reduced from 945 to 393 due to restriction <'tuf|securesystemslib|copy'>
#    List reduced from 393 to 20 due to restriction <20>
#
#    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
#         1    0.036    0.036 12355.858 12355.858 profile_tuf_bins.py:29(main)
#         1   53.294   53.294 12289.763 12289.763 /Users/lukp/tuf/tuf/tuf/repository_tool.py:2421(delegate_hashed_bins)
#     16385    1.074    0.000 12233.417    0.747 /Users/lukp/tuf/tuf/tuf/repository_tool.py:2164(delegate)
# 4300458036/147547 4273.850    0.000 10081.901    0.068 /Users/lukp/.pyenv/versions/3.8.0/lib/python3.8/copy.py:128(deepcopy)
# 269189355/147547 1024.742    0.000 10081.119    0.068 /Users/lukp/.pyenv/versions/3.8.0/lib/python3.8/copy.py:225(_deepcopy_dict)
# 537608476/573712 1916.648    0.000 10069.452    0.018 /Users/lukp/.pyenv/versions/3.8.0/lib/python3.8/copy.py:199(_deepcopy_list)
#     32787   76.144    0.002 7181.414    0.219 /Users/lukp/tuf/tuf/tuf/roledb.py:319(update_roleinfo)
#     81949   19.111    0.000 5053.164    0.062 /Users/lukp/tuf/tuf/tuf/roledb.py:683(get_roleinfo)
# 134799701/245935  385.711    0.000 2067.982    0.008 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:754(check_match)
# 403661172/786789  164.846    0.000 2063.374    0.003 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:701(check_match)
# 269172973/540895  471.488    0.000 2062.263    0.004 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:523(check_match)
# 671981865  328.549    0.000  742.563    0.000 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:960(check_match)
# 806797831  418.819    0.000  606.990    0.000 /Users/lukp/.pyenv/versions/3.8.0/lib/python3.8/copy.py:241(_keep_alive)
# 3493660202  386.532    0.000  386.532    0.000 /Users/lukp/.pyenv/versions/3.8.0/lib/python3.8/copy.py:182(_deepcopy_atomic)
# 134652058   80.054    0.000  114.281    0.000 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:590(check_match)
#         1    0.377    0.377   65.143   65.143 /Users/lukp/tuf/tuf/tuf/repository_tool.py:220(writeall)
#     16389    0.485    0.000   63.732    0.004 /Users/lukp/tuf/tuf/tuf/repository_lib.py:94(_generate_and_write_metadata)
# 135203458   40.774    0.000   54.002    0.000 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:202(check_match)
# 134455360   33.074    0.000   46.199    0.000 /Users/lukp/.ve/tuf-dev/lib/python3.8/site-packages/securesystemslib/schema.py:455(check_match)
#     16389    0.934    0.000   39.196    0.002 /Users/lukp/tuf/tuf/tuf/repository_lib.py:1643(write_metadata_file)
#

@woodruffw
Copy link
Contributor

Thanks a ton for digging into this!

joshuagl added a commit to joshuagl/tuf that referenced this issue Apr 2, 2020
Due to the performance overhead of deepcopy(), as used extensively in
roledb, the delegate function is rather slow. This is especially
noticeable when we have a large number_of_bins when calling
delegate_hashed_bins.

In order to be able to easily reduce the number of deepcopy() operations
we remove direct calls to delegate() and instead use the newly added
helper functions to replicate the behaviour, only with a single call
update to the roledb.

This improves the performance of a 16k bins delegation from a 1hr 24min
operation on my laptop to 33s.

Ideally once Issue theupdateframework#1005 has been properly fixed this commit can be
reverted and we can once again just call delegate() here.

Signed-off-by: Joshua Lock <jlock@vmware.com>
joshuagl added a commit to joshuagl/tuf that referenced this issue Apr 3, 2020
Due to the performance overhead of deepcopy(), as used extensively in
roledb, the delegate function is rather slow. This is especially
noticeable when we have a large number_of_bins when calling
delegate_hashed_bins.

In order to be able to easily reduce the number of deepcopy() operations
we remove direct calls to delegate() and instead use the newly added
helper functions to replicate the behaviour, only with a single call
update to the roledb.

This improves the performance of a 16k bins delegation from a 1hr 24min
operation on my laptop to 33s.

Ideally once Issue theupdateframework#1005 has been properly fixed this commit can be
reverted and we can once again just call delegate() here.

Signed-off-by: Joshua Lock <jlock@vmware.com>
@lukpueh lukpueh changed the title Fix performance issues in delegate_hashed_bins Fix performance and metadata representation issues in roledb Apr 8, 2020
@joshuagl joshuagl added this to the Refactor milestone Jul 7, 2020
@joshuagl joshuagl removed this from the Refactor milestone Sep 8, 2020
@joshuagl joshuagl removed the refactor label Sep 10, 2020
@jku
Copy link
Member

jku commented Feb 16, 2022

Closing this issue as it was filed against (what is now known as) the legacy codebase: issue seems to not be relevant anymore. Please re-open or file a new issue if you feel that the issue is revelant to current python-tuf.

More details

Current source code (and upcoming 1.0 release) only contains the modern components

  • a low-level Metadata API (tuf.api) and
  • tuf.ngclient that implements the client workflow,

Legacy components (e.g. tuf.client, tuf.repository_tool, tuf.repository_lib as well as the repo and client scripts) are no longer included. See announcement and API reference for more details.

@jku jku closed this as completed Feb 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants