Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide API to handle hashed bin delegations math #1952

Closed
jku opened this issue Apr 13, 2022 · 10 comments
Closed

Provide API to handle hashed bin delegations math #1952

jku opened this issue Apr 13, 2022 · 10 comments
Labels
backlog Issues to address with priority for current development goals repository Related to the repository implementation

Comments

@jku
Copy link
Member

jku commented Apr 13, 2022

@kairoaraujo has been implementing a repository in Warehouse and currently has to do some awful bit math to manage the hashed bins:
https://github.com/pypa/warehouse/pull/10870/files#diff-0f3efbecd517bebb4023401a03c16f756d3180a8763bbf7fab03a54876d74ae9

We should ensure Metadata API provides ways to do that. @MVrachev is adding succinct has bins in #1909 -- that's definitely related but let's see if this gets fixed there... The use cases so far are at least:

  • generator to produce all bin names/hash prefixes based on number of bins
  • function to lookup a delegated bin with target path argument, currently for non-succinct hash delegations
@jku jku mentioned this issue Apr 13, 2022
3 tasks
@jku
Copy link
Member Author

jku commented Apr 13, 2022

Possibly we only provide the functionality in a SuccinctHashDelegations class: succinct delegations are the sensible way to use hash delegations anyway and if someone has metadata with hashed_bin_prefixes and does not want to use succinct_hash_delegations, we could just tell them to instantiate a SuccinctHashDelegations in the code anyway to do quick lookups or to generate bin names etc.

@lukpueh
Copy link
Member

lukpueh commented Apr 19, 2022

I agree that succinct hash delegation is the way to go, as it reduces metadata size and makes the math a lot easier. So I am a bit against adding API to support non-succinct hash delegation. (see related discussion in theupdateframework/taps#132)

I suggest address this feature request with @1909 for succinct hash delegation only.

@ivanayov ivanayov added repository Related to the repository implementation backlog Issues to address with priority for current development goals labels Apr 20, 2022
@lukpueh
Copy link
Member

lukpueh commented Apr 21, 2022

Note from a conversation we had yesterday:

The API should at least provide a generator that yields all bin role names. For non-succinct hash bins this could look something like this:

def generate_hash_bins() -> Iterator[Tuple[str, List[str]]]:
"""Returns generator for bin names and hash prefixes per bin."""
# Iterate over the total number of hash prefixes in 'bin size'-steps to
# generate bin names and a list of hash prefixes served by each bin.
for low in range(0, NUMBER_OF_PREFIXES, BIN_SIZE):
high = low + BIN_SIZE - 1
bin_name = _bin_name(low, high)
hash_prefixes = []
for prefix in range(low, low + BIN_SIZE):
hash_prefixes.append(f"{prefix:0{PREFIX_LEN}x}")
yield bin_name, hash_prefixes

Fur succinct hash bins this should be a lot easier, as there are no hex prefix ranges.

@lukpueh
Copy link
Member

lukpueh commented Apr 21, 2022

Speaking of awful bit math, here's an alternative implementation of above generator using the PREFIX_BIT_LEN from TAP 15 as input parameter.

import math

PREFIX_BIT_LEN = 3 # exemplary value from TAP 15
NUMBER_OF_BINS = 2**PREFIX_BIT_LEN

for bin_idx in range(NUMBER_OF_BINS):
    # Zero left-fill the bit representation of the bin index to the width of the prefix
    # length to get the bit prefix that is served by this bin
    bit_prefix = f"{bin_idx:0{PREFIX_BIT_LEN}b}"

    # Ceil the the prefix length to fit full bytes to determine hex prefix ranges
    total_bit_len = math.ceil(PREFIX_BIT_LEN / 8) * 8

    # Right-fill the bit prefix with zeros (low) and ones (high) to the width of the
    # total bit length, to get the bounds of the hex prefix range
    low = int(f"{bit_prefix:0<{total_bit_len}s}", 2)
    high = int(f"{bit_prefix:1<{total_bit_len}s}", 2)

    hex_prefixes = []
    # Iterate from low to high and store hex representation of the prefix.
    for prefix in range(low, high + 1):
        hex_prefixes.append(f"{prefix:x}")

    # yield  _bin_name(low, high), hex_prefixes

Maybe this approach makes it easier if we want to maintain API for non-succinct and succinct hash bin delegation alongside each other. cc @MVrachev @kairoaraujo

@jku
Copy link
Member Author

jku commented Apr 21, 2022

Maybe this approach makes it easier if we want to maintain API for non-succinct and succinct hash bin delegation alongside each other.

I assume if we design the SuccinctDelegation object properly we can just have a single implementation? The inputs required for all of the math operations are always just the SuccinctDelegation fields so makes sense to use that, I think.

EDIT: I suppose the list of hex prefixes is indeed something that's not needed for the succinct case, hmmm...

@jku
Copy link
Member Author

jku commented Apr 21, 2022

This is a bit unrelated but I'm not sure if I ever linked to this one: rolename lookup without "binary as string of ones and zeros":
https://gist.github.com/jku/bccc355d0daea56449375013fa5243c9

@joshuagl
Copy link
Member

Was this taken care of in #2010 @MVrachev ?

@lukpueh
Copy link
Member

lukpueh commented Jul 20, 2022

Depends on whether the feature requests API for non-succinct hash bin delegation.

IMO, now that we have succinct hash bin delegation, there is no good reason to use the non-succinct form, so let's not encourage its usage with new API and do close this issue. :)

@MVrachev
Copy link
Collaborator

Was this taken care of in #2010 @MVrachev ?

No, we didn't add any API for non-succinct hash bin delegation.
I agree with @lukpueh here and think we should close this issue.

@jku
Copy link
Member Author

jku commented Aug 3, 2022

👍
Currently adding/removing a target to a repository still needs a bit of code (see vmware-archive/repository-editor-for-tuf#45 for an example) but there does not seem to be anything clearly missing from the delegations API...

@jku jku closed this as completed Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog Issues to address with priority for current development goals repository Related to the repository implementation
Projects
None yet
Development

No branches or pull requests

5 participants