Skip to content

Comparison between bytes and string in defining a frozenset throws exception #1236

@chaofanhan

Description

@chaofanhan

https://github.com/python-hyper/hyper-h2/blob/3b0b241d79f5a9ff9382bbc038f84862e0d80abf/src/h2/utilities.py#L20-L26.

Hi, when a python process runs with a flag -bb, the above part of code will throw exception and make h2 not work. May I ask why we define both bytes and string in the frozenset? Is it possible to use only bytes or string? Frozenset will compare keys for deduplication.

Activity

stroeder

stroeder commented on Feb 7, 2022

@stroeder

I agree this should be fixed, especially this module is used in many other stacks. Sets with mixed string types seem totally broken to me.

Maybe a custom set-class instead of frozenset?

stroeder

stroeder commented on Feb 7, 2022

@stroeder

I tried to fix this but I give up for now due to lack of time. This seems really seriously broken! This whole module package serves as a good example why you should have typing.

IMO the devs have to decide where in the call-stack to decode the lower protocol data and refactor everything else above that. Especially remove the really strange kludges like h2.utility._custom_startswith().

straz

straz commented on Jan 5, 2023

@straz

Even without -bb, this throws noisy warnings:

[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:20)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:29)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:39)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:46)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:55)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:60)

These warnings should at least be less noisy. Perhaps using warnings.simplefilter? reference

pquentin

pquentin commented on Nov 15, 2023

@pquentin

This also affects urllib3 who runs tests with -bb to catch issues, but will have to stop to adopt h2.

BYK

BYK commented on Oct 31, 2024

@BYK
Contributor

So I want to fix this with a PR. It seems like this is done for efficiency reasons that said I also looked at hyper/hpack and it looks like we already need to convert everything to bytes before encoding. That said the _to_bytes() helper there converts everything into a string first so, we always go through that conversion. That means there's no reason to keep headers in bytes in h2 land for efficiency so we can convert everything into strings as a first step and continue life with string comparisons and string-sets.

Does that sound plausible? If yes, PR incoming; if no, please help me understand why :)

BYK

BYK commented on Oct 31, 2024

@BYK
Contributor

Pinging @Kriechi to get attention as I think otherwise this will just be missed.

Kriechi

Kriechi commented on Nov 9, 2024

@Kriechi
Member

@BYK I'm not familiar enough with this part of the code base to recommend such a larger refactoring. I think there is an implicit API expectation to the users of the h2 library that they can pass in both bytes and str headers, so we need to retain that backward-compatibility, until we cut a major release (which I don't see happening in the near future).

To focus on the issue itself: would separating the bytes vs. str checks solve the exception throwing? First check which class the header key and value are, and then compare them against the frozenset for its correct type? I could also see a custom frozenset implementation or class that wraps this behaviour into something like a type-agnostic compare function that does not throw an exception like the current code does.

BYK

BYK commented on Nov 12, 2024

@BYK
Contributor

@Kriechi sorry for the delay and not being clear enough. My intention is keeping the h2 API unchanged. I'll list what I have in mind here and then create a sample PR to demonstrate that tomorrow:

  1. We do allow str or bytes as header names or values or in a mixed fashion in h2.
  2. h2 employs some mixed sets (containing values of type str and bytes) to identify some special headers
  3. This triggers Python's bytes-str comparison as it just checks each item in the set
  4. Splitting these sets, determining the data type, and then using the appropriate one is proposed earlier but it comes with a notable performance penalty. Just determining the data type essentially doubles the time it takes to make these comparisons. Given that these are done multiple times for each request, I think this is not the best way forward.
  5. I have noticed that we don't "normalize" the data type of the headers and just pass them down to hpack for transport
  6. In hpack it normalizes all headers by casting them to a str first and then to bytes. So we pay the price of this conversion there already and there's no reason for h2 to keep the headers dict types as whatever is passed to it.
  7. My proposal (the PR I'm going to create) is then removing all bytes variants from these sets and converting all headers to str in one go in h2. This will not only make things easier and faster on h2 side. It should not have any additional overhead downstream in hpack as it already casts things to str and str -> str conversion should be free.

Hope I did a better job explaining here but if not I'll be following up with the PR tomorrow anyway.

added a commit that references this issue on Nov 13, 2024
c95e738
BYK

BYK commented on Nov 13, 2024

@BYK
Contributor

@Kriechi okay the PR is up -- please see my note regarding formatting. Happy to work on that part if review turns out to be daunting.

Btw. I had to go the other way around and use bytes for everything instead of converting everything to str. We need an upstream patch to hpack to avoid that unnecessary bytes -> str -> bytes dance.

added a commit that references this issue on Nov 14, 2024
605f0ea
added a commit that references this issue on Nov 23, 2024
5841683
added a commit that references this issue on Dec 18, 2024
c71e301
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @pquentin@Kriechi@BYK@straz@stroeder

      Issue actions

        Comparison between bytes and string in defining a frozenset throws exception · Issue #1236 · python-hyper/h2