You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, when a python process runs with a flag -bb, the above part of code will throw exception and make h2 not work. May I ask why we define both bytes and string in the frozenset? Is it possible to use only bytes or string? Frozenset will compare keys for deduplication.
I tried to fix this but I give up for now due to lack of time. This seems really seriously broken! This whole module package serves as a good example why you should have typing.
IMO the devs have to decide where in the call-stack to decode the lower protocol data and refactor everything else above that. Especially remove the really strange kludges like h2.utility._custom_startswith().
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:20)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:29)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:39)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:46)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:55)
[BytesWarning] Comparison between bytes and string (.../h2/utilities.py:60)
These warnings should at least be less noisy. Perhaps using warnings.simplefilter? reference
So I want to fix this with a PR. It seems like this is done for efficiency reasons that said I also looked at hyper/hpack and it looks like we already need to convert everything to bytes before encoding. That said the _to_bytes() helper there converts everything into a string first so, we always go through that conversion. That means there's no reason to keep headers in bytes in h2 land for efficiency so we can convert everything into strings as a first step and continue life with string comparisons and string-sets.
Does that sound plausible? If yes, PR incoming; if no, please help me understand why :)
@BYK I'm not familiar enough with this part of the code base to recommend such a larger refactoring. I think there is an implicit API expectation to the users of the h2 library that they can pass in both bytes and str headers, so we need to retain that backward-compatibility, until we cut a major release (which I don't see happening in the near future).
To focus on the issue itself: would separating the bytes vs. str checks solve the exception throwing? First check which class the header key and value are, and then compare them against the frozenset for its correct type? I could also see a custom frozenset implementation or class that wraps this behaviour into something like a type-agnostic compare function that does not throw an exception like the current code does.
@Kriechi sorry for the delay and not being clear enough. My intention is keeping the h2 API unchanged. I'll list what I have in mind here and then create a sample PR to demonstrate that tomorrow:
We do allow str or bytes as header names or values or in a mixed fashion in h2.
h2 employs some mixed sets (containing values of type str and bytes) to identify some special headers
This triggers Python's bytes-str comparison as it just checks each item in the set
Splitting these sets, determining the data type, and then using the appropriate one is proposed earlier but it comes with a notable performance penalty. Just determining the data type essentially doubles the time it takes to make these comparisons. Given that these are done multiple times for each request, I think this is not the best way forward.
I have noticed that we don't "normalize" the data type of the headers and just pass them down to hpack for transport
In hpack it normalizes all headers by casting them to a str first and then to bytes. So we pay the price of this conversion there already and there's no reason for h2 to keep the headers dict types as whatever is passed to it.
My proposal (the PR I'm going to create) is then removing all bytes variants from these sets and converting all headers to str in one go in h2. This will not only make things easier and faster on h2 side. It should not have any additional overhead downstream in hpack as it already casts things to str and str -> str conversion should be free.
Hope I did a better job explaining here but if not I'll be following up with the PR tomorrow anyway.
@Kriechi okay the PR is up -- please see my note regarding formatting. Happy to work on that part if review turns out to be daunting.
Btw. I had to go the other way around and use bytes for everything instead of converting everything to str. We need an upstream patch to hpack to avoid that unnecessary bytes -> str -> bytes dance.
Activity
stroeder commentedon Feb 7, 2022
I agree this should be fixed, especially this module is used in many other stacks. Sets with mixed string types seem totally broken to me.
Maybe a custom set-class instead of frozenset?
stroeder commentedon Feb 7, 2022
I tried to fix this but I give up for now due to lack of time. This seems really seriously broken! This whole module package serves as a good example why you should have typing.
IMO the devs have to decide where in the call-stack to decode the lower protocol data and refactor everything else above that. Especially remove the really strange kludges like
h2.utility._custom_startswith()
.straz commentedon Jan 5, 2023
Even without
-bb
, this throws noisy warnings:These warnings should at least be less noisy. Perhaps using
warnings.simplefilter
? referencepquentin commentedon Nov 15, 2023
This also affects urllib3 who runs tests with
-bb
to catch issues, but will have to stop to adopt h2.h2
lib home-assistant/core#115379BYK commentedon Oct 31, 2024
So I want to fix this with a PR. It seems like this is done for efficiency reasons that said I also looked at hyper/hpack and it looks like we already need to convert everything to
bytes
before encoding. That said the_to_bytes()
helper there converts everything into a string first so, we always go through that conversion. That means there's no reason to keepheaders
inbytes
inh2
land for efficiency so we can convert everything into strings as a first step and continue life with string comparisons and string-sets.Does that sound plausible? If yes, PR incoming; if no, please help me understand why :)
BYK commentedon Oct 31, 2024
Pinging @Kriechi to get attention as I think otherwise this will just be missed.
Kriechi commentedon Nov 9, 2024
@BYK I'm not familiar enough with this part of the code base to recommend such a larger refactoring. I think there is an implicit API expectation to the users of the h2 library that they can pass in both
bytes
andstr
headers, so we need to retain that backward-compatibility, until we cut a major release (which I don't see happening in the near future).To focus on the issue itself: would separating the bytes vs. str checks solve the exception throwing? First check which class the header key and value are, and then compare them against the frozenset for its correct type? I could also see a custom
frozenset
implementation or class that wraps this behaviour into something like a type-agnostic compare function that does not throw an exception like the current code does.BYK commentedon Nov 12, 2024
@Kriechi sorry for the delay and not being clear enough. My intention is keeping the
h2
API unchanged. I'll list what I have in mind here and then create a sample PR to demonstrate that tomorrow:str
orbytes
as header names or values or in a mixed fashion inh2
.h2
employs some mixed sets (containing values of typestr
andbytes
) to identify some special headershpack
for transporthpack
it normalizes all headers by casting them to astr
first and then tobytes
. So we pay the price of this conversion there already and there's no reason forh2
to keep the headers dict types as whatever is passed to it.bytes
variants from these sets and converting all headers tostr
in one go inh2
. This will not only make things easier and faster onh2
side. It should not have any additional overhead downstream inhpack
as it already casts things tostr
andstr
->str
conversion should be free.Hope I did a better job explaining here but if not I'll be following up with the PR tomorrow anyway.
fix: No more BytesWarnings
BYK commentedon Nov 13, 2024
@Kriechi okay the PR is up -- please see my note regarding formatting. Happy to work on that part if review turns out to be daunting.
Btw. I had to go the other way around and use
bytes
for everything instead of converting everything tostr
. We need an upstream patch tohpack
to avoid that unnecessarybytes
->str
->bytes
dance.fix: No more BytesWarnings
fix: No more BytesWarnings
fix: No more BytesWarnings