Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

+ not encoded in query string when using PHP_QUERY_RFC3986 #109

Closed
bennnjamin opened this issue Apr 28, 2023 · 15 comments
Closed

+ not encoded in query string when using PHP_QUERY_RFC3986 #109

bennnjamin opened this issue Apr 28, 2023 · 15 comments

Comments

@bennnjamin
Copy link

bennnjamin commented Apr 28, 2023

Bug Report

Information Description
Package uri-components
Version 2.4.1
PHP version 8.1.2
OS Platform Ubuntu

Summary

I am trying to send phone numbers as query string parameters to communicate with a REST API and the request is failing because + is not encoded to %2B.

Standalone code, or other way to reproduce the problem

$queryPairs = [
    ["phone_number", "+18001231234"],
    ["special_characetrs", " /+t!q_"]
];
$queryString = QueryString::build($queryPairs, '&', PHP_QUERY_RFC1738);
print("League\URI PHP_QUERY_RFC1738 $queryString\n");
$queryString = QueryString::build($queryPairs, '&', PHP_QUERY_RFC3986);
print("League\URI PHP_QUERY_RFC3986 $queryString\n");

Expected result

The + character should be percent-encoded.

Actual result

League\URI PHP_QUERY_RFC1738 phone_number=%2B18001231234&special_characetrs=+%2F%2Bt%21q_
League\URI PHP_QUERY_RFC3986 phone_number=+18001231234&special_characetrs=%20/+t!q_
@nyamsprod
Copy link
Member

This is expected behaviour the + sign should not be encoded in RFC3986 if the encoding is not necessary which is the case in your example.

@bennnjamin
Copy link
Author

It's possible the REST API is not RFC compliant or using some old RFC so I may need to find a work around to interoperate with it in that case.

@nyamsprod
Copy link
Member

Encoding characters in URL is tricky because depending on the consuming client and its implementor and the year it was implemented and the language it gets complex and overly complicated indeed. And the amount of different RFC does not help either

@trowski
Copy link

trowski commented Apr 29, 2023

Thinking about this further, when + is part of a value , I believe it should be encoded to %2B when using RFC3986 rules. Spaces in the value become +. This is why http_build_query is encoding the + in the phone numbers in the example in the linked issue.

echo 'RFC1738: ', http_build_query(['key' => '+ '], encoding_type: PHP_QUERY_RFC1738), "\n";
echo 'RFC3986: ', http_build_query(['key' => '+ '], encoding_type: PHP_QUERY_RFC3986), "\n";

Output:

RFC1738: key=%2B+
RFC3986: key=%2B%20

@kelunik
Copy link
Contributor

kelunik commented Aug 11, 2023

@nyamsprod I believe this is a bug, as @trowski already mentioned. We switched to PHP_QUERY_RFC1718 now, see amphp/http-client#333.

@trowski
Copy link

trowski commented Aug 11, 2023

From my understanding, any reserved characters in RFC3986 should be encoded unless they are known to have semantic meaning in a URL schema. This may require passing another parameter designating which characters to exclude from encoding. Something that can maybe be changed before v7?

@nyamsprod
Copy link
Member

@trowski v7 is already out ... since yesterday. Let's see if can fix that issue and backport it into v6 if needed.

@trowski
Copy link

trowski commented Aug 11, 2023

v7 is already out ... since yesterday.

Oh, apologies, I should have checked. I'd consider this a bugfix anyway, and adding a parameter with a default should be no issue. Backporting to v6 would be fantastic.

@nyamsprod
Copy link
Member

@

v7 is already out ... since yesterday.

Oh, apologies, I should have checked. I'd consider this a bugfix anyway, and adding a parameter with a default should be no issue. Backporting to v6 would be fantastic.

So do I this is just a fix ... just need to be applied correctly

nyamsprod added a commit that referenced this issue Aug 11, 2023
@nyamsprod nyamsprod reopened this Aug 11, 2023
@nyamsprod
Copy link
Member

@trowski I already landed the fix for v7 ... let me know if this is ok for you

82a9925

nyamsprod added a commit that referenced this issue Aug 12, 2023
nyamsprod added a commit that referenced this issue Aug 12, 2023
nyamsprod added a commit that referenced this issue Aug 12, 2023
@nyamsprod
Copy link
Member

The fix has already been shipped on V2 it will be shipped in V7 later today

@trowski
Copy link

trowski commented Aug 14, 2023

Sorry for the delay, my weekend was rather busy. I'm not sure if the issue is limited to just +, I think all the reserved characters are supposed to be encoded (unless they have meaning in a particular URI scheme, which they usually don't for HTTP). I didn't have time to do more research to confirm this though.

Encoding + will fix the ambiguity between RFC1738 and 3986 when decoding, which was the main issue we experienced.

@nyamsprod
Copy link
Member

@trowski no problem

These are just fix to improve encoding so I can still release a new fix if needed and that's also why I did not release the fix yet for v7.

I stumble on this article https://www.456bereastreet.com/archive/201008/what_characters_are_allowed_unencoded_in_query_strings/ which is using the same logic that I did for not converting the + character. But indeed if we start encoding + we should then to be coherent encode all sub-delims characters.

@nyamsprod
Copy link
Member

I just tried the implementation from Guzzle and it behave like the package did before the fix

(string) Uri::withQueryValue(new Uri('https://example.com'), 'key', " /+t!q_");
// displays "https://example.com?key=%20/+t!q_"

sub-delims characters are not encoded.

@nyamsprod
Copy link
Member

7.1.0 is released with the fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants