Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP/Webdav encoding fixes - Fixes ticket #15365 #5584

Merged
merged 3 commits into from
Nov 4, 2014

Conversation

arnova
Copy link
Member

@arnova arnova commented Oct 27, 2014

No description provided.

@arnova
Copy link
Member Author

arnova commented Oct 27, 2014

@mkortstiege : Please review/comment, not sure whether the HTTPDirectory-change is the best approach but hacking our CURL class seemed even worse.

@arnova
Copy link
Member Author

arnova commented Oct 27, 2014

jenkins build this please

@mkortstiege
Copy link
Member

Should be OK as long as the converted URL is still valid and not causing further troubles. @topfs2 your button.

@arnova
Copy link
Member Author

arnova commented Oct 27, 2014

@mkortstiege : Tested this with Apache. It e.g. gives a filename called "test&.mkv" and the change in the last commit (the other are real fixes we need anyway for e.g. webdav as well) will change it into "test%26.mkv" which we'll use interally. Also requesting "test%26.mkv" from the Apache server seems to work find, no idea why Apache does this btw.

Note that I'm unable to figure out (according to the RFCs) how URLs like http://host/path/test&.mkv;option=value should be parsed/handled, if allowed at all, as that's what the actual problem is.

@mkortstiege
Copy link
Member

Win32 build error is unrelated. @topfs2 what are your feelings about this?

@wsnipex
Copy link
Member

wsnipex commented Oct 28, 2014

& is a delimiter and therefore a reserved char, see 2.2 in http://www.ietf.org/rfc/rfc3986.txt

@arnova
Copy link
Member Author

arnova commented Oct 28, 2014

@wsnipex : Actually the real fix for the last commit should be that we stop considering ; as an option separator for URLs (in .e.g. our URL class). I also read that rfc and the other relevant ones and they clearly state that for http AT LEAST ; is NOT an option separator. So the real fix should be arnova@7382539

@elupus / @Montellese : Please comment

@Montellese
Copy link
Member

Does anyone know why we consider ; as an option separator in HTTP(S) urls? Doesn't make much sense to me.

@arnova
Copy link
Member Author

arnova commented Oct 28, 2014

@Montellese : My point exactly, me neither. Same goes for # btw. I checked GIT history for URL.cpp but it seems it has been there for a very long time but I'm pretty sure it's wrong (according to RFC), causing problems like this.

@mkortstiege
Copy link
Member

Agreed. I'd say remove what's not specified in the RFC.

@Montellese
Copy link
Member

Just looked through RFC 3986, specifically http://tools.ietf.org/html/rfc3986#section-2.2, and both # and ; are listed as delimiters. # is a fragment delimiter and ; a sub-delimiter.

@arnova
Copy link
Member Author

arnova commented Oct 28, 2014

@mkortstiege: Sure, can you and/or @Montellese double check the RFCs, just in case?

@arnova
Copy link
Member Author

arnova commented Oct 28, 2014

@Montellese: Yeah ; is a sub-delimiter, which afaik means this is NOT allowed http://host/path/file;option=val but http://host/path/file?option=val;option2=val is, right?

EDIT: Bottomline is, afaik, we should first separate path from options with ? and process options including ; and # as separators.

@Montellese
Copy link
Member

OK I took a closer look and this is how I understand it. The path component of a URI ends with the first ? or # and everything after that is called query component. In the query segment any of the sub-delimiters (including & and ; but not #) can be used to separater options. A # signals the end of the query component.

@arnova
Copy link
Member Author

arnova commented Oct 29, 2014

@Montellese : Please check http://www.ietf.org/rfc/rfc2396.txt section 3.3, if I read it correctly it seems http://host/path/file;option is also used. Or am I misreading it?

@Montellese
Copy link
Member

It's written very unclearly IMO. But there's the following example: http://a/b/c/g;x=1/y where x=1 is an option of the path component g. So guess it's possible after all.

@wsnipex
Copy link
Member

wsnipex commented Oct 29, 2014

note that rfc3986 obsoletes RFC2396

@Montellese
Copy link
Member

http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/ has a pretty good description of URL structures. So basically ; can be used as a separator between a path component and parameters but they can't be used to separate a path (which can include parameters) from the query part which must always be done using a ?.

Now the question for me is: What's the difference between parameters and options in the query part? The author of above article states that parameters are very uncommon but they are still valid.

@wsnipex
Copy link
Member

wsnipex commented Oct 29, 2014

also see https://stackoverflow.com/questions/2163803/what-is-the-semicolon-reserved-for-in-urls/2163885#2163885 for a pretty good explanation of ";". In short: semicolon is reserved to delimit sub-segments, but can be used % encoded as well to be part of the segment name.

@arnova
Copy link
Member Author

arnova commented Oct 29, 2014

@Montellese : Hehehe, just read the exact same article this morning. So the only thing I'm wondering about then is whether Apache's encoding in paths of & to & amp ; is valid at all?

@Montellese
Copy link
Member

Probably not since both & and ; are reserved characters and they must always be encoded (using the % encoding) if they aren't meant to be used for their special purpose.

Are we parsing a HTML site in this use case? Because in HTML encoding & as & is valid but we would have to decode that first and then look at the URL.

@arnova
Copy link
Member Author

arnova commented Oct 29, 2014

@Montellese : To be clear I meant Apache's encoding of & amp ; in the href part like in test & file.avi. How should we handle it? Browsers handle it correctly, we do not. I see 2 options, the one in this PR or have our URL class detect & amp ; (and the like) to distinguish it from the ; delimiter.

@Montellese
Copy link
Member

@arnova: Do you happen to have an example HTML page?

IMO CURL shouldn't have to recognise HTML encodings. The & should already have been translated to an & before it is passed to CURL. CURL should only have to handle % based encodings.

@arnova
Copy link
Member Author

arnova commented Oct 29, 2014

@Montellese : http://www.eld.leidenuniv.nl/~arnova/public/

Btw. if you're statement is correct then the fix in the PR is the only correct one.

@Montellese
Copy link
Member

Well I'm not 100% sure but what you get is HTML which uses &<xxx>; based encoding so you decode that. Then you extract the URLs which use %<xx> based encoding so you split the URL into its components and then you decode each component.

@arnova
Copy link
Member Author

arnova commented Oct 31, 2014

@Montellese : We only need to decode the href part which is exactly as it is now in this PR. So think it's sane the way it is.

@Montellese
Copy link
Member

Probably yeah.

@arnova
Copy link
Member Author

arnova commented Nov 3, 2014

@topfs2 : You ok with this?

@topfs2 topfs2 added Approved Type: Fix non-breaking change which fixes an issue labels Nov 3, 2014
@topfs2
Copy link
Contributor

topfs2 commented Nov 3, 2014

I trust you guys. Merge at your convenience.

@MartijnKaijser MartijnKaijser modified the milestone: Helix 14.0-alpha5 Nov 3, 2014
arnova added a commit that referenced this pull request Nov 4, 2014
HTTP/Webdav encoding fixes - Fixes ticket #15365
@arnova arnova merged commit fd4a610 into xbmc:master Nov 4, 2014
@MartijnKaijser MartijnKaijser added this to the Helix 14.0-beta2 milestone Nov 4, 2014
@arnova arnova deleted the http_dav_encoding_fixes branch November 7, 2014 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Fix non-breaking change which fixes an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants