-
-
Notifications
You must be signed in to change notification settings - Fork 6.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTTP/Webdav encoding fixes - Fixes ticket #15365 #5584
Conversation
@mkortstiege : Please review/comment, not sure whether the HTTPDirectory-change is the best approach but hacking our CURL class seemed even worse. |
jenkins build this please |
Should be OK as long as the converted URL is still valid and not causing further troubles. @topfs2 your button. |
@mkortstiege : Tested this with Apache. It e.g. gives a filename called "test&.mkv" and the change in the last commit (the other are real fixes we need anyway for e.g. webdav as well) will change it into "test%26.mkv" which we'll use interally. Also requesting "test%26.mkv" from the Apache server seems to work find, no idea why Apache does this btw. Note that I'm unable to figure out (according to the RFCs) how URLs like http://host/path/test&.mkv;option=value should be parsed/handled, if allowed at all, as that's what the actual problem is. |
Win32 build error is unrelated. @topfs2 what are your feelings about this? |
& is a delimiter and therefore a reserved char, see 2.2 in http://www.ietf.org/rfc/rfc3986.txt |
@wsnipex : Actually the real fix for the last commit should be that we stop considering ; as an option separator for URLs (in .e.g. our URL class). I also read that rfc and the other relevant ones and they clearly state that for http AT LEAST ; is NOT an option separator. So the real fix should be arnova@7382539 @elupus / @Montellese : Please comment |
Does anyone know why we consider |
@Montellese : My point exactly, me neither. Same goes for # btw. I checked GIT history for URL.cpp but it seems it has been there for a very long time but I'm pretty sure it's wrong (according to RFC), causing problems like this. |
Agreed. I'd say remove what's not specified in the RFC. |
Just looked through RFC 3986, specifically http://tools.ietf.org/html/rfc3986#section-2.2, and both |
@mkortstiege: Sure, can you and/or @Montellese double check the RFCs, just in case? |
@Montellese: Yeah ; is a sub-delimiter, which afaik means this is NOT allowed http://host/path/file;option=val but http://host/path/file?option=val;option2=val is, right? EDIT: Bottomline is, afaik, we should first separate path from options with ? and process options including ; and # as separators. |
OK I took a closer look and this is how I understand it. The path component of a URI ends with the first |
@Montellese : Please check http://www.ietf.org/rfc/rfc2396.txt section 3.3, if I read it correctly it seems http://host/path/file;option is also used. Or am I misreading it? |
It's written very unclearly IMO. But there's the following example: |
note that rfc3986 obsoletes RFC2396 |
http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/ has a pretty good description of URL structures. So basically Now the question for me is: What's the difference between |
also see https://stackoverflow.com/questions/2163803/what-is-the-semicolon-reserved-for-in-urls/2163885#2163885 for a pretty good explanation of ";". In short: semicolon is reserved to delimit sub-segments, but can be used % encoded as well to be part of the segment name. |
@Montellese : Hehehe, just read the exact same article this morning. So the only thing I'm wondering about then is whether Apache's encoding in paths of & to & amp ; is valid at all? |
Probably not since both Are we parsing a HTML site in this use case? Because in HTML encoding |
@Montellese : To be clear I meant Apache's encoding of & amp ; in the href part like in test & file.avi. How should we handle it? Browsers handle it correctly, we do not. I see 2 options, the one in this PR or have our URL class detect & amp ; (and the like) to distinguish it from the ; delimiter. |
@arnova: Do you happen to have an example HTML page? IMO CURL shouldn't have to recognise HTML encodings. The |
@Montellese : http://www.eld.leidenuniv.nl/~arnova/public/ Btw. if you're statement is correct then the fix in the PR is the only correct one. |
…. Apache) transcode into %-URL encoding. Fixes xbmc#15365
602de1c
to
a7be6ef
Compare
Well I'm not 100% sure but what you get is HTML which uses |
@Montellese : We only need to decode the href part which is exactly as it is now in this PR. So think it's sane the way it is. |
Probably yeah. |
@topfs2 : You ok with this? |
I trust you guys. Merge at your convenience. |
HTTP/Webdav encoding fixes - Fixes ticket #15365
No description provided.