URLs passed to wxFileSystemHandler::OpenFile() sometimes contain double slashes #13096
Issue migrated from trac ticket # 13096
component: wxHtml | priority: normal
2011-03-28 19:17:36: ajk (Andrew Kroll) created the issue
In absence of a mime type and extension on the URI, or extension only for local files, wxHtml is unable to properly guess the mime type or content. It defaults to text/plain.
Sometimes this also causes the HTTP headers sent from a server to also be rendered within the text.
To reproduce, simply remove the extension from an HTML file, and try to load it.
It should be trivial to be able to detect the type from the content. Perhaps a filter that can figure out the content type, and set it.
The text was updated successfully, but these errors were encountered:
2011-03-29 17:04:20: @vslavik changed status from new to closed
2011-03-29 17:04:20: @vslavik set resolution to wontfix
2011-03-29 17:04:20: @vslavik changed type from defect to enhancement
2011-03-29 17:04:20: @vslavik commented
Replying to [#13096 ajk]:
This is expected behavior, it doesn't know the content-type.
Actually, it's rather non-trivial and fragile. But if you think it's trivial, then such a (trivial) patch from you would be gladly accepted. For now, I'm closing it as wontfix, though.
2011-04-02 17:51:27: @vadz changed priority from high to normal
2011-04-02 17:51:27: @vadz changed status from reopened to infoneeded_new
2011-04-02 17:51:27: @vadz commented
Sorry, I understand absolutely nothing here. What is the problem exactly, once again, please? What do you mean by "bad URL formation"? I see some extra slashes in your comments but I have no idea where do they come from nor why are they a problem.
Please explain from the beginning because this is very unclear right now.
2011-04-03 08:32:24: ajk (Andrew Kroll) changed status from infoneeded_new to new
2011-04-03 08:32:24: ajk (Andrew Kroll) commented
The extra slashes come from the parser, my guess, at least. What happens is that wx attempts to take the previous URL, and tack on the new target based on the old URL.
This fails, sometimes causing one gigantic URL, which in turn wreaks havoc, such as images and pages that do not load. I'm fairly sure the problem is the URI schema not getting parsed properly and the code gets confused. I've not looked at the code yet (I don't care much for c++) but perhaps I can dig into it, and find out.
I'm pretty close to just writing my own parser to replace what is there, but I don't want to if I do not have to. What is already in wx is (more or less) fine, and I can work around a lot of the deficiencies... like the whole capitalization strangeness, which is pretty easy to get around... again, a whole different story there too.
In short, at first I though it was something in the mime parsing.
Run the demo as instructed in the code, then compare it to the wxPython demo and use the same URL, and you will visually see what breaks on the original demo, and on stdout, you will see the actions that I take in attempts to correct it in Python code.
2011-04-03 11:39:15: @vadz changed status from new to infoneeded_new
2011-04-03 11:39:15: @vadz commented
Replying to [comment:7 ajk]:
Which parser are you speaking about here?
I don't understand much in the demo code so I don't understand what is it supposed to show. Please make the simplest possible example (which I could incidentally easily transpose to C++ unlike the current one) because I still don't understand what the problem actually is. Also please open separate tickets for separate problems instead of darkly hinting at their existence in this discussion, I really don't have time/energy to chase mysterious clues and would strongly prefer clear, concise explanations.
Thanks in advance.
2011-04-03 18:39:06: ajk (Andrew Kroll) changed status from infoneeded_new to new
2011-04-03 18:39:06: ajk (Andrew Kroll) changed type from enhancement to defect
2011-04-03 18:39:06: ajk (Andrew Kroll) commented
Since you have not ran the demo, ok, you possibly read it, etc, I'll try to reenforce/describe what is happening.
when it was passed to the filesystem handler.
An extra '/' was added, and should not be there. Because it is there, the web server rejects the URL with a 404. That causes a broken image.
On some occasions you might get something like this:
where the correct path should be
Since I do have a little time today, I will attempt to locate the area where this problem is, but no promises.
I'm also changing the type to back defect, as it is obviously a bug.
2011-04-04 13:01:38: @vadz changed title from wxHtml not able to guess mime type because of bad URL formation to URLs passed to wxFileSystemHandler::OpenFile() sometimes contain double slashes
2011-04-04 13:01:38: @vadz commented
So do you mean that URLs passed to
The disappearance of CGI suffix from the URL would be much more serious but how can this be reproduced?
2011-04-06 04:23:31: ajk (Andrew Kroll) commented
Here's the rather interesting part... if the non-decoded part were sent, instead of the attempt to re-encode, it would work straight away... That a look at this sample that I got today while doing surfing tests...
This made me just pop my eyeballs out and do a --- ** WTF!?! ** --- Why is it modifying anything at all?
Concerning the above examples/descriptions...
The first case, on some web servers will 404.
On the second case it's more like something gets confused with the path part, and just tacks on at the end WITH the double slash (that's how I detect it). It's rare, but I have seen it happen, and the result is confusion.
I've looked inside the code and indeed there is some places where ' is appended, perhaps in some respect someone thought that '' is an escape, and it clearly isn't. the '\' case would be an escape. In either case, depending on the schema, nothing gets added... and if the URL passed begins with a /, then it's absolute.
That's about as far as I got with it. Hope my descriptions help you nail this long-standing and very annoying bug. If you still need more examples, test cases, etc, please let me know and I'll see what I can do.
2011-04-06 13:39:44: @vslavik commented
Replying to [comment:11 ajk]:
Yes, we could use an example that clearly demonstrates the bug. As in, smallest possible piece of code (e.g. a patch against samples/minimal, but a few lines of code in a comment would do; a series of wxFileSystem instance calls leading to a wrong result would be best) that is self-contained and sufficient to reproduce the bug. It's still unclear to me what the problem is from your comments or the attached Python code.