Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URLs passed to wxFileSystemHandler::OpenFile() sometimes contain double slashes #13096

Open
wxtrac opened this issue Mar 28, 2011 · 13 comments
Open
Labels

Comments

@wxtrac
Copy link
Collaborator

@wxtrac wxtrac commented Mar 28, 2011

Issue migrated from trac ticket # 13096

component: wxHtml | priority: normal

2011-03-28 19:17:36: ajk (Andrew Kroll) created the issue


In absence of a mime type and extension on the URI, or extension only for local files, wxHtml is unable to properly guess the mime type or content. It defaults to text/plain.

Sometimes this also causes the HTTP headers sent from a server to also be rendered within the text.

To reproduce, simply remove the extension from an HTML file, and try to load it.
instead of rendering it as HTML, it will render as plain text.

It should be trivial to be able to detect the type from the content. Perhaps a filter that can figure out the content type, and set it.

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Mar 29, 2011

2011-03-29 17:04:20: @vslavik changed status from new to closed

2011-03-29 17:04:20: @vslavik set resolution to wontfix

2011-03-29 17:04:20: @vslavik changed type from defect to enhancement

2011-03-29 17:04:20: @vslavik commented

Replying to [#13096 ajk]:

To reproduce, simply remove the extension from an HTML file, and try to load it.
instead of rendering it as HTML, it will render as plain text.

This is expected behavior, it doesn't know the content-type.

It should be trivial

Actually, it's rather non-trivial and fragile. But if you think it's trivial, then such a (trivial) patch from you would be gladly accepted. For now, I'm closing it as wontfix, though.

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Mar 29, 2011

2011-03-29 23:34:11: ajk (Andrew Kroll) commented


Actually I have narrowed it down, and do have a fix...
I will upload what the bug is, and the work around.
The bug should be reported upstream to the wxWidgets people so that it can be fixed.

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Mar 30, 2011

2011-03-30 00:39:26: ajk (Andrew Kroll) uploaded file fixhttp.py (6.9 KiB)

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Mar 30, 2011

2011-03-30 00:49:41: ajk (Andrew Kroll) commented


Note on the fix demo...

Run the original demo and compare the output if the commentary inside the source code is not clear enough.

I promise you will see the difference!

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Apr 2, 2011

2011-04-02 17:46:18: ajk (Andrew Kroll) changed status from closed to reopened

2011-04-02 17:46:18: ajk (Andrew Kroll) changed resolution from wontfix to **

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Apr 2, 2011

2011-04-02 17:47:44: ajk (Andrew Kroll) changed title from wxHtml not able to guess mime type, plus other problems. to wxHtml not able to guess mime type because of bad URL formation

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Apr 2, 2011

2011-04-02 17:51:27: @vadz changed priority from high to normal

2011-04-02 17:51:27: @vadz changed status from reopened to infoneeded_new

2011-04-02 17:51:27: @vadz commented

Sorry, I understand absolutely nothing here. What is the problem exactly, once again, please? What do you mean by "bad URL formation"? I see some extra slashes in your comments but I have no idea where do they come from nor why are they a problem.

Please explain from the beginning because this is very unclear right now.

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Apr 3, 2011

2011-04-03 08:32:24: ajk (Andrew Kroll) changed status from infoneeded_new to new

2011-04-03 08:32:24: ajk (Andrew Kroll) commented

The extra slashes come from the parser, my guess, at least. What happens is that wx attempts to take the previous URL, and tack on the new target based on the old URL.

This fails, sometimes causing one gigantic URL, which in turn wreaks havoc, such as images and pages that do not load. I'm fairly sure the problem is the URI schema not getting parsed properly and the code gets confused. I've not looked at the code yet (I don't care much for c++) but perhaps I can dig into it, and find out.

I'm pretty close to just writing my own parser to replace what is there, but I don't want to if I do not have to. What is already in wx is (more or less) fine, and I can work around a lot of the deficiencies... like the whole capitalization strangeness, which is pretty easy to get around... again, a whole different story there too.

In short, at first I though it was something in the mime parsing.
...well, it is partly, but that's another story altogether, and with corrected URLs, the issue goes away.

Run the demo as instructed in the code, then compare it to the wxPython demo and use the same URL, and you will visually see what breaks on the original demo, and on stdout, you will see the actions that I take in attempts to correct it in Python code.

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Apr 3, 2011

2011-04-03 11:39:15: @vadz changed status from new to infoneeded_new

2011-04-03 11:39:15: @vadz commented

Replying to [comment:7 ajk]:

The extra slashes come from the parser, my guess, at least.

Which parser are you speaking about here?

Run the demo as instructed in the code

I don't understand much in the demo code so I don't understand what is it supposed to show. Please make the simplest possible example (which I could incidentally easily transpose to C++ unlike the current one) because I still don't understand what the problem actually is. Also please open separate tickets for separate problems instead of darkly hinting at their existence in this discussion, I really don't have time/energy to chase mysterious clues and would strongly prefer clear, concise explanations.

Thanks in advance.

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Apr 3, 2011

2011-04-03 18:39:06: ajk (Andrew Kroll) changed status from infoneeded_new to new

2011-04-03 18:39:06: ajk (Andrew Kroll) changed type from enhancement to defect

2011-04-03 18:39:06: ajk (Andrew Kroll) commented

Since you have not ran the demo, ok, you possibly read it, etc, I'll try to reenforce/describe what is happening.

'http://trac.wxwidgets.org/chrome/site/logo9.jpg'

became

`http://trac.wxwidgets.org//chrome/site/logo9.jpg'

when it was passed to the filesystem handler.

An extra '/' was added, and should not be there. Because it is there, the web server rejects the URL with a 404. That causes a broken image.

On some occasions you might get something like this:

'http://foo.domain.org/some/path/?some.cgi//some/different/path/someimage.jpg'

where the correct path should be

'http://foo.domain.org/some/different/path/someimage.jpg'

Since I do have a little time today, I will attempt to locate the area where this problem is, but no promises.

I'm also changing the type to back defect, as it is obviously a bug.

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Apr 4, 2011

2011-04-04 13:01:38: @vadz changed title from wxHtml not able to guess mime type because of bad URL formation to URLs passed to wxFileSystemHandler::OpenFile() sometimes contain double slashes

2011-04-04 13:01:38: @vadz commented

So do you mean that URLs passed to wxFileSystemHandler::OpenFile() have doubled slashes, is that the problem? If so, I agree that it's a bug although it doesn't seem a very serious one to be honest as most web servers seem to just ignore the consecutive slashes (in particular your wx logo URL above works just fine).

The disappearance of CGI suffix from the URL would be much more serious but how can this be reproduced?

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Apr 6, 2011

2011-04-06 04:23:31: ajk (Andrew Kroll) commented


Here's the rather interesting part... if the non-decoded part were sent, instead of the attempt to re-encode, it would work straight away... That a look at this sample that I got today while doing surfing tests...

IMAGE 'http://darcs.haskell.org/darcsweb/darcs.png'
in the filesystem handler it got passed:

http://darcs.haskell.org/cgi-bin/../darcsweb/darcs.png

This made me just pop my eyeballs out and do a --- ** WTF!?! ** --- Why is it modifying anything at all?
This is the URI the web server sent, why are we modifying it at all at the OnOpeningURL stage? I think one could call that part too a bug, but let's concentrate on the first/worse one at hand here...

Concerning the above examples/descriptions...

The first case, on some web servers will 404.

On the second case it's more like something gets confused with the path part, and just tacks on at the end WITH the double slash (that's how I detect it). It's rare, but I have seen it happen, and the result is confusion.

I've looked inside the code and indeed there is some places where ' is appended, perhaps in some respect someone thought that '' is an escape, and it clearly isn't. the '\' case would be an escape. In either case, depending on the schema, nothing gets added... and if the URL passed begins with a /, then it's absolute.

That's about as far as I got with it. Hope my descriptions help you nail this long-standing and very annoying bug. If you still need more examples, test cases, etc, please let me know and I'll see what I can do.

@wxtrac
Copy link
Collaborator Author

@wxtrac wxtrac commented Apr 6, 2011

2011-04-06 13:39:44: @vslavik commented


Replying to [comment:11 ajk]:

That's about as far as I got with it. Hope my descriptions help you nail this long-standing and very annoying bug. If you still need more examples, test cases, etc, please let me know and I'll see what I can do.

Yes, we could use an example that clearly demonstrates the bug. As in, smallest possible piece of code (e.g. a patch against samples/minimal, but a few lines of code in a comment would do; a series of wxFileSystem instance calls leading to a wrong result would be best) that is self-contained and sufficient to reproduce the bug. It's still unclear to me what the problem is from your comments or the attached Python code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant