New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add redirect_to_index_html boolean to [client] #21
Conversation
It True, when web api has to display a directory node, it first looks whether that directory contains an index.html file. If so - it redirects to it. If not - it shows the folder view. Useful for public gateways: some folders you want browsable, but if you have a blog with "draft" posts, index.html protects the folder from browsing.
If I understand correctly this applies to the /uri/ namespace, but it changes the semantics of existing links to directories, potentially breaking them. If it were to only apply to the /file/ namespace, there wouldn't be a compatibility problem because /file/ doesn't show the directory listing page for URLs ending in slash, IIRC. |
I'm not sure what I think of this approach. I guess there is a tension between exposing the files and dirs "like a filesystem does", i.e., don't mangle any bytes, don't hide things or move things around, versus "like a web server" does, i.e., well, all sorts of features depending on how you've configured your web server. :-) I use an nginx front-end (configured with the help of lafs-rpg), and there are several features I've added to my nginx front-end that I wouldn't want to see included in the base tahoe-lafs WUI/WAPI, such as redirecting "https://zooko.com/ANYTHING" to "https://zooko.com/uri/URI:DIR2-RO:d23ekhh2b4xashf53ycrfoynkq:y4vpazbrt2beddyhgwcch4sduhnmmefdotlyelojxg4tyzllhb4a/ANYTHING". So if you didn't have this feature, what could you do instead? Well, I guess you would keep all the drafts in a separate directory and mv them into this directory when you were ready to make them readable. That feels more consistent with the LAFS access control design to me, because you're expressing your policy about what should be readable by whom in terms of which files are in which directories. By contrast, with this feature of "index.html overrides directory listing", then if you remove the index.html you thereby expose an entire directory's worth of files to reading that weren't exposed before. And going the other direction, if you create a file whose name is "index.html" (possibly because someone tricked you, by choosing "index.html" for their filename when you let them choose the filename, and you didn't think of them choosing "index.html") then people who have only the readcap to that directory lose access to all those other files. So, yeah, I just talked myself into being -1 on this design. Sorry, thedod! What if we talk more about some kind of separation of concerns. Currently, the separation I have on zooko.com (the thedod also has, I know), with nginx doing all sorts of site-specific massaging of the data and the LAFS gateway just, well, being LAFSish -- that separation of concerns suits me pretty well. What's unsatisfying about it so that you thought to write this patch? Maybe deployment issues -- you have to install and configure two things (nginx and tahoe-lafs) instead of just one? Maybe something else? Maybe Tahoe-LAFS could have a "web server" interface separate from the WAPI. There are some other tickets where we've been saying "Hm, yeah, in some cases this needs to act just like a pure data filesystem over RESTful HTTP -- you send read requests (GET requests) and you get a stream of bytes back. But in other cases, it needs to act like a specific sort of web server -- perhaps wrapping the content that it sends back in an iframe, or even trying to inject Secure EcmaScript prologue into the beginning of each file, etc...". |
I agree with zooko that this feature is not suitable to prevent browsing, from a security perspective. In tahoe, a CAP is necessary and sufficient to access file contents. Therefore, these two URIs convey the same read authority:
Put another way, if I have only the first url, I can just remove "index.html" and then guess and check many filenames appended to that URL. However, there's a usability reason why, if you are running a website, you do not want to expose the tahoe gateway's directory browsing html. In that case, it makes sense that you might prefer custom index.html files which give your site a consistent feel. This feature is similar to a web server configuration for "automatic directory listing". In both cases it's not a secure access control mechanism and in both cases it may be desirable to turn it off (or otherwise handle directory URLs specially) for site usability/design. The tahoe gateway is like a webserver which has this option hard-coded to on. |
Here are some usability issues I've brainstormed, but haven't thought carefully about:
If the web content is well designed it could simply never link to bare directory URLs and only to its own index.html files, but because the potential still exists for any user to point into something that looks like it's authored by the same source as the rest of the web site content, then the default tahoe directory listing will give "the site" an inconsistent feel or presentation that makes it less desirable as a web platform. |
On 14/11/12 23:38, zooko wrote:
Also, it doesn't prevent an attacker from reading the directory contents by stripping David-Sarah Hopwood ⚥ |
The reason I need this feature is to enable draft posts like this one. OTOH, in other cases I do want users to be able to browse the folder (e.g. the "screenshots" folder, or stuff I share dropbox-style with friends). I think there should be a distinction between the grid and the wapi client: we have a single grid, but (at least in my case) two kinds of users: "the public" and "team" (e.g. me and my wife) with write permissions and all that. Each needs different wapi behavior. If we compare the behavior to a static web server serving from IMHO, the fact that what I see on localhost is different than what the public sees on my site is not "inconsistent behavior". It's two different applications that share the same storage backend. |
thedod: thank you for the clear explanation of the use case. I agree that it is a useful distinction. How should it be implemented, though? To carry on with your analogy of {{{http://myhost/}}} compared to {{{file:///var/www/}}}, we usually use separate tools (a web server and web clients vs. a filesystem and tools like "cp" and "mv" and operating system GUI filesystem browsers, etc.). It kind of seems to me that there ought to be separate tools for you and your wife to access your files and directories vs. for exposing to the public. I had mostly conceived of nginx serving the latter role. Why are you extending the tahoe-lafs wui/wapi instead of writing nginx config files to add this functionality? (My guess is that doing it in nginx config files is awkward or impossible, but you probably know a lot more about that than I do, so you tell me.) What if there were a "web server" listening on a different port. So, you could for example make that thing listen on port 80, and have the thing with semantics more like the current WAPI listen on another port, like "8765". Maybe features like the one on this pull request would live in a separate section of the {{{tahoe.cfg}}} file. What would that section be called? Something like {{{web_hosting}}}? |
I guess I took that direction because [at least at the moment] the way to expose a Tahoe-LAFS storage to the public is through the WAPI (via a tweaked nginx gateway), so when I've found out something that couldn't be done on nginx, I've added the functionality to the WAPI.
The way the functionality is defined [at least at the moment] is that the redirection depends on whether index.html exists (something I don't think nginx has the ablity to check, but maybe I'm wrong).
This could be cool for several reasons:
|
Okay, the notion of a builtin web-server sounds somewhat promising to me. How about this:
|
I'm closing this pull request, as well as #1858. |
Just noticed what @davidsarah wrote:
Tried it... Doh! I guess the whole thing was daft to begin with. |
Anyway (with Zooko's help), I think I've figured out how to do this on the nginx side: https://gist.github.com/4106919 |
I don't know about your current nginx rules. They are rewriting (sending back HTTP 30x redirects) from human-oriented URLs like https://dubiousdod.org/blog/drafts/example-draft-post.html to cap URLs like https://dubiousdod.org/uri/URI:DIR2-RO:r7xnodn7et6d3ex44p77qk4eka:nvca4ivhhm2an3eafzpg7wpppy7osgyxvngng5uriqjv2qkcag6a/Latest/drafts/example-draft-post.html . Is that the right thing to do? That seems wrong, if your goal is to have stricter access controls than the crypto-capability access controls. It seems to me that if you want to do that, then you should not send an HTTP 30x redirect, but should instead proxy, so that the user's browser URL bar still says "https://dubiousdod.org/blog/drafts/example-draft-post.html", and the content that gets returned to them by nginx is the content that the LAFS gateway loaded from URI:DIR2-RO:r7xnodn7et6d3ex44p77qk4eka:nvca4ivhhm2an3eafzpg7wpppy7osgyxvngng5uriqjv2qkcag6a/Latest/drafts internally. Let's move this discussion to tahoe-dev! It is no longer really relevant to this pull request, right? |
It
True
, when web api has to display a directory node,it first looks whether that directory contains an index.html file.
If so - it redirects to it. If not - it shows the folder view.
Useful for public gateways: some folders you want browsable,
but if you have a blog with "draft" posts, index.html protects
the folder from browsing.