Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add redirect_to_index_html boolean to [client] #21

Closed
wants to merge 1 commit into from

Conversation

thedod
Copy link

@thedod thedod commented Nov 14, 2012

It True, when web api has to display a directory node,
it first looks whether that directory contains an index.html file.
If so - it redirects to it. If not - it shows the folder view.
Useful for public gateways: some folders you want browsable,
but if you have a blog with "draft" posts, index.html protects
the folder from browsing.

It True, when web api has to display a directory node,
it first looks whether that directory contains an index.html file.
If so - it redirects to it. If not - it shows the folder view.
Useful for public gateways: some folders you want browsable,
but if you have a blog with "draft" posts, index.html protects
the folder from browsing.
@daira
Copy link
Member

daira commented Nov 14, 2012

If I understand correctly this applies to the /uri/ namespace, but it changes the semantics of existing links to directories, potentially breaking them. If it were to only apply to the /file/ namespace, there wouldn't be a compatibility problem because /file/ doesn't show the directory listing page for URLs ending in slash, IIRC.

@zooko
Copy link
Member

zooko commented Nov 14, 2012

I'm not sure what I think of this approach. I guess there is a tension between exposing the files and dirs "like a filesystem does", i.e., don't mangle any bytes, don't hide things or move things around, versus "like a web server" does, i.e., well, all sorts of features depending on how you've configured your web server. :-)

I use an nginx front-end (configured with the help of lafs-rpg), and there are several features I've added to my nginx front-end that I wouldn't want to see included in the base tahoe-lafs WUI/WAPI, such as redirecting "https://zooko.com/ANYTHING" to "https://zooko.com/uri/URI:DIR2-RO:d23ekhh2b4xashf53ycrfoynkq:y4vpazbrt2beddyhgwcch4sduhnmmefdotlyelojxg4tyzllhb4a/ANYTHING".

So if you didn't have this feature, what could you do instead? Well, I guess you would keep all the drafts in a separate directory and mv them into this directory when you were ready to make them readable. That feels more consistent with the LAFS access control design to me, because you're expressing your policy about what should be readable by whom in terms of which files are in which directories. By contrast, with this feature of "index.html overrides directory listing", then if you remove the index.html you thereby expose an entire directory's worth of files to reading that weren't exposed before. And going the other direction, if you create a file whose name is "index.html" (possibly because someone tricked you, by choosing "index.html" for their filename when you let them choose the filename, and you didn't think of them choosing "index.html") then people who have only the readcap to that directory lose access to all those other files.

So, yeah, I just talked myself into being -1 on this design. Sorry, thedod! What if we talk more about some kind of separation of concerns. Currently, the separation I have on zooko.com (the thedod also has, I know), with nginx doing all sorts of site-specific massaging of the data and the LAFS gateway just, well, being LAFSish -- that separation of concerns suits me pretty well.

What's unsatisfying about it so that you thought to write this patch? Maybe deployment issues -- you have to install and configure two things (nginx and tahoe-lafs) instead of just one? Maybe something else?

Maybe Tahoe-LAFS could have a "web server" interface separate from the WAPI. There are some other tickets where we've been saying "Hm, yeah, in some cases this needs to act just like a pure data filesystem over RESTful HTTP -- you send read requests (GET requests) and you get a stream of bytes back. But in other cases, it needs to act like a specific sort of web server -- perhaps wrapping the content that it sends back in an iframe, or even trying to inject Secure EcmaScript prologue into the beginning of each file, etc...".

@nejucomo
Copy link
Contributor

I agree with zooko that this feature is not suitable to prevent browsing, from a security perspective. In tahoe, a CAP is necessary and sufficient to access file contents. Therefore, these two URIs convey the same read authority:

  • https://<site>/uri/<CAP>/<some path>/index.html
  • https://<site>/uri/<CAP>/<some path>/

Put another way, if I have only the first url, I can just remove "index.html" and then guess and check many filenames appended to that URL.

However, there's a usability reason why, if you are running a website, you do not want to expose the tahoe gateway's directory browsing html. In that case, it makes sense that you might prefer custom index.html files which give your site a consistent feel.

This feature is similar to a web server configuration for "automatic directory listing". In both cases it's not a secure access control mechanism and in both cases it may be desirable to turn it off (or otherwise handle directory URLs specially) for site usability/design. The tahoe gateway is like a webserver which has this option hard-coded to on.

@nejucomo
Copy link
Contributor

Here are some usability issues I've brainstormed, but haven't thought carefully about:

  • The content is authored by someone who has no relationship to the web interface operator. (In fact, imagine there are more than 1 content authors who want/need different directory displays, so no single web portal configuration could satisfy them all.)
  • The web content is in a different language, or a different character set, or has some other need that the tahoe gateway cannot easily accommodate.

If the web content is well designed it could simply never link to bare directory URLs and only to its own index.html files, but because the potential still exists for any user to point into something that looks like it's authored by the same source as the rest of the web site content, then the default tahoe directory listing will give "the site" an inconsistent feel or presentation that makes it less desirable as a web platform.

@daira
Copy link
Member

daira commented Nov 15, 2012

On 14/11/12 23:38, zooko wrote:

By contrast, with this feature of
"index.html overrides directory listing", then if you remove the index.html you thereby
expose an entire directory's worth of files to reading that weren't exposed before. And
going the other direction, if you /create/ a file whose name is "index.html" (possibly
because someone tricked you, by choosing "index.html" for their filename when you let them
choose the filename, and you didn't think of them choosing "index.html") then people who
have only the readcap to that directory /lose/ access to all those other files.

Also, it doesn't prevent an attacker from reading the directory contents by stripping
DIR2: from the URI.

David-Sarah Hopwood ⚥

@thedod
Copy link
Author

thedod commented Nov 15, 2012

The reason I need this feature is to enable draft posts like this one. OTOH, in other cases I do want users to be able to browse the folder (e.g. the "screenshots" folder, or stuff I share dropbox-style with friends).

I think there should be a distinction between the grid and the wapi client: we have a single grid, but (at least in my case) two kinds of users: "the public" and "team" (e.g. me and my wife) with write permissions and all that. Each needs different wapi behavior.

If we compare the behavior to a static web server serving from /var/www/, the public interface provides something similar to http://myhost/ while team members can see something similar to file:///var/www/ on myhost (where index.html has no special meaning, and all folders can be browsed). In other words, we already have a setup "in nature" that can provide both behaviors, depending on audience :).

IMHO, the fact that what I see on localhost is different than what the public sees on my site is not "inconsistent behavior". It's two different applications that share the same storage backend.

@zooko
Copy link
Member

zooko commented Nov 15, 2012

thedod: thank you for the clear explanation of the use case. I agree that it is a useful distinction. How should it be implemented, though? To carry on with your analogy of {{{http://myhost/}}} compared to {{{file:///var/www/}}}, we usually use separate tools (a web server and web clients vs. a filesystem and tools like "cp" and "mv" and operating system GUI filesystem browsers, etc.).

It kind of seems to me that there ought to be separate tools for you and your wife to access your files and directories vs. for exposing to the public. I had mostly conceived of nginx serving the latter role. Why are you extending the tahoe-lafs wui/wapi instead of writing nginx config files to add this functionality?

(My guess is that doing it in nginx config files is awkward or impossible, but you probably know a lot more about that than I do, so you tell me.)

What if there were a "web server" listening on a different port. So, you could for example make that thing listen on port 80, and have the thing with semantics more like the current WAPI listen on another port, like "8765".

Maybe features like the one on this pull request would live in a separate section of the {{{tahoe.cfg}}} file. What would that section be called? Something like {{{web_hosting}}}?

@thedod
Copy link
Author

thedod commented Nov 15, 2012

It kind of seems to me that there ought to be separate tools for you and your wife to access your files and directories vs. for exposing to the public.

I guess I took that direction because [at least at the moment] the way to expose a Tahoe-LAFS storage to the public is through the WAPI (via a tweaked nginx gateway), so when I've found out something that couldn't be done on nginx, I've added the functionality to the WAPI.

Why are you extending the tahoe-lafs wui/wapi instead of writing nginx config files to add this functionality?

The way the functionality is defined [at least at the moment] is that the redirection depends on whether index.html exists (something I don't think nginx has the ablity to check, but maybe I'm wrong).
Maybe I could change the requirements:

  • If it's a "bare" dircap (I think a regexp is enough in order to decide), always redirect to index.html (and if such a file doesn't exist - that's too bad).
  • In order to enable public directory view in specific cases, the nginx can have a whitelist that doesn't get redirected.
    Such a design would be more efficient (checking whether index.html exists is costly), but would require updating the whitelist (something my wife wouldn't be able to do) whenever we want a new "publicly browsable folder".

What if there were a "web server" listening on a different port.

This could be cool for several reasons:

  • We can get rid of the nginx
  • We can design it Tahoe-LAFS minded (minimize the possibility of insecure configuration - as opposed to nginx which is too general-purpose for comfort).
  • The server can read its configuration (publicly-browsable white-list, short-permalink rewrites, etc.) from a file inside the storage (we can simply provide its readcap at tahoe.cfg). Editing such a file is something my wife can do, and later on we can even have a web-server config editor as part of the WAPI. OTOH, the writecap of that config file is something we wouldn't want compromised :)

@zooko
Copy link
Member

zooko commented Nov 15, 2012

Okay, the notion of a builtin web-server sounds somewhat promising to me. How about this:

  • We reject this pull request (I don't know how to do this).
  • thedod can, of course, use a fork of tahoe-lafs with this patch, if needed.
  • We create a ticket on https://tahoe-lafs.org and write down the requirements for such a builtin web-server. (I'm still not sure if it is a good idea or what exactly it would do...)

@thedod
Copy link
Author

thedod commented Nov 15, 2012

I'm closing this pull request, as well as #1858.
I'll write some draft of the requirements for a public web server (the way I see them) and let you all look at them before we open that ticket.

@thedod thedod closed this Nov 15, 2012
@thedod
Copy link
Author

thedod commented Nov 15, 2012

Just noticed what @davidsarah wrote:

Also, it doesn't prevent an attacker from reading the directory contents by stripping DIR2: from the URI.

Tried it... Doh! I guess the whole thing was daft to begin with.
If we want a public server that doesn't allow directory browsing, we shouldn't redirect to (and expose) a cap, but actually maintain the logical path from the "mountpoint cap" (e.g. /blog/drafts/my-secret.html where /blog/ is mapped to some never-exposed cap).
I'll avoid draft posts until this is implemented :)

@thedod
Copy link
Author

thedod commented Nov 18, 2012

Anyway (with Zooko's help), I think I've figured out how to do this on the nginx side: https://gist.github.com/4106919

@zooko
Copy link
Member

zooko commented Nov 19, 2012

I don't know about your current nginx rules. They are rewriting (sending back HTTP 30x redirects) from human-oriented URLs like https://dubiousdod.org/blog/drafts/example-draft-post.html to cap URLs like https://dubiousdod.org/uri/URI:DIR2-RO:r7xnodn7et6d3ex44p77qk4eka:nvca4ivhhm2an3eafzpg7wpppy7osgyxvngng5uriqjv2qkcag6a/Latest/drafts/example-draft-post.html . Is that the right thing to do? That seems wrong, if your goal is to have stricter access controls than the crypto-capability access controls. It seems to me that if you want to do that, then you should not send an HTTP 30x redirect, but should instead proxy, so that the user's browser URL bar still says "https://dubiousdod.org/blog/drafts/example-draft-post.html", and the content that gets returned to them by nginx is the content that the LAFS gateway loaded from URI:DIR2-RO:r7xnodn7et6d3ex44p77qk4eka:nvca4ivhhm2an3eafzpg7wpppy7osgyxvngng5uriqjv2qkcag6a/Latest/drafts internally.

Let's move this discussion to tahoe-dev! It is no longer really relevant to this pull request, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants