Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save private item attachments on protected folder #241

Open
leogermani opened this issue May 21, 2019 · 5 comments

Comments

@leogermani
Copy link
Contributor

commented May 21, 2019

No description provided.

@leogermani leogermani created this issue from a note in tainacan/tainacan (Backlog) May 21, 2019

@leogermani leogermani added the 0.11 label May 21, 2019

@leogermani leogermani self-assigned this May 21, 2019

@leogermani leogermani moved this from Backlog to To Do in tainacan/tainacan Jul 8, 2019

@leogermani

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2019

So here are the results of what I found so far as possible approaches:

Changing directory structure

First of all we thought it would be a good idea to change the uploads directory structure to better organize Tainacan items attachments and documents, so they dont get all mixed up with other uploads under the uploads and year and month folders...

However, it turns out this is not very easy to achieve.

The most used approach is to hook into the wp_handle_upload_prefilter filter to add a filter to upload_dir only when a new file is beeing saved. This is described in https://wordpress.stackexchange.com/questions/195453/different-upload-path-per-file-type?noredirect=1&lq=1 and this approach is implemented by this plugin: http://wordpress.org/extend/plugins/custom-upload-dir/

But the problem is that, inside these hooks, and in any hook in _wp_handle_upload you don't have the information of ID of the post you are attaching the file to. In media_handle_upload function, where you have the ID, you don't have any hooks.

You could do this by getting the post_id from the http request ($_REQUEST['post_id']), but this would not work in all cases, for example in importers. Also, in Tainacan, the document is not attached to the item as the other attachments are, so the post_id would be null anyway... There would have to be some other way to inform this...

So one way to achieve this would be not to use the default WordPress way of handling uploads... of course this comes with a huge downsize. For example, plugins that integrates with external storage services, like amazon s3, would not work out of the box.

Protecting files

There are basically 2 ways we could protect the files:

Create a folder for private files

We could create a separate folder to store private files, but:

  • This falls under the problems described above, of not being able to change the upload folder properly
  • If we managed to find a way, we would have to deal with bulk changes to items permissions. Let's say a collection is set to private and there are already 20.000 items in it. This would have to trigger a backgroud process to move all the files from one folder to another and update the attachment post in the database...

Check permission on every upload before serve it

This would be an approach similar to this plugin: https://wordpress.org/plugins/private-files/

A .htacces file directs all traffic to php, that checks file permission.

But this is a performance killer...

An intermediate solution, if we managed to separate a folder for all uploads related to tainacan items, woud be to add this only to this folder, leaving regular WordPress uploads out of it. But still...

However, this is the easiest way to implement it right now.

Conclusion (so far)

We still have to think better on how to deal with this.

The only way to assure total control and security over the files seems to be not using WordPress native upload workflows.

This would give more security and control, but would make it more difficult for integrations. It's easy to think that Tainacan would go well together with plugins that save the files somewhere else, like amazon S3 or similar services, and this approach would break this.

The easiest way to do this right now is to direct all traffic to php and check permissions before serving static files, at a high performance cost... Maybe we could give it a try and look how fast we could make it...

@leogermani

This comment has been minimized.

Copy link
Contributor Author

commented Jul 15, 2019

@leogermani

This comment has been minimized.

Copy link
Contributor Author

commented Jul 15, 2019

Gave it a second try and a second thought.

Serving all files from the upload folder via php is just unacceptable. It could easily kill the web server.

I've created a feature branch (named feature/241) and made a proof of concept on changing the upload dir based on the post id. It works, but we will have to start attaching the document file to the Item. Here are few things we must do:

  • Inform post_id when upload the item document
  • Filter default response when trying to get item attachments to remove the document (maybe its not possible. If not, change behavior on plugin and theme)

Then, the first approach is to upload files on a collection_id/item_id folder structure inside the uploads folder.

  • give an option to change the default tainacan upload dir
  • offer a hook that trigger when this folders are created

Finally we will protect the folder of private items or collections. In apache, we will do this by adding a .htaccess file to it. The template_redirect hook will later check for permission and serve the file (as the current POC does). But we leave room to other servers protect the folders in other ways

  • create hook to protect folders and trigger it when needed.
  • protect folders in apache using .htaccess

Let's see how it goes

ps - I was thinking on putting all the private files in a separated folder. Then we would only have to protect one folder. But Im dropping this idea for now, as this would require some background processes if we, for example, made a collection with thousands of items private and we had to move thousands of files around at once. I want to avoid that

leogermani added a commit that referenced this issue Jul 16, 2019

@leogermani leogermani moved this from To Do to In Progress in tainacan/tainacan Jul 17, 2019

leogermani added a commit that referenced this issue Jul 17, 2019

leogermani added a commit that referenced this issue Jul 17, 2019

leogermani added a commit that referenced this issue Jul 17, 2019

leogermani added a commit that referenced this issue Jul 17, 2019

@leogermani

This comment has been minimized.

Copy link
Contributor Author

commented Jul 17, 2019

Ok, at the end I think the implementation is fairly good, a bit different from the last comment.

How it works

Everytime an attachment or a document is uploaded to an item, the file is saved under a special directory inside the uploads folder. There is a base directory for all items files, tainacan-items by default, and, inside this folder a $collection_id/$item_id folder strucure.

The URL for the files will always remain the plain strucutre, for example, site.com/wp-content/uploads/tainacan-items/1234/2345/file.jpg. However, in the file system, collections and items folders can be prefixed, indicating they are protected folders.

So if the item with ID 2345 is private, the folder will be called, for example, _x_2345.

Since the URL and the folder in the filesystem don't match, this would cause a 404 error. But then tainacan will detect this is an attempt to access an item file and will look into the prefixed folders for it.

If it finds, it will check whether current user can read the item and then serve it for authorized users, or leave a 404 response for non authorized users.

Everytime you edit an item or a collection, or even bulk edit items, the folders will be renamed accordingly.

The base folder for items attachments can be changed setting the TAINACAN_ITEMS_UPLOADS_DIR constant in wp-config.php.

The prefix for private folders can be changed setting the TAINACAN_PRIVATE_FOLDER_PREFIX constant in wp-config.php.

Note: This constants must be setted in a fresh install. If they are changed after there are uploads, all the links will break...

Protecting folders

Now you can add a rule to your .htaccess to block access to all prefixed folders inside your uploads directory. This could be done with a rule like this:

RewriteRule ^wp-content/uploads/tainacan-items/.*_x_\d+/.+$ - [F,L]

Even if a person manages to find out the real file path, with the prefixed folder name, he/she will not be able to access it, only via the clean URL that will pass permission check.

This is the basic difference from my last comment. One rule to rule them all. And it should work fine for nginx as well. :)

What is left to do

  • Write tests!

  • Write docs in the wiki

  • Write a CLI command to migrate current structure to the new structure (this is not mandatory, current installations will work normally and start upload new files in the new structure, but old files will stay where they are)

  • Handle documents: Now that documents are attached to items, it must be excluded from attachments lists ... this could be done adding post__not_in parameter to the queries passing the document ID. And also exclude paramenter to the API requests calling for attachments.

  • Also, review the $item->get_attachments() method to exclude the document.

leogermani added a commit that referenced this issue Jul 23, 2019

leogermani added a commit that referenced this issue Jul 24, 2019

leogermani added a commit that referenced this issue Jul 24, 2019

mateuswetah added a commit that referenced this issue Jul 25, 2019

leogermani added a commit to tainacan/tainacan-theme that referenced this issue Jul 25, 2019

leogermani added a commit that referenced this issue Jul 25, 2019

leogermani added a commit that referenced this issue Jul 25, 2019

leogermani added a commit that referenced this issue Jul 25, 2019

mateuswetah added a commit that referenced this issue Jul 25, 2019

leogermani added a commit to tainacan/tainacan-wiki that referenced this issue Jul 25, 2019

@leogermani leogermani moved this from In Progress to Validation in tainacan/tainacan Jul 29, 2019

@vnmedeiros

This comment has been minimized.

Copy link
Contributor

commented Aug 2, 2019

it be interesting to add a notification, on system check eg., about the existence of the rewriterule (.htaccess)

leogermani added a commit that referenced this issue Aug 2, 2019

leogermani added a commit that referenced this issue Aug 5, 2019

leogermani added a commit that referenced this issue Aug 5, 2019

leogermani added a commit that referenced this issue Aug 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.