Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where is the uploaded file? #60

Closed
KempWatson opened this issue Aug 24, 2016 · 10 comments
Closed

Where is the uploaded file? #60

KempWatson opened this issue Aug 24, 2016 · 10 comments

Comments

@KempWatson
Copy link

KempWatson commented Aug 24, 2016

Another issue (closed) says, regarding uploaded files being renamed to *.bin and *.info: "That's a good use case but a very specialized one. In general, I think, people will need more information than just the original file name and therefore have to read the additional data anyway. So I am sorry, but your suggestion won't make it into tusd."

I believe this is a major shortcoming for a "file uploader" - the uploaded file is essentially not there at all; this is a huge use case, not specialized in the least.

Is there a procedure for transferring "x.jpg" for example, to the server as "x.jpg"? Or, perhaps, a config hook to rename the file upon upload completion?

@kvz
Copy link
Member

kvz commented Aug 24, 2016

I would have to disagree with you, tus is designed to be deployed at massive scale. What if two users upload a file called x.jpg?

Should:

  • the first one be overwritten or the second one rejected?
  • we change the filename of the second - but now the filename is no longer a representation of the original one?
  • we create a directory per user? But what if one user uploads two times a file called screenshot.jpg, but they are different images?
  • we create one dir per file? That seems like a very wieldy workaround

The scenario for x.jpg, is the meta data such as the original filename is stored in your storage layer, and your app presents and offers x.jpg by reading that information back, along with for instance mime type, filesize, and perhaps which user uploaded the file.

If your app is not susceptible to the listed concerns and you really must have a folder with the original filenames for your usecase, you can still use tusd hooks to rename the file as soon as it has been written to disk. Hooks are separate programs that get presented the meta data and can perform an action with that, they can be written in any language as long as they are executable. But I'd be very careful about the mentioned concerns, going that route.

Does this make sense or do you think we overlooked something here?

For reference, the original ticket mentioned is #44

@KempWatson
Copy link
Author

KempWatson commented Aug 24, 2016

Thanks kvz for the quick answer. That's a very valuable use case indeed; it seems essentially an object store like OpenStack Swift, Amazon S3, Caringo Swarm, Go's own Minio, or many KV databases capable of large values, but embeddable and extensible.

In our use case, we are uploading dozens to thousands of medical images, each image 100 GB to 1 TB in size, into controlled-access folders. Other applications need to access the images by their original file names and extensions, and there will be no filename conflicts. Some of the other applications are written in ASP.NET, some in Go.

So far, I've embedded Tusd in a Go wrapper that controls the login and target directory. I'm assuming that without rewriting parts of Tusd (I'm not a fan of forking projects, it's the scourge of modern collaborative deb development...), my next step would be to read the .info file, get the original filename, and rename the .bin file, then delete the .info file. Am I on the right track, or might you suggest an easier/better approach? Your mention of hooks above might be obviated since I'm using Go on the backend.

Ideally, perhaps Tusd could have two modes on upload config, one as now, one with preservation of filenames?

Also, unrelatedly, does Tusd support HTTP/2, and does it use multiple HTTP streams to speed chunk delivery by utilizing more pipe bandwidth? WebSockets for maintained connection? Go's gobs for encoding/decoding speed? Just thoughts if not yet implemented.

Thanks!

@kvz
Copy link
Member

kvz commented Aug 24, 2016

it seems essentially an object store like OpenStack Swift, Amazon S3

It's not an object store itself, but we do offer adapters for S3, google cloud files, etc. tus is really only about the transfer, not the storage per se.

In our use case, we are uploading dozens to thousands of medical images, each image 100 GB to 1 TB in size, into controlled-access folders. Other applications need to access the images by their original file names and extensions, and there will be no filename conflicts. Some of the other applications are written in ASP.NET, some in Go.

Wow that's super interesting. We'd love to cover that in a case study if you're comfortable with that.

Am I on the right track, or might you suggest an easier/better approach?

If you can, I would avoid running a fork as well. I think hooks are the way to fly. That way you can run a release binary, which will prove helpful if you ever run into an issue. It will be harder for the community to replicate failures in custom builds. And it would be easier to dismiss issues too (not cool, but this is due to a human trait that all open source projects have to endure).

Anyway, I think hooks are the way to fly, you'll get your meta data over STDIN in JSON like so: https://github.com/tus/tusd/blob/master/.hooks/post-finish

You can use any language there to parse the JSON and move the file to a different location - preserving the original filename - not having to run a fork. For authentication / etc I'd probably run tusd on localhost, and use HAProxy or some other kind of proxy. This also solves the problem of having to run tusd as root if you want it listening on a port <1024.

Ideally, perhaps Tusd could have two modes on upload config, one as now, one with preservation of filenames?

I'm afraid there is little chance of this happening since the collision of filenames in most cases is so likely it is almost a certainty. Meaning we'd have to support behavior to serve a very small usecase, and people not aware of the issues around this might actually pick this more convenient option and then have files destroyed because of it.

Also, unrelatedly, does Tusd support HTTP/2, and does it use multiple HTTP streams to speed chunk delivery by utilizing more pipe bandwidth? WebSockets for maintained connection? Go's gobs for encoding/decoding speed? Just thoughts if not yet implemented.

It is compatible. For chunks I refer to our concat extension. Websockets aren't needed as we'll just open many more connections. We'll likely not support Gob as the protocol is intended to be spoken in an interoperable way across many platforms and languages.

@Acconut
Copy link
Member

Acconut commented Aug 24, 2016

Also, unrelatedly, does Tusd support HTTP/2

Sadly, @kvz, the answer is not that easy :) First of all, the tus protocol on its own absolutely supports HTTP/2, however for tusd, the story is a bit different. Go 1.6 introduced transparent and seamless support for HTTP/2 (see https://golang.org/pkg/net/http/):

The http package has transparent support for the HTTP/2 protocol when using HTTPS.

The tusd binary (the one inside cmd/tusd/) currently has no functionality to use TLS and therefore does not support HTTP/2. The tusd package, however, can be mounted to either HTTP or HTTPS listeners and is therefore possible to talk the new HTTP protocol, when configured correctly.

@kvz
Copy link
Member

kvz commented Aug 25, 2016

A I see, sorry for getting that part wrong, thanks for correcting!

Sent from mobile, pardon the brevity.

On 24 aug. 2016, at 23:14, Marius notifications@github.com wrote:

Also, unrelatedly, does Tusd support HTTP/2

Sadly, @kvz, the answer is not that easy :) First of all, the tus protocol on its own absolutely supports HTTP/2, however for tusd, the story is a bit different. Go 1.6 introduced transparent and seamless support for HTTP/2 (see https://golang.org/pkg/net/http/):

The http package has transparent support for the HTTP/2 protocol when using HTTPS.

The tusd binary (the one inside cmd/tusd/) currently has no functionality to use TLS and therefore does not support HTTP/2. The tusd package, however, can be mounted to either HTTP or HTTPS listeners and is therefore possible to talk the new HTTP protocol, when configured correctly.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@joshuadiezmo
Copy link

@kvz how can i get the file extension name?

@Acconut
Copy link
Member

Acconut commented Oct 11, 2016

@ReverseFlash28 If the uploader supplies the filename using metadata you may be able to extract the extension from there even though this requires strict validation and cannot be trusted in general. Therefore you may want to detect the file's type be looking for a magic numbers (e.g. see unix file(1) command) and then choosing based on the result the corresponding extension.

@heri16
Copy link

heri16 commented Feb 18, 2017

Hi @Acconut , could you provide example wrapper code on how to enable HTTP/2 on tusd over TLS?

@Acconut
Copy link
Member

Acconut commented Feb 20, 2017

@heri16 What does you setup look like? Do you use the tusd package in a custom Go application or run the tusd binary behind a proxy (such as nginx or Apache)?

@Acconut
Copy link
Member

Acconut commented Apr 6, 2017

Closing this issue due to a lack of information. Feel free to leave a comment if you want to continue the discussion :)

@Acconut Acconut closed this as completed Apr 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants