Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upload using tus but serve files directly from s3 or using nginx without tus server #621

Closed
prolific opened this issue May 29, 2024 · 5 comments · Fixed by #624
Closed

Comments

@prolific
Copy link

I am using tus along with uppy client in react app for uploading files which is working great.

Objective:
But I do not want to use tus at all for serving my uploaded files. I want to serve files directly from s3 or from local file system using nginx (whichever datastore I use) without having to pass the request through tus server. I can obviously put both s3 and local file system behind cdn of my choice. I feel the real benefit of tus is while uploading files and not after that. It would be great if there is an easier way to decouple the serving of files part.

Issues:

  1. URL generated by tus is very different from the file name as well as the path where it is actually stored. So without tus server I can't map the url directly to actual file path. Tus basically generates a hash for the complete path of the file and then use that as the file id.
  2. When I access the files in s3 directly then they are served using the contentType: application/offset+octet-stream and hence the file is downloaded instead of being rendered as image which I think I can override using the contentType metadata in tus client. But is there a way to do this in tus node server automatically for all files based on original file mimetype or something?
  3. Is there any way I can force all the files to be stored using a custom name along with unique id and extension. Example: some-file-name-${unique_id}.jpg. The combination of both file name and unique id can help prevent conflicts between multiple files. Also, the extension can help in identifying the type of file quickly.
  4. Since I want to serve files directly from respective datastore so can i somehow return a custom url (url of cdn or s3) once the file is uploaded.
  5. Can I simply disable the generation of metadata files along with the actual file because I want to serve directly from s3 without the head request?

Current Code:

const datastore = new S3Store({
    partSize: 8 * 1024 * 1024, // Each uploaded part will have ~8MiB,
    s3ClientConfig: {
        bucket: process.env.AWS_BUCKET!,
        region: process.env.AWS_REGION!,
        credentials: {
            accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
            secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
        },
    },
});

const tusServer = new TusServer({
    path: options.path,
    respectForwardedHeaders: true,
    datastore: datastore,
    namingFunction(req) {
        const id = crypto.randomUUID();
        const userId = "userid"; // Parse user id from request
        return `users/${userId}/${id}`
    },
    generateUrl(req, { proto, host, path, id }) {
        id = Buffer.from(id, 'utf-8').toString('base64url')
        return `${proto}://${host}${path}/${id}`
    },
    getFileIdFromRequest(req) {
        const reExtractFileID = /([^/]+)\/?$/
        const match = reExtractFileID.exec(req.url as string)

        if (!match || options.path.includes(match[1])) {
            return;
        }

        return Buffer.from(match[1], 'base64url').toString('utf-8')
    },
});

server.addContentTypeParser('application/offset+octet-stream', (req, payload, done) => done(null));
server.all(`${options.path}`, (req, res) => { tusServer.handle(req.raw, res.raw); });
server.all(`${options.path}/*`, (req, res) => { tusServer.handle(req.raw, res.raw); });

Note that I am storing files of each user separately in their own directory.

Would really appreciate any help in this direction. Thanks!

@Murderlon
Copy link
Member

Hi, it would probably be nice to have storage information in the Upload model, which is passed along in hooks. Here is what tusd has in it:

  // Storage contains information about where the upload is stored. The exact values
  // depend on the storage that is used and are not available in the pre-create hook.
  // This example belongs to the file store. 
  "Storage": {
       // For example, the filestore supplies the absolute file path:
       "Type": "filestore",
       "Path": "/my/upload/directory/14b1c4c77771671a8479bc0444bbc5ce",

       // The S3Store and GCSStore supply the bucket name and object key:
       "Type": "s3store",
       "Bucket": "my-upload-bucket",
       "Key": "my-prefix/14b1c4c77771671a8479bc0444bbc5ce"
  }

Then you can use onUploadFinish to store this information somewhere in order to serve it without tus.

2. When I access the files in s3 directly then they are served using the contentType: application/offset+octet-stream and hence the file is downloaded instead of being rendered as image which I think I can override using the contentType metadata in tus client. But is there a way to do this in tus node server automatically for all files based on original file mimetype or something?

No, content-type MUST be set to application/offset+octet-stream for PATCH requests to the tus server, as per the protocol specification. You should have the original file name in the metadata though.

3. Is there any way I can force all the files to be stored using a custom name along with unique id and extension

Yes with namingFunction, as you already have in your example.

4. Since I want to serve files directly from respective datastore so can i somehow return a custom url (url of cdn or s3) once the file is uploaded

Not immediately as a response from the tus server to a tus client. Although you could alter the metadata server-side to include the your CDN URL to make your client aware of it.

5. Can I simply disable the generation of metadata files along with the actual file because I want to serve directly from s3 without the head request?

They are required unfortunately.

@prolific
Copy link
Author

@Murderlon Thanks for your inputs. I took some time to try out your suggestions:

1.

Then you can use onUploadFinish to store this information somewhere in order to serve it without tus.

This makes sense. But the second upload param I received (with name upload) doesn't contain information regarding storage. Is that something I have to enable myself somewhere?

2.

  1. When I access the files in s3 directly then they are served using the contentType: application/offset+octet-stream and hence the file is downloaded instead of being rendered as image which I think I can override using the contentType metadata in tus client. But is there a way to do this in tus node server automatically for all files based on original file mimetype or something?

No, content-type MUST be set to application/offset+octet-stream for PATCH requests to the tus server, as per the protocol specification. You should have the original file name in the metadata though.

Without a proper content type in s3, all files are downloaded automatically when accessed instead of rendering as images or something else. This is happening obviously because the content type in s3 is binary/octet-stream instead of something like image/png. Is there any possible solution to fix this. Because this particular content type restriction is kind of forcing the use of tus for serving files. I can understand that tus requires content type to be binary/octet-stream for resumable uploads but once the upload is finished it would be great to fix the content type in s3 and for the files stored in local files store.

3.

  1. Is there any way I can force all the files to be stored using a custom name along with unique id and extension

Yes with namingFunction, as you already have in your example.

This works and makes sense as well.

4.

  1. Since I want to serve files directly from respective datastore so can i somehow return a custom url (url of cdn or s3) once the file is uploaded

Not immediately as a response from the tus server to a tus client. Although you could alter the metadata server-side to include the your CDN URL to make your client aware of it.

I am not sure if I fully understand you here. Do you mean to say that while returning the response from server when the upload finishes, I just add the cdn url in the response metadata?

5.

  1. Can I simply disable the generation of metadata files along with the actual file because I want to serve directly from s3 without the head request?

They are required unfortunately.

Okay. Can I instead force the info files to be stored into a separate folder or is it necessary for both the files (main file and info file) to be stored side by side?

Further Question:

  1. As per typescript typings, the second param of namingFunction called metadata is optional. In which scenario can that be undefined? If I want to use the combination of filename and unique id as the name with which the file will be stored then I have to read metadata to get the original filename but if the metadata is undefined in some scenario then that will be a problem.

@Murderlon
Copy link
Member

Murderlon commented Jun 3, 2024

This makes sense. But the second upload param I received (with name upload) doesn't contain information regarding storage. Is that something I have to enable myself somewhere?

No I meant to say that tusd implements this into the upload model and tus Node.js does not (yet). So that would require a PR first before you have access. I can take a look at that.

Without a proper content type in s3, all files are downloaded automatically when accessed instead of rendering as images or something else. This is happening obviously because the content type in s3 is binary/octet-stream instead of something like image/png. Is there any possible solution to fix this. Because this particular content type restriction is kind of forcing the use of tus for serving files. I can understand that tus requires content type to be binary/octet-stream for resumable uploads but once the upload is finished it would be great to fix the content type in s3 and for the files stored in local files store.

When we create an upload in S3, we set the ContentType to whatever the content type is inside the metadata. Can you check whether your client sets the content type in meta data?

if (upload.metadata?.contentType) {
request.ContentType = upload.metadata.contentType
}

I am not sure if I fully understand you here. Do you mean to say that while returning the response from server when the upload finishes, I just add the cdn url in the response metadata?

You can change the status code, body, and headers in onUploadFinish. This could be used to return your new URL in the body, for which you have to set the status_code to 200 (theoretically not allowed in the protocol, but practically fine). The default response for PATCH is 204 which according to HTTP itself is not allowed to have a body.

const server = new Server({
  // ..
  async onUploadFinish(req, res, upload) {
    const url = await getURLForCDN(req, res, upload)
    // any headers you may want too
    const headers = {}
    return { res, status_code: 200, headers, body: JSON.stringify({ url }) }
  },
})

Okay. Can I instead force the info files to be stored into a separate folder or is it necessary for both the files (main file and info file) to be stored side by side?

For now they are always stored together. Vast majority of use cases don't need it separate.

6. As per typescript typings, the second param of namingFunction called metadata is optional. In which scenario can that be undefined? If I want to use the combination of filename and unique id as the name with which the file will be stored then I have to read metadata to get the original filename but if the metadata is undefined in some scenario then that will be a problem.

If the client doesn't send any metadata, you don't have any metadata on the server either. Make sure your client always sends it.

@prolific
Copy link
Author

prolific commented Jun 4, 2024

@Murderlon Thanks again for your thorough response.

No I meant to say that tusd implements this into the upload model and tus Node.js does not (yet). So that would require a PR first before you have access. I can take a look at that.

Yeah, that would be really helpful. Thanks.

When we create an upload in S3, we set the ContentType to whatever the content type is inside the metadata. Can you check whether your client sets the content type in meta data?

Yes, this is what I meant originally:
Can we set this ContentType inside metadata server side based on some logic instead of appending/controlling it client side? Probably in a function called onUploadCreate or maybe some other function where we can modify this metadata before uploading the file to S3?

@Murderlon
Copy link
Member

Can we set this ContentType inside metadata server side based on some logic instead of appending/controlling it client side? Probably in a function called onUploadCreate or maybe some other function where we can modify this metadata before uploading the file to S3?

Yes you can:

const server = new Server({
  // ..
  async onUploadCreate(req, res, upload) {
    const contentType = await extractContentType(req, res, upload)
    const metadata = { ...upload.metadata, contentType }
    return { res, metadata }
  },
})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants