Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@tus/s3-store: add expiration extension #513

Merged
merged 2 commits into from Nov 27, 2023

Conversation

fenos
Copy link
Collaborator

@fenos fenos commented Nov 10, 2023

Implements Expiration Extension for the S3 Store.

a new configuration is added on the S3Options to configure how long an upload URL will be valid for:

new S3Store({
 expirationPeriodInMilliseconds: ((1000 * 60) * 60) * 24, // 24h
})

@fenos fenos force-pushed the tus-s3-store/expiration-extension branch 2 times, most recently from a65aa51 to 48dee94 Compare November 14, 2023 11:48
@fenos
Copy link
Collaborator Author

fenos commented Nov 14, 2023

I've changed slightly the implementation, instead of relying on the LastModified returned by S3 which will change whenever we overwrite the metadata file (for example in the declareLength on deferred uploads) we add an additional metadata field created-at

Additionally, if we configure an expiration for the upload, we can safely set the expiration for incompleteParts so that they will be automatically cleaned up once the URL "expires" which fixes #459

@fenos fenos force-pushed the tus-s3-store/expiration-extension branch 3 times, most recently from 1633870 to 96be736 Compare November 14, 2023 12:13
Copy link
Member

@Murderlon Murderlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I'm not sure this works currently as the store doesn't implement deleteExpired:

cleanUpExpiredUploads(): Promise<number> {
if (!this.datastore.hasExtension('expiration')) {
throw ERRORS.UNSUPPORTED_EXPIRATION_EXTENSION
}
return this.datastore.deleteExpired()
}

Your trick to let S3 do the work is neat but it incompatible with the API currently. Probably fine to do both, although it should be document well that running this particularly for the S3 store is not needed as it works automatically.

packages/s3-store/index.ts Outdated Show resolved Hide resolved
@fenos fenos force-pushed the tus-s3-store/expiration-extension branch from 96be736 to 5e0c544 Compare November 22, 2023 17:40
@fenos
Copy link
Collaborator Author

fenos commented Nov 22, 2023

@Murderlon @Acconut Unfortunately the Expires option during upload doesn't set the expiration to an object as i thought (it would have been too easy and nice) but it instead instructs the cache header when it should expire 😢

For this reason, I've made quite a few changes as follows:

  • Re-introduced the concept of Tus-Completed tag in order to create a lifecycle policy to automatically clean up expired files. I'm using putObject instead of putObjectTagging just because the .info file is quite small and it has better compatibility across S3 servers (like R2)

  • Implemented deleteExpired method on S3Store which relies on listMultipartUploads to list in-progress / unfinished uploads. For each expired multiPartUpload we delete .info - .part and abort the upload. One caveat is that MiniO doesn't honor the S3 server response of listMultipartUploads and returns an empty list see the issue Minio s3 not compliant with ListMultipartUploads minio/minio#13246 - this approach wouldn't work with MiniO because of that not sure how to go about this peculiarity. Other S3 compatible stores such as R2 seem to honour the S3 spec.

  • Added e2e tests for the S3Store against a MiniO server to cover all expiration use cases. I do mock the listMultipartUploads for the reason reported above since I'd prefer to keep the proper compatibility on S3

With the following approach, the user can choose to create a lifecycle policy on S3 and filtering by Tus-Complete=false after X days or run the deleteExpired function in a cronjob somewhere, they will both do the same thing

Let me know how it feels now 👍

@fenos fenos force-pushed the tus-s3-store/expiration-extension branch 5 times, most recently from 88421f7 to dc5da78 Compare November 22, 2023 18:27
@Acconut
Copy link
Member

Acconut commented Nov 23, 2023

I am 👍 for using S3 lifecycle configurations to support expiration. That is the most efficient way to implement and it scales well for huge buckets, although at the cost of a reduced granularity (expiration can only be configured in days, not hours or minutes).

Could you please provide an example for the lifecycle configuration that is intended to be used with your implementation from this PR?

  • Implemented deleteExpired method on S3Store which relies on listMultipartUploads to list in-progress / unfinished uploads.

Why is it necessary to delete these manually? We can use the AbortIncompleteMultipartUpload rule (https://docs.aws.amazon.com/AmazonS3/latest/API/API_LifecycleRule.html#AmazonS3-Type-LifecycleRule-AbortIncompleteMultipartUpload) to remove the incomplete multipart uploads and then use "normal" expiration to clean up .info and .part files.

@fenos
Copy link
Collaborator Author

fenos commented Nov 23, 2023

Hi @Acconut
The idea behind this is to let S3 handle the whole cleanup of all files including aborting multipart and deleting .info and .part this reduces the burden of having a cron job that deals with deleting expired .info and .part files.
I believe is a big win to let S3 do the job as you mentioned for big buckets.

The policy would be the following (pretty simple)

{
    "Rules": [
        {
            "Filter": {
                "Tag": {
                    "Key": "Tus-Complete",
                    "Value": "false"
                }
            },
            "Expiration": {
                "Days": 1 //This value should be equal or greater than the expiryTime set on Tus-server
            }
        }
    ]
}

I think it's okay to reduce the granularity and delete assets every 24h from my perspective.
In any case, this will be the user choice. Either going with lifecycle policies or setting up a manual cron job.
But at least we support both and we make it easier to offload the job to S3 😃

@Acconut
Copy link
Member

Acconut commented Nov 23, 2023

I agree with you here, but I don't understand why we need the deleteExpired implementation? Is it not possible to let S3 lifecycles clean up the multipart uploads for us? Why do we need to do this manually?

@Murderlon
Copy link
Member

Murderlon commented Nov 23, 2023

@Acconut as mentioned in #513 (review) that's the API we currently have. Not having that method would result in a crash. It's also possible to have a dummy method which does nothing, but why not keep the API consistent across stores for CRON jobs?

Copy link
Member

@Murderlon Murderlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this big effort! Testing locally with minio is pretty cool too.

I tried to add docs but I can't push to this PR. Here is a patch you can use with git apply 0001-Add-docs.path

0001-Add-docs.patch

packages/s3-store/index.ts Outdated Show resolved Hide resolved
packages/s3-store/index.ts Show resolved Hide resolved
@Murderlon Murderlon changed the title feat: s3 store expiration extension @tus/s3-store: add expiration extension Nov 23, 2023
@Acconut
Copy link
Member

Acconut commented Nov 23, 2023

as mentioned in #513 (review) that's the API we currently have. Not having that method would result in a crash. It's also possible to have a dummy method which does nothing, but why not keep the API consistent across stores for CRON jobs?

Yes, I am suggesting to just have a dummy method without doing any manual labor. Then the S3 lifecycles can take care of of this automatically without any cron job needed.

Maybe I am confused here. Is the cron implementation an alternative to using S3 lifecycles? Or do I need to run the cron implementation next to lifecycles in order to clean up all expired uploads (complete and incomplete)?

@fenos
Copy link
Collaborator Author

fenos commented Nov 23, 2023

Maybe I am confused here. Is the cron implementation an alternative to using S3 lifecycles? Or do I need to run the cron implementation next to lifecycles in order to clean up all expired uploads (complete and incomplete)?

To clarify, I think that we can say we support both!
It's ultimately the user's choice if they want to rely on S3 lifecycle or call cleanUpExpiredUploads() in a cronJob it's either one or the other.

Even if i don't think it would cause any issue if the user has both lifecycle and cronjob but it doesn't really make sense :)

@Acconut
Copy link
Member

Acconut commented Nov 23, 2023

Alright, I thought users were expected to use both methods. Thanks for clarifying this.

However, the current cron implementation is not able to clean completed uploads, is it? I can imagine that this is also desired by some users.

@Murderlon
Copy link
Member

Alright, I thought users were expected to use both methods. Thanks for clarifying this.

I clarified this in the docs git patch I shared above

However, the current cron implementation is not able to clean completed uploads, is it? I can imagine that this is also desired by some users.

According the spec:

The Server MAY remove unfinished uploads once they expire

So might be out of scope?

@Acconut
Copy link
Member

Acconut commented Nov 23, 2023

So might be out of scope?

The handling of completed uploads is not discussed much in the spec because that is mostly out of the scope of tus. That being said, tus servers often implement features that go beyond the content of the specification because those are needed to properly use and run tus severs, e.g. hooks for integrating with the main application. So deciding what is in scope or not is a blurry line :)

I think people will be interested in removing completed uploads as well since this will trim the bucket and keep it from growing indefinitely. Of course, you can decide that this is not something you want to handle in tus-node-server. However, if I were to implement expiration in tusd, I will handle this case as well.

@fenos
Copy link
Collaborator Author

fenos commented Nov 23, 2023

It is not trivial to add per Object expiration (for successful uploads), the reason is that you have no other way than to iterate the whole bucket keys and compare the LastModified date in S3 with a date you set as the expiration.

I believe we can add this functionality in another iteration if needed to not increase the scope of this current PR too much,
another PR another problem to solve 😄

@Murderlon
Copy link
Member

I'm okay with keeping it only for unfinished uploads for now. We can revisit if the need arises.

Regarding minio, I thought about it more and I think it's probably better to reuse the existing S3 credentials in CI and actually test against s3. This would make it consistent with the other tests, we avoid the addition of docker in the repository, and we can get rid of the mocking for listMultipartUploads.

What do you think?

@fenos
Copy link
Collaborator Author

fenos commented Nov 24, 2023

Yes agree, I will use S3 credentials

@fenos fenos force-pushed the tus-s3-store/expiration-extension branch from dc5da78 to 8b50456 Compare November 24, 2023 13:40
@fenos fenos force-pushed the tus-s3-store/expiration-extension branch 2 times, most recently from 18c45b3 to 04f72d5 Compare November 24, 2023 13:53
@fenos
Copy link
Collaborator Author

fenos commented Nov 24, 2023

@Murderlon I've switched the tests to use the S3 bucket with the credentials provided as envs,
However, the test that calls deleteExpired() returned me 120 file deleted instead of 2 😄 probably there were some unfinished uploads on the bucket.

I'm not sure if this was a one off thing or not, could you try to re-run the tests pls, i don't have the power to trigger a re-run.
If it still fails, we'll need to find another way to assert the correctness of the test

@fenos fenos force-pushed the tus-s3-store/expiration-extension branch 2 times, most recently from 4bb22b1 to c86b1da Compare November 24, 2023 14:26
@fenos
Copy link
Collaborator Author

fenos commented Nov 24, 2023

@Murderlon right! I've modified the test slightly to make sure it tests the right thing even if there are unfinished uploads (probably caused by other tests)

The test does the following:

  • Upload 6MB out of 11MB
  • Call .deleteExpired()
  • Make sure the .info file and the .part file are still there and not deleted
  • wait until the files should be expired
  • re-run .deleteExpired()
  • asserts that these 2 files are not there 👍

@fenos fenos force-pushed the tus-s3-store/expiration-extension branch 8 times, most recently from fe968c5 to e54178c Compare November 24, 2023 15:23
@fenos fenos force-pushed the tus-s3-store/expiration-extension branch from e54178c to c57acd5 Compare November 24, 2023 15:34
Copy link
Member

@Murderlon Murderlon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks!

packages/s3-store/README.md Outdated Show resolved Hide resolved
Co-authored-by: Merlijn Vos <merlijn@soverin.net>
@fenos
Copy link
Collaborator Author

fenos commented Nov 27, 2023

I think is ready to be merged! 🎉 👍

@Murderlon Murderlon merged commit 6fded1e into tus:main Nov 27, 2023
2 checks passed
@fenos fenos deleted the tus-s3-store/expiration-extension branch November 27, 2023 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants