Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: pre-signed URL for S3 storage #2855

Merged
merged 1 commit into from Jan 29, 2024

Conversation

reddec
Copy link
Contributor

@reddec reddec commented Jan 29, 2024

Adds automatically background refresh of all external links if they are belongs to the current blob (S3) storage. The feature is disabled by default in order to keep backward compatibility.

The background go-routine spawns once during startup and periodically signs and updates external links if that links belongs to current S3 storage.

fixes #1191

Original idea

The original idea was to sign external links on-demand, however, with current architecture it will require duplicated code in plenty of places. If do it, the changes will be quite invasive and in the end pointless: I believe, the architecture will be eventually updated to give more scalable way for pluggable storage. For example - Upload/Download interface without hard dependency on external link. There are stubs already, but I don't feel confident enough to change significant part of the application architecture.

Screenshot from 2024-01-29 17-05-22
Screenshot from 2024-01-29 17-08-44

…re belongs to the current blob (S3) storage. The feature is disabled by default in order to keep backward compatibility.

The background go-routine spawns once during startup and periodically signs and updates external links if that links belongs to current S3 storage.

The original idea was to sign external links on-demand, however, with current architecture it will require duplicated code in plenty of places. If do it, the changes will be quite invasive and in the end pointless: I believe, the architecture will be eventually updated to give more scalable way for pluggable storage. For example - Upload/Download interface without hard dependency on external link. There are stubs already, but I don't feel confident enough to change significant part of the application architecture.
@reddec reddec requested a review from boojack as a code owner January 29, 2024 09:12
@reddec reddec changed the title Adds automatically background refresh of all external links if they a… feat: pre-signed URL for S3 storage Jan 29, 2024
Copy link
Collaborator

@boojack boojack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your contribution.

@boojack boojack merged commit fa17dce into usememos:main Jan 29, 2024
7 checks passed
@reddec reddec deleted the feature/presigned-urls branch January 29, 2024 13:16
@ertuil
Copy link
Contributor

ertuil commented Jan 29, 2024

I am really happy to have this PR, and I have tried this feature with the latest test docker image with Aliyun OSS. I set presign to true and restarted the container several times. However, it seems that it doesn't work out, as the ExternalUrls are not changed (after I saw a 'links pre-signed' message in my log). I also observer that some resources' updated_ts in the database have been modified.

I am, again, really happy to have this PR, and willing to provide my all help if necessary.

Also, I am wondering, if it is possible not to set the resource's ACL to 'public_read' after the pre-sign feature works?

@reddec
Copy link
Contributor Author

reddec commented Jan 30, 2024

@ertuil

Thanks for the feedback.

Could you please help me to debug?

  1. Could you please ensure, that storage marked as default?
  2. Could you please check logs just right after restart? To force update it may require resetting update_at field at some time earlier than 24 hours ago. Or just upload any new file.
  3. Could you please double check the url in UI via debugger? It should contain signature after ? character

Also, I am wondering, if it is possible not to set the resource's ACL to 'public_read' after the pre-sign feature works?

It's a bit hard for me to speak for Aliyun, but generally speaking, for S3 compatible storage you may mark the bucket as private after this PR.

@ertuil
Copy link
Contributor

ertuil commented Jan 30, 2024

Of course, I am willing to help, and I have made a fast fix in #2860

  1. The storage is marked as default,
  2. The check log is just right after the restart, and the update_at field was a fake observation.
  3. The URL was not changed.

After examination, I find that in my case, the resources' ExternalLink's hostname is not a part of the endpoint, and the PreSignLink function returns the pre-signed link 'as-is' the original static one. It is fixed in #2860

There is a double-layer ACL, the bucket level, and the resource level. Backet ACL can be private, but the memos set the resources' ACL to public_read when it is uploaded to the server. Also in #2860 , I disable this forced public_read, and works fine in my case.

@reddec
Copy link
Contributor Author

reddec commented Jan 30, 2024

@ertuil I see... Thanks for the fix!

I was thinking maybe it's time to redesign storage layer 😅 to avoid so many heuristics.

My idea is to extend/introduce storage interface like this:

type ResourceProvider interface {
	Delete(ctx context.Context, key string) error
	Upload(ctx context.Context, key string, payload io.Reader) error
	Download(ctx context.Context, key string) (io.ReadCloser, error)
}

the table should be extended by field storage_provider.

In that case, local, webdav (hypothetical), s3 or specialized like Aliyun, will be just implementation of the same interface.

The response should be proxied by memos instance (ex: GET /api/v1/blobs/{res-id})

Pros:

  • security: all data is proxied by memos instance, no exposed external public links
  • maintainability: blobs storage logic encapsulated within very lean interface without exposing details. It should be much easier to add more storages.
  • memory optimization: current implementation reads everything in memory instead of streaming. For big files the current solution may cause OOM.

Cons:

  • no traffic/domain splitting: since memos serves both blobs and dynamic content, the load will be hire, as well as browsers may limit number of parallel requests to the same domain. However, realistically speaking, it shouldn't be an issue and can be solved in bigger scale just by increased number of parallel instances of memos.

The proposal contains plenty of serious changes so I would like to hear your opinions as well as get blessing from @boojack 😄 before start.

@boojack
Copy link
Collaborator

boojack commented Jan 30, 2024

@reddec SGTM. we are also planning to migrate the storage v1 api to gRPC, maybe we can implement it together. #2858

@leilei3167
Copy link

@reddec SGTM. we are also planning to migrate the storage v1 api to gRPC, maybe we can implement it together. #2858 SGTM。我们还计划将存储v1 api迁移到gRPC,也许我们可以一起实现。第2858章

I'm curious about the decision to implement apiv2 to gRPC. Could you please shed some light on the notable benefits compared to the HTTP format used in v1?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to Access Storage Files in a Private Bucket
4 participants