Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove linked addresses from KV #7725

Merged
merged 9 commits into from
May 8, 2024

Conversation

N-o-Z
Copy link
Member

@N-o-Z N-o-Z commented May 6, 2024

Closes #7716

Change Description

Background

Improve performance on write request using Get/Link physical address
Today address are written to KV under the repository partition for validation purposes.
This requires reading, writing and deleting of the linked address kv entry for every write.
As a result we are limited to 500 requests/sec.
To remove this limitation, it was decided to not track physical addresses generated by lakeFS in KV.
Instead, we chose a more lenient approach which only verifies the linked address can be used for a specific repository/branch/path and in a given timeframe

Testing Details

Added/modified unit tests

Breaking Change?

No

Additional info

Introduced a new configuration variable for the secret signing key for the linked address signature. It is required, but for backward compatibility purposes was given a default value which must be modified in production

@N-o-Z N-o-Z added include-changelog PR description should be included in next release changelog area/KV Improvements to the KV store implementation labels May 6, 2024
@N-o-Z N-o-Z requested a review from arielshaqed May 6, 2024 09:46
@N-o-Z N-o-Z self-assigned this May 6, 2024
Copy link

github-actions bot commented May 6, 2024

♻️ PR Preview 63ee494 has been successfully destroyed since this PR has been closed.

🤖 By surge-preview

Copy link

github-actions bot commented May 6, 2024

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

No large changes. Most worrying one is actually that the we encourage confusing the URL signing key with the really secret signing keys.

Also having the signing configuration in blockstore makes sense, so maybe the digest code should live in block/adapter.go?

@@ -631,6 +652,53 @@ func (c *Controller) StsLogin(w http.ResponseWriter, r *http.Request, body apige
writeResponse(w, r, http.StatusOK, responseToken)
}

func (c *Controller) verifyLinkAddress(repository, branch, path, physicalAddress string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I might prefer for this to be on Catalog. Controller is supposed to manage the protocol, while Catalog is supposed to manage... other things. Suppose we decide to add some fake S3 call to the gateway that links - how would we acccess it?

It would also make this function easier to test -- pkg/api is our most slowest most complex "unit" test.

In any case not in this PR, but consider opening an issue.

@@ -631,6 +652,53 @@ func (c *Controller) StsLogin(w http.ResponseWriter, r *http.Request, body apige
writeResponse(w, r, http.StatusOK, responseToken)
}

func (c *Controller) verifyLinkAddress(repository, branch, path, physicalAddress string) error {
address, signature, found := strings.Cut(physicalAddress, ",")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes storage namespace does not contain any colons. But it could!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you recommend to approach this?
Any set of characters can potential exist in the storage namespace.

@@ -5121,27 +5188,6 @@ func resolvePathList(objects, prefixes *[]string) []catalog.PathRecord {
return pathRecords
}

func NewController(cfg *config.Config, catalog *catalog.Catalog, authenticator auth.Authenticator, authService auth.Service, authenticationService authentication.Service, blockAdapter block.Adapter, metadataManager auth.MetadataManager, migrator Migrator, collector stats.Collector, cloudMetadataProvider cloud.MetadataProvider, actions actionsHandler, auditChecker AuditChecker, logger logging.Logger, sessionStore sessions.Store, pathProvider upload.PathProvider, usageReporter stats.UsageReporterOperations) *Controller {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, looks like you moved it to a better place.

@@ -5325,7 +5371,7 @@ func (c *Controller) GetUsageReportSummary(w http.ResponseWriter, r *http.Reques
writeResponse(w, r, http.StatusOK, response)
}

func (c *Controller) CreateUserExternalPrincipal(w http.ResponseWriter, r *http.Request, body apigen.CreateUserExternalPrincipalJSONRequestBody, userID string, params apigen.CreateUserExternalPrincipalParams) {
func (c *Controller) CreateUserExternalPrincipal(w http.ResponseWriter, r *http.Request, _ apigen.CreateUserExternalPrincipalJSONRequestBody, userID string, params apigen.CreateUserExternalPrincipalParams) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just my fmt

viper.SetDefault(BlockstoreTypeKey, "local")
}

viper.SetDefault("blockstore.encrypt.secret_key", DefaultSecretSigningKey)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let's use a different default here. This secret key considerably less secret, is really cheap to change. By using the same default we encourage people to give it the same value.

return nil
}

func (c *Controller) encryptAddress(logicalAddress string) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (c *Controller) encryptAddress(logicalAddress string) string {
func (c *Controller) signAddress(logicalAddress string) string {


func (c *Controller) getAddressWithSignature(repository, branch, path string) string {
physicalPath := c.PathProvider.NewPath()
return physicalPath + "," + c.encryptAddress(getAddressToSign(repository, branch, path, physicalPath))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"," should be a const somewhere, it will make it clearer that verification uses the same thing.

Comment on lines 667 to 669
h := hmac.New(sha256.New, []byte(c.Config.Blockstore.Encrypt.SecretKey))
h.Write([]byte(stringToVerify))
calculated := h.Sum(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can and should use encryptAddress() (or whatever it's new name will be).

@@ -221,6 +221,9 @@ type Config struct {
} `mapstructure:"ui_config"`
} `mapstructure:"auth"`
Blockstore struct {
Encrypt struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Encrypt struct {
Signing struct {

@arielshaqed
Copy link
Contributor

One more thing!

Deploying this change will invalidate all in-fought presigned URLs. We need to announce that. Please add a changelog entry to that effect.

@N-o-Z N-o-Z requested a review from arielshaqed May 6, 2024 12:40
Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, will continue later

@@ -1,6 +1,11 @@
# Changelog

# v1.20.0
# Unreleased
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request by @arielshaqed

@@ -31,7 +31,7 @@ var (
reShortCommitID = regexp.MustCompile(`[\d|a-f]{16}`)
reChecksum = regexp.MustCompile(`([\d|a-f]{32})|(0x[0-9A-F]{15})`)
reEndpoint = regexp.MustCompile(`https?://\w+(:\d+)?/api/v\d+/`)
rePhysicalAddress = regexp.MustCompile(`/data/[0-9a-v]{20}/[0-9a-v]{20}`)
rePhysicalAddress = regexp.MustCompile(`/data/[0-9a-v]{20}/[0-9a-v]{20}?(,.+)*`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you have 2 problems....
And one of them is /data/aaaaaaaaaaaaaaaaaaaa/aaaaaaaaaaaaaaaaaaaa,sd,cds,sdc,sdc,sdc,sd,csdc,

t.Fatal(err)
}

t.Run("get and link physical address", func(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIC you don't have an happy flow test now..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I do

@@ -221,6 +221,9 @@ type Config struct {
} `mapstructure:"ui_config"`
} `mapstructure:"auth"`
Blockstore struct {
Signing struct {
SecretKey SecureString `mapstructure:"secret_key" validate:"required"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking change... remove the required part

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its required but has a default per discussion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot work otherwise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best PR EVER!!!

@@ -2768,6 +2744,53 @@ func (c *Catalog) listRepositoriesHelper(ctx context.Context) ([]*graveler.Repos
return repos, nil
}

func (c *Catalog) VerifyLinkAddress(repository, branch, path, physicalAddress string) error {
address, signature, found := strings.Cut(physicalAddress, LinkAddressSigningDelimiter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the following storage namespace work?
s3://bucket/s,o,m,e/p,a,t,h/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still fails for some storage namespaces.

It seems to me that we are putting a lot of effort into writing a single incorrect line of code, when we could write <5 correct lines like here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-o-Z as Ariel mentioned, the issue still exists

}

func getAddressToSign(repository, branch, path, physicalAddress string) string {
return repository + branch + path + physicalAddress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will these calls return different address to sign?

  1. getAddressToSign("aa", "bb", "cc", "dd")
  2. getAddressToSign("a", "ab", "bc", "cdd")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Why not just sign the actual generated physical address?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We said that signature should be dependant on repo, branch and path (i.e. you can only use a linked address with a single logical path)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@N-o-Z how so? It's just string concatenation..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, to answer both questions:

  1. If we sign only the physical address, then users can user generated linked addresses of repo1/branch2/path3 in repo4/branch5/path6
  2. Because physical address is always unique, the prefix for the address to sign will always be unique and therefore we will never get to a situation of:
    getAddressToSign("aa", "bb", "cc", "dd")
    getAddressToSign("a", "ab", "bc", "cdd")

However, regarding 2, if it makes more sense I can add some separator between the parts of the signature


jobData := []struct {
name string
interval time.Duration
fn func(context.Context)
}{
{
name: "delete expired link addresses",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: Some tokens will be kept forever in the kv..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we shouldn't invest on the cleanup as this will probably won't make a dent in the DB

Comment on lines 2759 to 2761
h := hmac.New(sha256.New, []byte(c.signingKey))
h.Write([]byte(stringToVerify))
calculated := h.Sum(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract to func?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty please? 🥺

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed that

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I think we can do a bit better on signing the exact string of the physical address, and on precisely extracting the digest. They are both <10 lines of code, combined.

@@ -2768,6 +2744,53 @@ func (c *Catalog) listRepositoriesHelper(ctx context.Context) ([]*graveler.Repos
return repos, nil
}

func (c *Catalog) VerifyLinkAddress(repository, branch, path, physicalAddress string) error {
address, signature, found := strings.Cut(physicalAddress, LinkAddressSigningDelimiter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still fails for some storage namespaces.

It seems to me that we are putting a lot of effort into writing a single incorrect line of code, when we could write <5 correct lines like here.

}

func getAddressToSign(repository, branch, path, physicalAddress string) string {
return repository + branch + path + physicalAddress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Why not just sign the actual generated physical address?

Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the comments were not resolved..

@@ -2768,6 +2744,53 @@ func (c *Catalog) listRepositoriesHelper(ctx context.Context) ([]*graveler.Repos
return repos, nil
}

func (c *Catalog) VerifyLinkAddress(repository, branch, path, physicalAddress string) error {
address, signature, found := strings.Cut(physicalAddress, LinkAddressSigningDelimiter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@N-o-Z as Ariel mentioned, the issue still exists

Comment on lines 2759 to 2761
h := hmac.New(sha256.New, []byte(c.signingKey))
h.Write([]byte(stringToVerify))
calculated := h.Sum(nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty please? 🥺

}

func getAddressToSign(repository, branch, path, physicalAddress string) string {
return repository + branch + path + physicalAddress
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@N-o-Z how so? It's just string concatenation..

@N-o-Z N-o-Z requested a review from itaiad200 May 8, 2024 11:58
Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! But please be sure of the regexp :-/

esti/lakectl_util.go Outdated Show resolved Hide resolved
Copy link
Contributor

@itaiad200 itaiad200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@N-o-Z N-o-Z force-pushed the task/remove-linked-addresses-from-kv-7716 branch from 6edb8c4 to 17f6a4a Compare May 8, 2024 15:19
@N-o-Z N-o-Z enabled auto-merge (squash) May 8, 2024 15:20
@N-o-Z N-o-Z disabled auto-merge May 8, 2024 16:56
@N-o-Z N-o-Z merged commit 686802c into master May 8, 2024
35 of 36 checks passed
@N-o-Z N-o-Z deleted the task/remove-linked-addresses-from-kv-7716 branch May 8, 2024 17:01
emulatorchen pushed a commit to emulatorchen/lakeFS that referenced this pull request May 27, 2024
* Remove linked addresses from KV

* CR Fixes

* Add changelog

* Change delimiter and encoding

* More fixes

* More fixes 2

* More More Fixes

* Update esti/lakectl_util.go

Co-authored-by: Ariel Shaqed (Scolnicov) <ariels@treeverse.io>

* Fix azure regex

---------

Co-authored-by: Ariel Shaqed (Scolnicov) <ariels@treeverse.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/KV Improvements to the KV store implementation include-changelog PR description should be included in next release changelog performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improvement: Remove Token Validation from Get&LinkPhysicalAddress
4 participants