Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LakeFSFS - TokenProvider AWS (no login yet) #7604

Merged
merged 10 commits into from
Apr 10, 2024
Merged

Conversation

Isan-Rivkin
Copy link
Contributor

@Isan-Rivkin Isan-Rivkin commented Mar 26, 2024

Closes #7537 (without call to login, waiting on #7578).
No breaking changes. Currently not document and internal only.

tested with release hadoop-lakefs-assembly/0.2.4-RC.0

@Isan-Rivkin Isan-Rivkin self-assigned this Mar 26, 2024
@Isan-Rivkin Isan-Rivkin marked this pull request as draft March 26, 2024 18:39
Copy link

github-actions bot commented Mar 26, 2024

E2E Test Results - DynamoDB Local - Local Block Adapter

13 passed

@Isan-Rivkin
Copy link
Contributor Author

@arielshaqed it's still a draft but please take a look at the draft im sure you will have comments.
If it's not clear then i'll come back and continue.

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANKS!

Very nice. All code here is very good. But I think I am missing several parts, notably I don't understand how refresh can ever **re--**fresh the token, it will only set it once. I think we will need an additional executor / sleeping thread / something that periodically refreshes. So not sure what parts are included in this draft, basically.

AFAICT this requires sigV4 on the AWS endpoint. This is probably cool, but best validate versus requirements. In particular, us-east-1 is always special with regards to sigV4/sigV2.

public static final String TOKEN_AWS_CREDENTIALS_PROVIDER_ACCESS_KEY_SUFFIX = "token.aws.access.key";
public static final String TOKEN_AWS_CREDENTIALS_PROVIDER_SECRET_KEY_SUFFIX = "token.aws.secret.key";
public static final String TOKEN_AWS_CREDENTIALS_PROVIDER_SESSION_TOKEN_KEY_SUFFIX = "token.aws.session.token";
public static final String TOKEN_AWS_CREDENTIALS_PROVIDER_TOKEN_DURATION = "token.aws.sts.duration";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these require documentation somewhere, but especially a configuration option named "*_DURATION": what are the units, etc.?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed duration.
will add documentation before merging this as a final step in additional PR into this one, i just want to be sure about the actual configuration

String authProvider = FSConfiguration.get(conf, scheme, Constants.LAKEFS_AUTH_PROVIDER_KEY_SUFFIX, LakeFSClient.BASIC_AUTH);
// TODO(isan) here we will add support for other auth providers
if (authProvider != BASIC_AUTH) {
throw new IOException("lakeFS auth provider not implemented");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Include authProvider so that user has an idea what they tried to use and is not implemented.

import java.util.Map;
import com.amazonaws.util.json.JSONObject;

class GeneratePresignGetCallerIdentityRequest extends AmazonWebServiceRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AmazonWebServiceRequest includes many more methods, and I don't understand how they interact with the methods here. Say for instance I call #setRequestCredentialsProvider. What happens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

THANK YOU this is unnecessary, removed it and adopted the code.

Request<GeneratePresignGetCallerIdentityRequest> presignRequest(GeneratePresignGetCallerIdentityRequest r) throws IOException;
}

class GetCallerIdentityV4Presigner extends AWS4Signer implements STSGetCallerIdentitPresigner{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
class GetCallerIdentityV4Presigner extends AWS4Signer implements STSGetCallerIdentitPresigner{
class GetCallerIdentityV4Presigner extends AWS4Signer implements STSGetCallerIdentityPresigner{

Comment on lines 52 to 55
/**
* Seconds in a week, which is the max expiration time Sig-v4 accepts
*/
private final static long MAX_EXPIRATION_TIME_IN_SECONDS = 60 * 60 * 24 * 7;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we validate this? Doesn't a lower level / AWS server side validate this for us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

}
}

interface STSGetCallerIdentitPresigner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
interface STSGetCallerIdentitPresigner {
interface STSGetCallerIdentityPresigner {

import java.util.Map;
import com.amazonaws.util.json.JSONObject;

class GeneratePresignGetCallerIdentityRequest extends AmazonWebServiceRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java no longer needs each class in a file named for that class?! Yippee!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does unless the class is not declared as public 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

private void newToken() throws Exception {
String token = this.newPresignedGetCallerIdentityToken();
// call lakeFS to exchange the token for a lakeFS token
this.lakeFSAuthToken = "lakefs jwt 123";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes added comment on that (will not merge this PR before #7578)

}

@Override
public void refresh() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, it looks like this will never refresh! The first time it calls newToken it sets lakeFSAuthToken. And every time after that !needsNewToken, so it does nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see very clearly see why it's confusing sorry for that.

  1. Regardless of implementation which we can discuss - the goal of this function is to create a new token (with new TTL) if possible.
  2. I'm trying to make a standard for calling refresh token, mimicking how AWSCredentialProvider did that.
  3. I'm open for discussion on removing it for the initial release - e.g no refresh will be delivered.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So please document what it does: it clearly confuses at least one reader of the code.

@Isan-Rivkin
Copy link
Contributor Author

@arielshaqed please re-review.
I have addressed all your comments.
This PR will require additional look before final approve it depends on missing functionality PR #7578 being merged.
Once that merged I'll use the calls in newToken().

Copy link
Contributor

@arielshaqed arielshaqed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat!

How was this tested? Can we incorporate it into an Esti test, or into the contract test? I worry about regressing.

@@ -287,7 +287,7 @@ To export to S3:
<dependency>
<groupId>io.lakefs</groupId>
<artifactId>sdk</artifactId>
<version>1.12.0</version>
<version>1.14.0</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why +2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.14 + contains all external principal API's.
So it will end up 1.14.x with the login API merged.

HttpBasicAuth basicAuth = (HttpBasicAuth) apiClient.getAuthentication(BASIC_AUTH);
basicAuth.setUsername(accessKey);
basicAuth.setPassword(secretKey);
}else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
}else {
} else {

int stsExpirationInSeconds;
ApiClient lakeFSApi;

AWSLakeFSTokenProvider() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very odd constructor. It does nothing except change the scope. Can we get rid of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a default constructor I'm using it because the child class(es) TemporaryAWSCredentialsLakeFSTokenProvider initiates it and they also do some work before initiating it.

}

@Override
public void refresh() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So please document what it does: it clearly confuses at least one reader of the code.

@@ -0,0 +1,38 @@
package io.lakefs.auth;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this is a value type. Could we use org.immutables.value?

AWSSessionCredentials awsCreds = GetCallerIdentityV4PresignerTest.newMockAWSCreds();
GetCallerIdentityV4Presigner stsPresigner = new GetCallerIdentityV4Presigner();
GeneratePresignGetCallerIdentityRequest stsReq = new GeneratePresignGetCallerIdentityRequest(
new URI("https://sts.amazonaws.com"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be configurable. Think AWS in China, govclouds, various regional endpoints, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what you mean? this is configurable, it's just a test Im initiating the address, I could put other addresses.

GeneratePresignGetCallerIdentityRequest stsReq = new GeneratePresignGetCallerIdentityRequest(
new URI("https://sts.amazonaws.com"),
awsCreds,
new HashMap<String,String>(){{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use e.g. ImmutableMap?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency that exists only in tests we discussed and decided not to introduce into lakeFSFS.

Comment on lines 53 to 54
+"Action=GetCallerIdentity&"
+ "X-Amz-Algorithm=AWS4-HMAC-SHA256&"+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please position "+" signs consistently.

"Version=2011-06-15&"+
"X-Amz-SignedHeaders=host%3Bx-lakefs-server-id&"+
"X-Amz-Security-Token=[^&]+&"+
"X-Amz-Credential="+ awsCreds.getAWSAccessKeyId()+"%2F\\d{8}%2Fus-east-1%2Fsts%2Faws4_request&"+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally you would extract the credential to a group using (\w+), match, and then test equality to the AWS access key ID. The current code might behave weirdly if the access key ID is somehow invalid and contains strange characters. We can work around that, or we can make an effort to explain why it cannot happen, or we can make the effort to quote the access key ID -- but it will still be simplest simply not to use a regexp to do that.

I know it's just a test. OTOH, "code returned a bad access key" would be a really good error message :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with specific check for that param and a message


Pattern compiledPattern = Pattern.compile(pattern);
Matcher matcher = compiledPattern.matcher(url.toString());
Assert.assertEquals(String.format("URL %s does not match \npattern: %s",url, pattern), true, matcher.matches());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unit test verifies that the code returns all query parameters in a particular order. But I do not think that this should be part of the requirement for what we are testing.

So I would really prefer to parse the HRL, and then verify that the fields are correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the test to check and compare each parameter one by one as suggested, thanks!

@Isan-Rivkin Isan-Rivkin marked this pull request as ready for review April 7, 2024 09:16
@Isan-Rivkin
Copy link
Contributor Author

Thanks @arielshaqed addressed all your comments.
Merging this PR and waiting for #7578 to be merged.
Once it's merged I'll create additional PR with call to that API in the LakeFSClient and documentation accordingly.

@Isan-Rivkin Isan-Rivkin added the include-changelog PR description should be included in next release changelog label Apr 7, 2024
@Isan-Rivkin Isan-Rivkin changed the title LakeFSFS - TokenProvider AWS LakeFSFS - TokenProvider AWS (no login yet) Apr 7, 2024
@Isan-Rivkin Isan-Rivkin merged commit 99a60cd into master Apr 10, 2024
36 checks passed
@Isan-Rivkin Isan-Rivkin deleted the feature/m1-lakefsfs-iam branch April 10, 2024 09:16
emulatorchen pushed a commit to emulatorchen/lakeFS that referenced this pull request May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
include-changelog PR description should be included in next release changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lakeFSFS - support External principal login with IAM role
2 participants