Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stepfunctions-tasks): bedrock createModelCustomizationJob integration #31913

Open
wants to merge 89 commits into
base: main
Choose a base branch
from

Conversation

badmintoncryer
Copy link
Contributor

This PR was previously created and passed the community review, but the maintainer review stopped midway, and it was eventually closed. There shouldn’t be any issues with the content, so I am submitting the PR again.

Issue # (if applicable)

Closes #29042

Reason for this change

AWS stepfunctions support optimized integration with AWS bedrock.
Currently, only invokeModel is supported by CDK, but I would like createModelCustomizationJob to be supported in the same manner.

Description of changes

I've added CreatemodelCustomizationJob class.

const taskConfig = {
  baseModel: model,
  clientRequestToken: 'MyToken',
  customizationType: CustomizationType.FINE_TUNING,
  kmsKey,
  customModelName: 'MyCustomModel',
  customModelTags: [{ key: 'key1', value: 'value1' }],
  hyperParameters: {
    batchSize: '10',
  },
  jobName: 'MyCustomizationJob',
  jobTags: [{ key: 'key2', value: 'value2' }],
  outputDataS3Uri: outputBucket.s3UrlForObject(),
  trainingDataS3Uri: trainingBucket.s3UrlForObject(),
  validationDataS3Uri: [validationBucket.s3UrlForObject()],
  vpcConfig: {
    securityGroups: [new ec2.SecurityGroup(stack, 'SecurityGroup', { vpc })],
    subnets: vpc.isolatedSubnets,
  },
};

const task1 = new BedrockCreateModelCustomizationJob(stack, 'CreateModelCustomizationJob1', taskConfig);

const chain = sfn.Chain
  .start(new sfn.Pass(stack, 'Start'))
  .next(task1)
  .next(new sfn.Pass(stack, 'Done'));

new sfn.StateMachine(stack, 'StateMachine', {
  definitionBody: sfn.DefinitionBody.fromChainable(chain),
  timeout: cdk.Duration.seconds(30),
});

Description of how you validated changes

I've added both unit and integ tests.

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@badmintoncryer
Copy link
Contributor Author

@gracelu0 Thank you for your kindness check. I've addressed all of your comments.

Copy link
Contributor

@gracelu0 gracelu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this! I left some comments to improve the interface design for extensibility, and we will need to check the policy updates with security reviewer (so there may be some additional comments to address).

bedrock.FoundationModelIdentifier.AMAZON_TITAN_TEXT_G1_EXPRESS_V1,
);

const task = new tasks.BedrockCreateModelCustomizationJob(this, 'CreateModelCustomizationJob2', {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of all the // optional comments can we have a // required comment to point out the required properties?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

*
* @see https://docs.aws.amazon.com/bedrock/latest/userguide/encryption-custom-job.html
*
* @default - no encryption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from the linked docs, By default, Amazon Bedrock encrypts custom models with AWS owned keys. - can we specify that as the default here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

*
* @default - no prefix
*/
readonly prefix?: string;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this prefix field for? since I see for the s3Uri property the pattern is ^s3://[a-z0-9][-.a-z0-9]{1,61}(?:/[-!_*'().a-z0-9A-Z]+(?:/[-!_*'().a-z0-9A-Z]+)*)?/?$ so the prefix is expected to be s3://.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I apologize for the confusing naming. This prefix represents the absolute path from the root within the bucket, and is passed to the s3UrlForObject() method as follows.

        OutputDataConfig: {
          S3Uri: this.props.outputData.bucket.s3UrlForObject(this.props.outputData.path),
        },

I've changed it from "prefix" to "path" and added some explanations. How does this look?

*/
readonly role?: iam.IRole;

/**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I like the simplified BucketConfiguration interface here, I see that TrainingDataConfig already differs from ValidationDataConfig and OutputDataConfig with additional prop invocationLogsConfig.

To avoid making breaking changes in the future in case new sub-properties are added, can we make a base interface BucketConfiguration (maybe called DataBucketConfiguration) and create interfaces extending it for each of outputData, trainingData and validationData ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your nice suggestion. I defined each interfaces like that.

*
* @see https://docs.aws.amazon.com/bedrock/latest/APIReference/API_Validator.html
*/
readonly validationData: BucketConfiguration[];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like validationDataConfig is not required, so we need to update this and also the validation to allow minimum of 0 validators

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

S3Uri: this.props.outputData.bucket.s3UrlForObject(this.props.outputData.prefix),
},
RoleArn: this._role.roleArn,
TrainingDataConfig: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's okay if we don't include it in this PR, but we should ensure the contract allows us to add this in the future (see my other comment about extending the interface)

return this.props.role;
}
const role = new iam.Role(this, 'BedrockRole', {
assumedBy: new iam.ServicePrincipal('bedrock.amazonaws.com'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From https://docs.aws.amazon.com/bedrock/latest/userguide/model-customization-iam-role.html#model-customization-iam-role-trust it mentions the option to restrict the scope using Condition:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "account-id"
                },
                "ArnEquals": {
                    "aws:SourceArn": "arn:aws:bedrock:us-east-1:account-id:model-customization-job/*"
                }
            }
        }
    ] 
}

Can we add this to adhere to PoLP and reduce the permissions scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added Condition section!

const role = new iam.Role(this, 'BedrockRole', {
      assumedBy: new iam.ServicePrincipal('bedrock.amazonaws.com', {
        conditions: {
          StringEquals: {
            'aws:SourceAccount': account,
          },
          ArnEquals: {
            'aws:SourceArn': Stack.of(this).formatArn({
              service: 'bedrock',
              resource: 'model-customization-job',
              resourceName: '*',
            }),
          },
        },
      }),

@gracelu0 gracelu0 added the needs-security-review Related to feature or issues that needs security review label Jan 16, 2025
@aws-cdk-automation aws-cdk-automation removed the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Jan 16, 2025
@gracelu0 gracelu0 removed their assignment Jan 30, 2025
@mergify mergify bot dismissed gracelu0’s stale review February 6, 2025 14:00

Pull request has been modified.


private validatePattern(name: string, pattern: RegExp, value?: string): void {
if (value !== undefined && !Token.isUnresolved(value) && !pattern.test(value)) {
throw new ValidationError(`${name} must match the pattern ${pattern.toString()}, got: ${value}.`, this);

Check failure

Code scanning / CodeQL

Polynomial regular expression used on uncontrolled data High

This
regular expression
that depends on
library input
may run slow on strings starting with '0' and with many repetitions of '-'.
@aaythapa aaythapa self-assigned this Feb 10, 2025
@aaythapa aaythapa removed their assignment Feb 20, 2025
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

@aws-cdk-automation aws-cdk-automation dismissed their stale review February 22, 2025 00:29

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.


/**
* The S3 bucket configuration where the validation data is stored.
* If you don't provide a validation dataset, specify the evaluation percentage by the `Evaluation percentage` hyperparameter.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This information is described in management console.

スクリーンショット 2025-02-22 9 28 22

@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Feb 25, 2025
@badmintoncryer
Copy link
Contributor Author

@gracelu0 I'm really sorry for my late response. I've addressed all of your comments. Could you please reconfirm again?

@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: a979625
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
distinguished-contributor [Pilot] contributed 50+ PRs to the CDK effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. needs-security-review Related to feature or issues that needs security review p2 pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stepfunctions-tasks: support for bedrock createModelCustomizationJob task
5 participants