Skip to content

feat(core): add include property specifying particular files for assets #33132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 38 commits into from

Conversation

go-to-k
Copy link
Contributor

@go-to-k go-to-k commented Jan 24, 2025

Issue # (if applicable)

Closes #26107.

Reason for this change

Currently the option for Assets etc. has the exclude only for filtering files. Also I had submitted the PR that adds include before, but closed it because the exclude can cover situations for staging particular files by using negation patterns such as * and !filenames. (see: #26107 (comment)).

But I encountered the situation that it is difficult to express our requirements with exclude only.

The example of directories and files is the following (y: required, x: not required).

- main.py          y
- dummy-1          x
- dummy-2          x
- dummy-3          x
- dummy-...        x
- subdir           
  - examples       x    
    - test.sh      x
    - test.py      x
    - README.md    x
  - a.py           y
  - b.py           y
  - binary         y
  - my.json        y

Now for the situation, we may use the following pattern with exclude only.

exclude: ['*', '!main.py', '!subdir', '!subdir/**/*', 'subdir/examples/*'],

However, there are some confusing points if the double negative is desired:

  • Negation (!) and positive are mixed and difficult to understand.
  • It does not work if you change the order in which you specify them (i.e. the double negative no longer works).
    • e.g. ['*', 'subdir/examples/*', '!main.py', '!subdir', '!subdir/**/*']
    • -> It includes ‘subdir/examples/*’ that should be excluded.

In such cases, the include option makes it simple to achieve the following. (This is very easy to understand and you don't have to mind the order.)

include: ['main.py', 'subdir', 'subdir/**/*'],
exclude: ['subdir/examples/*'],

The include property is also present in other modules that filter particular files, such as cdk watch, codepipeline Git triggers and BucketDeployment for S3, so it isn't unnatural.

In particular, it should be good for developers to be able to match the filtering settings of cdk watch with hotswap and the settings of the asset files added in this PR.

Description of changes

  • Add an include parameter to interface FileOptions as well as exclude.
  • Check for matching include in copyDirectory, where the actual file search is done.
  • If the same file is specified in include and exclude, exclude takes precedence.

Describe any new or updated permissions being added

Description of how you validated changes

Both unit and integ tests.

Checklist


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@github-actions github-actions bot added p2 distinguished-contributor [Pilot] contributed 50+ PRs to the CDK labels Jan 24, 2025
@aws-cdk-automation aws-cdk-automation requested a review from a team January 24, 2025 12:21
Copy link
Collaborator

@aws-cdk-automation aws-cdk-automation left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This review is outdated)

comments

comments

comments

rm comments
Copy link

codecov bot commented Jan 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.04%. Comparing base (74cbe27) to head (1669ace).
Report is 24 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #33132      +/-   ##
==========================================
+ Coverage   83.98%   84.04%   +0.06%     
==========================================
  Files         120      121       +1     
  Lines        6976     7003      +27     
  Branches     1178     1187       +9     
==========================================
+ Hits         5859     5886      +27     
  Misses       1005     1005              
  Partials      112      112              
Flag Coverage Δ
suite.unit 84.04% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
packages/aws-cdk ∅ <ø> (∅)
packages/aws-cdk-lib/core 84.04% <100.00%> (+0.06%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@aws-cdk-automation
Copy link
Collaborator

This PR has been in the CHANGES REQUESTED state for 3 weeks, and looks abandoned. Note that PRs with failing linting check or builds are not reviewed, please ensure your build is passing

To prevent automatic closure:

  • Resume work on the PR
  • OR request an exemption by adding a comment containing 'Exemption Request' with justification e.x "Exemption Request: "
  • OR request clarification by adding a comment containing 'Clarification Request' with a question e.x "Clarification Request: "

This PR will automatically close in 7 days if no action is taken.

@kaizencc
Copy link
Contributor

hi @go-to-k, what's the status on this PR? i don't want to rush you but the automation may eventually close your PR as abandoned...

@go-to-k
Copy link
Contributor Author

go-to-k commented Feb 18, 2025

hi @kaizencc . Thanks for letting me know!
I've been working on other PRs and put this one off. I'm going to do it later so I'll update the branch so it doesn't get closed.

@aws-cdk-automation aws-cdk-automation dismissed their stale review February 18, 2025 00:41

✅ Updated pull request passes all PRLinter validations. Dismissing previous PRLinter review.

// eslint-disable-next-line @typescript-eslint/no-require-imports
const minimatch = require('minimatch');

export function matchIncludePatterns(patterns: string[], absoluteRootPath: string, absoluteFilePath: string): boolean {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +89 to +91
* If matching the symlink but not its target, it is included (i.e. the `include` patterns match
* the symlink path itself, regardless of its target). This is the same behavior even if `follow`
* is not `SymlinkFollowMode.NEVER`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match the current exclude behavior

@go-to-k go-to-k marked this pull request as ready for review February 23, 2025 13:45
@go-to-k go-to-k requested a review from a team as a code owner February 23, 2025 13:45
@aws-cdk-automation aws-cdk-automation added the pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. label Feb 23, 2025
Comment on lines +39 to +42
const destDirPath = path.dirname(destFilePath);
if (!fs.existsSync(destDirPath)) {
fs.mkdirSync(destDirPath, { recursive: true });
}
Copy link
Contributor Author

@go-to-k go-to-k Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory creation code was added because:

  1. Previously, parent directories were already created by the isDirectory() processing before copying files/symlinks (*1).
  2. With the include option to copy only specific files, directories that don't match the include patterns are not created (*2). Without this behavior, we would end up creating unnecessary directories along the traversal path when we finally reach files that aren't included. As a result, we need to explicitly create parent directories for included files/symlinks, particularly to handle cases where those parent directories aren't included.

*1

    if (stat && stat.isDirectory()) {
      fs.mkdirSync(destFilePath);
      copyDirectory(sourceFilePath, destFilePath, options, rootDir);
      stat = undefined;
    }

*2

    if (stat && stat.isDirectory()) {
      if (included) {
        fs.mkdirSync(destFilePath, { recursive: true });
      }
      copyDirectory(sourceFilePath, destFilePath, options, rootDir);
      stat = undefined;

@go-to-k
Copy link
Contributor Author

go-to-k commented Feb 23, 2025

@kaizencc

I've completed the necessary implementation, so I'm ready for review.

@kaizencc kaizencc self-assigned this Apr 8, 2025
@shikha372 shikha372 added the skip-abstractions-board signal to automated workflow to skip adding to project board label Apr 10, 2025
@aws-cdk-automation
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: AutoBuildv2Project1C6BFA3F-wQm2hXv2jqQv
  • Commit ID: 75b2790
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@tmokmss
Copy link
Contributor

tmokmss commented Jun 28, 2025

Hi @go-to-k ! I appreciate the effort put into identifying and addressing a real pain point with file filtering in CDK. As a reviewer of the last PR, I still have some concerns about the current implementation approach that I'd like to share:

I see some potential issues with adding include to the core API:

  1. Modal behavior: Adding include creates different default behaviors - not specifying it means "include everything" while adding any include pattern suddenly means "exclude everything else". This modal behavior could lead to confusing outcomes and increased cognitive load for developers.

  2. Fixed priority limitations

    • While making exclude always take precedence over include does address ordering issues, it creates a rather opinionated approach that limits flexibility for certain use cases. Consider a case where you want to:
      • Exclude an entire directory with exclude: ['some/directory/**']
      • But still include one specific file with include: ['some/directory/special.txt']
    • Because the Asset is one the most fundamental constructs in CDK, we might want to avoid adding a prop that only works in certain scenarios. After all, the ordering issue is unavoidable for this use case (aws cli s3 module also has this doc).
  3. Unnecessary forced decisions

    • Adding both include and exclude options forces users to make additional decisions:
      • "Should I use include or exclude for this case?"
      • "Which has precedence in CDK specifically?" (users will need to remember or look up CDK-specific rules)
      • "What happens if I use both?"
    • This creates unnecessary complexity when the existing pattern with negations can already handle all use cases, albeit with some verbosity (+ following the very similar practices with .dockerignore or .gitignore.)

Instead of modifying the core API, have you considered creating an extension class for your project? This would provide the same functionality with more simplified code. (Yes, it does not directly work with Lambda Code.fromAsset, but you can create a wrapper easily with the same approach):

class YourAsset extends Asset {
  constructor(scope: Construct, id: string, props: {
    include: string[];  // New include option
  } & AssetProps) {
    // Transform include patterns into exclude negation patterns.
    // This only works when exclude always takes precedence over include, but hides the ordering concerns.
    const transformedExclude = [
      '*', 
      ...props.include.map(pattern => `!${pattern}`), 
      ...(props.exclude ?? [])
    ]
 
    super(scope, id, {
      ...props,
      exclude: transformedExclude
    });
  }
}

Although I don't think this wrapper should live in the aws-cdk-lib core module, it requires less code and offerrs the syntactic convenience you're looking for. What do you think?

@go-to-k
Copy link
Contributor Author

go-to-k commented Jun 29, 2025

Hi @tmokmss . Thanks for your comments!

I understand your concerns, but let me provide some replies:

1. Modal behavior

Are you arguing for the unnecessariness of include filtering itself? Or are you arguing that when introducing this feature, the default behavior should be changed?

If it's the former case, this discussion would not be limited to just this PR, but would concern include/exclude filtering itself. Include/exclude filtering has been introduced in "cdk watch" and other features in CDK, and it should be a common and natural feature that has been adopted in various software outside of CDK as well. Also, "cdk watch" in particular is a feature deeply related to assets, and I think it would be a great benefit to be able to align asset filtering settings with "cdk watch" settings.

If it's the latter case, I would appreciate it if you could share any good suggestions you might have. (However, "cdk watch" has the same logic, so I think it would be good to match "cdk watch".)

2. Fixed priority limitations

This same limitation exists in cdk watch as well, which uses the identical priority order (exclude takes precedence over include). Since cdk watch is already widely used and accepted by the community, I don't think this is as significant an issue as suggested.

Maintaining consistency with the existing cdk watch behavior actually seems more beneficial than introducing a different filtering logic for assets.

Also, in that case, using exclude without include would be good even if adding include.

3. Unnecessary forced decisions

Regarding questions about precedence and behavior when using both options, these are standard documentation challenges that can be resolved through clear API documentation and examples.

Choosing between include and exclude is a trade-off with flexibility. I think it's a good thing to give users the option to write more simply with include.

Also, .dockerignore and .gitignore have the constraint of expressing all filtering logic within a single file format, whereas include and exclude properties can be designed as clearly separated configuration options, so the conditions are different.


Instead of modifying the core API, have you considered creating an extension class for your project?

Thank you for the good example. However, if I only consider my project, I can handle it in any way. It's already possible to handle it with exclude alone. Rather, I think it should be introduced into the CDK library.

@tmokmss
Copy link
Contributor

tmokmss commented Jun 30, 2025

@go-to-k Thanks for your detailed response. I have a few additional thoughts:

  1. Limited reusability between assets and watch: Most CDK projects contain multiple assets, each with their own filtering needs. This means we can't simply reuse the same include/exclude settings between individual assets and the watch configuration anyway, which weakens the consistency argument. Plus, not all CDK users uses or are familiar with cdk watch, where I'm not sure how many users are actually happy with this change.

  2. Default behavior switch: The modal behavior issue is specifically introduced by this PR (isn't it?). With exclude-only filtering, there's no mode switch to worry about - it works consistently regardless of whether or not you specify an exclude/include, which I prefer :)

While I don't think the feature is completely unnecessary, I question whether the benefits outweigh the added complexity and potential confusion. This is ultimately a maintainer decision, but I'd ask them to consider if this feature - which works well only for specific use cases - belongs in such a fundamental construct. I acknowledge your point about cdk watch having the same limitations, but that's a separate consideration.

I appreciate your dedication to improving the CDK developer experience. These are just my thoughts as a reviewer to help the maintainers make an informed decision. Thanks!

@go-to-k
Copy link
Contributor Author

go-to-k commented Jun 30, 2025

@tmokmss

I appreciate your response.
I understand what you are saying, and it is true that I can't reveal the unassailable advantages of this feature right now. For that reason, I'm going to close this PR for now.
There is still the issue for this feature request left and I will work on it again if any further discussion arises or opinions on the superiority of the feature are raised.

@go-to-k go-to-k closed this Jun 30, 2025
@go-to-k go-to-k deleted the asset-include branch June 30, 2025 07:27
Copy link
Contributor

Comments on closed issues and PRs are hard for our team to see.
If you need help, please open a new issue that references this one.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 30, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
distinguished-contributor [Pilot] contributed 50+ PRs to the CDK effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p2 pr/needs-community-review This PR needs a review from a Trusted Community Member or Core Team Member. skip-abstractions-board signal to automated workflow to skip adding to project board
Projects
None yet
Development

Successfully merging this pull request may close these issues.

core: support specifying included files for Code.from_asset
5 participants