Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDK needs garbage collection for assets in the cdk.out directory #2869

Open
nathanpeck opened this issue Jun 13, 2019 · 37 comments
Open

CDK needs garbage collection for assets in the cdk.out directory #2869

nathanpeck opened this issue Jun 13, 2019 · 37 comments
Labels
@aws-cdk/assets Related to the @aws-cdk/assets package effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1 package/tools Related to AWS CDK Tools or CLI

Comments

@nathanpeck
Copy link
Member

Each time I run a CDK deploy I get a new asset in the asset's directory, and they seem to accumulate forever. Each asset folder is around 100 MB for me, so this quickly adds up to many GB of data. Here is a screenshot of it accumulating assets again after the last time I cleaned it out manually.

Screen Shot 2019-06-13 at 2 26 47 PM

Ideally I would like a CDK configuration that would cause CDK to automatically garbage collect older asset files it no longer needs so I don't have to do it manually.

@nathanpeck nathanpeck added the feature-request A feature should be added or improved. label Jun 13, 2019
@NGL321 NGL321 added the needs-triage This issue or PR still needs to be triaged. label Jun 17, 2019
@eladb eladb removed their assignment Jul 31, 2019
@RomainMuller
Copy link
Contributor

RomainMuller commented Aug 26, 2019

Duplicated by #3749

@SomayaB SomayaB added the @aws-cdk/assets Related to the @aws-cdk/assets package label Oct 11, 2019
@SomayaB
Copy link
Contributor

SomayaB commented Oct 11, 2019

Seems to be related to #1332

@SomayaB
Copy link
Contributor

SomayaB commented Oct 11, 2019

Hi @nathanpeck, thanks for submitting a feature request! This seems like a reasonable and helpful ask. We will look into this and someone will update this issue when there is movement.

@SomayaB SomayaB removed the needs-triage This issue or PR still needs to be triaged. label Oct 11, 2019
@SomayaB SomayaB assigned eladb and unassigned shivlaks Nov 4, 2019
@eladb eladb added the package/tools Related to AWS CDK Tools or CLI label Jan 23, 2020
@eladb eladb added effort/medium Medium work item – several days of effort and removed @aws-cdk/assets Related to the @aws-cdk/assets package labels Jan 23, 2020
@eladb eladb removed their assignment Jan 23, 2020
@eladb eladb removed the effort/medium Medium work item – several days of effort label Jan 23, 2020
@eladb
Copy link
Contributor

eladb commented Feb 4, 2020

I think that if users do cdk deploy we should actually emit cdk.out directory under /tmp instead of the project directory. When users deploy, cdk.out is just an intermediate artifact instead of a build artifact.

P.S. it should be something like /tmp/cdk.out.xxxx where xxxx is the hash of the project path (in order to allow multiple projects to co-exist on the same machine).

@SomayaB SomayaB added the @aws-cdk/assets Related to the @aws-cdk/assets package label Feb 4, 2020
@shivlaks shivlaks added the effort/medium Medium work item – several days of effort label Feb 5, 2020
@nathanpeck
Copy link
Member Author

@eladb I do worry that would reduce the visibility of the folder. Particularly in cases where I have multiple projects and for some reason my stacks aren't generating as expected I would hate to have to figure out which of the outputs inside my tmp folder is the right one.

I think while it is tempting to piggyback on the existing tmp cleanup behavior I don't think that it would be good for users of CDK, because it would end up being a hidden cache behavior that would be harder to clear when needed

@eladb
Copy link
Contributor

eladb commented Feb 5, 2020

If you do cdk synth output will still go to ./cdk.out which will give you visibility into exactly what's going to be used during deployment.

I am not sure I understand why you think putting intermediate (temporary) build artifacts is not a good use case for /tmp. Isn't that what /tmp is all about?

@SomayaB SomayaB added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Feb 5, 2020
@nathanpeck
Copy link
Member Author

@eladb I don't think of the build artifacts as temporary.

For example if I GCC compile I would expect my C++ files to turn into object files in a local path, not in the /tmp folder.

Or if I TypeScript compile I expect the resulting JavaScript to end up in the local directory, not in /tmp

From that perspective I see CDK to CloudFormation / assets as just another type of transformation, where I expect the resulting product to be local, not remotely cached

I'm not strictly opinionated on this, but it just feels somewhat strange to me if the cdk.out is located in a different folder outside of my project

@SomayaB SomayaB removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 7 days. label Feb 17, 2020
@plumdog
Copy link
Contributor

plumdog commented Mar 3, 2020

I found this issue from a different direction - I have some tests for my CDK code, and each time I run them it is building a new asset directory and putting it in /tmp, a new one for each test case. The assets for me happened to my 100s of MB, and soon my /tmp device was full.

I think I would expect that - by default - assets for test runs were deleted after the test run had completed, regardless of where they are stored.

@0xdevalias
Copy link
Contributor

In the interim.. is it ok to just manually clear out anything in this folder (or even the whole folder)? I've left it building up for now as I wasn't sure if they were required somewhere down the line/for cdk diff support/etc.

@NGL321 NGL321 assigned rix0rrr and unassigned shivlaks Jan 25, 2021
@lprhodes
Copy link

I have another use case for more control over the asset directories.

I'm using CDK with SAM CLI and I'm trying to use tsc-watch to re-run the cdk synth after detecting changes to typescript. Due to a new asset directory being created each time SAM needs to be restarted.

The workaround I'm about to implement is to get the existing asset directory name, delete it, then rename the new asset directory to the old one after cdk synth. There's the possibility that SAM will keep a pointer to the original directory which is moved to trash but we shall see!

@lprhodes
Copy link

@eladb eladb unassigned rix0rrr and eladb Feb 25, 2021
@jbvsmo
Copy link

jbvsmo commented Mar 26, 2021

Please don't move cdk.out to /tmp as people who never reboot will have that thing blowing up as well. Also it is not safe when deploying multiple projects since the erase solution above would remove anything inside /tmp

I had literally over 100 asset.XXXXX directories each weighing 85MB and since those have tons of small files it took a few minutes to delete the 9GB of data.

Why isn't all that being just deleted right after deploy (or before deploy so we keep last one)?
If I would like to keep the data, I could explicitly ask for it.

@leantorres73
Copy link

I think cdk synth should clean the folder and create it again

@acomagu
Copy link
Contributor

acomagu commented Sep 2, 2021

How about automatic cleanup based on the creation date?

For example, configure cdk.json like:

{
  "app": "bin/synth",
  "autoCleanOutdatedAssetsBefore": "3days" // The assets created before 3 days are automatically deleted(on running `cdk synth` or etc.)
}

@mjsztainbok
Copy link

This is problematic with CDK tests as every test run creates a new directory in /tmp and when writing tests it fills up the hard disk space quite quickly

@jtnz
Copy link

jtnz commented Mar 16, 2022

I've run out of space (aka memory on Linux) in /tmp many times because the /tmp/cdk.out* dirs.

Never had a problem around cdk.out in the project root, but I haven't been doing much cdk synth locally (we use pipelines).

@dougperkes
Copy link

+1 to finding a solution for this. I just had to clean up ~70GB of files from my cdk.out directory in my project.

@lazinessdevs
Copy link

Why not just delete the cdk.out folder before each synth ou deploy?

@skinny85
Copy link
Contributor

skinny85 commented Sep 5, 2022

Why not just delete the cdk.out folder before each synth ou deploy?

Because all Assets would have to be re-staged on every synth that way (the ZIP files re-zipped, etc.), making it even slower than it is now.

@mrgrain
Copy link
Contributor

mrgrain commented Sep 6, 2022

I've run out of space (aka memory on Linux) in /tmp many times because the /tmp/cdk.out* dirs.

I'm surprised by this. Is there no OS level garbage collection for /tmp in your distribution?

@jankatins
Copy link

I'm surprised by this. Is there now OS level garbage collection for /tmp in your distribution?

/tmp is a ramdisk (at least on my linux systems), so is gone after a restart/logout. But if you restart only once in a blue moon, running out of space will happen...

@mrgrain
Copy link
Contributor

mrgrain commented Sep 6, 2022

I'm surprised by this. Is there now OS level garbage collection for /tmp in your distribution?

/tmp is a ramdisk (at least on my linux systems), so is gone after a restart/logout. But if you restart only once in a blue moon, running out of space will happen...

Thanks for clarifying this. 👍🏻

@ryanwilliams83
Copy link

I'm using C# and the DockerImageFunction construct and I just stumbled across 45GB of assets in cdk.out

My Program.cs now has the following

    public static void Main(string[] args)
    {
        if (Directory.Exists(@"cdk.out"))
        {
            Console.Error.WriteLine(@"Erasing cdk.out/");
            Directory.Delete(@"cdk.out", true);
            Console.Error.WriteLine(@"Erased cdk.out/");

            Console.Error.WriteLine(@"Creating cdk.out/");
            Directory.CreateDirectory(@"cdk.out");
            Console.Error.WriteLine(@"Created cdk.out/");
        }

        var app = new App();
        ...

@nathanpeck
Copy link
Member Author

Should be warned that if you delete your cdk.out folder every time then it will make CDK much slower because CDK will not be able to reuse previously prepared assets, and will have to prepare them from scratch each time. Ideally you have some process to only clean up asset files that are older than a specific cutoff date or once the size gets over a threshold. That way your day to day usage of CDK will stay faster and you'll stop accumulating GB of data

@wz2b
Copy link

wz2b commented Nov 13, 2022

I'm not sure of the issues hierarchy here, but everyone should probably be aware of a parallel discussion going on in aws/aws-cdk-rfcs#64 (opened in 2018).

I feel like clearing out cdk.out better be an okay thing to do, because I build from multiple development locations, so they aren't going to be in sync depending on if I'm working from home or my office..

Deleting things out of the staging bucket is a little scarier to me. Issues related to scaling and rollback have been raised, but I am not enough of an expert to know whether or not those are legitimate concerns.

I think it should be okay to clear out the staging bucket after you successfully deploy, but I'm not confident enough to try it on a production project. The biggest item in the staging bucket looks like it might be part of the cdk itself (maybe put there by cdk bootstrap?)

I think all this means two things:

  • The feature request in Garbage Collection for Assets aws-cdk-rfcs#64 is really important, we should let the developers know we care about this
  • I think some temporary workaround is required, and I think we need an expert to tell us what's safe to remove from the staging bucket and what is not. Just doing it by date seems problematic.

@dmeehan1968
Copy link

I work on my project in a Dropbox folder, and regularly use xattr -w com.dropbox.ignored 1 node_modules to prevent that directory being synced to Dropbox. I do the same with cdk.out, so any process that deletes the folder also removes the extended attribute and can lead to the files syncing to dropbox without me realising (until I run out of dropbox space).

The ability to move the artefacts to a directory outside the current working directory/tree (and outside of dropbox) is ideal, and I can always create a soft link for convenience from the cwd which isn’t synced.

Perpetual growth of the cdk.out directory is, IMHO, just lazy design. I appreciate that there are intermediate assets that might add extra cost to repeated synth/deploy cycles and these should be documented.

@integralla
Copy link

I'll add one more suggestion to the pile...

I'd like the CDK Toolkit to provide a clean command that would serve as a standardized way to clean up the local resources that are created by running other toolkit commands such as synth.

With a clean command in place, developers can add a process to an appropriate phase of their build life cycle, based on their specific project needs. For example, with a JVM project using Apache Maven, the exec-maven-plugin could be used to execute the command (I do something similar today with a shell script).

Of course, the templates provided for use with the init command could also provide a sensible default.

@j-murata
Copy link

My CDK project is an npm package, and I utilize npm pre scripts to remove the cdk.out directory before executing the cdk command.

package.json
{
  "scripts": {
    "cdk": "cdk",
    "precdk": "shx rm -rf cdk.out"
  }
}

I use shx to make it work on cross-platform.

Then run npm scripts as follows:

$ npm run cdk -- diff
$ npm run cdk -- deploy

If the environment in which the cdk command is executed is limited, the easiest solution may be to define a shell alias for cdk.

I hope this is of some help.

@huantbui
Copy link

huantbui commented Sep 8, 2023

I have another use case for more control over the asset directories.

I'm using CDK with SAM CLI and I'm trying to use tsc-watch to re-run the cdk synth after detecting changes to typescript. Due to a new asset directory being created each time SAM needs to be restarted.

The workaround I'm about to implement is to get the existing asset directory name, delete it, then rename the new asset directory to the old one after cdk synth. There's the possibility that SAM will keep a pointer to the original directory which is moved to trash but we shall see!

@lprhodes I figured out a solution for this cdk.out/asset.* hash folder. Since aws-cdk > NodejsFunctionProps.bundling. commandHooks, you can create a utility sh script to run it without re-running aws cdk every time... as it is time consuming...

sample code:

 afterBundling(inputDir: string, outputDir: string): string[] {
          const outFile = join(outputDir, "index.js");
          const scriptPath = join(inputDir, "..", ".scripts");
          const shFile = fileName.replace(".ts", ".sh");
          return [
            `mkdir -p ${scriptPath}`,
            `echo esbuild ${inputDir}/${fileName} --outfile=${outFile} --watch --bundle --target=node18 --platform=node > ${scriptPath}/${shFile}`,
          ];
        },

And then in my package.json > scripts, I have "watch:lambda": "sh .scripts/<file_name>.sh"

When you run that script, esbuild is actually running watch and recompiles your changes and put it out to the cdk.out/asset.* folder path (thanks to commandHooks outputDir)...

Hope that helps! I was able to code my lambdas in typescript and re-run the lambda without costing so much time for the cdk re-runs.

Resources:

@davidjmemmett
Copy link

I've been working on a CDK project over the last couple of days and suddenly my IDE stopped working. I had a poke around today and found this:

$ find ./cdk.out -type f|wc -l
8638859
$ du -sh ./cdk.out
164G	./cdk.out/

For now I'm going to obliterate the cdk.out directory every so often, but this really shouldn't build up so quickly.

@yonelacort
Copy link

yonelacort commented Jan 14, 2025

In order to see the total size all files within cdk.out directories are taking you can run:

  find . -type d -name "cdk.out" -exec du -sk {} + | awk '{total += $1} END {print total/1024 " MB"}'

Then from the relevant path, you can delete the assets and synth files within those cdk.out directories if you run the following command:

find . -type d -name "cdk.out" -exec rm -rf {} +

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/assets Related to the @aws-cdk/assets package effort/medium Medium work item – several days of effort feature-request A feature should be added or improved. p1 package/tools Related to AWS CDK Tools or CLI
Projects
None yet
Development

No branches or pull requests