Skip to content

ObjectStorageProvider: add write back support #1677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

rressi-at-globus
Copy link

@rressi-at-globus rressi-at-globus commented May 9, 2025

Description

When we succesfully restore a binary package from the cloud storage we should cache it locally to avoid downloading it again.

This feature works at these conditions:

  • we have configured binary caches on the file system with write permissions (default,write, default,readwrite, files,...,write, files,...,readwrite)
  • we have configured binary caches on the cloud storage with read permissions (x-aws,...,read, x-aws,...,readwrite, x-cos,...,read, x-cos,...,readwrite, x-gcs,...,read, x-gcs,...,readwrite)

To cache it, it just moves the restored zip to the local cache directory instead of deleting it.

Design

flowchart TD
    subgraph Dev["Developer Machine"]
        A[Checked-out C++ Code]
        B["VCPKG Build Process"]
        C["Local Binary Cache (.zip files)"]
    end

    subgraph AWS["Build of the external component (CMake)"]
        D[GitHub checkout]
        E[CMake config]
        F[CMake build]
    end

    A --> B
    B -->|Check| C
    C -->|Hit| BDone[Use from Local Cache]
    C -->|Miss| FCheck[Check S3 Cache]
    FCheck -->|Hit| FDownload[Download from S3]
    FDownload --> B
    FDownload -.-> CUpdate
    FCheck -->|Miss| D
    D --> E
    E --> F
    F --> CUpdate["Update Local Cache"]
    CUpdate --> B
    BDone --> B

    style BDone fill:#ccffcc,stroke:#333
    style FDownload fill:#ccffcc,stroke:#333
    style CUpdate fill:#ccffcc,stroke:#333
    linkStyle 6 stroke:red,stroke-width:3px,stroke-dasharray: 5, 5
Loading

Rational

We have our developers compiling with their machines inside our offices around the world and they also often work from home. To speed-up their job we store precompiled externals in archives on AWS S3.

For externals we have an internal solution that uses local hard-drive as first layer cache and S3 as second layer. When something is not found localy, it is downloaded and then left on the local hard-drive for a while.

A part of the compilation time optimization, this is also helping a lot into keeping egress costs under control:

image

This is an example of a month where costs were higher than usual because for some few days the feature was not working after an upgrade of our tool.

This small PR is implementing with fews lines of code a similar strategy to VCPKG.

@Osyotr
Copy link
Contributor

Osyotr commented May 9, 2025

Related: #1406

@rressi-at-globus rressi-at-globus force-pushed the binary_caching_preserve_zips_from_cloud_storage branch 5 times, most recently from 4649e4f to e13b440 Compare May 9, 2025 19:10
@rressi-at-globus rressi-at-globus force-pushed the binary_caching_preserve_zips_from_cloud_storage branch from e13b440 to 676e48f Compare May 9, 2025 19:21
@rressi-at-globus rressi-at-globus changed the title ObjectStorageProvider: cache locally restored Zips ObjectStorageProvider: cache locally binary cache files fetched from the cloud storage buckets. May 16, 2025
@rressi-at-globus rressi-at-globus changed the title ObjectStorageProvider: cache locally binary cache files fetched from the cloud storage buckets. ObjectStorageProvider: keep locally binary cache archives coming from cloud storage. May 16, 2025
@rressi-at-globus
Copy link
Author

There are 3 PRs trying to implement write back support to this tool.

This is a must-have feature in many circumstances.

What distinguish this solution is the following:

  • it is in a mergeable status.
  • it is very small.
  • it doesn't require much testing (it is already working very nicely on our side).
  • but it is limited to Cloud Storage backends (which also reduces the things to be tested).

What about taking in this now and then looking forward for a more robust solution based on the other proposals in the mid-long term?

At the end this is the most efficient solution possible on the backends we are targeting:

  • just move/rename the packages on local binary caches after aving used them successfully.

No extra copies, just one fast, atomic I/O operation.

@rressi-at-globus
Copy link
Author

@BillyONeal what do you think about my last comment?

@rressi-at-globus rressi-at-globus changed the title ObjectStorageProvider: keep locally binary cache archives coming from cloud storage. ObjectStorageProvider: add write back support Jun 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants