Skip to content

Get Remote Storage URL for files/directories added directly from S3 #3714

@AratiNagmal

Description

@AratiNagmal

The dvc get --show-url command provides remote storage location for the files/directories added to DVC repo from local machine.
But this feature is currently not available for the files/directories which are added directly from S3 using dvc add s3://....

Scenario:
My source data location is S3 bucket s3://data_bucket/. I configured another S3 bucket (s3://dvc_bucket) for DVC remote.

For the first time I add files to DVC repo,

dvc add s3://data_bucket/dataset1.csv -f dataset1.csv.dvc

We might receive new data in different location on S3 bucket. It could be something like s3://data_bucket/dir1/dataset1.csv. Then, we again add new version of data to DVC repo:

dvc add s3://data_bucket/dir1/dataset1.csv

Once the data added to DVC, source data (i.e. files/directories on s3://data_bucket) might be cleared.

So I am looking for a way to get remote storage url i.e a location in s3://dvc_bucket for the files added from s3://data_bucket. So that this url can be utilised in later steps in our pipeline without worrying about source data location.

More context:

https://discordapp.com/channels/485586884165107732/485596304961962003/705308604257009766

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions