-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
The dvc get --show-url command provides remote storage location for the files/directories added to DVC repo from local machine.
But this feature is currently not available for the files/directories which are added directly from S3 using dvc add s3://....
Scenario:
My source data location is S3 bucket s3://data_bucket/. I configured another S3 bucket (s3://dvc_bucket) for DVC remote.
For the first time I add files to DVC repo,
dvc add s3://data_bucket/dataset1.csv -f dataset1.csv.dvc
We might receive new data in different location on S3 bucket. It could be something like s3://data_bucket/dir1/dataset1.csv. Then, we again add new version of data to DVC repo:
dvc add s3://data_bucket/dir1/dataset1.csv
Once the data added to DVC, source data (i.e. files/directories on s3://data_bucket) might be cleared.
So I am looking for a way to get remote storage url i.e a location in s3://dvc_bucket for the files added from s3://data_bucket. So that this url can be utilised in later steps in our pipeline without worrying about source data location.
More context:
https://discordapp.com/channels/485586884165107732/485596304961962003/705308604257009766
Thank you!