-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Closed
Copy link
Labels
fs: s3Related to the S3 filesystemRelated to the S3 filesystem
Description
Bug Report
Description
Here is a GitHub Action that ran:
dvc import-url s3://tripdata/202505-citibike-tripdata.zip s3/tripdata/202505-citibike-tripdata.zip
# Importing 's3://tripdata/202505-citibike-tripdata.zip' -> 's3/tripdata/202505-citibike-tripdata.zip'
dvc import-url s3://tripdata/JC-202505-citibike-tripdata.csv.zip s3/tripdata/JC-202505-citibike-tripdata.csv.zip
# Importing 's3://tripdata/JC-202505-citibike-tripdata.csv.zip' -> 's3/tripdata/JC-202505-citibike-tripdata.csv.zip'
dvc push
# 2 files pushedHowever, the first imported file (s3/tripdata/202505-citibike-tripdata.zip) ended up truncated, in my S3 remote cache.
I backed up the truncated blob with a .bad suffix, and then manually fixed the blob in the cache (with aws s3 cp, dvc add, dvc push):
aws s3 ls s3://ctbk/.dvc/files/md5/9e/880ca091cc946d563ea4b115ec443e
# 2025-06-06 19:44:58 844607858 880ca091cc946d563ea4b115ec443e
# 2025-06-06 19:39:50 838860800 880ca091cc946d563ea4b115ec443e.badVerifying that 9e/880ca091cc946d563ea4b115ec443e.bad is a prefix of the full blob:
aws s3 cp s3://ctbk/.dvc/files/md5/9e/880ca091cc946d563ea4b115ec443e.bad - | md5sum
# ef7b7328a690dfdc9858c2da4cad9f41 -
bad_size="$(aws s3 ls s3://ctbk/.dvc/files/md5/9e/880ca091cc946d563ea4b115ec443e.bad | awk '{print $3}')"; echo $bad_size
# 838860800
aws s3 cp s3://ctbk/.dvc/files/md5/9e/880ca091cc946d563ea4b115ec443e - 2>/dev/null | head -c "$bad_size" | md5sum
# ef7b7328a690dfdc9858c2da4cad9f41 -Reproduce
I'm guessing it was a transient issue in my GHA run. I haven't tried to reproduce it.
I'm not sure which one failed here:
- It could be that
import-urlfailed,dvc pushhappily pushed the truncated blob - Or
import-urlmay have been fine, butpushsilently failed to complete.
Expected
If import-url or push fails to import or push a full file, the command should exit non-zero, and some errors should be logged.
Environment information
You can see everything in the the GHA:
ubuntu-latestpip installoutput showsdvc-3.59.2and etc.
Metadata
Metadata
Assignees
Labels
fs: s3Related to the S3 filesystemRelated to the S3 filesystem