Skip to content

perf: copy cache operations very slow on network drive #3084

@rxxg

Description

@rxxg

Execute the following script on a Windows network drive

mkdir repo
cd repo
git init --quiet
dvc init -q
dvc config cache.type copy
measure-command { fsutil file createnew data 134217728 }
# Probably less than 1 second
measure-command { dvc add data }
# Around 30 seconds on our setup

Digging into the problem a little bit it seems that the cache copy operations end up in shutil.copyfileobj which reads the file to be copied into memory in 16kb chunks before writing out again. Unless the network is very local this is always going to be a performance killer.

The situation might be better with Python 3.8 (#3033), but it would be good to ease the pain until DVC supports this version, and even then some users will not be able to upgrade straightaway.

Metadata

Metadata

Assignees

No one assigned

    Labels

    awaiting responsewe are waiting for your reply, please respond! :)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions