This repo contains tools to efficiently copy, remove and link large filesets.
-
rcp
is a tool for copying files; similar tocp
but generally MUCH faster when dealing with a large number of files.Inspired by tools like
dsync
(1) andpcp
(2). -
rrm
is a tool for removing files.Basic usage is equivalent to
rm -rf
. -
rlink
allows hard-linking files.A common pattern is to also provide
--update <path>
that overrides any paths insrc
to instead be copied over from there. -
rcmp
is a tool for comparing filesets.Currently, it only supports comparing metadata (no content checking).
Returns error code 1 if there are differences, 2 if there were errors.
> rcp <foo> <bar> --progress --summary
Roughly equivalent to cp -R --update=none <foo> <bar>
.
> rcp <foo> <bar> --preserve --progress --summary --overwrite
Roughly equivalent to: cp -pR <foo> <bar>
.
Progress bar is sent to stderr
while log messages go to stdout
. This allows us to pipe stdout
to a file to preserve the tool output while still viewing the interactive progress bar. This works for all RCP tools.
> rcp <foo> <bar> --progress --summary > copy.log
> rrm <bar> --progress --summary
Roughly equivalent to: rm -rf <bar>
.
> rlink <foo> <bar> --progress --summary
Roughly equivalent to: cp -p --link <foo> <bar>
.
> rlink <foo> --update <bar> <baz> --update-exclusive --progress --summary
Using --update-exclusive
means that if a file is present in <foo>
but not in <bar>
it will be ignored.
Roughly equivalent to: rsync -a --link-dest=<foo> <bar> <baz>
.
> rcmp <foo> <bar> --progress --summary --log compare.log
All tools are available via nixpkgs under rcp package name.
The following command will install all the tools on your system:
> nix-env -iA nixpkgs.rcp
Starting with release v0.10.1
, .deb and .rpm packages are available as part of each release.
The copy semantics for RCP tools differ slightly from how e.g. the cp
tool works. This is because of the ambiguity in the result of a cp
operation that we wanted to avoid.
Specifically, the result of cp foo/x bar/x
depends on bar/x
being a directory. If so, the resulting path will be bar/x/x
(which is usually undesired), otherwise it will be bar/x
.
To avoid this confusion, RCP tools:
- will NOT overwrite data by default (use
--overwrite
to change) - do assume that a path WITHOUT a trailing slash is the final name of the destination and
- path ending in slash is a directory into which we want to copy the sources (without renaming)
The following examples illustrate this (those rules apply to both rcp
and rlink
):
rcp A/B C/D
- copyA/B
intoC/
and name itD
; ifC/D
exists fail immediatelyrcp A/B C/D/
- copyB
intoD
WITHOUT renaming i.e., the resulting path will beC/D/B
; ifC/B/D
exists fail immediately
Using rcp
it's also possible to copy multiple sources into a single destination, but the destination MUST have a trailing slash (/
):
rcp A B C D/
- copyA
,B
andC
intoD
WITHOUT renaming i.e., the resulting paths will beD/A
,D/B
andD/C
; if any of which exist fail immediately
-
set
--ops-throttle
to reduce the maximum number of operations per second- useful if you want to avoid interfering with other work on the storage / host
-
set
--max-open-files
to reduce the maximum number of open files- RCP tools will automatically adjust the maximum based on the system limits however, this setting can be used if there are additional constraints
rcp
tools will log non-terminal errors and continue- to fail immediately on any error use the
--fail-early
flag
Log messages
- sent to
stdout
- by default only errors are logged
- verbosity controlled using
-v
/-vv
/-vvv
for INFO/DEBUG/TRACE and-q
/--quiet
to disable
Progress
- sent to
stderr
(bothProgressBar
andTextUpdates
) - by default disabled
- enabled using
-p
/--progress
with optional--progress-type=...
override
Summary
- sent to
stdout
- by default disabled
- enabled using
--summary
rcp
tools will not-overwrite pre-existing data unless used with the --overwrite
flag.
The rcp
tools now use the tracing
crate for logging and support sending data to the tokio-console
subscriber.
To enable the console-subscriber
you need to set the environment variable RCP_TOKIO_TRACING_CONSOLE_ENABLED=1
(or true
with any case).
By default port 6669
is used (tokio-console
default) but this can be changed by setting RCP_TOKIO_TRACING_CONSOLE_SERVER_PORT=1234
.
The trace events are retained for 60s. This can be modified by setting RCP_TOKIO_TRACING_CONSOLE_RETENTION_SECONDS=120
.