Skip to content

dvc add --external fails on hdfs directory #4332

@huskykurt

Description

@huskykurt

Bug Report

dvc add --external hdfs://... fails when the target is a directory.

Please provide information about your setup

dvc add --external hdfs://...

Output of dvc version:

$ dvc version
DVC version: 1.3.1 (pip)
---------------------------------
Platform: Python 3.8.2 on Linux-3.10.0-1127.el7.x86_64-x86_64-with-glibc2.17
Supports: hdfs, http, https
Cache types: hardlink, symlink
Repo: dvc, git

Additional Information (if any):

If applicable, please also provide a --verbose output of the command, eg: dvc add --verbose.

2020-08-04 17:26:41,289 DEBUG: fetched: [(3,)]
Adding...SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/hanmail/connex/opt/hadoop-2.6.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hanmail/connex/opt/hadoop-2.6.0/share/hadoop/httpfs/tomcat/webapps/webhdfs/WEB-INF/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/hanmail/connex/opt/hadoop-2.6.0/share/hadoop/kms/tomcat/webapps/kms/WEB-INF/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Adding...
2020-08-04 17:26:44,920 DEBUG: fetched: [(4,)]
2020-08-04 17:26:44,922 ERROR: hdfs command 'hadoop fs -checksum hdfs://search-hammer-analyzer1.dakao.io:9000/output/kurt/dvc_test_source/test1/' finished with non-zero return code 1': b"checksum: `hdfs://search-hammer-analyzer1.dakao.io:9000/output/kurt/dvc_test_source/test1': Is a directory\n"
------------------------------------------------------------
Traceback (most recent call last):
ab1
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/command/add.py", line 17, in run
ab2
    self.repo.add(
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/repo/__init__.py", line 34, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/repo/add.py", line 90, in add
    stage.save()
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/stage/__init__.py", line 380, in save
    self.save_outs()
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/stage/__init__.py", line 391, in save_outs
    out.save()
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/output/base.py", line 279, in save
    if not self.changed():
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/output/base.py", line 221, in changed
    status = self.status()
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/output/base.py", line 218, in status
    return self.workspace_status()
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/output/base.py", line 206, in workspace_status
    if self.changed_checksum():
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/output/base.py", line 194, in changed_checksum
    return self.checksum != self.get_checksum()
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/output/base.py", line 180, in get_checksum
    return self.tree.get_hash(self.path_info)
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/tree/base.py", line 268, in get_hash
    hash_ = self.get_file_hash(path_info)
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/tree/hdfs.py", line 167, in get_file_hash
    stdout = self.hadoop_fs(
  File "/hanmail/.pyenv/versions/3.8.2/envs/dvc/lib/python3.8/site-packages/dvc/tree/hdfs.py", line 155, in hadoop_fs
    raise RemoteCmdError(self.scheme, cmd, p.returncode, err)
dvc.tree.base.RemoteCmdError: hdfs command 'hadoop fs -checksum hdfs://.../test1/' finished with non-zero return code 1': b"checksum: `hdfs://.../test1': Is a directory\n"

Metadata

Metadata

Assignees

Labels

bugDid we break something?p2-mediumMedium priority, should be done, but less importantresearch

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions