Skip to content

Conversation

@ghost
Copy link

@ghost ghost commented Oct 17, 2019

Fix: #1654

Setup:

#!/usr/bin/env bash

set -x

export AWS_ACCESS_KEY_ID='testing'
export AWS_SECRET_ACCESS_KEY='testing'
export AWS_SECURITY_TOKEN='testing'
export AWS_SESSION_TOKEN='testing'

moto_server s3 &> /dev/null &

# File structure:
#       dvc-temp
#       ├── data
#       │  ├── alice
#       │  ├── alpha
#       │  └── subdir
#       │     ├── 1
#       │     ├── 2
#       │     └── 3
#       ├── empty_dir
#       ├── empty_file
#       └── foo

python -c '
import boto3

session = boto3.session.Session()
s3 = session.client("s3", endpoint_url="http://localhost:5000")
s3.create_bucket(Bucket="dvc-temp")
s3.put_object(Bucket="dvc-temp", Key="fix-1654/")
s3.put_object(Bucket="dvc-temp", Key="fix-1654/cache/")
s3.put_object(Bucket="dvc-temp", Key="fix-1654/empty_dir/")
s3.put_object(Bucket="dvc-temp", Key="fix-1654/empty_file", Body=b"")
s3.put_object(Bucket="dvc-temp", Key="fix-1654/foo", Body=b"foo")
s3.put_object(Bucket="dvc-temp", Key="fix-1654/data/alice", Body=b"alice")
s3.put_object(Bucket="dvc-temp", Key="fix-1654/data/alpha", Body=b"alpha")
s3.put_object(Bucket="dvc-temp", Key="fix-1654/data/subdir/1", Body=b"1")
s3.put_object(Bucket="dvc-temp", Key="fix-1654/data/subdir/2", Body=b"2")
s3.put_object(Bucket="dvc-temp", Key="fix-1654/data/subdir/3", Body=b"3")
'

repository=$(mktemp -d)
cd $repository
dvc init --no-scm
dvc remote add s3 s3://dvc-temp/fix-1654
dvc remote modify s3 endpointurl http://localhost:5000

dvc remote add cache remote://s3/cache
dvc config cache.s3 cache

dvc run -d remote://s3/data 'echo something'

Notes:

  • head_object doesn't work with directories, only empty ones
  • list_objects doesn't work with empty directories
  • aws s3 cp doesn't work with directories, you need to use --recursive
  • dvc run -d remote:// doesn't look up if cache is setup (as -o), but directories uses cache

TODO:

  • Change walk implementation to walk_files
  • Write unit tests for new implemented S3 methods
  • Address review comments

Copy link
Contributor

@Suor Suor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there is no tests here.

@ghost ghost changed the title [WIP] s3: support for directories s3: support for directories Oct 23, 2019


def test_isdir(s3, remote):
assert remote.isdir(remote.path_info / "data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could totally use parametrize here instead of copypasting. 🙂

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Forgot about parametrize 👌

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, parametrize is 1 and a half second slower 😓

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrOutis this is because all the fixtures are rebuilt every time. Use for loop inside single test instead. Moreover, I would reuse the same list for fixture and tests as far as that is possible.

@efiop
Copy link
Contributor

efiop commented Oct 24, 2019

@MrOutis Tests are still failing on py2 due to unicode error. Just a heads up 🙂

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@efiop efiop merged commit 27b63e2 into treeverse:master Oct 25, 2019
@ghost ghost deleted the fix-1654 branch October 25, 2019 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants