Skip to content

dvc pull: failed to pull data from the Azure Blob in GitHub Actions #6899

@hosszubalazs

Description

@hosszubalazs

Bug Report

Description

dvc pull fails in GitHub Actions with an auth issue when accessing Azure Blob Storage. The same actions worked yesterday, and still work correctly on my dev machine.
In GitHub Actions I receive the following error message:
ERROR: failed to pull data from the cloud - Authentication to Azure Blob Storage via account key failed. Learn more about configuration settings at <https://man.dvc.org/remote/modify>: unable to connect to account for Unable to create async transport. Please check aiohttp is installed.

Reproduce

GitHub Actions config to reproduce

on: [push]
jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - uses: actions/setup-python@v2
        with:
          python-version: '3.8' 
          architecture: 'x64' 
      - uses: iterative/setup-cml@v1
      - name: PiP Install
        run: pip install -r requirements.txt
      - name: Train model
      run: |
          dvc doctor
          dvc pull

Expected

dvc pull succeeds, downloading my output files configured in the dvc pipeline file. Running dvc repro after this also succeeds, skipping all stages since the latest output is available.

Environment information

The issue only reproduces in GitHub Actions. The issue seemingly started today by random. The same auth setup worked yesterday in Actions. I had several successful pipelines runs and PRs.
Azure Blob Storage is configured in .dvc/config with account_name and account_key. I can verify it works locally by deleting .dvc/cache, and running dvc pull. The cache correctly recreates. Rerunning dvc pull locally results in Everything is up to date. dvc push also works.

Output of dvc doctor:

# In GitHub Actions where the problem occurs
$ dvc doctor
DVC version: 2.8.2 (pip)
---------------------------------
Platform: Python 3.8.12 on Linux-5.8.0-1042-azure-x86_64-with-glibc2.2.5
Supports:
	azure (adlfs = 2021.10.0, knack = 0.8.2, azure-identity = 1.7.0),
	gdrive (pydrive2 = 1.10.0),
	webhdfs (fsspec = 2021.10.1),
	http (aiohttp = 3.8.0, aiohttp-retry = 2.4.6),
	https (aiohttp = 3.8.0, aiohttp-retry = 2.4.6)

Additional Information (if any):

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: data-syncRelated to dvc get/fetch/import/pull/pushfs: azureRelated to the Azure filesystemupstreamIssues which need to be resolved in an upstream dependency

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions