Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

coerce dataarg PR causing regression in utils.prepare_meta #116

Open
matthewfeickert opened this issue Feb 7, 2022 · 4 comments · May be fixed by #117
Open

coerce dataarg PR causing regression in utils.prepare_meta #116

matthewfeickert opened this issue Feb 7, 2022 · 4 comments · May be fixed by #117
Assignees
Labels

Comments

@matthewfeickert
Copy link
Member

PR #92 introduced a regression that is hitting recast-atlas and by proxy is also hitting reana-workflow-engine-yadage (c.f. reanahub/reana-workflow-engine-yadage#220 and reanahub/reana-workflow-engine-yadage#219) when it got into patch release 0.20.2.

This is showing up in calls to the steering API where metadir=None

ys = YadageSteering.create(
metadir=metadir,

def create(cls, **kwargs):
dataopts = kwargs.get("dataopts") or {}
if kwargs["dataarg"].startswith("local:"):
dataarg = kwargs["dataarg"].split(":", 1)[1]
metadir = kwargs.get("metadir")
metadir = metadir or "{}/_yadage/".format(dataarg)
if dataopts.get("overwrite") and os.path.exists(metadir):
shutil.rmtree(metadir)
else:
metadir = kwargs["metadir"]
accept_metadir = kwargs.pop("accept_metadir", False)
kw = copy.deepcopy(kwargs)
kw["metadir"] = metadir
prepare_meta(
metadir, accept_metadir
) # meta must be here because data model might store stuff here

where prepare_meta is passing a metadir of None

yadage/yadage/utils.py

Lines 222 to 229 in 333fedc

def prepare_meta(metadir, accept=False):
"""
prepare workflow meta-data directory
:param metadir: the meta-data directory name
:param accept: whether to accept an existing metadata directory
"""
if os.path.exists(metadir):

Minimal failing example

(base) feickert@ThinkPad-X1:/tmp$ pyenv virtualenv 3.9.6 venv
(base) feickert@ThinkPad-X1:/tmp$ pyenv activate venv
(venv) feickert@ThinkPad-X1:/tmp$ python -m pip --quiet install --upgrade pip setuptools wheel
(venv) feickert@ThinkPad-X1:/tmp$ python -m pip --quiet install 'yadage[viz]==0.20.2'
(venv) feickert@ThinkPad-X1:/tmp$ python -m pip --quiet install recast-atlas[local]
(venv) feickert@ThinkPad-X1:/tmp$ python -m pip list | grep 'adage\|recast\|pydot'
adage              0.10.3
pydot              1.4.2
pydotplus          2.0.2
recast-atlas       0.1.9
yadage             0.20.2
yadage-schemas     0.10.7
(venv) feickert@ThinkPad-X1:/tmp$ python -m pip list
Package            Version
------------------ ---------
adage              0.10.3
attrs              21.4.0
certifi            2021.10.8
charset-normalizer 2.0.11
checksumdir        1.2.0
click              8.0.3
decorator          5.1.1
glob2              0.7
idna               3.3
jq                 1.2.2
jsonpath-rw        1.4.0
jsonpointer        2.2
jsonref            0.2
jsonschema         4.4.0
mock               4.0.3
networkx           2.6.3
packtivity         0.14.24
pip                22.0.3
ply                3.11
psutil             5.9.0
pydot              1.4.2
pydotplus          2.0.2
pygraphviz         1.7
pyparsing          3.0.7
pyrsistent         0.18.1
PyYAML             6.0
recast-atlas       0.1.9
requests           2.27.1
setuptools         60.8.1
six                1.16.0
urllib3            1.26.8
wheel              0.37.1
yadage             0.20.2
yadage-schemas     0.10.7
(venv) feickert@ThinkPad-X1:/tmp$ recast run testing/busyboxtest --backend local --tag hello
2022-02-07 11:45:57,129 | packtivity.asyncback |   INFO | configured pool size to 12
2022-02-07 11:45:57,141 | recastatlas.subcomma |  ERROR | caught exception
Traceback (most recent call last):
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/backends/local.py", line 21, in run_workflow
    run_workflow(**spec)
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/yadage/steering_api.py", line 19, in run_workflow
    with steering_ctx(*args, **kwargs):
  File "/home/feickert/.pyenv/versions/3.9.6/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/yadage/steering_api.py", line 89, in steering_ctx
    ys = YadageSteering.create(
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/yadage/steering_object.py", line 61, in create
    prepare_meta(
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/yadage/utils.py", line 229, in prepare_meta
    if os.path.exists(metadir):
  File "/home/feickert/.pyenv/versions/3.9.6/lib/python3.9/genericpath.py", line 19, in exists
    os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/subcommands/run.py", line 56, in run
    run_sync(name, spec, backend=backend)
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/backends/__init__.py", line 77, in run_sync
    BACKENDS[backend].run_workflow(name, spec)
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/backends/local.py", line 23, in run_workflow
    raise FailedRunException
recastatlas.exceptions.FailedRunException
Error: Workflow failed

Example of bug location

There was a very small number of changes to yadage src between v0.20.1 and v0.20.2 (c.f.: v0.20.1...v0.20.2) and if I locally revert PR #92 and install this version in the same working Python virtual environment as above then things pass

(venv) feickert@ThinkPad-X1:~/Code/GitHub/yadage/yadage$ pwd
/home/feickert/Code/GitHub/yadage/yadage
(venv) feickert@ThinkPad-X1:~/Code/GitHub/yadage/yadage$ git log | head -n 14
commit b83c5470d8dc6762d1971b5dd704ac4a3488841f
Author: Matthew Feickert <matthew.feickert@cern.ch>
Date:   Mon Feb 7 11:35:48 2022 -0600

    Revert "coerce dataarg (#92)"
    
    This reverts commit b03a637f2afa53f5967472e5df7d17b229905e0a.

commit 333fedcc57db752d398bf1f1a63c2c166788768f
Author: Matthew Feickert <matthew.feickert@cern.ch>
Date:   Thu Feb 3 14:30:13 2022 -0600

    Bump version: 0.20.1 → 0.20.2

(venv) feickert@ThinkPad-X1:~/Code/GitHub/yadage/yadage$ python -m pip --quiet install --upgrade -e .
(venv) feickert@ThinkPad-X1:~/Code/GitHub/yadage/yadage$ python -m pip show yadage
Name: yadage
Version: 0.20.2
Summary: yadage - YAML based adage
Home-page: https://github.com/yadage/yadage
Author: Lukas Heinrich
Author-email: lukas.heinrich@cern.ch
License: UNKNOWN
Location: /home/feickert/Code/GitHub/yadage/yadage
Requires: adage, checksumdir, click, glob2, jq, jsonpath_rw, jsonpointer, jsonref, jsonschema, packtivity, psutil, pyyaml, requests, yadage-schemas
Required-by:
(venv) feickert@ThinkPad-X1:/tmp$ cd /tmp/
(venv) feickert@ThinkPad-X1:/tmp$ recast run testing/busyboxtest --backend local --tag hello
2022-02-07 12:03:09,834 | packtivity.asyncback |   INFO | configured pool size to 12
2022-02-07 12:03:10,246 |      yadage.creators |   INFO | no initialization data
2022-02-07 12:03:10,246 |    adage.pollingexec |   INFO | preparing adage coroutine.
2022-02-07 12:03:10,246 |                adage |   INFO | starting state loop.
2022-02-07 12:03:10,321 |     yadage.wflowview |   INFO | added </hello:0|defined|unknown>
2022-02-07 12:03:11,011 |     yadage.wflowview |   INFO | added </world:0|defined|unknown>
2022-02-07 12:03:11,880 |    adage.pollingexec |   INFO | submitting nodes [</hello:0|defined|known>]
2022-02-07 12:03:11,881 |                adage |   INFO | unsubmittable: 0 | submitted: 0 | successful: 0 | failed: 0 | total: 2 | open rules: 0 | applied rules: 2
2022-02-07 12:03:11,882 |      pack.hello.step |   INFO | starting file logging for topic: step
2022-02-07 12:03:13,733 |           adage.node |   INFO | node ready </hello:0|success|known>
2022-02-07 12:03:13,734 |    adage.pollingexec |   INFO | submitting nodes [</world:0|defined|known>]
2022-02-07 12:03:13,735 |      pack.world.step |   INFO | starting file logging for topic: step
2022-02-07 12:03:15,650 |           adage.node |   INFO | node ready </world:0|success|known>
2022-02-07 12:03:15,671 | adage.controllerutil |   INFO | no nodes can be run anymore and no rules are applicable
2022-02-07 12:03:15,671 | adage.controllerutil |   INFO | no nodes can be run anymore and no rules are applicable
2022-02-07 12:03:15,671 |                adage |   INFO | unsubmittable: 0 | submitted: 0 | successful: 2 | failed: 0 | total: 2 | open rules: 0 | applied rules: 2
2022-02-07 12:03:18,196 |                adage |   INFO | adage state loop done.
2022-02-07 12:03:18,197 |                adage |   INFO | execution valid. (in terms of execution order)
2022-02-07 12:03:18,197 |                adage |   INFO | workflow completed successfully.
2022-02-07 12:03:18,197 |  yadage.steering_api |   INFO | done. dumping workflow to disk.
2022-02-07 12:03:18,198 |  yadage.steering_api |   INFO | visualizing workflow.
2022-02-07 12:03:18,545 | recastatlas.subcomma |   INFO | RECAST run finished.

RECAST result testing/busyboxtest recast-hello:
--------------
[]
@matthewfeickert
Copy link
Member Author

@lukasheinrich PR #92 was back in 2019, so I'm not expecting you to remember the specifics, but if we can iterate on this that would be good.

@matthewfeickert
Copy link
Member Author

It might be worth noting that PR #92 touches the source code in a small number of places

(venv) feickert@ThinkPad-X1:~/Code/GitHub/yadage/yadage$ git show b03a637 -- yadage/ | cat
commit b03a637f2afa53f5967472e5df7d17b229905e0a
Author: Lukas <lukas.heinrich@gmail.com>
Date:   Fri Nov 15 03:20:25 2019 +0100

    coerce dataarg (#92)
    
    * coerce data arg

diff --git a/yadage/manualcli.py b/yadage/manualcli.py
index 30f5bd2..c8922af 100755
--- a/yadage/manualcli.py
+++ b/yadage/manualcli.py
@@ -87,6 +87,7 @@ def init(
     if os.path.exists("input.yml") and not initfiles:
         initfiles = ("input.yml",)
     initdata = utils.getinit_data(initfiles, parameter)
+    dataarg = utils.coerce_data_arg(dataarg)
     dataopts = utils.options_from_eqdelimstring(dataopt)
 
     ys = YadageSteering.create(
@@ -534,6 +535,7 @@ def add(
     verbosity,
 ):
     parameter = utils.options_from_eqdelimstring(parameter)
+    dataarg = utils.coerce_data_arg(dataarg)
     dataopts = utils.options_from_eqdelimstring(dataopts)
     handle_common_options(verbosity)
     ys = handle_connection_options(
diff --git a/yadage/state_providers/__init__.py b/yadage/state_providers/__init__.py
index ea0840a..a2c65bd 100644
--- a/yadage/state_providers/__init__.py
+++ b/yadage/state_providers/__init__.py
@@ -80,8 +80,6 @@ def fromenv_provider(dataarg, dataopts):
 
 def state_provider_from_string(dataarg, dataopts=None):
     dataopts = dataopts or {}
-    if len(dataarg.split(":", 1)) == 1:
-        dataarg = "local:" + dataarg
     for k in providersetup_handlers.keys():
         if dataarg.startswith(k):
             return providersetup_handlers[k](dataarg, dataopts)
diff --git a/yadage/steering.py b/yadage/steering.py
index fd3a7e9..043e720 100644
--- a/yadage/steering.py
+++ b/yadage/steering.py
@@ -185,6 +185,7 @@ def main(
         enable_plugins(plugins.split(","))
 
     initdata = utils.getinit_data(initfiles, parameter)
+    dataarg = utils.coerce_data_arg(dataarg)
     dataopts = utils.options_from_eqdelimstring(dataopt)
     backendopts = utils.options_from_eqdelimstring(backendopt)
     modelopts = utils.options_from_eqdelimstring(modelopt)
diff --git a/yadage/steering_object.py b/yadage/steering_object.py
index f671c06..7d5fe49 100644
--- a/yadage/steering_object.py
+++ b/yadage/steering_object.py
@@ -45,10 +45,9 @@ class YadageSteering(object):
 
     @classmethod
     def create(cls, **kwargs):
-        dataarg = kwargs["dataarg"]
         dataopts = kwargs.get("dataopts") or {}
-        is_local_data = len(dataarg.split(":", 1)) == 1
-        if is_local_data:
+        if kwargs["dataarg"].startswith("local:"):
+            dataarg = kwargs['dataarg'].split(':',1)[1]
             metadir = kwargs.get("metadir")
             metadir = metadir or "{}/_yadage/".format(dataarg)
             if dataopts.get("overwrite") and os.path.exists(metadir):
diff --git a/yadage/utils.py b/yadage/utils.py
index 1859174..9c00c48 100644
--- a/yadage/utils.py
+++ b/yadage/utils.py
@@ -247,3 +247,9 @@ def pointerize(data, asref=False, stepid=None):
         )
 
     return data.asrefs(callback=callback)
+
+
+def coerce_data_arg(dataarg):
+    if len(dataarg.split(":", 1)) == 1:
+        dataarg = "local:" + dataarg
+    return dataarg

@matthewfeickert
Copy link
Member Author

Looking at the following diff applied to HEAD (333fedc)

$ git diff
diff --git a/yadage/steering_object.py b/yadage/steering_object.py
index 2ece5d0..93ee610 100644
--- a/yadage/steering_object.py
+++ b/yadage/steering_object.py
@@ -7,7 +7,7 @@ import shutil
 from .serialize import snapshot
 from .wflowstate import load_model_fromstring
 from .controllers import setup_controller
-from .utils import setupbackend_fromstring, prepare_meta
+from .utils import setupbackend_fromstring, prepare_meta, coerce_data_arg
 from .creators import handlers as creators
 
 log = logging.getLogger(__name__)
@@ -46,21 +46,25 @@ class YadageSteering(object):
     @classmethod
     def create(cls, **kwargs):
         dataopts = kwargs.get("dataopts") or {}
+        # kwargs["dataarg"] = coerce_data_arg(kwargs["dataarg"])
         if kwargs["dataarg"].startswith("local:"):
             dataarg = kwargs["dataarg"].split(":", 1)[1]
-            metadir = kwargs.get("metadir")
-            metadir = metadir or "{}/_yadage/".format(dataarg)
-            if dataopts.get("overwrite") and os.path.exists(metadir):
-                shutil.rmtree(metadir)
-        else:
-            metadir = kwargs["metadir"]
+
+        metadir = (
+            kwargs["metadir"]
+            if kwargs["metadir"] is not None
+            else f"{kwargs['dataarg']}/_yadage/"
+        )
+        if dataopts.get("overwrite") and os.path.exists(metadir):
+            shutil.rmtree(metadir)
+
         accept_metadir = kwargs.pop("accept_metadir", False)
 
         kw = copy.deepcopy(kwargs)
         kw["metadir"] = metadir
-        prepare_meta(
-            metadir, accept_metadir
-        )  # meta must be here because data model might store stuff here
+        # meta must be here because data model might store stuff here
+        prepare_meta(metadir, accept_metadir)
         ctrl = creators["local"](**kw)
         prepare_meta(metadir, accept=True)  # Hack in case creator deletes meta
         return cls(metadir, ctrl)

if the

        # kwargs["dataarg"] = coerce_data_arg(kwargs["dataarg"])

is uncommented so that you'd force local and end up with kw of

(Pdb) kw
{'metadir': 'local:recast-hello/_yadage/', 'dataarg': 'local:recast-hello', 'dataopts': {}, 'wflowopts': None, 'workflow_json': None, 'workflow': 'rootflow.yml', 'toplevel': 'github:yadage/yadage-workflows:testing/busybox-flow', 'schemadir': '/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/yadageschemas', 'validate': True, 'initdata': {}, 'modelsetup': 'inmem', 'modelopts': None, 'controller': 'frommodel', 'ctrlopts': None}

then things will run. Though if not (don't force it), then

(Pdb) kw
{'metadir': 'recast-hello/_yadage/', 'dataarg': 'recast-hello', 'dataopts': {}, 'wflowopts': None, 'workflow_json': None, 'workflow': 'rootflow.yml', 'toplevel': 'github:yadage/yadage-workflows:testing/busybox-flow', 'schemadir': '/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/yadageschemas', 'validate': True, 'initdata': {}, 'modelsetup': 'inmem', 'modelopts': None, 'controller': 'frommodel', 'ctrlopts': None}

and an error RE: dataopts will occur

Traceback (most recent call last):
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/backends/local.py", line 21, in run_workflow
    run_workflow(**spec)
  File "/home/feickert/Code/GitHub/yadage/yadage/yadage/steering_api.py", line 19, in run_workflow
    with steering_ctx(*args, **kwargs):
  File "/home/feickert/.pyenv/versions/3.9.6/lib/python3.9/contextlib.py", line 117, in __enter__
    return next(self.gen)
  File "/home/feickert/Code/GitHub/yadage/yadage/yadage/steering_api.py", line 89, in steering_ctx
    ys = YadageSteering.create(
  File "/home/feickert/Code/GitHub/yadage/yadage/yadage/steering_object.py", line 67, in create
    prepare_meta(metadir, accept_metadir)
  File "/home/feickert/Code/GitHub/yadage/yadage/yadage/creators.py", line 47, in local_workflows
    rootprovider = state_provider_from_string(dataarg, dataopts)
  File "/home/feickert/Code/GitHub/yadage/yadage/yadage/state_providers/__init__.py", line 86, in state_provider_from_string
    raise RuntimeError("unknown data type %s %s" % (dataarg, dataopts))
RuntimeError: unknown data type recast-hello {}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/subcommands/run.py", line 56, in run
    run_sync(name, spec, backend=backend)
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/backends/__init__.py", line 77, in run_sync
    BACKENDS[backend].run_workflow(name, spec)
  File "/home/feickert/.pyenv/versions/venv/lib/python3.9/site-packages/recastatlas/backends/local.py", line 23, in run_workflow
    raise FailedRunException
recastatlas.exceptions.FailedRunException
Error: Workflow failed

@matthewfeickert
Copy link
Member Author

matthewfeickert commented Feb 17, 2022

Okay, so @lukasheinrich thinks that this isn't actually a bug but just an accidental (in that this was added 2 years ago before any release was planned) API breaking change that needs to get propagated up through recast-atlas and reana-workflow-engine-yadage. He thinks that using something like

from yadage.utils import coerce_data_arg

...

class LocalBackend:
    def run_workflow(self, name, spec):
        spec["dataarg"] = coerce_data_arg(spec["dataarg"])
...

in recastatlas/backends/local.py (or better yet during the spec creation) should fix things there and that a similar change in reana_workflow_engine_yadage/cli.py (which could either be using from yadage.utils import coerce_data_arg or just

    with steering_ctx(
        dataarg=f"local:{workflow_workspace}",
...

)

What I'm going to propose is this:

  1. Add a HEAD of dependencies CI workflow recast-hep/recast-atlas#97 gets added for easier testing.
  2. We make a branch in yadage where we revert PR coerce dataarg #92 so there isn't an API breaking change in it (that should be the only thing that could cause one given the small number of things that happened between v0.20.1 and v0.20.2), and then make yadage release v0.20.3 off of that branch so that we can get the visualization fixes. That branch is just a release branch for v0.20.X bug fixes into the future.
  3. We test v0.20.3 against recast-atlas and reana-workflow-engine-yadage and once that is working we make a PR that is the equivalent of setup: Set compatible releases for 'yadage' extra to be support 'yadage[viz]' reanahub/reana-commons#333.
  4. We then release yadage v0.21.0 which is just the same state as v0.20.2 was, but now we know it has an API breaking change so we can propagate out the effects to recast-atlas and reana-workflow-engine-yadage in a tested and controlled manner.

Thoughts and feedback welcome from @lukasheinrich and @mvidalgarcia on this plan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants