Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define metadata tree root #26

Closed
AloisMahdal opened this issue May 15, 2018 · 17 comments
Closed

Define metadata tree root #26

AloisMahdal opened this issue May 15, 2018 · 17 comments
Assignees
Labels
enhancement New feature or request

Comments

@AloisMahdal
Copy link
Contributor

This is probably more of a design issue, so I understand if there is some resistance, but I think I can jusify it.

It seems that fmf does not have concept of whole item tree but rather tends to just discover items in all subdirectories. This is nice and simple, but creates the problem that items don't really have "identity" and meta-data actually depends on where I ask.

For example, I have repository like this:

.
└── demo
    ├── foo
    │   └── main.fmf
    └── main.fmf

In the top main.fmf, I have

project: demo
maintainer: joe

and in the foo/main.fmf

description: The Foo

Now I get different results about foo depending on where I ask:

$ fmf
fmf-tree/demo/foo
description: The Foo
maintainer: joe
project: demo
$ cd demo
$ fmf
demo/foo
description: The Foo
maintainer: joe
project: demo
$ cd foo
$ fmf
foo
description: The Foo

If I have inheritancy, doesn't it imply that I already treat my items as tree? If so, shouldn't they always be resolved as part of the same tree?

Or, from another angle: why can't I "ask" for complete set of meta-data when I'm in the directory of a in leaf item?

@AloisMahdal
Copy link
Contributor Author

AloisMahdal commented May 15, 2018

A more concrete example is my [jats-demo][jd] test suite (with the structure very similar to the example above).

The thing is, that I want to implement a test runner, that is aware of fmf, so before it launches the test, it populates environment with some of the meta-data (eg. test deps, version, id). User of my tool is supposed to clone the test suite and run a test, which could look like this:

git clone ...
cd jats-demo
jattool runtest src/foo

and jattool will get all the data, etc. But what if user will do this:

git clone ...
cd jats-demo
cd src
jattool runtest foo

or

cd src/foo
jattool runtest .

OK, so what I need to do is make sure that when jattool internally calls fmf, it makes sure it does so only from the top dir.

Fair enough.

Anyway, I wonder if this shouldn't rather be fmf's responsibility?

@AloisMahdal
Copy link
Contributor Author

A possible solution would be to introduce a root directory called eg. .fmf.

Behavior of fmf would then be somewhat similar to git:

  • When called, it would always look "upwards" for a directory named .fmf.

  • If that was not found, it would simply fail.

  • If that was found, it would include all main.fmf's on the way from the root to the directory where it was called.

  • Then it would actually start looking into subdirs, etc.

@jkrysl
Copy link
Contributor

jkrysl commented May 16, 2018

A possible solution would be to introduce a root directory called eg. .fmf.

For my use case this change would mean I have to rewrite quite big part of my infrastructure as I would have to change the tree to have .fmf directory in root of my tests. I really do not that...

What I am doing is having this 'root' directory specified in my tests and building FMF tree from this directory. This ensures I always have the complete tree. Than I have parameter to restrict running tests only to sum subdirectory, but this already is out of FMF.

Also for debugging reasons I'm calling FMF from different subdirectories to see where I am getting the data from. This would not be possible with your change.

@AloisMahdal
Copy link
Contributor Author

Having to specify the meta-data location in my tests seems to kinda beat the point of having fmf; at least it would not be very independent. I mean, every time I move the suite or move a test within the suite (eg. to sub-dir) or move the whole suite, I'd have to update test or I'd get incorrect data (which might or might not present itself immediately).

About the debugging: that use case could be fixed by introducing envvar that would override the root: this is also what git already does. Oh and also I think there was another PR/issue that addressed this.

@jkrysl
Copy link
Contributor

jkrysl commented May 22, 2018

I like the idea with envvar for debugging. Given that mostly mimicking what git does seems reasonable including the .fmf directory, which we can use to store some fmf configurations in the future - configurations like attribute specifications that is being discussed.

@psss
Copy link
Collaborator

psss commented May 22, 2018

Thanks for the idea, Alois. I see the use case. And the git-like approach with the .fmf directory seems fine to me. However there are still some questions to be answered:

The Tree class always expects to receive the root directory of the metadata tree. What fmf command line tool does is that it uses the first argument or defaults to the current directory if no path given. In both cases it prints all leaf objects availabe in the metadata tree.

I can imagine we change the default behaviour of the fmf command and instead of using the current working directory it would search for the .fmf directory or the top-most main.fmf file in the current directory structure.

Would that cover your use case? Note that you always get all objects and that directory structure does not have to map 1:1 to the metadata tree structure and that you can also directly use python API for implementing you custom scenarios.

@psss psss self-assigned this May 22, 2018
@psss psss added the enhancement New feature or request label May 22, 2018
@AloisMahdal
Copy link
Contributor Author

I can imagine we change the default behaviour of the fmf command and instead of using the current working directory it would search for the .fmf directory or the top-most main.fmf file in the current directory structure.

But what is the "current directory structure"?

My idea (git-like) was that in case no arguments (or envvar overrides) were given, fmf would look for:

  • .fmf in $PWD,
  • .fmf in $PWD/..,
  • etc, up to / or filesystem boundary.

I guess that problem with looking for main.fmf upwards is that you might have scenarios like this:

 .
├── bar
│   ├── baz
│   │   └── main.fmf
│   └── main.fmf
└── foo
    └── main.fmf

ie. you do make use of main.fmf and inheritance, you just don't have an obvious "root main.fmf". Now if you call fmf from bar/ or foo/, you get a different set of items, but once you add "root main.fmf", the behavior changes.

Another case could be when you have a set of independent projects:

.
└── storage
    ├── proj1
    │   └── main.fmf
    ├── proj2
    │   └── main.fmf
    └── proj3
        └── main.fmf

At this point, what if somebody accidentally created main.fmf in the storage dir (or anywhere upper in the hierarchy)?

.fmf dir IMO solves both of those cases, plus allows for individual leaves to have the same identity (ie. relpath from root vs. from $PWD)---which is my original point.

I guess the greater principle is, that if you want to have inheritance and integrity, the scope should be specified more explicitly and in a way that it's safely within the control of the project.

@AloisMahdal
Copy link
Contributor Author

@jkrysl ...

[...] the .fmf directory, which we can use to store some fmf configurations in the future - configurations like attribute specifications that is being discussed

Yep, I think first candidate could be some format version file, ie. a version encapsulating version of whole fmf specification -- similar to debian/compat. Including "something" would avoid the inconvenience caused by the fact that git does not store empty dirs. (And IMHO the thing about versions is that if you have good versioning strategy it's never too soon to start versioning.)

@jscotka
Copy link
Collaborator

jscotka commented May 23, 2018

I've created separate issue with very similar concept #29 where existence od .fmf is implicitly done by first existing *.fmf file in tree. I've done it in my PoC for metadata in this way and it worked well.

But as mentioned by @AloisMahdal in 2 comments back using explicit .fmf dir or file helps to precisely indentify root and avoid unvanted behaviour, but could lead to another issue, what are similar to git submodules means that in case I clone some tests with metadata with .fmf dir to some path where are already fmf metadata with .fmf root item will lead to some behaviour like git submodules.

From that PoV I'm little bit against this idea with special .fmf dir, what marks root, and suggest to use fist main.fmf or somename.fmf file as an root for items.

Otherwise, from concept of these metadata how it is defined and done by fmf tooling. FMF does not care about clean tree structure, but it uses list of trees what Is fine and I'm familiar with this. But we should be aware of that.

@jkrysl
Copy link
Contributor

jkrysl commented May 23, 2018

@jscotka I think the issue with .fmf dir approach you mentioned can be worked around by choosing the right way to look for this directory if we do not like it. If fmf looks top-down (starts at '/'), importing submodules means the .fmf directory of the submodule gets ignored and the whole tree becomes a branch of the 'parent' tree.
But this approach brings another issue, where this could break the sanity of the tree, for example attribute types could mismatch now. So this should be only used at users own discretion...

@jscotka
Copy link
Collaborator

jscotka commented May 23, 2018

I like this idea with .fmf after discussion, just another example with fmf tooling

$ pwd
/home/jscotka/git/fmf

Command

fmf --brief . --format "{} -- {}\n" --value root --value name

Actual result

/home/jscotka/git -- fmf/xxx/a/b/c
/home/jscotka/git -- fmf/examples/deep/one/two/three
/home/jscotka/git -- fmf/examples/merge/parent/child
/home/jscotka/git -- fmf/examples/child/son/grandson
/home/jscotka/git -- fmf/examples/touch
/home/jscotka/git -- fmf/examples/wget/download/test
/home/jscotka/git -- fmf/examples/wget/download/requirements/no-clobber
/home/jscotka/git -- fmf/examples/wget/download/requirements/server-response
/home/jscotka/git -- fmf/examples/wget/download/requirements/output-document
/home/jscotka/git -- fmf/examples/wget/download/requirements/quota
/home/jscotka/git -- fmf/examples/wget/download/requirements/bind-address
/home/jscotka/git -- fmf/examples/wget/download/requirements/spider
/home/jscotka/git -- fmf/examples/wget/download/requirements/tries
/home/jscotka/git -- fmf/examples/wget/download/requirements/continue
/home/jscotka/git -- fmf/examples/wget/download/requirements/timestamping
/home/jscotka/git -- fmf/examples/wget/download/requirements/get-file
/home/jscotka/git -- fmf/examples/wget/download/requirements/progress
/home/jscotka/git -- fmf/examples/wget/recursion/fast
/home/jscotka/git -- fmf/examples/wget/recursion/deep
/home/jscotka/git -- fmf/examples/wget/requirements/download/output-document-file
/home/jscotka/git -- fmf/examples/wget/requirements/download/output-document-pipe
/home/jscotka/git -- fmf/examples/wget/requirements/upload/post-file
/home/jscotka/git -- fmf/examples/wget/requirements/upload/post-data
/home/jscotka/git -- fmf/examples/wget/requirements/protocols/ftp
/home/jscotka/git -- fmf/examples/wget/requirements/protocols/http
/home/jscotka/git -- fmf/examples/wget/requirements/protocols/https
/home/jscotka/git -- fmf/examples/wget/protocols/ftp
/home/jscotka/git -- fmf/examples/wget/protocols/http
/home/jscotka/git -- fmf/examples/wget/protocols/https
/home/jscotka/git -- fmf/examples/scatter/object

Expected result

similar like another little bit ugly command:
fmf --brief . --format "{} -- {}\n" --value 'os.path.dirname(os.path.dirname(sources[0])) if "main" in os.path.basename(sources[0]) else os.path.dirname(sources[0])' --value '(root + "/" + name).replace((os.path.dirname(os.path.dirname(sources[0])) if "main" in os.path.basename(sources[0]) else os.path.dirname(sources[0])) + "/","")'

/home/jscotka/git/fmf -- xxx/a/b/c
/home/jscotka/git/fmf/examples -- deep/one/two/three
/home/jscotka/git/fmf/examples/merge -- parent/child
/home/jscotka/git/fmf/examples -- child/son/grandson
/home/jscotka/git/fmf/examples -- touch
/home/jscotka/git/fmf/examples -- wget/download/test
/home/jscotka/git/fmf/examples -- wget/download/requirements/no-clobber
/home/jscotka/git/fmf/examples -- wget/download/requirements/server-response
/home/jscotka/git/fmf/examples -- wget/download/requirements/output-document
/home/jscotka/git/fmf/examples -- wget/download/requirements/quota
/home/jscotka/git/fmf/examples -- wget/download/requirements/bind-address
/home/jscotka/git/fmf/examples -- wget/download/requirements/spider
/home/jscotka/git/fmf/examples -- wget/download/requirements/tries
/home/jscotka/git/fmf/examples -- wget/download/requirements/continue
/home/jscotka/git/fmf/examples -- wget/download/requirements/timestamping
/home/jscotka/git/fmf/examples -- wget/download/requirements/get-file
/home/jscotka/git/fmf/examples -- wget/download/requirements/progress
/home/jscotka/git/fmf/examples -- wget/recursion/fast
/home/jscotka/git/fmf/examples -- wget/recursion/deep
/home/jscotka/git/fmf/examples -- wget/requirements/download/output-document-file
/home/jscotka/git/fmf/examples -- wget/requirements/download/output-document-pipe
/home/jscotka/git/fmf/examples -- wget/requirements/upload/post-file
/home/jscotka/git/fmf/examples -- wget/requirements/upload/post-data
/home/jscotka/git/fmf/examples -- wget/requirements/protocols/ftp
/home/jscotka/git/fmf/examples -- wget/requirements/protocols/http
/home/jscotka/git/fmf/examples -- wget/requirements/protocols/https
/home/jscotka/git/fmf/examples -- wget/protocols/ftp
/home/jscotka/git/fmf/examples -- wget/protocols/http
/home/jscotka/git/fmf/examples -- wget/protocols/https
/home/jscotka/git/fmf/examples -- scatter/object

So both commands with root + name creates same path, but second one give you cleaner way what means root element.
It means that it creates list of trees with various root elements

It also will helps and leads to same behavior like using fmf with more directories like fmf path1 path2

@AloisMahdal
Copy link
Contributor Author

About the "first main.fmf", it boils down to identifying which one is the first, as each strategy has its own problems; I think I've shown them above.

I put on my designer hat and tried to think this through: Can't really put my finger on it, but my intuition tells me that using the same element (main.fmf) to both insert data with cascade inheritance and delimit scope of that cascading is something that can't be done.


About the subdirectory, I think the intuitive solution would be: just treat it as if it was completely different tree. For example:

/
└── blueproj
    ├── .fmf
    ├── bar
    │   └── redproj
    │       ├── .fmf
    │       ├── main.fmf        # red
    │       ├── quux
    │       │   └── main.fmf    # red
    │       └── qux
    │           └── main.fmf    # red
    ├── foo
    │   └── main.fmf    # blue
    └── main.fmf        # blue

there's actually no conflict:

  • both "blueproj" and "redproj" have well-defined roots,

  • data don't get mixed up,

  • the behavior fmf in the topmost dir could be:

    • (A) throw error because we're not in fmf structure, but if called from blueproj, just list the blue data.

    • (B) look into subdirs and list items from both blueproj or redproj (a-la find) but don't ever mix them up (ie. the "red" fmf's don't inherit anything from the blue ones)

    • (C) find the blueproj/.fmf and just list blue data and ignore the red (maybe just mention the fact it exists as verbose stderr) sort-of like git-submodules.

(A) seems to me as the safest and most intuitive one.

Looking into subdirs (C) might look useful but it's not worth the extra complexity (if user is interested in collecting data from more project, it's trivial to do it explicitly by find -type d -name .fmf)

(Note that if the version file was included, this could work pretty smoothly even if blue and red are based on different version of the fmf specification itself: /usr/bin/fmf would just have to decide if it supports it.)

@psss psss changed the title Inherit from upper in the tree Define metadata tree root May 24, 2018
@psss
Copy link
Collaborator

psss commented May 24, 2018

Thanks for the nice example, Alois. I agree with your choice (A). This nicely limits the scope to a single metadata tree. We can always combine multiple trees using additional tooling (like find). It seems to me that directory discovery should be out-of-scope for fmf (do one thing and do it well).

So far I'm not convinced that we need to nest metadata trees. But if there is a clear use case we can definitely support ignoring nested tree (well-defined by the .fmf directory) or merging the data as you described above.

Regarding the implemention I would suggest to modify the Tree class to accept directory path to be explored. This does not have to be the root (top) directory of the metadata tree but it should be under the metadata tree root, which would be automatically detected. Otherwise error would be thrown.

Command line tool can keep current behaviour: Either it receives desired path(s) as argument or defaults to the current working directory. In both cases discovery of the .fmf directory would be done. So this would not be backward-compatible.

The .fmf directory would contain a config file which would be in YAML format and would contain at least name and version. Name would serve as the top level identifier (first part of the object names, currently detected from the root directory name).

We should probably also have an easy way to create the config. Something similar to git init? Which brings another question: Shall fmf support subcommands like fmf init or shall we use something like fmf --init?

@psss
Copy link
Collaborator

psss commented May 24, 2018

@jscotka, regarding your example: For now I would suggest to prevent tree nesting and rather handle metadata trees individually. That is to have a forest of metadata trees as separate directories which may be stored in a common parent directory or scattered across the filesystem as needed. If the .fmf directory is not detected an error should be thrown.

@jscotka
Copy link
Collaborator

jscotka commented May 24, 2018

@psss I'm fine with this solution.
I've just shown that it is relatively very easy to detect fist source file and use it as root element and create this via fmf command and I understand that this could lead to unexpected behaviour.
btw, that name that you changes name via config, could be very tricky, current dir name is much cleaner solution.
see example:

you have dir structure like

.
└── wget
    └── tests
        ├── .fmf
        │   └── config
        ├── Sanity
        │   └── simple
        │       ├── main.fmf
        │       └── runtest.sh
        └── wget-upstream
            ├── .fmf
            │   └── config
            └── testsuite1
                ├── main.fmf
                └── script.py

and in config you redefine fist name=wget, and it generates then name like wget/Sanity/simple but real dir path is tests/Sanity/simple
so then triple root + name + data['test'] (or alternatively as we've discussed root + data.get('path', name) + data['test']) will produce bad path location, there will be just one possibility how to find the path and it is use sources[-1] as path location in this case, or replace then wget with tests path for test location

summary

  • pros
    • I'm fine with .fmf as identifier or root element (but without other feature also fist *.fmf is fine to find root)
    • I'm for to ignore any other .fmf file in subree, but append elements to this tree, (if I download for example to wget/tests/wget-upstream upstream testsuite I'm still able to insert this tree to basic tree and use metadata there and possibly reference some path like wget/tests/wget-upstream/testsuite1/ also in internal test metadata), so that include metadata, just ignore this .fmf dir, and use just one on the top.
  • cons
    • I'm against using it for name configuration -> leads to nontrivial complexity with paths and need semantics, how to solve it
  • I would like to see tool like fmf-init, because something like fmf init completely change behavior from nowadays, but maybe now there is right time to do it, and then for example have commands like:
    • fmf init - initialize and create minimal configs (it is more less my issue about templating and tranforming Tranformation and templating tool #8)
    • fmf show - basic show of metadata trees
    • fmf format - instead of --format parameter expects format string and appropriate --values

@AloisMahdal
Copy link
Contributor Author

Nice.

Regarding .fmf/config; I'm not against it, but OTOH,

  • It was intentional in my suggestion above, to keep 'version' in simple ASCII file. Format version is the fundamental thing that defines everything else (file names, serialization formats, keys...) so it's a good practice to make sure everyone can read it; even things that yet need to decide what to do (whether even call fmf, let alone yaml parser).

  • Also I don't see the need for the name key, at least not right away. If nested trees are going to be ignored by default, fmf can just use relpath like it does now; there should be no confusion IIUC. (Note that git also does nit have this concept.)

Regarding the CLI, IMHO fmf binary does not necesarily have to have this init function; at least not for now. I do agree with @jscotka that meta-command style might be better (I imagine fmf init, fmf ls, fmf show [--format FMT] [ITEM].. fmf find...), but let's keep that a separate questuion/PR (I understand that fmf is in early phase so I'm ok with things being moving, so from my POV there's no need to hurry.)

@psss
Copy link
Collaborator

psss commented May 29, 2018

I see your point with the plain text version file. This seems fine/reasonable. Regarding the name, I think we need to re-think a bit the concept of the identifiers. Especially if we consider merging multiple metadata trees in the future we somehow need to be able to match the naming. One of the options could be to just forget about the first segment of the identifier and call the top object as /. But that's another issue. Let's keep the scope of this issue to defining the root. To sum up the proposal:

  • Directory called .fmf will define the metadata tree root (it will be required)
  • It will contain plain text file version with fmf format version number (integer)
  • Class Tree will accept directory path anywhere in the metadata tree
  • Tree root discovery will be done from bottom to top
  • An error is thrown if there is no .fmf directory found

I've created a new issue #32 to add support for subcommands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants