Skip to content

Add support for global metadata#67

Merged
sebastiankreutzer merged 9 commits intotudasc:develfrom
sebastiankreutzer:feat/global_md
Aug 7, 2025
Merged

Add support for global metadata#67
sebastiankreutzer merged 9 commits intotudasc:develfrom
sebastiankreutzer:feat/global_md

Conversation

@sebastiankreutzer
Copy link
Member

@sebastiankreutzer sebastiankreutzer commented Aug 5, 2025

This patch adds support for attaching global metadata to a call graph.
This is metadata that doesn't belong to one specific node, but to the graph as a whole.

The implementation requires some slight changes to the v4 file format. Despite this, @pearzt and I came to the conclusion that it is preferable to not increase the format minor version, since v4 has not had an official release.
Inside the graph lib, metadata management is now implemented in a mixin class that is used by both the CgNode and Callgraph classes.

As an initial use case, I extended the getMain() utility function to first query for an entryFunction metadata entry.
If such metadata is available, the explicitly provided node is used instead of trying to find the main function by name.
This is particularly useful for Fortran, since there is no notion of a special main function.

The following is an example call graph using the new entryFunction metadata:

{
  "_CG": {
    "meta": {
      "entryFunction": "0"
    },
    "nodes": {
      "0": {
        "callees": null,
        "functionName": "thisIsTheMainFunction",
        "hasBody": true,
        "meta": {},
        "origin": "main.cpp"
      }
    }
  },
  "_MetaCG": {
    "generator": {
      "name": "MetaCG",
      "sha": "4ad73ebf7a108aa388d232ee4824bada13feaa4c",
      "version": "0.8"
    },
    "version": "4.0"
  }
}

Alongside the implementation, this patch includes both unit and integration tests.

@sebastiankreutzer
Copy link
Member Author

The MetaData class needed a minor adjustment to support merging of global metadata.
The merge function has been changed by wrapping the MergeAction in a std::optional.
This way, the metadata object knows if the merge is associated with a CgNode or is at a graph level.

Copy link
Member

@jplehr jplehr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the changes are a good direction, but I disagree that this is file format v4.
This is a breaking change.

So, I would prefer to bump the version to v5 instead. What does a versioned file format help us, if we can have two files in a particular format version that are incompatible with each other / the reader that says "I can handle this version of the file format"?

MetadataMixin(MetadataMixin&&) noexcept = default;
MetadataMixin& operator=(MetadataMixin&&) noexcept = default;

~MetadataMixin() = default;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be marked virtual?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it non-virtual on purpose, because the mixin class is not supposed to be used for dynamic polymorphism.
I.e. you would never call something like:

MetadataMixin* base = new Callgraph();
delete base;

@pearzt
Copy link
Member

pearzt commented Aug 6, 2025

It is indeed a breaking change to the file format. Still, I would be fine with just calling this v4, as v4 has not been part of a release (or even the master branch), yet. I guess it comes down to the question whether we consider a file format to be final just because it has been merged into devel. IMO, a file format would be allowed to evolve until it is merged into master. Trying to keep compatibility to every revision that has ever been on devel seems over the top to me.

Anyhow, I'm fine with calling it v5 if that's what we agree on.

@jplehr
Copy link
Member

jplehr commented Aug 6, 2025

Since people are also working on devel or using devel this is a breaking change IMHO.
So, even in the case that this has not been part of the release, I think we should just not have two different implementations of the same format version.

If implementation burden is the problem, then we should fix that instead of having incompatible versions of the same version.

@sebastiankreutzer
Copy link
Member Author

I disagree with this somewhat strongly.
What is the point of having a devel branch if every commit is required to be a self-contained mini-release?
We also don't bump up cgcollector's version on every fix or added feature, even though the output may change.
IMO it should be clear to users that unreleased development versions are subject to change.

One reason I didn't add this to the initial V4 PR is that I didn't want to put even more stuff into this one patch.
Of course, you could argue that we could have thought about this extension before and future proved the initial format proposal, e.g. by adding the nodes entry from the get-go.
But there will always be things that you can't foresee and may want/need to change in hindsight.
I worry that bumping up the format version every time that happens will end up leaving us with 20 different versions, where 18 of those are not supported because they were never intended to be used in stable production.

@jplehr
Copy link
Member

jplehr commented Aug 6, 2025

The reasoning for it being that in a perfect world, a user hands a file to the library and the library knows how to parse it. That is not the case with such changes, because now it depends on the point in time you built the library.

I completely understand the question about why have a devel branch then and I don't think you can always foresee what you need. So, yes, you need a moving target.

The motivation with the file format versions was initially more along the lines of something rather internal. So, we can bump the file format version as often as we want. This may / will lead to jumps in the file format version between releases, e.g., going from 3 to 6 or whatever. This is motivated with some painful experiences in "the early days" where things simply broke because of the evolving format and one could no longer reproduce / use the tools.
That likely has evolved into something more "standardese" in nature, or at least is perceived as such, and hence only needs to be "final" or "stable" for a release -- which is different from the original motivation. Not wrong of course, but different.
Given that it is used across tools as a command line argument is not necessarily by design, but due to the fact that historically not all components were based on the graph lib and so it needed to be specified explicitly. That probably added to the perception of it being a "public definition".

In any case, if the motivation of the file format version being something rather internal is perceived as wrong and it should instead be something that is only meaningful in releases, then OK. I won't oppose that direction.
I still want to point out the potential annoyances this can bring with files generated between such changes on devel that may simply not be apparent initially.

@jplehr
Copy link
Member

jplehr commented Aug 7, 2025

We agreed in an offline discussion that going forward, the file format will only be considered "fix" in release versions.

@sebastiankreutzer sebastiankreutzer merged commit 9e0a0c1 into tudasc:devel Aug 7, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants