MRM3: Machine Readable ML Model Metadata

This project helps manage and analyze information about machine learning models (model metadata) by storing it in a knowledge graph database. When training ML models, we collect important details like how well the model performs, how much energy it uses, what kind of hardware it needs, and what data it was trained on. By organizing all this information in a graph structure, we make it easy to find the right model for specific needs, compare different models, and understand their environmental impact. The project includes schemas on how to collect this metadata in a standard format and store it in a Neo4j graph database, which is a knowledge graph that can then be queried to answer questions about the models. It enables use cases presented below.

Installation

Clone the repository:
Install dependencies:
Configure environment variables:

cp .env.template .env

Edit .env with your Neo4j credentials and desired configuration settings. (See how to deploy a Neo4j instance using Docker below)

The following environment variables can be configured in your .env file:

# Neo4j Configuration
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password-here

# Application Settings
JSON_DIR=./localization_model_metadata_dataset

Usage

Run the following command to deploy a Neo4j instance using Docker (optional):

docker run --name neo4j -p 7474:7474 -p 7687:7687 -d -e NEO4J_AUTH=neo4j/your-password-here neo4j:latest

Place your ML model metadata JSON files in the configured JSON_DIR directory.
- (optional) If you don't have any model metadata yet, you can use the example localization model metadata JSON files provided in the localization_model_metadata_dataset/compressed.zip file. Extract and proceed to step 4.
Ensure your JSON files conform to the schema defined in json_schema/model_card_schema.json. You can use the json_schema/schema_validator.py script to validate your JSON files.
Run the script to insert metadata from JSON files into the Neo4j knowledge graph:

python json_to_KG.py

Graph Structure

The knowledge graph implements the following structure:

Node Types

Model: Core node containing model name, version, description, and author information
Dataset: Training data information including name, size, version, and date
Service: Defines the service the model provides (e.g., localization, text generation)
ProblemType: Specifies the ML problem type (regression, classification, etc.)
ModelArchitecture: Details about the model's architecture
Device: Hardware specifications for training/inference
ModelTraining: Training metrics and parameters
ModelInference: Inference performance data
Parameters: Training parameters including optimizer and split type
Hyperparameters: Model hyperparameters such as learning rate, batch size

Relationships

TRAINED_ON: Connects Model to Dataset
PROVIDES: Links Model to Service
UTILIZES: Associates Model with ModelArchitecture
RUNS_ON: Connects ModelTraining/ModelInference to Device
TRAINS_ON: Links ModelTraining to Model
SOLUTION_FOR: Connects Service to ProblemType
CONTAINS: Links ModelTraining to Parameters
USES: Connects ModelTraining to Hyperparameters
INFERENCE_ON: Links ModelInference to Model

Ontology Diagram

The following diagram illustrates the ontology of the knowledge graph:

Use Cases

The ML model metadata knowledge graph enables several key applications, from environmental impact optimization to intelligent model deployment and natural language-based exploration.

Green Computing Applications

Intelligent selection of ML models based on energy consumption and carbon footprint metrics through both Neo4j queries and potentially graph neural networks (GNN)
Analysis of how model architectures and datasets affect carbon footprint
Workload shifting capabilities to reduce overall environmental impact

Intelligent ML Model Orchestration

Comprehensive knowledge of training and inference metrics (latency, computational complexity, carbon footprint, accuracy)
Possible integration with MLOps systems for optimized training and deployment on distributed infrastructure
This could enhance performance on edge devices through improved model orchestration and improve resource utilization based on infrastructure capabilities

RAG

Natural language interaction through Graph Retrieval Augmented Generation (RAG)
Generation of insights about model relationships and performance patterns
Interactive exploration of model metadata through graph-based queries

Example Queries

1. Find models with the lowest energy consumption, output dataset name, architecture and FLOPs

MATCH (m:Model)-[:TRAINED_ON]->(d:Dataset)
MATCH (m)-[:UTILIZES]->(a:ModelArchitecture)
MATCH (i:ModelInference)-[:INFERENCE_ON]->(m)
RETURN m.name, 
       a.type as architecture,
       d.name as dataset,
       i.energyConsumption,
       i.flops
ORDER BY i.energyConsumption ASC

2. Find models with the lowest carbon footprint

MATCH (m:Model)-[:TRAINED_ON]->(d:Dataset)
MATCH (m)-[:UTILIZES]->(a:ModelArchitecture)
MATCH (t:ModelTraining)-[:TRAINS_ON]->(m)
WHERE d.name = "lumos5g"
RETURN m.name, t.carbonFootprint, t.mae_mean
ORDER BY t.carbonFootprint ASC

License

This project is licensed under the BSD-3 Clause License - see the LICENSE file for details.

Citation

Please cite our paper as follows:

@misc{mrm3,
      title={MRM3: Machine Readable ML Model Metadata}, 
      author={Andrej Čop and Blaž Bertalanič and Marko Grobelnik and Carolina Fortuna},
      year={2025},
      eprint={2505.13343},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2505.13343}, 
}

The example json dataset is also available on zenodo: 10.5281/zenodo.15235417. Please cite as follows:

@dataset{cop_2025_15235417,
  author       = {Čop, Andrej and Bertalanic, Blaz and Fortuna, Carolina},
  title        = {MRM3: Machine Readable ML Model Metadata},
  month        = apr,
  year         = 2025,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.15235417},
  url          = {https://doi.org/10.5281/zenodo.15235417},
}

Acknowledgments

This work was supported by the Slovenian Research Agency (P20016) and the European Commission NANCY project (No. 101096456).

We would like to acknowledge the SensorLab team for developing the localization models, and Gregor Cerar and Tim Strnad for developing an automated training pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
fig		fig
json_schema		json_schema
localization_model_metadata_dataset		localization_model_metadata_dataset
.env.template		.env.template
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
json_to_KG.py		json_to_KG.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MRM3: Machine Readable ML Model Metadata

Installation

Usage

Graph Structure

Node Types

Relationships

Ontology Diagram

Use Cases

Green Computing Applications

Intelligent ML Model Orchestration

RAG

Example Queries

1. Find models with the lowest energy consumption, output dataset name, architecture and FLOPs

2. Find models with the lowest carbon footprint

License

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

sensorlab/MRM3

Folders and files

Latest commit

History

Repository files navigation

MRM3: Machine Readable ML Model Metadata

Installation

Usage

Graph Structure

Node Types

Relationships

Ontology Diagram

Use Cases

Green Computing Applications

Intelligent ML Model Orchestration

RAG

Example Queries

1. Find models with the lowest energy consumption, output dataset name, architecture and FLOPs

2. Find models with the lowest carbon footprint

License

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages