GitHub - MobileTeleSystems/data-rentgen: NextGen DataMotion Lineage

What is Data.Rentgen?

Data.Rentgen is a Data Motion Lineage service, compatible with OpenLineage specification.

Currently we support consuming lineage from:

Apache Spark
Apache Airflow
Apache Hive
Apache Flink
dbt

Note: service is under active development, so it doesn't have stable API for now.

Goals

Collect lineage events produced by OpenLineage clients & integrations.
Store operation-grained events for better detalization (instead of job grained Marquez).
Provide API for fetching both job/run ↔ dataset lineage and dataset ↔ dataset lineage.

Features

Support consuming large amounts of lineage events, use Apache Kafka as event buffer.
Store data in tables partitioned by event timestamp, to speed up lineage graph resolution.
Lineage graph is build with user-specified time boundaries (unlike Marquez where lineage is build only for last job run).
Lineage graph can be build with different granularity. e.g. merge all individual Spark commands into Spark applicationId or Spark applicationName.
Column-level lineage support.
Authentication support.

Non-goals

This is not a Data Catalog. DataRentgen doesn't track dataset schema change, owner and so on. Use Datahub or OpenMetadata instead.
Static Data Lineage like view → table is not supported.

Limitations

OpenLineage have integrations with Trino, Debezium and some other lineage sources. DataRentgen support may be added later.
Unlike Marquez, DataRentgen parses only limited set of facets send by OpenLineage, and doesn't store custom facets. This can be changed in future.

Name		Name	Last commit message	Last commit date
Latest commit History 475 Commits
.github		.github
data_rentgen		data_rentgen
docker		docker
docs		docs
tests		tests
.dockerignore		.dockerignore
.env.docker		.env.docker
.env.local		.env.local
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
.spdx-license-header.txt		.spdx-license-header.txt
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENSE.txt		LICENSE.txt
Makefile		Makefile
README.rst		README.rst
SECURITY.rst		SECURITY.rst
codecov.yml		codecov.yml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What is Data.Rentgen?

Goals

Features

Non-goals

Limitations

Documentation

Screenshots

Lineage graph

Datasets

Runs

Spark application

Spark run

Spark operation

Airflow DagRun

Airflow TaskInstance

About

Uh oh!

Releases 3

Uh oh!

Contributors 4

Languages

License

MobileTeleSystems/data-rentgen

Folders and files

Latest commit

History

Repository files navigation

What is Data.Rentgen?

Goals

Features

Non-goals

Limitations

Documentation

Screenshots

Lineage graph

Datasets

Runs

Spark application

Spark run

Spark operation

Airflow DagRun

Airflow TaskInstance

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Uh oh!

Contributors 4

Languages