Skip to content

[DSIP-79][Task] Add Datavines task to better support data quality #16113

Open
@xxzuo

Description

@xxzuo

Search before asking

  • I had searched in the DSIP and found no similar DSIP.

Motivation

DataVines is an easy-to-use data quality service platform that supports multiple metric.
https://github.com/datavane/datavines

  • Datavines supports executing multiple metrics in one job.
  • Datavines supports execution status dashboard and data quality report.
  • Datavines supports plug-in extensions for components such as metric, data sources, error data storage, and execution engines.
  • Jdbc engines can be used to execute data quality tasks instead of solely relying on Spark engines.

Design Detail

Sript mode

  1. config data quality job in datavines
    image

  2. get the job config scipt file

  3. Add datavines job node in workflow, and configure the script
    image

API Mode

  1. config data quality job in datavines
    image

  2. get the jobId

  3. Add datavines job node in workflow, and configure the datavines api address and jobId

Compatibility, Deprecation, and Migration Plan

No response

Test Plan

No response

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions