Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Data lifecycle image and minor changes #887

Merged
merged 13 commits into from
Jul 4, 2022
Merged
19 changes: 13 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,25 @@

# Overview

Versatile Data Kit is a framework which enables Data Engineers to develop, deploy, run and manage Data Jobs. **A Data Job is a data processing workload** and can be written in Python, SQL, or both at the same time. A Data Job enables Data Engineers to implement automated pull ingestion (E in ELT) and batch data transformation (T in ELT) into a database or any type of data storage.
Versatile Data Kit (VDK) is an open source framework that enables anybody with basic SQL or Python knowledge to create their own data pipelines.

Versatile Data Kit enables Data Engineers to develop, deploy, run and manage Data Jobs. **A Data Job is a data processing workload** and can be written in Python, SQL, or both at the same time. A Data Job enables Data Engineers to implement automated pull ingestion (E in ELT) and batch data transformation (T in ELT) into a database or any type of data storage.

Versatile Data Kit consists of two main components:

* A **Data SDK**, which provides all tools for the automation of data extraction, transformation and loading, as well as a plugin framework which allows users to extend the framework according to their specific requirements.
* A **Control Service**, which allows users to create, deploy, manage and execute Data Jobs in Kubernetes runtime environment.
* A **Data SDK** provides all tools for the automation of data extraction, transformation, and loading, as well as a plugin framework that allows users to extend the framework according to their specific requirements.
* A **Control Service** allows users to create, deploy, manage and execute Data Jobs in Kubernetes runtime environment.

To help solve common data engineering problems Versatile Data Kit:
* allows ingestion of data from different sources including CSV files, JSON objects, data provided by REST API services, etc.;
* ensures data applications are packaged, versioned and deployed correctly, while dealing with credentials, retries, reconnects, etc.;
* allows ingestion of data from different sources, including CSV files, JSON objects, data provided by REST API services, etc.;
* ensures data applications are packaged, versioned, and deployed correctly while dealing with credentials, retries, reconnects, etc.;
* provides built-in monitoring and smart notification capabilities;
* tracks both code and data modifications and the relations between them enabling engineers to troubleshoot faster as well as providing an easy revert to a stable version.
* tracks both code and data modifications and the relations between them, enabling engineers to troubleshoot faster and providing an easy revert to a stable version.


#### Data Journey and where VDK fits in
![Data Journey](./support/images/versatile-data-kit-data-journey.svg#gh-light-mode-only)
![Data Journey](./support/images/versatile-data-kit-data-journey-dark-mode.svg#gh-dark-mode-only)

# Installation and Getting Started

Expand Down
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions support/images/versatile-data-kit-data-journey.svg
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.