Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specs: add architecture.md #2265

Merged
merged 4 commits into from
Jul 27, 2023
Merged

specs: add architecture.md #2265

merged 4 commits into from
Jul 27, 2023

Conversation

antoniivanov
Copy link
Collaborator

@antoniivanov antoniivanov commented Jun 15, 2023

The goal of the architecture file is to provide a comprehensive overview of the VDK's architecture, components, and their relationships, serving as a reference document for developers, contributors, stakeholders, and users to understand the system's structure and make informed decisions.

The diagrams are based on C4 model https://c4model.com

The architecture is based on OnePage documents we use in VMware and we (Yoan (@yonitoo) and I) used the following architecture overviews as an example of good (data related) open source project arch documents: argo, dbt, airbyte, metaflow, dagster, flyte

Closes #858

Signed-off-by: Antoni Ivanov aivanov@vmware.com
Co-authored-by: Yoan Salambashev ysalambashev@vmware.com

@antoniivanov antoniivanov changed the title vdk-specs: add architecture.md specs: add architecture.md Jun 15, 2023
@antoniivanov antoniivanov changed the title specs: add architecture.md specs1: add architecture.md Jun 15, 2023
@antoniivanov antoniivanov changed the title specs1: add architecture.md specs: add architecture.md Jun 15, 2023
@antoniivanov antoniivanov changed the title specs: add architecture.md specs1: add architecture.md Jun 15, 2023
@antoniivanov antoniivanov changed the title specs1: add architecture.md specs: add architecture.md Jun 15, 2023
@dakodakov
Copy link
Collaborator

I believe the following things are missing:

  • "glossary" section at the top of the page - it might contain links to other pages, but I believe it would be beneficial.
  • Out of this document, it's not clear, that the Control Service itself is running in a k8s cluster. I believe we should state that explicitly
  • it's worth clarifying that when a user creates a data job, a data job container image is created for it and deployed as a cron job - this is roughly explained in the builder job/data job sections, however I don't think it's going to be clear to a new/user
  • VDK SDK Component Diagram - how about explaining with a couple of sentences what each rectangle in this diagram does?

@mivanov1988
Copy link
Contributor

I believe the following things are missing:

  • Out of this document, it's not clear, that the Control Service itself is running in a k8s cluster. I believe we should state that explicitly

Including a deployment diagram or a visual representation illustrating a typical Control Service deployment could be advantageous. The inclusion of such a diagram or visual representation will enable the reader to gain a clear understanding of the location of all the components.

@mivanov1988
Copy link
Contributor

mivanov1988 commented Jun 19, 2023

To be honest, the style of the diagram is too unconventional for me.

Copy link
Contributor

@mivanov1988 mivanov1988 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@antoniivanov
Copy link
Collaborator Author

image

I believe the following things are missing:

  • "glossary" section at the top of the page - it might contain links to other pages, but I believe it would be beneficial.
  • Out of this document, it's not clear, that the Control Service itself is running in a k8s cluster. I believe we should state that explicitly
  • it's worth clarifying that when a user creates a data job, a data job container image is created for it and deployed as a cron job - this is roughly explained in the builder job/data job sections, however I don't think it's going to be clear to a new/user
  • VDK SDK Component Diagram - how about explaining with a couple of sentences what each rectangle in this diagram does?

Addressed all parts. I did not add Glossary but I have added links to the Dictionary - https://github.com/vmware/versatile-data-kit/wiki/dictionary in a couple of places.

@antoniivanov
Copy link
Collaborator Author

antoniivanov commented Jul 16, 2023

To be honest, the style of the diagram is too unconventional for me.

https://c4model.com/ is getting more traction so I hope it's going to be familiar more and more people. ALso I find it simple to understand and learn. So I will keep it. I tried to change most diagram slightly to make them slightly more conventional-looking but not too much it is still following https://c4model.com/

@antoniivanov antoniivanov force-pushed the person/aivanov/architecture branch 6 times, most recently from 0c4c46e to 9e7d0c4 Compare July 18, 2023 08:19
@sabadzhiev
Copy link

sabadzhiev commented Jul 18, 2023

#2: The levels explanation between the 'Control Service ' and 'VDK SDK' looks different:

  • Control Service: System context + Container diagram
  • VDK SDK: System context + Component diagram.
    Not sure if this was intentional, but brings me the feeling that the document is not complete.

In the c4 model, a container is a separately runnable/deployable unit (e.g. a separate process space) that executes code or stores data. The whole SDK is a single container so its components are the next level of detail.

specs/architecture/README.md Outdated Show resolved Hide resolved
@antoniivanov antoniivanov force-pushed the person/aivanov/architecture branch 2 times, most recently from 73af79d to 0e14ad9 Compare July 20, 2023 15:53
@antoniivanov antoniivanov force-pushed the person/aivanov/architecture branch 2 times, most recently from 777b295 to 489dd7c Compare July 27, 2023 07:06
@murphp15
Copy link
Collaborator

I'm surprised you don't go into the helm arch deployment.
This would address the comments above about saying the control service runs in k8s.

@antoniivanov
Copy link
Collaborator Author

I'm surprised you don't go into the helm arch deployment. This would address the comments above about saying the control service runs in k8s.

What do you mean ?

antoniivanov and others added 3 commits July 27, 2023 17:42
The goal of the architecture file is to provide a comprehensive overview
of the VDK's architecture, components, and their relationships, serving
as a reference document for developers, contributors, stakeholders, and
users to understand the system's structure and make informed decisions.

The diagrams are based on C4 model https://c4model.com

The architecture is based on OnePage documents and we (Yoan and I) used
the following architecture overviews as an example of good (data
related) open source project arch documents:

- https://argoproj.github.io/argo-workflows/architecture/
- https://docs.getdbt.com/docs/cloud/about-cloud/architecture
- https://airflow.apache.org/
docs/apache-airflow/stable/core-concepts/overview.html
- https://docs.airbyte.com/understanding-airbyte/high-level-view/
- https://docs.metaflow.org/internals/technical-overview
- https://docs.dagster.io/deployment/overview
- https://docs.flyte.org/en/latest/concepts/architecture.html

Closes #858

Signed-off-by: Antoni Ivanov <aivanov@vmware.com>
The goal of the architecture file is to provide a comprehensive overview
of the VDK's architecture, components, and their relationships, serving
as a reference document for developers, contributors, stakeholders, and
users to understand the system's structure and make informed decisions.

The diagrams are based on C4 model https://c4model.com

The architecture is based on OnePage documents and we (Yoan and I) used
the following architecture overviews as an example of good (data
related) open source project arch documents:

- https://argoproj.github.io/argo-workflows/architecture/
- https://docs.getdbt.com/docs/cloud/about-cloud/architecture
- https://airflow.apache.org/
docs/apache-airflow/stable/core-concepts/overview.html
- https://docs.airbyte.com/understanding-airbyte/high-level-view/
- https://docs.metaflow.org/internals/technical-overview
- https://docs.dagster.io/deployment/overview
- https://docs.flyte.org/en/latest/concepts/architecture.html

Closes #858

Signed-off-by: Antoni Ivanov <aivanov@vmware.com>
@antoniivanov antoniivanov merged commit a77390a into main Jul 27, 2023
8 checks passed
@antoniivanov antoniivanov deleted the person/aivanov/architecture branch July 27, 2023 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an architecture diagram
6 participants