Skip to content

Getting started workflow toolings research

Antoni Ivanov edited this page Aug 25, 2023 · 19 revisions

Table of contents generated with markdown-toc

Apache Spring

Apache Spring is one of the most popular frameworks for building enterprise Java applications. The Spring team has invested considerable effort into making the developer experience smooth and efficient. This document will provide a brief walkthrough of the getting started workflow with Spring and highlight lessons that can be adopted for VDK.

Getting Started Workflow

Spring Initializr (choose your modules)

The starting point for most Spring projects, Spring Initializr - https://start.spring.io/ - provides a web-based interface for bootstrapping a new Spring application. Users select the desired build tool (Maven/Gradle), language (Java/Kotlin/Groovy), and dependencies/modules (Spring MVC for web apps, Spring Data for data access, Spring Security for authentication, etc.), and Initializr generates a project skeleton for them.

Modularity:

Spring is designed in a modular fashion. This allows developers to pick and choose only what they need. Spring doesn't lock users into a particular way of doing things. It offers flexibility in choosing tools, databases, etc

In VDK plugins offer similar modularity in theory.

Spring Boot

https://spring.io/guides/gs/spring-boot/

Simplifies the bootstrapping process. It provides conventions for application setup and configurations, reducing the need for boilerplate code. Spring Boot also offers an embedded server so that developers can run the application immediately without external server setup.

In VDK, quickstart-vdk serves the same way in theory.

Spring Actuator

Spring Actuator is a sub-project of Spring Boot that provides production-ready features to help monitor and manage application health, metrics, info, and more. One of the more notable features is its set of endpoints to retrieve application operational information - e.g see how the project is configured, what beans are active, metrics about the project, health info.

Convention Over Configuration

Spring prioritizes conventions. This reduces the amount of boilerplate and configuration code, streamlining the development process. Adopting a similar strategy can make our tool more user-friendly.

Examples

  • Configuration conventions:

    • By default, Spring looks for properties in files named application.properties or application.yml.
    • Profile follows the convention of application-{profile}.properties.
    • Spring follows convention of accepting properties from various sources like system properties, environment variables, command-line arguments, and property files. It resolves them in a specific order, allowing overrides
    • Spring Boot, specifically, provides conventions for configuring a data source by simply defining properties like spring.datasource.url, spring.datasource.username, etc., without needing any additional configuration class . As long library is in the classpath, things just work.
  • By following naming conventions and annotations, Spring automatically detects and registers beans, and automatically inject them reducing manual wiring.

  • Without configuration, Spring Boot can automatically connect to a database if the right dependencies are in the classpath only.

  • Spring Boot can automatically start an embedded server with some sensible default settings.

  • JPA is another good example

    • By default, each entity class in JPA corresponds to a table. The table's name is derived from the class name.
    • Each non-static, non-transient field in an entity class is mapped to a column
    • By naming methods according to specific patterns, Spring Data JPA can infer the DB query (findById(String id))

Validation and Autocompletion

Integrated development environments (IDEs) like IntelliJ IDEA and Eclipse support validation and autocompletion for Spring configurations.

Environment Profile Management

Spring allows multiple environment configurations (like 'dev', 'prod'). This enables the same codebase to behave differently based on the environment it's running in

Simple start in IDE without extra plugins

Entry point is

	public static void main(String[] args) {
		SpringApplication.run(Application.class, args);
	}

not requiring specific command

Exception handling

Consistent exception hierarchy

https://docs.spring.io/spring-framework/docs/3.0.0.M4/reference/html/ch11s02.html

Spring provides a consistent exception hierarchy across its modules. For instance, the DataAccessException is used across JDBC, JPA, Hibernate, and other data access technologies. This allows developers to handle exceptions in a consistent manner, regardless of the underlying data access technology.

Spring translates exceptions from various data access technologies (e.g., JDBC, Hibernate, JPA) into a consistent set of exceptions, such as DataAccessException. This allows developers to handle data access exceptions in a consistent manner, regardless of the underlying technology.

Centralized Exception Handling

https://spring.io/blog/2013/11/01/exception-handling-in-spring-mvc

Spring MVC has @ControllerAdvice and @ExceptionHandler annotations. With these, developers can handle exceptions globally across controllers or locally within a specific controller, respectively. This centralized approach helps in returning standardized error responses and reduces duplicate error-handling code.

Extensible

While Spring provides its own set of exceptions and mechanisms, it also allows developers to define custom exceptions and handlers. This ensures that the framework can be tailored to specific needs.

Ideas based on Spring

Introduce a VDK Initializer

Like Spring Initializer, provide a GUI or CLI tool that sets up the basic structure for VDK, pre-populated with sensible defaults.

Guided Workflow

A step-by-step process, with explanations, can guide new users through the setup and deployment processes

Config File Alternatives

Allow users to specify configuration in formats more widely adopted, like YAML or TOML. These formats are less error-prone due to their structured nature.

Config Profiles

Like Spring's environment profiles, allow users to define configuration profiles. This way, configurations can be set once and reused across different environments

Runtime Validation

Implement a runtime check for configurations, similar to Spring. Alert the user if there are unknown or deprecated properties

Config/Metrics UI like Spring actuator

Conventions idea

Directory Structure

By defining a default directory structure for VDK projects, users can quickly set up and understand projects. For example:

/configs: For configuration files
/csv: for csv files 
/data: For input/output data
/steps/python: For defining individual python tasks
/steps/sql: for defining individual SQL tasks 

Maybe something else. This is just illustrative.

Naming conventions

If a job is named "data_cleaning", VDK can automatically look for configurations named "data_cleaning.config"

Or look at the section "data_cleaning" in HOME/.vdk/config

or other similar conventions. Assume well-known industry conventions and avoid making your own

Sensible defaults

If a user doesn’t specify certain configuration parameters, VDK should have sensible defaults that it falls back on

Non CLI entry point option

Already exists in Notebooks but doesn't provide pure python user experience.

For example that would be better:

if __name__ == '__main__':
   StandaloneDataJobFactory.run()

Python frameworks

Django:

Django's settings.py system makes it easy to understand and configure an application. It provides a very structured way of setting up database connections, middleware, installed apps, and many other settings

Flask

Flask uses an object-based configuration, which means that configuration is loaded via regular Python files

Configurations Libraries

This library allows for separation of the configuration parameters from the code, making it easier to manage. It can pull from environment variables or .ini files, providing type casting and defaults.

A configuration management tool for Python applications, supporting formats like TOML, YAML, JSON, and others. It allows for environment-specific settings and layered configurations.

Traitlets is a configuration system for Python applications used in the Jupyter ecosystem.

One of the primary features of Traitlets is that it provides dynamic type checking and offers a mechanism to observe and respond to changes in configuration values. Traitlets-based applications often allow configurations to be defined both via configuration files (typically in Python or JSON format) and via command-line arguments

OmegaConf is a configuration management library for Python that supports structured and hierarchical configurations. It offers features like variable interpolation, merging of multiple configuration sources, and integration with typed data classes

Clone this wiki locally