[WIP] Dynamically generated versioned schema #711

patelh · 2020-09-14T18:35:47Z

One of the limitations of maha is how we expect to define the schema. This is a primarily due to how we defined the same in the older version of maha which were not open sourced. The schema was predefined and used in runtime. In order to add a new column or derived column, a code change was required. The pro of this is it goes through code review and testing before pushing it to production. The con is every change has to go through the same process regardless of how simple the change may be; e.g. something as simple as adding a new metric column with no specializations. Another limitation of maha is versioning of schema, a request may be validated and queued using a different version of the schema then when a worker picks up the request for processing in the asynchronous processing use case. This could result in failures if schema changes are not backward compatible (e.g. removal of a column).

One approach to solving both of these problems would be to generate schema dynamically on start up. This could introduce a little latency to start up process but reduce burden on adding new changes without having to make code changes. However, we would need to version the changes and support rollback to prior version. In some circumstances, the change may be complex enough to warrant a predefined hook to be invoked on the builder after dynamically constructing the schema.

A schema may be dynamically constructed using these versioned configurations:

Database Schema Dump
Overrides and derived column configurations on top of table definitions
Predefined hook to be invoked on the builder

In order to support versioning and publishing of updates with rollback we'd need to support the following:

Versioned schema dump, overrides and derived column definitions, and predefined hook libs
Ability to define the current version / latest published version (a versioned triple of the 3 configurations above)
Ability to export a version such that it can be imported into a downstream environment (e.g. staging -> production)
Attach validation tests for each versioned export so they can be used for validation after import

In order to support the above, we'd need the following:

Define a data model that supports the above requirements
API to allow CRUD on the data model (dev/staging environment)
API to support export/import/validation workflow (dev/staging/prod environment)

Each environment would have its own independently managed versioned triples. The individual components may be versioned in dev environment and subsequently pushed to staging and prod.

WIP

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Dynamically generated versioned schema #711

[WIP] Dynamically generated versioned schema #711

patelh commented Sep 14, 2020 •

edited

[WIP] Dynamically generated versioned schema #711

[WIP] Dynamically generated versioned schema #711

Comments

patelh commented Sep 14, 2020 • edited

patelh commented Sep 14, 2020 •

edited