Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Dynamically generated versioned schema #711

Open
patelh opened this issue Sep 14, 2020 · 0 comments
Open

[WIP] Dynamically generated versioned schema #711

patelh opened this issue Sep 14, 2020 · 0 comments

Comments

@patelh
Copy link
Collaborator

patelh commented Sep 14, 2020

One of the limitations of maha is how we expect to define the schema. This is a primarily due to how we defined the same in the older version of maha which were not open sourced. The schema was predefined and used in runtime. In order to add a new column or derived column, a code change was required. The pro of this is it goes through code review and testing before pushing it to production. The con is every change has to go through the same process regardless of how simple the change may be; e.g. something as simple as adding a new metric column with no specializations. Another limitation of maha is versioning of schema, a request may be validated and queued using a different version of the schema then when a worker picks up the request for processing in the asynchronous processing use case. This could result in failures if schema changes are not backward compatible (e.g. removal of a column).

One approach to solving both of these problems would be to generate schema dynamically on start up. This could introduce a little latency to start up process but reduce burden on adding new changes without having to make code changes. However, we would need to version the changes and support rollback to prior version. In some circumstances, the change may be complex enough to warrant a predefined hook to be invoked on the builder after dynamically constructing the schema.

A schema may be dynamically constructed using these versioned configurations:

  1. Database Schema Dump
  2. Overrides and derived column configurations on top of table definitions
  3. Predefined hook to be invoked on the builder

In order to support versioning and publishing of updates with rollback we'd need to support the following:

  1. Versioned schema dump, overrides and derived column definitions, and predefined hook libs
  2. Ability to define the current version / latest published version (a versioned triple of the 3 configurations above)
  3. Ability to export a version such that it can be imported into a downstream environment (e.g. staging -> production)
  4. Attach validation tests for each versioned export so they can be used for validation after import

In order to support the above, we'd need the following:

  1. Define a data model that supports the above requirements
  2. API to allow CRUD on the data model (dev/staging environment)
  3. API to support export/import/validation workflow (dev/staging/prod environment)

Each environment would have its own independently managed versioned triples. The individual components may be versioned in dev environment and subsequently pushed to staging and prod.

WIP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant