Skip to content

Create suite of sanity tests across DBs and types for equivalency #39

@chadlwilson

Description

@chadlwilson

Context / Goal

Since the tool is designed to help you validate migration of data across different schema types and even (relational) database implementations, and primarily works based on hashed data representing a row in a dataset, we want to have a way to validate that it is actually working correctly for this purpose.

Since there are potentially differences in the way drivers handle things such as character encodings, number types, timestamp/data types we want to ensure that the hashed data representing one data type in one database is considered hash-identical to that of another. If there is not, there should be a way to ensure they are so using something simple in SQL, or we should probably change our implementation.

Expected Outcome

  • Modify/refactor the MultiDataSourceConnectivityIntegrationTest from Configure integration test tooling to allow testing across DB types #28 to instead of just testing R2DBC connectivity via Micronaut, instead have a suite of simple scenarios that focus on real integration testing scenarios, focused on type differences
    • It is likely that the hashing impl in HashedRow is not going to work correctly. An int with value 10 in one DB will likely not be considered equal to a long with value 10 in another DB, and similar with other types. We will have to make some decisions about how this should work to canonicalize values, and how configurable it needs to be.
    • Should a string of "10" be considered equal to a bigint of 10?
  • Run simple, but real reconciliations that focus on ensuring that hashed values from one DB of a given type are equivalent to that of a different DB

Out of Scope

Additional context / implementation notes

  • At time of writing we are using the Exposed framework in test code to generate schemas for testing with. This may not give us the level of control over data types in the databases that we require, and may need to be re-evaluated.
  • Possibly these tests could run in a matrix style with simple data set queries on each side like SELECT id as MigrationKey, test_type_column FROM testdata, creating a single table with a single test data column per test
    • Dimension 1: DB (mysql, postgres, mssql)
    • Dimension 2: DB Type under test (CHAR/VARCHAR, INT/INTEGER, BIGINT, NUMERIC/DECIMAL/REAL/FLOAT, DATETIME/DATE/TIME/TIMESTAMP etc)
  • Types

Metadata

Metadata

Assignees

Labels

size:Mmedium itemstaskGeneral technical task

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions