Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vdk-impala: unify names of templates betwen trino and impala #787

Merged
merged 1 commit into from
Apr 6, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -86,23 +86,22 @@ def initialize_job(self, context: JobContext) -> None:
)

context.templates.add_template(
"scd2", pathlib.Path(get_job_path("load/dimension/scd2"))
"scd2_simple", pathlib.Path(get_job_path("load/dimension/scd2"))
)

context.templates.add_template(
"load/fact/snapshot", pathlib.Path(get_job_path("load/fact/snapshot"))
)

context.templates.add_template(
"snapshot", pathlib.Path(get_job_path("load/fact/snapshot"))
"periodic_snapshot", pathlib.Path(get_job_path("load/fact/snapshot"))
)

context.templates.add_template(
"load/versioned", pathlib.Path(get_job_path("load/versioned"))
)

context.templates.add_template(
"versioned", pathlib.Path(get_job_path("load/versioned"))
"scd2", pathlib.Path(get_job_path("load/versioned"))
)

@staticmethod
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ In summary, it overwrites the target table with the source data.

### Template Name (template_name):

- "load/dimension/scd1"
- "scd1"

### Template Parameters (template_args):

Expand Down Expand Up @@ -41,3 +41,7 @@ def run(job_input):
job_input.execute_template("load/dimension/scd1", template_args)
# . . .
```

### Example

See full example of how to use the template in [our example documentation](https://github.com/vmware/versatile-data-kit/wiki/SQL-Data-Processing-templates-examples#overwrite-strategy-slowly-changing-dimension-type-1).
Original file line number Diff line number Diff line change
@@ -1,14 +1,16 @@
### Purpose:

Template used to load raw data from a Data Lake to target 'Slowly Changing Dimension Type 2' table in a Data Warehouse.
This is very simple implementation whic overrides rows based on updated timestamp and ID. It's generally not recommended to use.
Prefer to use ["sdc2" template instead](https://github.com/vmware/versatile-data-kit/wiki/SQL-Data-Processing-templates-examples#versioned-strategy--slowly-changing-dimension-type-2)

### Details:

Explanation of SCD type 2 can be seen here: <https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row>

### Template Name (template_name):

- "load/dimension/scd2"
- "scd2_simple"

### Template Parameters (template_args):

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ truncating all present target table records observed after t1.

### Template Name (template_name):

- "load/fact/snapshot"
- "periodic_snapshot"

### Template Parameters (template_args):

Expand Down Expand Up @@ -46,3 +46,7 @@ def run(job_input):
job_input.execute_template('load/fact/snapshot', template_args)
# . . .
```

### Example

See full example of how to use the template in [our example documentation](https://github.com/vmware/versatile-data-kit/wiki/SQL-Data-Processing-templates-examples#append-strategy-periodic-snapshot-fact).
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Explanation of SCD type 2 can be seen here: <https://en.wikipedia.org/wiki/Slowl

### Template Name (template_name):

- "load/versioned"
- "scd2"

### Template Parameters (template_args):

Expand All @@ -18,7 +18,7 @@ Explanation of SCD type 2 can be seen here: <https://en.wikipedia.org/wiki/Slowl
- source_schema - SC Data Lake schema containing the source view.
- source_view - SC Data Lake view where source data is loaded from.
- id_column - Column that holds the natural key of the target table.
- value_columns - A list of columns from the source that are considered errors. Present both in the source and the target tables.
- value_columns - A list of columns from the source that are copied. Present both in the source and the target tables.
- tracked_columns - A sublist of the value columns that are tracked for changes. Present both in the source and the target tables.
- updated_at_column - A column containing the update time of a record. Present in the source table. Optional (default value is "updated_at").
- sk_column - A surrogate key column that is automatically generated in the target table. Optional (default value is "sk").
Expand Down Expand Up @@ -54,3 +54,7 @@ def run(job_input):
job_input.execute_template('load/versioned', template_args)
# . . .
```

### Example

See full example of how to use the template in [our example documentation](https://github.com/vmware/versatile-data-kit/wiki/SQL-Data-Processing-templates-examples#versioned-strategy--slowly-changing-dimension-type-2).