Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of unique tests on all Core models #497

Merged
merged 7 commits into from
Jun 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ vars:
## The vars directly below enable all models related to the type of healthcare data being used

clinical_enabled: true
claims_enabled: true
# claims_enabled: true


## The vars directly below enable a single data mart. See the Quickstart
Expand Down
6 changes: 3 additions & 3 deletions integration_tests/docs_generate/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,9 @@ vars:
# Use the vars below to enabled or disable sections of The Tuva Project.

## The vars directly below enable all models related to the type of healthcare data being used
#clinical_enabled: true
#claims_enabled: true
tuva_marts_enabled: true
clinical_enabled: true
claims_enabled: true
#tuva_marts_enabled: true


dispatch:
Expand Down
61 changes: 55 additions & 6 deletions models/core/core_models.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@ models:
- name: condition_id
description: Unique identifier for each condition in the table.
tests:
- unique
- unique:
config:
severity: warn
- name: patient_id
description: Unique identifier for each patient in the dataset.
- name: encounter_id
Expand Down Expand Up @@ -98,6 +100,12 @@ models:
tags: core
materialized: table
columns:
- name: eligibility_id
description: Unique identifier for each eligibility row in the table.
tests:
- unique:
config:
severity: warn
- name: patient_id
description: Unique identifier for each patient in the dataset.
- name: member_id
Expand Down Expand Up @@ -164,6 +172,10 @@ models:
columns:
- name: encounter_id
description: Unique identifier for each encounter.
tests:
- unique:
config:
severity: warn
- name: patient_id
description: Unique identifier for a patient.
- name: encounter_type
Expand Down Expand Up @@ -270,7 +282,11 @@ models:
materialized: table
columns:
- name: lab_result_id
description: Unique identifier for the lab test.
description: Unique identifier for each lab result.
tests:
- unique:
config:
severity: warn
- name: patient_id
description: Unique identifier for each patient.
- name: encounter_id
Expand Down Expand Up @@ -368,6 +384,10 @@ models:
columns:
- name: location_id
description: Unique identifier for each location.
tests:
- unique:
config:
severity: warn
- name: npi
description: >
The national provider identifier associated with the location e.g.
Expand Down Expand Up @@ -401,6 +421,7 @@ models:
`dbt_utils.pretty_time` as the local time of the `dbt run`
environment. Timezone is configurable via the `tuva_last_run` var.


- name: core__medical_claim
description: >
The medical claim table contains information on services rendered to
Expand All @@ -412,10 +433,14 @@ models:
alias: medical_claim
tags: core
materialized: table
tests:
- unique:
column_name: "(claim_id||'-'||claim_line_number)"
columns:
- name: medical_claim_id
description: Unique identifier for each row in the table.
tests:
- unique:
config:
severity: error

- name: claim_id
description: Unique identifier for each claim.
- name: claim_line_number
Expand Down Expand Up @@ -592,6 +617,10 @@ models:
columns:
- name: medication_id
description: Unique identifier for each medication in the table.
tests:
- unique:
config:
severity: warn
- name: patient_id
description: Unique identifier for each patient in the dataset.
- name: encounter_id
Expand Down Expand Up @@ -672,6 +701,10 @@ models:
columns:
- name: observation_id
description: Unique identifier for each observation in the dataset.
tests:
- unique:
config:
severity: warn
- name: patient_id
description: Unique identifier for each patient in the dataset.
- name: encounter_id
Expand Down Expand Up @@ -734,6 +767,10 @@ models:
columns:
- name: patient_id
description: Unique identifier for each person across all datasets.
tests:
- unique:
config:
severity: error
- name: sex
description: The gender of the patient.
meta:
Expand Down Expand Up @@ -793,6 +830,12 @@ models:
tags: core
materialized: table
columns:
- name: pharmacy_claim_id
description: Unique identifier for each row in the table.
tests:
- unique:
config:
severity: error
- name: claim_id
description: Unique identifier for each claim.
- name: claim_line_number
Expand Down Expand Up @@ -874,6 +917,10 @@ models:
columns:
- name: practitioner_id
description: Unique ID for the provider.
tests:
- unique:
config:
severity: warn
- name: npi
description: NPI for the provider.
meta:
Expand Down Expand Up @@ -914,7 +961,9 @@ models:
- name: procedure_id
description: The unique identifier for the performed procedure.
tests:
- unique
- unique:
config:
severity: warn
- name: encounter_id
description: >
The encounter_id for the encounter where this procedure was performed.
Expand Down
27 changes: 26 additions & 1 deletion models/core/staging/core__stg_claims_condition.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI this won't work on athena
cast(unpivot_cte.claim_id||''||unpivot_cte.claim_line_number||''||unpivot_cte.diagnosis_rank||'_'||unpivot_cte.source_code as {{ dbt.type_string() }} ) as condition_id

athena makes you cast claim line number explicitly as varchar

Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ with unpivot_cte as (

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -32,6 +33,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -51,6 +53,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -70,6 +73,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -89,6 +93,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -108,6 +113,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -127,6 +133,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -146,6 +153,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -165,6 +173,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -184,6 +193,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -203,6 +213,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -222,6 +233,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -241,6 +253,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -260,6 +273,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -279,6 +293,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -298,6 +313,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -317,6 +333,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -336,6 +353,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -355,6 +373,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -374,6 +393,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -393,6 +413,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -412,6 +433,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -431,6 +453,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -450,6 +473,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -469,6 +493,7 @@ union all

select
claim_id
, claim_line_number
, patient_id
, coalesce(admission_date
, claim_start_date
Expand All @@ -487,7 +512,7 @@ where diagnosis_code_25 is not null
)

select distinct
cast(unpivot_cte.data_source||'_'||unpivot_cte.claim_id||'_'||unpivot_cte.diagnosis_rank||'_'||unpivot_cte.source_code as {{ dbt.type_string() }} ) as condition_id
cast(unpivot_cte.claim_id||'_'||unpivot_cte.claim_line_number||'_'||unpivot_cte.diagnosis_rank||'_'||unpivot_cte.source_code as {{ dbt.type_string() }} ) as condition_id
, cast(unpivot_cte.patient_id as {{ dbt.type_string() }} ) as patient_id
, cast(coalesce(ap.encounter_id, ed.encounter_id) as {{ dbt.type_string() }} ) as encounter_id
, cast(unpivot_cte.claim_id as {{ dbt.type_string() }} ) as claim_id
Expand Down
4 changes: 3 additions & 1 deletion models/core/staging/core__stg_claims_eligibility.sql
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll have to cast as varchar instead of date for athena

   cast(patient_id as {{ dbt.type_string() }} ) || '-' || cast(enrollment_start_date as date ) || '-' || cast(enrollment_end_date as date )
        || '-' ||  cast(payer as {{ dbt.type_string() }} ) || '-' || cast(plan as {{ dbt.type_string() }} ) as eligibility_id
   , cast(patient_id as {{ dbt.type_string() }} ) as patient_id

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,9 @@


select
cast(patient_id as {{ dbt.type_string() }} ) as patient_id
cast(patient_id as {{ dbt.type_string() }} ) || '-' || cast(enrollment_start_date as {{ dbt.type_string() }} ) || '-' || cast(enrollment_end_date as {{ dbt.type_string() }} )
|| '-' || cast(payer as {{ dbt.type_string() }} ) || '-' || cast(plan as {{ dbt.type_string() }} ) as eligibility_id
, cast(patient_id as {{ dbt.type_string() }} ) as patient_id
, cast(member_id as {{ dbt.type_string() }} ) as member_id
, cast(subscriber_id as {{ dbt.type_string() }} ) as subscriber_id
, cast(birth_date as date) as birth_date
Expand Down
Loading
Loading