-
Notifications
You must be signed in to change notification settings - Fork 3
Cohort Extraction
The cohort extraction module is the primary tool for building patient-level analytic files. It identifies patients matching diagnosis/procedure criteria, applies inclusion/exclusion filters, and exports the resulting claim files.
from medicaid_utils.filters.patients.cohort_extraction import extract_cohort
extract_cohort(
state="AL",
lst_year=[2016, 2017, 2018],
dct_diag_proc_codes=dct_codes,
dct_filters=dct_filters,
lst_types_to_export=["ip", "ot", "ps"],
dct_data_paths=dct_paths,
cms_format="TAF",
)Use ICD-9 and/or ICD-10 codes with inclusion and exclusion logic. Codes are matched using prefix matching — "250" matches "2500", "25000", "25002", etc.
dct_codes = {
"diag_codes": {
"diabetes_t2": {
"incl": {
9: ["250"], # ICD-9 prefix
10: ["E11"], # ICD-10 prefix
},
"excl": {
9: ["25001", "25003", "25011", "25013"], # Odd 5th digits = Type 1
10: ["E10"], # Exclude Type 1
},
},
},
"proc_codes": {},
}Procedure codes are keyed by procedure coding system:
dct_codes = {
"diag_codes": {},
"proc_codes": {
"methadone": {
7: [ # ICD-10-PCS (system code 7)
"HZ81ZZZ", "HZ84ZZZ", "HZ85ZZZ", "HZ86ZZZ",
],
},
},
}Common procedure system codes:
-
1— CPT/HCPCS -
6— ICD-9-CM procedure -
7— ICD-10-PCS
Filters control which claims and patients are included:
dct_filters = {
"cohort": {
"ip": {
"missing_dob": 0, # Exclude missing DOB
"range_numeric_age_prncpl_proc": (18, 64), # Age 18-64
},
"ot": {
"missing_dob": 0,
"range_numeric_age_srvc_bgn": (18, 64),
},
},
"export": {},
}| Type | Example | Description |
|---|---|---|
| Column value | "missing_dob": 0 |
Keep rows where column equals value |
| Numeric range | "range_numeric_age_srvc_bgn": (18, 64) |
Keep rows where column is within range (inclusive) |
| Date range | "range_date_srvc_bgn_date": ("20160101", "20181231") |
Keep rows where date is within range |
| Exclusion | "excl_female": 1 |
Exclude patients with positive exclusion flag |
After extraction, the export folder contains:
-
cohort_{STATE}.csv— patient-level file with condition flags, inclusion indicator, and date of birth -
cohort_{STATE}_{YEAR}.csv— year-specific patient file -
cohort_exclusions_{TYPE}_{STATE}_{YEAR}.parquet— filter statistics - Exported claim files in the requested format (CSV or Parquet)
for state in ["AL", "IL", "CA", "NY", "TX"]:
extract_cohort(
state=state,
lst_year=[2016, 2017, 2018],
dct_diag_proc_codes=dct_codes,
dct_filters=dct_filters,
lst_types_to_export=["ip", "ot", "ps"],
dct_data_paths={
"source_root": "/data/cms/",
"export_folder": f"/output/cohort/{state}/",
},
cms_format="TAF",
)- Common Recipes — More code examples
- Risk Adjustment Algorithms — Apply after cohort extraction
medicaid-utils | Documentation | PyPI | GitHub | MIT License | Research Computing Group, Biostatistics Laboratory, The University of Chicago
Getting Started
User Guide
Recipes & How-Tos
Reference
Links