-
Notifications
You must be signed in to change notification settings - Fork 3
MAX vs TAF
medicaid-utils supports both Medicaid file formats published by CMS. Understanding their differences is essential for working with Medicaid claims data.
| Feature | MAX (Medicaid Analytic eXtract) | TAF (T-MSIS Analytic Files) |
|---|---|---|
| Years available | 1999–2015 | 2014–present |
| Diagnosis coding | Primarily ICD-9-CM | Primarily ICD-10-CM |
| File structure | Single flat file per claim type | Multiple sub-files per claim type |
| Beneficiary ID |
BENE_MSIS, BENE_ID, or MSIS_ID
|
BENE_MSIS, BENE_ID, or MSIS_ID
|
| Raw CMS claim types | IP, OT, RX, PS, CC | IP, OT, LT, RX, DE (person summary) |
| Supported in medicaid-utils | IP, OT, PS, CC | IP, OT, LT, RX, PS |
| Diagnosis columns |
DIAG_CD_1 – DIAG_CD_9
|
DGNS_CD_1 – DGNS_CD_12
|
| Procedure columns |
PRCDR_CD_1 – PRCDR_CD_6
|
PRCDR_CD_1 – PRCDR_CD_6, LINE_PRCDR_CD
|
| Date columns |
SRVC_BGN_DT, ADMSN_DT
|
SRVC_BGN_DT, ADMSN_DT
|
MAX — Single DataFrame accessible via .df:
from medicaid_utils.preprocessing import max_ip
ip = max_ip.MAXIP(year=2012, state="WY", data_root="/data/cms")
df = ip.df # Single Dask DataFrameTAF — Multiple sub-file DataFrames in .dct_files:
from medicaid_utils.preprocessing import taf_ip
ip = taf_ip.TAFIP(year=2019, state="AL", data_root="/data/cms")
df_base = ip.dct_files["base"] # Header/base records
df_line = ip.dct_files["line"] # Line-level detail
df_dx = ip.dct_files["base_diag_codes"] # Diagnosis codes
df_ndc = ip.dct_files["line_ndc_codes"] # NDC codesMost functions accept a cms_format parameter:
# MAX (after constructing LST_DIAG_CD from DIAG_CD_* columns)
score(ip.df, lst_diag_col_name="LST_DIAG_CD", cms_format="MAX")
# TAF (after calling ip.gather_bene_level_diag_ndc_codes())
score(ip.dct_files["base_diag_codes"], lst_diag_col_name="LST_DIAG_CD", cms_format="TAF")The extract_cohort function handles format differences internally:
# Just change cms_format — the rest of the API is the same
extract_cohort(state="WY", lst_year=[2012], cms_format="MAX", ...)
extract_cohort(state="AL", lst_year=[2019], cms_format="TAF", ...)Each TAF claim type is split into sub-files:
| Suffix | Description | Dict Key |
|---|---|---|
h (e.g., iph) |
Header/base | "base" |
l (e.g., ipl) |
Line-level detail | "line" |
occr (e.g., ipoccr) |
Occurrence codes | "occurrence_code" |
dx (e.g., ipdx) |
Diagnosis codes | "base_diag_codes" |
ndc (e.g., ipndc) |
NDC codes | "line_ndc_codes" |
- ICD-9 studies (pre-October 2015): Use MAX data
- ICD-10 studies (post-October 2015): Use TAF data
-
Cross-era studies: Use both, with ICD-9 and ICD-10 code mappings in your
dct_diag_proc_codes -
Pharmacy studies: TAF preferred (medicaid-utils implements TAF RX preprocessing via
TAFRX; MAX RX data exists in CMS but is not yet supported in the package)
BENE_MSIS is a composite identifier constructed by medicaid-utils (not a raw CMS column). It applies to both MAX and TAF:
BENE_MSIS = STATE_CD + "-" + HAS_BENE + "-" + (BENE_ID or MSIS_ID)
-
BENE_ID— CMS-assigned, intended to be unique across states and years -
MSIS_ID— State-assigned, unique only within a state and year -
HAS_BENE— 1 ifBENE_IDexists, 0 otherwise (falls back toMSIS_ID)
Example: "AL-1-123456789" (Alabama, has BENE_ID, ID is 123456789)
The index_col parameter on all claim classes accepts any of the three IDs: "BENE_MSIS", "BENE_ID", or "MSIS_ID". The default is "BENE_MSIS".
See Glossary for the complete column name mapping between MAX and TAF.
medicaid-utils | Documentation | PyPI | GitHub | MIT License | Research Computing Group, Biostatistics Laboratory, The University of Chicago
Getting Started
User Guide
Recipes & How-Tos
Reference
Links