-
Notifications
You must be signed in to change notification settings - Fork 3
MAX vs TAF
Manu Murugesan edited this page Mar 13, 2026
·
6 revisions
medicaid-utils supports both Medicaid file formats published by CMS. Understanding their differences is essential for working with Medicaid claims data.
| Feature | MAX (Medicaid Analytic eXtract) | TAF (T-MSIS Analytic Files) |
|---|---|---|
| Years available | 1999–2015 | 2014–present |
| Diagnosis coding | Primarily ICD-9-CM | Primarily ICD-10-CM |
| File structure | Single flat file per claim type | Multiple sub-files per claim type |
| Beneficiary ID | MSIS_ID |
BENE_MSIS (or MSIS_ID) |
| Claim types | IP, OT, PS, CC | IP, OT, LT, RX, DE (person summary) |
| Diagnosis columns |
DIAG_CD_1 – DIAG_CD_9
|
DGNS_CD_1 – DGNS_CD_12
|
| Procedure columns |
PRCDR_CD_1 – PRCDR_CD_6
|
PRCDR_CD_1 – PRCDR_CD_6, LINE_PRCDR_CD
|
| Date columns |
SRVC_BGN_DT, ADMSN_DT
|
SRVC_BGN_DT, ADMSN_DT
|
MAX — Single DataFrame accessible via .df:
from medicaid_utils.preprocessing import max_ip
ip = max_ip.MAXIP(year=2012, state="WY", data_root="/data/cms")
df = ip.df # Single Dask DataFrameTAF — Multiple sub-file DataFrames in .dct_files:
from medicaid_utils.preprocessing import taf_ip
ip = taf_ip.TAFIP(year=2019, state="AL", data_root="/data/cms")
df_base = ip.dct_files["base"] # Header/base records
df_line = ip.dct_files["line"] # Line-level detail
df_dx = ip.dct_files["dx"] # Diagnosis codes
df_ndc = ip.dct_files["ndc"] # NDC codesMost functions accept a cms_format parameter:
# MAX
score(ip.df, lst_diag_col_name="LST_DIAG_CD", cms_format="MAX")
# TAF
score(ip.dct_files["base"], lst_diag_col_name="LST_DIAG_CD", cms_format="TAF")The extract_cohort function handles format differences internally:
# Just change cms_format — the rest of the API is the same
extract_cohort(state="WY", lst_year=[2012], cms_format="MAX", ...)
extract_cohort(state="AL", lst_year=[2019], cms_format="TAF", ...)Each TAF claim type is split into sub-files:
| Suffix | Description | Dict Key |
|---|---|---|
h (e.g., iph) |
Header/base | "base" |
l (e.g., ipl) |
Line-level detail | "line" |
occr (e.g., ipoccr) |
Occurrence codes | "occr" |
dx (e.g., ipdx) |
Diagnosis codes | "dx" |
ndc (e.g., ipndc) |
NDC codes | "ndc" |
- ICD-9 studies (pre-October 2015): Use MAX data
- ICD-10 studies (post-October 2015): Use TAF data
-
Cross-era studies: Use both, with ICD-9 and ICD-10 code mappings in your
dct_diag_proc_codes - Pharmacy studies: TAF only (MAX does not have a dedicated RX file type)
See Glossary for the complete column name mapping between MAX and TAF.
medicaid-utils | Documentation | PyPI | GitHub | MIT License | Research Computing Group, Biostatistics Laboratory, The University of Chicago
Getting Started
User Guide
Recipes & How-Tos
Reference
Links