Skip to content

MAX vs TAF

Manu Murugesan edited this page Mar 13, 2026 · 6 revisions

MAX vs TAF: Understanding the Two CMS File Formats

medicaid-utils supports both Medicaid file formats published by CMS. Understanding their differences is essential for working with Medicaid claims data.

Overview

Feature MAX (Medicaid Analytic eXtract) TAF (T-MSIS Analytic Files)
Years available 1999–2015 2014–present
Diagnosis coding Primarily ICD-9-CM Primarily ICD-10-CM
File structure Single flat file per claim type Multiple sub-files per claim type
Beneficiary ID MSIS_ID BENE_MSIS (or MSIS_ID)
Claim types IP, OT, PS, CC IP, OT, LT, RX, DE (person summary)
Diagnosis columns DIAG_CD_1DIAG_CD_9 DGNS_CD_1DGNS_CD_12
Procedure columns PRCDR_CD_1PRCDR_CD_6 PRCDR_CD_1PRCDR_CD_6, LINE_PRCDR_CD
Date columns SRVC_BGN_DT, ADMSN_DT SRVC_BGN_DT, ADMSN_DT

Key Differences in Code

Accessing DataFrames

MAX — Single DataFrame accessible via .df:

from medicaid_utils.preprocessing import max_ip

ip = max_ip.MAXIP(year=2012, state="WY", data_root="/data/cms")
df = ip.df  # Single Dask DataFrame

TAF — Multiple sub-file DataFrames in .dct_files:

from medicaid_utils.preprocessing import taf_ip

ip = taf_ip.TAFIP(year=2019, state="AL", data_root="/data/cms")
df_base = ip.dct_files["base"]   # Header/base records
df_line = ip.dct_files["line"]   # Line-level detail
df_dx   = ip.dct_files["dx"]     # Diagnosis codes
df_ndc  = ip.dct_files["ndc"]    # NDC codes

Specifying Format

Most functions accept a cms_format parameter:

# MAX
score(ip.df, lst_diag_col_name="LST_DIAG_CD", cms_format="MAX")

# TAF
score(ip.dct_files["base"], lst_diag_col_name="LST_DIAG_CD", cms_format="TAF")

Cohort Extraction

The extract_cohort function handles format differences internally:

# Just change cms_format — the rest of the API is the same
extract_cohort(state="WY", lst_year=[2012], cms_format="MAX", ...)
extract_cohort(state="AL", lst_year=[2019], cms_format="TAF", ...)

TAF Sub-File Types

Each TAF claim type is split into sub-files:

Suffix Description Dict Key
h (e.g., iph) Header/base "base"
l (e.g., ipl) Line-level detail "line"
occr (e.g., ipoccr) Occurrence codes "occr"
dx (e.g., ipdx) Diagnosis codes "dx"
ndc (e.g., ipndc) NDC codes "ndc"

Which Format Should I Use?

  • ICD-9 studies (pre-October 2015): Use MAX data
  • ICD-10 studies (post-October 2015): Use TAF data
  • Cross-era studies: Use both, with ICD-9 and ICD-10 code mappings in your dct_diag_proc_codes
  • Pharmacy studies: TAF only (MAX does not have a dedicated RX file type)

Column Name Reference

See Glossary for the complete column name mapping between MAX and TAF.

Clone this wiki locally