In [59]:
from pathlib import Path
import duckdb

PROJECT_ROOT = Path.cwd().parent
MART_DIR = PROJECT_ROOT / 'data' / 'mart'

path_drug_year = MART_DIR / 'mart_drug_year_2026-02-01.parquet'
path_prescriber_drug_year = MART_DIR / 'mart_prescriber_drug_year_2026-02-01.parquet'
path_prescriber_year = MART_DIR / 'mart_prescriber_year_2026-02-01.parquet'

con = duckdb.connect()

con.execute(
    f"""
    create view mart_drug_year as
    select *
    from read_parquet('{path_drug_year}')
    """
)

con.execute(
    f"""
    create view mart_prescriber_drug_year as
    select *
    from read_parquet('{path_prescriber_drug_year}')
    """
)

con.execute(
    f"""
    create view mart_prescriber_year as
    select *
    from read_parquet('{path_prescriber_year}')
    """
)

<_duckdb.DuckDBPyConnection at 0x1218fbf30>

1. Mart: mart_drug_year

1.1. Business question
- Which drugs account for the largest share of Medicare Part D spending in 2023?

1.2. Why this matters
- focus price negotiations and rebate strategies on the most impactful drugs;
- make informed formulary policy decisions (which drugs to prioritize or restrict)
- assess how dependent the overall budget is on a reliatively small set of drugs.

1.3 Data and grain
Forr this analysis, I use:
- grain: drug_name, year
- filter: year = 2023

Rationale for this choice:
- the data are already aggregated across all prescribers at the drugs x year level;
- there is no unnecessary prescriber_npi dimension, which is not needed for this analysis;
- it is convenient to calculate each drug's share of total spending relative to the overall budget.

1.4. Metrics
Core metrics:
- total_drug_cost - total spending on the drug in 2023;
- cost_share = total_drug_cost / SUM(total_drug_cost) OVER () - the drug's share of total Part D spending.

Optionsl extensions:
- cumulative share (e.g., for TOP-N drugs).

In [75]:
# -- TOP-10 and TOP-50 of drug spending

con.execute(
    """
        with ranked as (
            select
            generic_name,
            total_drug_cost,
            row_number() over (order by total_drug_cost desc) as rn
            from mart_drug_year
            where year = 2023
        ),
        agg as (
            select
                sum(total_drug_cost) as total_cost,
                sum(case when rn <= 10 then total_drug_cost else 0 end) as top_10_cost,
                sum(case when rn <= 50 then total_drug_cost else 0 end) as top_50_cost
            from ranked
        )
        select
            round(top_10_cost * 100.0 / total_cost, 2) as top_10_share,
            round(top_50_cost * 100.0 / total_cost, 2) as top_50_share,
            round((total_cost - top_50_cost) * 100.0 / total_cost, 2) as tail_share
        from agg;
    """
).df()

Unnamed: 0,top_10_share,top_50_share,tail_share
0,32.45,60.96,39.04


1.5. Results (Key figures)

Based on the mart_drug_year data mart for 2023:
- The top 10 drugs by spending account for approximately **32.45%** of the total budget.
- The top 50 drugs account for approximately **60.96%** of the total budget.
- All remaining drugs (the long tail) account for the remaining **39.04%** of total spending.

These results indicate a high concentration of spending in a relatively small set of drugs.

1.6. Insight

Medicare Part D drug spending is highly concentrated, with a limited set of drugs
accounting for a substantial share of total expenditures.

This follows a classic long-tail pattern:
- a small group of drugs (Top 10 / Top 50) generates the majority of spending;
- a large number of drugs in the long tail collectively contribute a much smaller
  share of the total budget.

From a business perspective, this concentration creates clear leverage points:
- a small set of drugs should be prioritized for pricing, rebate, and formulary
  negotiations;
- monitoring price and utilization trends for these drugs can deliver outsized
  impact on overall cost control;
- effective management of this limited group of high-impact drugs is likely to
  yield the greatest budgetary benefits.

2. Mart: mart_prescriber_drug_year

2.1 Business Question
- How concentrated are drug expenditures across prescribers,
and to what extent do a small number of prescribers drive total drug spending?

2.2 Why This Matters
Understanding prescriber-level cost concentration is important for:
- identifying whether drug spending is driven by a limited number of prescribers
- detecting potential outliers or high-impact prescribing patterns
- supporting targeted cost-control, compliance, and policy interventions

If a small group of prescribers accounts for a disproportionate share of spending,
this creates clear leverage points for managing overall program costs.

2.3. Data and Grain
This mart contains aggregated Medicare Part D data for 2023
at the **prescriber × drug × year** level.

Each row represents total annual metrics for a specific drug
prescribed by a specific prescriber.

Grain:
- prescriber_npi
- drug_name
- year

The grain is fixed: **one row = one prescriber × one drug × one year**.

2.4. Metrics
- total_drug_cost — total annual cost of prescriptions
- total_claims — total number of claims
- total_day_supply — total number of days of therapy supplied

All metrics are aggregated at the prescriber–drug–year level.

In [74]:
con.execute(
    """
        with ranked as (
            select
                generic_name,
                npi,
                total_drug_cost,
                row_number() over (
                    partition by generic_name
                    order by total_drug_cost desc
                ) as rn
            from mart_prescriber_drug_year
            where year = 2023
        ),

        agg as (
            select
                sum(total_drug_cost) as total_cost,
                sum(case when rn <= 10 then total_drug_cost else 0 end) as top_10_cost,
                sum(case when rn <= 50 then total_drug_cost else 0 end) as top_50_cost
            from ranked
        )

        select
            round(total_cost / 1e9, 2) as total_cost_billion,
            round(top_10_cost * 100.0 / total_cost, 2) as top_10_prescriber_pct,
            round(top_50_cost * 100.0 / total_cost, 2) as top_50_prescriber_pct
        from agg;
    """
).df()

Unnamed: 0,total_cost_billion,top_10_prescriber_pct,top_50_prescriber_pct
0,212.69,3.28,8.6


2.5. Results (Key Figures)
- Total prescriber–drug expenditures in 2023 amounted to **$216.99 billion**
- The top 10 prescribers account for **3.28%** of total drug spending
- The top 50 prescribers account for **8.6%** of total drug spending

2.6. Insight
At the system level, drug spending is broadly distributed across prescribers,
with no strong evidence of overall prescriber-level concentration.

However, within individual drugs, spending is often concentrated among a
relatively small number of prescribers. This indicates that prescriber-driven
cost concentration is primarily **drug-specific rather than system-wide**.

As a result, targeted interventions focused on key prescribers for specific
high-impact drugs are likely to be more effective than broad, system-wide
prescriber controls.

3. Mart: mart_drug_year

3.1. Business question
How concentrated are Medicare Part D expenditures across prescribers overall,
and to what extent do a small number of prescribers drive total program spending?

3.2. Why this matters
Prescriber-level concentration analysis helps:
- identify whether overall drug spending is driven by a limited number of prescribers
- detect high-impact prescribers contributing disproportionately to total costs
- support targeted oversight, compliance, and cost-management initiatives

If spending is highly concentrated, system-wide cost control measures may be less effective
than focused interventions aimed at a small group of prescribers.

3.3 Data and grain
This mart contains aggregated Medicare Part D data for 2023
at the **prescriber × year** level.

Each row represents total annual prescribing metrics
for a single prescriber across all drugs.

Grain:
- prescriber_npi
- year

The grain is fixed: **one row = one prescriber × one year**.

3.4. Metrics
- total_drug_cost — total annual cost of all prescriptions written by the prescriber
- total_claims — total number of claims across all drugs
- total_day_supply — total number of days of therapy supplied

All metrics are aggregated at the prescriber-year level.

In [79]:
con.execute(
    """
        with ranked as (
            select
                npi,
                total_drug_cost,
                row_number() over (
                    order by total_drug_cost desc
                ) as rn
            from mart_prescriber_year
            where year = 2023
        ),

        agg as (
            select
                sum(total_drug_cost) as total_cost,
                sum(case when rn <= 10 then total_drug_cost else 0 end) as top_10_cost,
                sum(case when rn <= 50 then total_drug_cost else 0 end) as top_50_cost
            from ranked
        )

        select
            round(total_cost / 1e9, 2) as total_cost_billion,
            round(top_10_cost * 100.0 / total_cost, 2) as top_10_prescriber_pct,
            round(top_50_cost * 100.0 / total_cost, 2) as top_50_prescriber_pct
        from agg;
    """
).df()


Unnamed: 0,total_cost_billion,top_10_prescriber_pct,top_50_prescriber_pct
0,212.69,0.39,1.0


3.5. Results (Key Figures)
- Total Medicare Part D spending across all prescribers in 2023 amounted to **$212.69 billion**
- The top 10 prescribers account for **0.39%** of total spending
- The top 50 prescribers account for **1.0%** of total spending

3.6. Insight
At the system-wide level, Medicare Part D spending is broadly distributed across
prescribers, with no evidence of strong prescriber-level concentration.

Even the highest-spending prescribers account for only a very small share of total
program expenditures, indicating that overall spending is not driven by a limited
set of individual prescribers.

This suggests that broad, system-wide prescriber controls are unlikely to be
effective on their own, and that cost management efforts should instead focus on
drug-level drivers and utilization patterns.

**Cross-Mart Summary: Where Are the Real Cost Drivers?**

Across all three marts, different concentration patterns emerge:

- At the drug level, spending is highly concentrated among a small number of drugs.
- Within individual drugs, expenditures are often driven by a limited number of prescribers.
- At the system-wide prescriber level, concentration is relatively low.

This indicates that Medicare Part D cost drivers are primarily **drug-specific**
rather than **prescriber-wide**.

As a result, cost management strategies are likely to be more effective when
focused on high-cost drugs and their key prescribers, rather than on broad,
system-wide prescriber interventions.

4. Deep dive: top high-cost drugs

4.1. Business question
What differentiates the highest-cost drugs from the rest of the market,
and are high total costs driven primarily by utilization volume
or by higher cost per claim?

4.2. Why this matters
Understanding whether high drug spending is driven by utilization
or by cost per claim is critical for effective cost-control strategies.

If costs are driven mainly by high utilization, volume management and access controls
may be effective. If costs are driven by high cost per claim, pricing,
rebate negotiations, and formulary decisions become more important.

4.3 Data and grain
This analysis uses the 'mart_drug_year' mart at the **drug × year** grain.
Each row represents aggregated annual metrics for a single drug.

Grain:
- drug_name
- year

4.4. Metrics
- total_drug_cost — total annual drug spending
- total_claims — total number of claims
- cost_per_claim — average cost per claim (derived metric)

In [102]:
con.execute(
    """
        with total as (
            select
                sum(total_drug_cost) as total_cost
            from mart_drug_year
            where year = 2023
        ),
        baseline as (
            select
                avg(total_drug_cost / nullif(total_claim_count, 0)) as avg_cost_per_claim
            from mart_drug_year
            where year = 2023
        ),
        top5_raw AS (
            select
                generic_name,
                total_drug_cost,
                total_claim_count
            from mart_drug_year
            where year = 2023
            order by total_drug_cost desc
            limit 5
        ),
        top5_enriched as (
            select
                t.generic_name,
                t.total_drug_cost,
                t.total_claim_count,
                t.total_drug_cost / nullif(t.total_claim_count, 0) as cost_per_claim,
                b.avg_cost_per_claim,
                t.total_drug_cost * 100.0 / nullif(t.total_claim_count, 0) / b.avg_cost_per_claim as cost_vs_market_ratio
            from top5_raw t
            cross join baseline b
        )
        select
            round(sum(total_drug_cost) / 1e9, 2) as top5_cost_billion,
            round(sum(total_drug_cost) * 100.0 / any_value(total_cost), 2) as top5_share_pct,
            round(min(cost_per_claim), 2) as min_cost_per_claim,
            round(max(cost_per_claim), 2) as max_cost_per_claim,
            round(min(cost_vs_market_ratio), 2) as min_cost_vs_market,
            round(max(cost_vs_market_ratio), 2) as max_cost_vs_market
        from top5_enriched, total;
    """
).df()

Unnamed: 0,top5_cost_billion,top5_share_pct,min_cost_per_claim,max_cost_per_claim,min_cost_vs_market,max_cost_vs_market
0,47.27,22.23,862.56,1374.05,16.64,26.5


4.5. Results (Key Figures)
- The top 5 drugs account for **$47.27 billion** in total Medicare Part D spending in 2023.
- These drugs represent **22.23%** of total drug spending.
- Average cost per claim for the top 5 drugs ranges from **$862.56** to **$1374.05**.
- Compared to the market average, cost per claim for these drugs ranges from **16.64% to 26.5% of the market average**.

Overall, the top drugs combine a large share of total spending with relatively low
cost per claim compared to the broader market.

4.6. Insight
Despite accounting for more than one-fifth of total Medicare Part D spending,
the top 5 drugs are not high-cost on a per-claim basis.

Their average cost per claim is substantially below the market average,
indicating that high total spending is driven primarily by very high utilization
rather than elevated prices per prescription.

This suggests that cost containment efforts for these drugs may be more effective
if focused on utilization management rather than pricing or rebate-based interventions.

**Executive Summary**

This project analyzes Medicare Part D spending patterns for 2023 using a
multi-level data mart architecture.

The analysis shows that overall spending is highly concentrated at the drug level,
while prescriber-level concentration is relatively low at the system level.
However, within individual drugs, a small number of prescribers often drive a
disproportionate share of costs.

In addition, a deep dive into the highest-spending drugs reveals that elevated
total costs are primarily driven by very high utilization rather than by high
cost per claim.

Together, these findings suggest that Medicare Part D cost drivers are largely
drug-specific, and that targeted, drug-focused and utilization-based
interventions are likely to be more effective than broad, system-wide
prescriber controls.