In [0]:
# Preflight checks before Power BI export
from pyspark.sql import functions as F

def show(q):
    print("\n" + "-"*88 + "\n" + q.strip() + "\n")
    display(spark.sql(q))

# 1) Coverage & schema sanity (Bronze)
show("DESCRIBE TABLE fx_impact.bronze_ecb_fx_rates")
show("""
SELECT MIN(date) AS min_d, MAX(date) AS max_d, COUNT(*) AS rows
FROM fx_impact.bronze_ecb_fx_rates
""")
show("""
SELECT MIN(period_date) AS min_m, MAX(period_date) AS max_m, COUNT(*) AS rows
FROM fx_impact.bronze_comtrade_imports
""")

# 2) Silver integrity
show("""
SELECT month, cmdCode, COUNT(*) AS c
FROM fx_impact.silver_monthly_fact
GROUP BY month, cmdCode
HAVING COUNT(*) > 1
""")
show("""
SELECT SUM(CASE WHEN fx_missing_flag=1 THEN 1 ELSE 0 END) AS missing_fx_rows,
       COUNT(*) AS total_rows
FROM fx_impact.silver_monthly_fact
""")

# 3) Gold reconciliation
show("""
WITH m AS (
  SELECT month, SUM(import_eur) AS sum_eur
  FROM fx_impact.gold_monthly_metrics
  GROUP BY month
)
SELECT t.month, t.total_import_eur, m.sum_eur,
       (t.total_import_eur - m.sum_eur) AS diff
FROM fx_impact.gold_monthly_totals t
JOIN m USING (month)
WHERE ABS(t.total_import_eur - m.sum_eur) > 1e-6
ORDER BY month
""")
show("""
SELECT month, SUM(share_of_total_eur) AS sum_share
FROM fx_impact.gold_monthly_metrics
GROUP BY month
HAVING ABS(SUM(share_of_total_eur) - 1.0) > 1e-6
""")

# 4) Last complete month vs latest data
show("SELECT last_day(add_months(current_date(), -1)) AS last_complete_month")
show("SELECT MAX(month) AS latest_gold_month FROM fx_impact.gold_monthly_totals")

# 5) COVID tables sanity (optional)
show("SELECT * FROM fx_impact.gold_period_summary")
show("SELECT COUNT(*) AS rows_by_cmd FROM fx_impact.gold_covid_period_kpis_by_cmd")


## How to read results
- Dupes table should be empty.
- missing_fx_rows should be ~0.
- Reconciliation and share checks should return no rows.
- Use last_complete_month to decide if you should export complete months only in your export notebook.

All green. Here’s the read:

* **ECB Bronze**: 2018-01-02 → 2025-08-19, 5,862 rows. Start on the 2nd is normal (holiday). Schema = `DATE` ✅
* **Comtrade Bronze**: 2018-01-01 → 2025-05-01, **445 rows** = 89 months × 5 HS codes ✅
* **Silver**: no duplicates; **missing\_fx\_rows = 0** ✅
* **Gold**: totals reconcile; shares sum to 1; latest gold month = **2025-05-01** (Comtrade only up to May) ✅
* **COVID summary**: During \~-1.49% vs Pre; Post \~+26.89% vs Pre. Looks sensible. ✅
* **By-cmd rows**: 5 commodities as expected ✅

## What this means

You’re clean to move to Power BI. The only nuance is **freshness**: FX is current to Aug-19, but Comtrade stops at **May 2025**. So export through **May 2025** (a complete month) and avoid June/July/Aug in visuals.
