# Layer: Gold (Business)
**Project:** Lean Logistics Data Pipeline\
**Business Domain:** E-commerce (Olist Dataset)\
**Table Name:** `dm_products`

---
## üìë Notebook Information
| Version | Date | Author | Summary of Changes |
| :--- | :--- | :--- | :--- |
| v1.0 | 2026-02-20 | T√°ssia Marchito | Initial creation of Product Dimension (`dm_products`). |
| v1.1 | 2026-02-20 | T√°ssia Marchito | Added column comments, table tags, and enforced PK constraint. |

---
## üéØ Objectives
This notebook creates the Product Dimension by joining the refined products table with their respective category translations.
* **Dimensional Modeling:** Implementing the `dm_` prefix for analytical dimensions.
* **Enrichment:** Joining `tb_products` with `tb_product_category_name_translation` from the Silver layer.
* **Standardization:** Providing English category names as the primary descriptor for business users.
* **Metadata Management:** Added detailed comments to each column for business clarity.
* **Governance:** Applied table-level tags and enforced the Primary Key (PK) on `cd_product_id`.

In [0]:
from pyspark.sql.functions import col, current_timestamp

In [0]:
from pyspark.sql.functions import col, current_timestamp

# 1. Defini√ß√µes de Caminho
source_products = "cat_tm_services_silver.db_logistics.tb_products"
source_translation = "cat_tm_services_silver.db_logistics.tb_product_category_name_translation"
target_table = "cat_tm_services_gold.db_logistics.dm_products"

# 2. Transforma√ß√£o e Cria√ß√£o
df_products = spark.read.table(source_products)
df_translation = spark.read.table(source_translation)

dm_products = df_products.join(df_translation, "nm_product_category_name", "left").select(
    col("cd_product_id"),
    col("nm_product_category_name_english").alias("ds_product_category"),
    col("vl_product_weight_g"),
    col("vl_product_length_cm"),
    col("vl_product_height_cm"),
    col("vl_product_width_cm")
).withColumn("ts_gold_at", current_timestamp())

# Escrevemos a tabela primeiro
dm_products.write.format("delta").mode("overwrite").option("overwriteSchema", "true").saveAsTable(target_table)

# 3. Aplica√ß√£o de Metadados (SQL Direto para maior compatibilidade com UI)
print(f"Applying Governance to {target_table}...")

# TAGS (Tente executar estas tr√™s linhas separadamente se o erro persistir)
# No Databricks moderno, esta √© a sintaxe que preenche a coluna 'Tags' do UI
spark.sql(f"ALTER TABLE {target_table} SET TAGS ('quality' = 'gold', 'domain' = 'logistics', 'type' = 'dimension')")

# COMENT√ÅRIOS DE COLUNA
spark.sql(f"ALTER TABLE {target_table} ALTER COLUMN cd_product_id COMMENT 'Unique identifier for the product (MD5 Hash)'")
spark.sql(f"ALTER TABLE {target_table} ALTER COLUMN ds_product_category COMMENT 'Product category name translated to English'")
spark.sql(f"ALTER TABLE {target_table} ALTER COLUMN vl_product_weight_g COMMENT 'Product weight measured in grams'")
spark.sql(f"ALTER TABLE {target_table} ALTER COLUMN vl_product_length_cm COMMENT 'Product length measured in centimeters'")
spark.sql(f"ALTER TABLE {target_table} ALTER COLUMN vl_product_height_cm COMMENT 'Product height measured in centimeters'")
spark.sql(f"ALTER TABLE {target_table} ALTER COLUMN vl_product_width_cm COMMENT 'Product width measured in centimeters'")
spark.sql(f"ALTER TABLE {target_table} ALTER COLUMN ts_gold_at COMMENT 'Timestamp of Gold layer processing'")

# CONSTRAINTS
spark.sql(f"ALTER TABLE {target_table} ALTER COLUMN cd_product_id SET NOT NULL")
try:
    spark.sql(f"ALTER TABLE {target_table} ADD CONSTRAINT pk_dm_products PRIMARY KEY(cd_product_id) RELY")
except:
    pass

print("‚úÖ Process complete. Please REFRESH your browser to see tags in Catalog Explorer.")