# Notebook 00 | Ingestão Bronze

**Objetivo:**  
Ler arquivos CSV brutos do DBFS e gravar como Delta Tables na camada Bronze.


## 1. Configurações

- Definir variáveis de caminho  
- Configurar modo de escrita

In [0]:
# Paths no DBFS
raw_base_path    = "/FileStore/tables"    # onde estão os CSVs brutos
bronze_base_path = "/FileStore/bronze"    # onde vamos gravar os Delta

# Modo de escrita
write_mode = "overwrite"

## 2. Função para limpar nomes de colunas

Remove espaços, parênteses e caracteres problemáticos.


In [0]:
def clean_column_names(df):
    new_cols = [
        col.strip().lower()
                  .replace(" ", "_")
                  .replace("(", "")
                  .replace(")", "")
                  .replace("-", "_")
                  .replace("/", "_")
        for col in df.columns
    ]
    return df.toDF(*new_cols)

## 3. Leitura dos arquivos CSV brutos

In [0]:
# Leitura do CSV de vendas
df_sales = (
    spark.read
         .option("header", True)
         .option("inferSchema", True)
         .csv(f"{raw_base_path}/Sales.csv")
)

# Leitura do CSV de produtos
df_products = (
    spark.read
         .option("header", True)
         .option("inferSchema", True)
         .csv(f"{raw_base_path}/Products.csv")
)

# Leitura do CSV de lojas
df_stores = (
    spark.read
         .option("header", True)
         .option("inferSchema", True)
         .csv(f"{raw_base_path}/Stores.csv")
)

# Leitura do CSV de taxas de câmbio
df_exchange = (
    spark.read
         .option("header", True)
         .option("inferSchema", True)
         .csv(f"{raw_base_path}/Exchange_Rates.csv")
)


## 4. Padronização dos nomes de colunas

In [0]:
df_sales = clean_column_names(df_sales)
df_products = clean_column_names(df_products)
df_stores = clean_column_names(df_stores)
df_exchange = clean_column_names(df_exchange)

## 5. Escrita em Delta (Camada Bronze)


In [0]:
df_sales.write.format("delta").mode(write_mode).save(f"{bronze_base_path}/sales")
df_products.write.format("delta").mode(write_mode).save(f"{bronze_base_path}/products")
df_stores.write.format("delta").mode(write_mode).save(f"{bronze_base_path}/stores")
df_exchange.write.format("delta").mode(write_mode).save(f"{bronze_base_path}/exchange_rates")

## 6. Registro das tabelas no Catálogo SQL (Para facilitar o acesso posterior)


In [0]:
%sql
CREATE DATABASE IF NOT EXISTS bronze;
USE bronze;

CREATE TABLE IF NOT EXISTS sales_bronze
USING DELTA
LOCATION '/FileStore/bronze/sales';

CREATE TABLE IF NOT EXISTS products_bronze
USING DELTA
LOCATION '/FileStore/bronze/products';

CREATE TABLE IF NOT EXISTS stores_bronze
USING DELTA
LOCATION '/FileStore/bronze/stores';

CREATE TABLE IF NOT EXISTS exchange_rates_bronze
USING DELTA
LOCATION '/FileStore/bronze/exchange_rates';

## 7. Validação rápida (visualização das tabelas)

In [0]:
%sql
SELECT * FROM bronze.sales_bronze LIMIT 5;

order_number,line_item,order_date,delivery_date,customerkey,storekey,productkey,quantity,currency_code
366000,1,2016-01-01,,265598,10,1304,1,CAD
366001,1,2016-01-01,2016-01-13,1269051,0,1048,2,USD
366001,2,2016-01-01,2016-01-13,1269051,0,2007,1,USD
366002,1,2016-01-01,2016-01-12,266019,0,1106,7,CAD
366002,2,2016-01-01,2016-01-12,266019,0,373,1,CAD


In [0]:
%sql
SELECT * FROM bronze.products_bronze LIMIT 5;

productkey,product_name,brand,color,unit_cost_usd,unit_price_usd,subcategorykey,subcategory,categorykey,category
1,Contoso 512MB MP3 Player E51 Silver,Contoso,Silver,$6.62,$12.99,101,MP4&MP3,1,Audio
2,Contoso 512MB MP3 Player E51 Blue,Contoso,Blue,$6.62,$12.99,101,MP4&MP3,1,Audio
3,Contoso 1G MP3 Player E100 White,Contoso,White,$7.40,$14.52,101,MP4&MP3,1,Audio
4,Contoso 2G MP3 Player E200 Silver,Contoso,Silver,$11.00,$21.57,101,MP4&MP3,1,Audio
5,Contoso 2G MP3 Player E200 Red,Contoso,Red,$11.00,$21.57,101,MP4&MP3,1,Audio


In [0]:
%sql
SELECT * FROM bronze.stores_bronze

storekey,country,state,square_meters,open_date
1,Australia,Australian Capital Territory,595.0,2008-01-01
2,Australia,Northern Territory,665.0,2008-01-12
3,Australia,South Australia,2000.0,2012-01-07
4,Australia,Tasmania,2000.0,2010-01-01
5,Australia,Victoria,2000.0,2015-12-09
6,Australia,Western Australia,2000.0,2010-01-01
7,Canada,New Brunswick,1105.0,2007-05-07
8,Canada,Newfoundland and Labrador,2105.0,2014-07-02
9,Canada,Northwest Territories,1500.0,2005-03-04
10,Canada,Nunavut,1210.0,2015-04-04


In [0]:
%sql
SELECT * FROM bronze.exchange_rates_bronze

date,currency,exchange
2015-01-01,USD,1.0
2015-01-01,CAD,1.1583
2015-01-01,AUD,1.2214
2015-01-01,EUR,0.8237
2015-01-01,GBP,0.6415
2015-01-02,USD,1.0
2015-01-02,CAD,1.1682
2015-01-02,AUD,1.2323
2015-01-02,EUR,0.8304
2015-01-02,GBP,0.6477


## 7. Próximos Passos

- Utilizar as tabelas registradas na camada Bronze para construir a camada Silver  
- Realizar joins entre tabelas, normalização cambial e cálculo de métricas em USD  
- Tratar nulos e inconsistências e gravar a Silver como Delta Table pronta para agregações  
