# Conclusion

This MVP successfully delivered a **complete end-to-end data engineering pipeline** in Databricks (Bronze → Silver → Gold) using public, aggregated PIX data from the Central Bank of Brazil.

The project covered the full lifecycle of a data pipeline, including raw data ingestion, standardization and scope enforcement in the Silver layer, analytical modeling through a star schema in the Gold layer, and reproducible data quality validation.

---

## Business Questions Coverage

Out of the **9 proposed business questions**, **7 were successfully addressed** using the available dataset for the 2023–2024 period.

These questions focused on:
- temporal evolution of PIX usage,
- differences between payer and receiver profiles,
- behavioral patterns by age group,
- regional segmentation,
- transaction nature and purpose,
- and regional concentration of PIX activity.

All answers were supported by **query-based analytical evidence** generated from the Gold layer and documented in a dedicated analysis notebook.

---

## Limitations of the Dataset

Two business questions could not be answered within the scope of this MVP:

- **Income level of PIX users**  
  The dataset does not include income information or reliable proxies. Addressing this question would require enrichment with external socioeconomic data sources (e.g., IBGE datasets).

- **Essential vs non-essential expenses**  
  Although transaction nature and purpose are available, the dataset does not provide an official or standardized classification distinguishing essential from non-essential expenses. Answering this question would require an explicit semantic classification layer, documented as an analytical assumption.

These limitations reflect **constraints of the source data**, not shortcomings of the pipeline design or implementation.

---

## Technical Learnings

A key technical takeaway from this MVP was the importance of **clear contracts between pipeline layers**, particularly:

- enforcing analytical scope and data types in the Silver layer,
- modeling facts and dimensions with explicit grain in the Gold layer,
- and validating assumptions through deterministic data quality checks.

This approach reduced ambiguity, simplified analysis, and increased confidence in the analytical outputs.

---

## Possible Extensions

Future iterations of this project could explore:
- automated data quality monitoring,
- richer metadata and lineage documentation,
- enrichment with complementary public datasets to expand analytical scope.

These extensions could be introduced incrementally without changing the core pipeline structure established in this MVP.

---

## Final Note

This MVP prioritizes **correctness, transparency, and reproducibility** over complexity.

All insights are derived from **aggregated monthly data**, not transaction-level records, and should be interpreted accordingly. The project demonstrates how a well-structured data engineering pipeline can support analytical exploration while making its limitations explicit.
