# Platform Architecture Overview

## Data Intelligence and Delta Lake
![Data Intelligence and Delta Lake](./images/Data_Intelligence_and_Delta_Lake.png)

**Delta Lake** is file-based open source storage format
- ACID transaction guarantees
- Scalable data and metadata handling: leveraging Spark to scale out all metadata processing and handling metadata for metabyte scale table
- Audit history and time travel: using transaction logs with details of any transaction
- Schema enforcement and schema evolution: preventing the insertions of data with mismatched schema and allow to explicitly change table schema
- Support for deletes, updates and merges
- Unified streaming and batch data processing

## Data Intelligence and Unity Catalog

![Data Intelligence and Unity Catalog](./images/Data_Intelligence_and_Unity_Catalog.png)

**Unity Catalog:**
1. Unified view of data and AI estate:

![Unity_Catalog_1_Unified_View](./images/Unity_Catalog_1_Unified_View.png)
    
  - **Discover, classify and organize** structure and unstructured data as well as notebooks, ML models, ML features and arbitrary files at 1 place
  - Leverage **data federation** to register and query data from external data sources without ingestion, expanding analysis capabilities
  - Drive better data understanding and faster insights with efficient **tag-based search**

2. Single permission model for data and AI:

![Unity_Catalog_2_Single_Permission_Model](./images/Unity_Catalog_2_Single_Permission_Model.png)
  
  - Secure data estate with a **unified interface** for managing access policies across all data and AI assets
  - Enable **fine-grained access controls** on rows and columns for enhanced security
  - Access data securely from diverse computing platforms using **open interfaces**

3. AI-driven monitoring and reporting

![Unity_Catalog_3_AI_Driven](./images/Unity_Catalog_3_AI_Driven.png)

  - Receive **proactive alerts** for quality issues and errors in data and ML model pipelines
  - Access **real-time data lineage** down to the column level for efficient root cause analysis and error debugging
  - Utilize **auto-generated** dashboards to easily share data and ML quality reports
  - Gain a clear **end-to-end view** on data flow and consumption, ensuring compliance and audit readiness

4. Open data sharing and collaboration

# Data Governance
**Data Governace:** includes the principle practises and tools to manage organization's data assets. It helps: 
  - Understanding who has access to what data
  - Auditing accesses
  - Understanding how data is used in business
  - Aligning with business strategy to ensure data compliance, security, quality and visibility

## Key Elements
**Data Governace** has 8 components

![Data_Governance_Key_Elements](./images/Data_Governance_Key_Elements.png)

## Data Governance Complexity

![Data_Governance_Complex](./images/Data_Governance_Complex.png)
|Issues                                 |Consequences                                     |UC decreases the complexity              |
|---------------------------------------|-------------------------------------------------|-----------------------------------------|
|Fragmented view of the data estate     |Reduced pace of innovation                       |Unified view of the data estate          |
|Multiple tools for access mgmt         |Increased data breach risk, operational expenses |Single permissions model for data and AI |
|Incomplete monitoring & visibility     |Non-compliance risk, reputation harm             |AI powered monitoring & reporting        |
|Lake of cross-platform data sharing    |Costly data sharing, untapped monetization       |Delta Sharing built into the platform    |



## Data Governance With/Without Intelligence

## AI Helps

### Find and Discover Data & AI Assets

![AI_Find_Discover_Data_AI_Assets](./images/AI_Find_Discover_Data_AI_Assets.png)

### Enhance Data Documentation

![AI_Enhance_Data_Documentation](./images/AI_Enhance_Data_Documentation.png)

### Automate Lineage for Workloads

![AI_Automate_Lineage_Workloads](./images/AI_Automate_Lineage_Workloads.png)

### Monitor & Observe

![AI_Monitor_Observe.png](./images/AI_Monitor_Observe.png)


## Delta Sharing 
![Delta_Sharing_Icon.png](./images/Delta_Sharing_Icon.png)
- **Open cross-platform sharing:**
  - Easily share parquet and delta format data, without establish new ingestion processes to consume data
  - Has **native integration** with Power BI, Tableau, Spark, Pandas and Java
- **Share live data without copying it:**
  - Data is maintained on the Provider in Data Lake, ensuring data is reliable in real-time and provides most current data to the Recipient
- **Centralized administration & governance:**
  - Data is governanced, tracked and audited in 1 single place
  - Allow to monitor data at table, partition and version level
- **Marketplace for data products:**
  - Build and package data products through a marketplace for distribution anywhere 
- **Privacy-safe data clean rooms:**
  - Collaboration between data Provider and Recipient is hosted in secured environment
- **Avoid vendor lock-in:** seamless data sharing across clouds, regions and platforms without replication
- **Share more than just data:** notebooks, ML models, dashboards

## Databricks Marketplace

![Marketplace_Provider_Consumer.png](./images/Marketplace_Provider_Consumer.png)
- It is open marketplace for all data, analytics and AI, powered by **Delta Sharing**
- It provides more than data:

![Marketplace_Provide_More_Than_Data.png](./images/Marketplace_Provide_More_Than_Data.png)


## Databricks Clean Room
- Different business units of organization run computation on joined data
- Setup sercured way, data owner can share datasets and collaborators can run on mutually approved computation in a single space
- Backed by Delta Sharing, business units don't need to replicate data into the clean room

![Databricks_Clean_Room.png](./images/Databricks_Clean_Room.png)

# Security, Reliability and Performance