# Explore the Microsoft Fabric Lakehouse
**A Lakehouse** presents as a database and is built on top of a **data lake** using **`Delta` format tables**.
- Lakehouses combine the SQL-based analytical capabilities of a **`relational data warehouse`** and the flexibility and scalability of a **`data lake`**. 
- Lakehouses store all data formats and can be used with various analytics tools and programming languages.

<img src="../images/01_Get started with Microsoft Fabric/02/lakehouse-components.png" alt="Lakehouse Components" style="border: 2px solid black; border-radius: 10px;">

## Benefits
- Lakehouses use **`Spark`** and **`SQL engines`** to process large-scale data and support ML or predictive modeling analytics.
- Lakehouse data is organized in a **`schema-on-read`** format, which means you define the schema as needed rather than having a predefined schema.
- Lakehouses support **`ACID`** (Atomicity, Consistency, Isolation, Durability) **transactions** through **Delta Lake formatted tables** for data consistency and integrity.
- Lakehouses are a **`single location`** for data engineers, data scientists, and data analysts to access and use data.

## Microsoft Fabric lakehouses
In Microsoft Fabric, you can 
1. Create a lakehouse in any **premium tier workspace**. 
2. Load data - in any common format - from various sources; including local files, databases, or APIs. 
    - Data ingestion can also be automated using **Data Factory Pipelines** or **Dataflows (Gen2)** in Microsoft Fabric.
      - **Data Factory Pipelines** can be used to orchestrate Spark, Dataflow, and other activities; enabling you to implement complex data transformation processes.
      - **Dataflows (Gen2)** are based on **Power Query** - a familiar tool to data analysts using Excel or Power BI that provides visual representation of transformations as an alternative to traditional programming.
3. Create Fabric **shortcuts** to data in external sources, such as **Azure Data Lake Store Gen2** or a **Microsoft OneLake** location outside of the lakehouse's own storage. 
    - The **Lakehouse Explorer** enables you to browse files, folders, shortcuts, and tables; and view their contents within the Fabric platform.
4. Use **Notebooks** or **Dataflows (Gen2)** to explore and transform it.
5. Query it using SQL, use it to train machine learning models, perform real-time intelligence, or develop reports in Power BI.
6. Apply **data governance policies** to your Lakehouse, such as data classification and access control.

# Work with Microsoft Fabric Lakehouses

## Create and explore a lakehouse
You create and configure a new Lakehouse in the Data Engineering workload. Each lakehouse produces **three named items** in the Fabric-enabled workspace:
- **Lakehouse:** is the lakehouse **`storage` and `metadata`**, where you interact with files, folders, and table data.
- **Semantic model (default):** is an **automatically created `semantic model`** based on the tables in the lakehouse.
  - **Power BI reports** can be built from the semantic model.
- **SQL analytics endpoint:** is a **read-only** SQL analytics endpoint through which you can **_connect_** and **_query_** data with **Transact-SQL**.

<img src="../images/01_Get started with Microsoft Fabric/02/lakehouse-items.png" alt="Three Lakehouse items" style="border: 2px solid black; border-radius: 10px;">

You can work with the data in the lakehouse in two modes:
- **Lakehouse:** enables you to **_add_** and **_interact_** with tables, files, and folders in the Lakehouse.
- **SQL analytics endpoint:** enables you to **use SQL to _query_** the tables in the lakehouse and manage its relational semantic model.

<img src="../images/01_Get started with Microsoft Fabric/02/explorer-modes.png" alt="Two Lakehouse Explorer modes" style="border: 2px solid black; border-radius: 10px;">

## Ingest data into a lakehouse
There are many ways to load data into a Fabric lakehouse, including:
- **Upload:** Upload **local files or folders** to the lakehouse. You can then explore and process the file data, and load the results into tables.
- **Dataflows (Gen2):** Import and transform data from **multiple sources** using **`Power Query Online`**, and load it directly into a table.
- **Notebooks:** Use notebooks in Fabric to ingest and transform data, and load it into tables or files.
- **Data Factory pipelines:** Copy data and orchestrate data processing **activities**, loading the results into tables or files.

## Access data using shortcuts
**Shortcuts:** enable you to **_integrate_** data into your lakehouse while keeping it stored in external storage.
- Shortcuts are useful when you need to source data that's in a **`different storage account`** or even a **`different cloud provider`**. 
  - Within your Lakehouse you can create shortcuts that point to **_different storage accounts_** and other Fabric items like data warehouses, KQL databases, and other Lakehouses.
- Shortcuts can be created in both **Lakehouses** and **KQL databases**, and appear as a folder in the lake. This allows Spark, SQL, Real-Time intelligence and Analysis Services to all utilize shortcuts when querying data.

Source data **permissions and credentials** are all managed by **`OneLake`**. 
- When accessing data through a shortcut to another OneLake location, the **identity** of the calling user will be utilized to **_authorize_** **access** to the data in the target path of the shortcut.
- The user **_must have permissions_** in the target location to read the data.

<img src="https://files.training.databricks.com/images/icon_note_32.png" alt="Note"> For more information on how to use shortcuts, see [OneLake shortcuts documentation](https://learn.microsoft.com/en-us/fabric/onelake/onelake-shortcuts) in the Microsoft Fabric documentation.

# Explore and transform data in a lakehouse

## Explore and Transform
After loading data into the lakehouse, you can use various tools and techniques to explore and transform it, including:
- **Apache Spark:** Each Fabric lakehouse can use **Spark pools** through **`Notebooks`** or **`Spark Job Definitions`** to process data in files and tables in the lakehouse using Scala, PySpark, or Spark SQL.
  - **Notebooks:** **Interactive coding interfaces** in which you can use code to read, transform, and write data directly to the lakehouse as tables and/or files.
  - **Spark job definitions:** On-demand or scheduled scripts that use the **Spark engine to process data** in the lakehouse.
- **SQL analytic endpoint:** To run **Transact-SQL** statements to query, filter, aggregate, and otherwise explore data in lakehouse tables.
- **Dataflows (Gen2):** In addition to using a dataflow to ingest data into the lakehouse, you can **_create_ a dataflow** to perform subsequent transformations through **Power Query**, and optionally land transformed data back to the Lakehouse.
- **Data pipelines:** Orchestrate **complex data transformation logic** that operates on data in the lakehouse through a sequence of **activities** (such as dataflows, Spark jobs, and other control flow logic).

## Analyze and Visualize
The data in your lakehouse tables is included in a **`semantic model`** that defines a **`relational model`** for your data.
- You can edit this semantic model (or create other semantic models), defining custom measures, hierarchies, aggregations, and other elements of a semantic model. 
- You can then use the semantic model as the **source for a `Power BI report`** that enables you to visualize and analyze the data.

By combining the data visualization capabilities of Power BI with the centralized storage and tabular schema of a data lakehouse, you can implement an end-to-end analytics solution on a single platform.