# Describe the significance of scalable models

## What is enterprise or large-scale data?
Before we talk through scalability, let's define what we're talking about. You'll see throughout the module that we refer to **_enterprise-scale_** or **_large-scale_** data rather than big data. In this module, enterprise-scale or large-scale data refers to tables with a large number of records or rows. Power BI, used with tools like Azure Synapse Analytics, can analyze massive datasets, in the range of trillions of rows or petabytes of data.

If you're familiar with working with enterprise data, it may be helpful to understand that Power BI is the next generation of Analysis Services. It's the same technology under the hood of Analysis Services and Power BI datasets, the [VertiPaq engine](https://learn.microsoft.com/en-us/analysis-services/analysis-services-overview?view=asallproducts-allversions).

## What is scalability and why is it important?
Scalability in this context refers to building data models that can handle growth in the volume of data. A data model that ingests thousands of rows of data may grow to millions of rows over time, and the model must be designed to accommodate such growth. It's important to consider that your data will grow and/or change, which increases complexity.

Scalability must be at the forefront in enterprise solutions to ensure:
- Flexibility - models need to be able to accommodate change
- Data growth - models must be able to handle an increase in data volume with acceptable report performance
- Reduced complexity - models built with scalability in mind will be less complex and easier to manage

## How do I design for scalability?
The best approach to building scalable Power BI data models will always be building with data modeling best practices in mind.

Beyond the data model, [Power BI Premium](https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-what-is) was designed specifically for enterprise deployments. Premium capacity offers greater storage capacity and allows for larger individual datasets depending on the [SKU](https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-what-is#size-considerations). Implementing the premium only large dataset storage feature enables data to grow beyond the Power BI desktop (.pbix) file size limitations.

<img src="https://files.training.databricks.com/images/icon_note_32.png" alt="Note"> Are you planning a Power BI enterprise deployment? Read the [Power BI enterprise deployment whitepaper](https://aka.ms/PBIEnterpriseDeploymentWP) for a full list of enterprise deployment considerations.

Another important consideration in designing for scalability using Power BI Premium is [choosing the right capacity](https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-what-is#capacity-nodes). You'll need to work with your Power BI administrator to determine which Power BI Premium licensing SKU is available to you. If you're having performance issues in Premium capacity, work first to optimize your model, and then work with your Power BI administrator to [monitor Power BI Premium capacities](https://learn.microsoft.com/en-us/power-bi/enterprise/service-admin-premium-monitor-capacity).

At the most basic level, it's important to understand that Premium capacities require sufficient memory for processing. You'll need to double the amount of RAM to process your data model refresh. For example, if you have a 40-GB dataset, you'll need **_at least_** 80-GB of memory available. A 40-GB dataset would be best supported by a P3/A6 capacity, which contains 100-GB of memory.

<img src="https://files.training.databricks.com/images/icon_note_32.png" alt="Note"> Review [Power BI license types and capabilities](https://learn.microsoft.com/en-us/power-bi/enterprise/service-admin-licensing-organization#license-types-and-capabilities). If you're not sure which license type your organization has, check with the Power BI administrator.

# Implement Power BI data modeling best practices

## Choose the correct Power BI model framework
Choosing the correct Power BI model framework is at the heart of building any scalable solution.

The first place to start with your Power BI data model is [import mode](https://learn.microsoft.com/en-us/power-bi/connect-data/service-dataset-modes-understand#import-mode). Import mode offers you the most options, design flexibility, and delivers fast performance.

Use [DirectQuery](https://learn.microsoft.com/en-us/power-bi/connect-data/service-dataset-modes-understand#directquery-mode) when your data source stores large volumes of data and/or your report needs to deliver near real-time data.

Finally, use a [composite model](https://learn.microsoft.com/en-us/power-bi/connect-data/service-dataset-modes-understand#composite-mode) when you need to:
- Boost the query performance of a DirectQuery model.
- Deliver near real-time query results from an import model.
- Extend a Power BI dataset (or Azure Analysis Services model) with other data.

Composite models combine data from more than one DirectQuery source or combine DirectQuery with import data.

<img src="https://files.training.databricks.com/images/icon_note_32.png" alt="Note"> Review the [Choose a Power BI model framework module](https://learn.microsoft.com/en-us/training/modules/choose-power-bi-model-framework/) for more information on using import, DirectQuery, or composite models.

## Implement data modeling best practices
There are some basic principles to abide by when building any data model. These principles become even more important as data begins to grow.

Most importantly, you want to do as much data preparation work as possible **before data reaches Power BI**, as far upstream as possible. For example, if you have the opportunity to transform data in the data warehouse, that's where it should be done. Transformation at the source produces consistency for any other solutions built on that data and ensures that your Power BI model doesn't need to do any extra processing. This may require working with your data engineer or other members of the data team and is critically important.

### Best practices for import mode:
- Always **start with import mode** if you can.
- **Only bring in data you need**.
  - Remove unnecessary rows and columns.
  - Only process what is absolutely necessary (tables/partitions) given the business requirements.
- **Avoid wide tables**.
  - Use a [star schema](https://learn.microsoft.com/en-us/power-bi/guidance/star-schema) in Power BI.
    - If your source is a beautifully modeled data warehouse, you're a step ahead.
    - Big data is often in wide flat tables. Take advantage of dimensional models for their performance benefits.
    - Power BI supports multiple fact tables with different dimensionality and different granularities – you don’t have to put everything into one large table.
- **Pre-aggregate data before loading it to the model** where possible.
- **Reduce the usage of calculated columns**.
  - Data transformations requiring additional columns should be done as close to the source as possible.
- **Avoid high cardinality columns**.
  - Consider breaking a datetime column into two columns, one for date and one for time.
- **Use appropriate data types**.
  - Use integers instead of strings for ID columns.
  - Use surrogate keys for ID columns if necessary.
- **Limit the use of bi-directional filters** on relationships.
- **Disable [auto date/time](https://learn.microsoft.com/en-us/power-bi/guidance/auto-date-time)**.
  - Connect to a date table at the source or create your own date table.
- Disable attribute hierarchies for non-attribute columns.
- If querying a relational database, **query database views rather than tables**.
  - A view provides an abstraction layer to manage columns, and relates back to the first consideration, pushing - transformations as close to the source as possible.
  - Views shouldn't contain logic. They should only contain a SELECT statement from a table.
- **Consider partitioning and incremental refresh** to avoid loading data you don’t need to.
- Check to ensure [query folding](https://learn.microsoft.com/en-us/power-query/power-query-folding) is achieved.
  - If query folding isn't possible, you have another opportunity to work with the data engineer to move transformation upstream.

<img src="https://files.training.databricks.com/images/icon_note_32.png" alt="Note">  Learn more about [techniques to help reduce the data loaded into import models](https://learn.microsoft.com/en-us/power-bi/guidance/import-modeling-data-reduction).

### Best practices specific to DirectQuery mode:
- Set relationships to enforce integrity using the [Assume referential integrity property](https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-assume-referential-integrity) on relationships.
  - The Assume Referential Integrity setting on relationships enables queries to use INNER JOIN statements rather than OUTER JOIN.
- **Limit the use of bi-directional filters** on relationships.
- Use only when necessary.
- Limit the **complexity of DAX calculations**.
  - Because query folding occurs by default in DirectQuery, more complex DAX measures means added complexity at the source, leading to slow queries.
  - The need for complex DAX also leads back to the key principle of applying transformations as far upstream as possible. You may need to work with the data engineer to apply transformations at the source.
- **Avoid the use of calculated columns**.
  - Transformations requiring additional columns should be done as far upstream as possible, particularly when using DirectQuery.
- **Avoid relationships on calculated columns**
- **Avoid relationships on Unique Identifier columns**
- Use **dual storage mode** for dimensions related to fact tables that are in DirectQuery.

<img src="https://files.training.databricks.com/images/icon_note_32.png" alt="Note"> Refer to the [DirectQuery model guidance](https://learn.microsoft.com/en-us/power-bi/guidance/directquery-model-guidance) for a complete list of considerations in developing DirectQuery models.

There's also a tool you can use as you're developing tabular models that will alert you of modeling missteps or changes that would improve model design and performance. The [Best Practice Analyzer within Tabular Editor](https://powerbi.microsoft.com/blog/best-practice-rules-to-improve-your-models-performance/) was designed to help you design models that adhere to modeling best practices.

# Configure large datasets
Power BI datasets store data in a highly compressed, in-memory cache for optimized query performance. Enterprise deployment of an analytics solution using Power BI will likely require [Power BI Premium](https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-what-is). With the **_large dataset storage format_** enabled, dataset sizes are limited only by the capacity size, or a maximum size set by the administrator. This differs from datasets in Power BI Premium, which are limited to 10 GB after compression if large dataset storage format isn't enabled.

Large datasets can be enabled for all Premium P SKUs, Embedded A SKUs, and with Premium Per User (PPU). The large dataset size limit in Premium is comparable to Azure Analysis Services, in terms of data model size limitations.

The large dataset feature brings the Power BI dataset cache sizes to parity with Azure Analysis Services model sizes. The large dataset feature enables consolidation of tabular models from SQL Server Analysis Services and Azure Analysis Services on one common platform based on Power BI Premium.

<img src="https://files.training.databricks.com/images/icon_note_32.png" alt="Note"> To use large dataset storage format, the dataset must be stored in a workspace that allocated to Premium capacity.

Enabling the large dataset format enables fast user interactivity and allows data to grow beyond the 10-GB limit. Additionally, the large dataset format can also improve xmla write operation performance, even for datasets that may not be large.

<img src="https://files.training.databricks.com/images/icon_note_32.png" alt="Note"> Datasets enabled for large models can't be downloaded as a Power BI Desktop (.pbix) file from the Power BI service. Read more about [.pbix download limitations](https://learn.microsoft.com/en-us/power-bi/create-reports/service-export-to-pbix#limitations-when-downloading-a-pbix-from-a-dataset).

## Enable large dataset storage format
To take advantage of the large dataset storage format option, it must be enabled in the Power BI service. Here you can enable large dataset storage format for a single dataset, or for all datasets created in a workspace.

### Enable large dataset storage format for a single dataset
In the dataset settings in the Power BI service, toggle the slider to on and select **Apply**.

<img src="../images/05_Work with semantic models in Microsoft Fabric/01/enable-large-dataset.png" alt="Screenshot of the large dataset storage format option in the dataset settings in the Power B I service, with the toggle in the on position." style="border: 2px solid black; border-radius: 10px;">

### Enable large dataset storage format for all datasets created in a workspace
You can set the default storage format for all datasets created in a workspace in the workspace settings. In the settings, select **_Premium_**, and select **_Large dataset storage format_** as the **_Default storage format_**.

<img src="../images/05_Work with semantic models in Microsoft Fabric/01/default-storage-format.png" alt="Screenshot of the premium workspace settings in the Power B I service, with the cursor over the large dataset storage format option." style="border: 2px solid black; border-radius: 10px;">

Large dataset storage format for a workspace can also be enabled using PowerShell.

<img src="https://files.training.databricks.com/images/icon_note_32.png" alt="Note"> See [Configure large datasets](https://learn.microsoft.com/en-us/power-bi/enterprise/service-premium-large-models) to learn more about large models in Power BI Premium including information on checking dataset size, dataset eviction, considerations, and limitations.