Okay, let's continue with the next lab, which focuses on Analytics and Reporting.

---

## 📘 Section 6: Analytics and Reporting

### 🧪 Lab 6.8: Build a Readmissions Dashboard

**Objective:** Design and build an interactive Power BI dashboard to analyze hospital readmissions, incorporating key performance indicators (KPIs) and visualizations. Implement Row-Level Security (RLS) to restrict data access based on user roles.

**Scenario:** As a BI analyst at HealthForward Inc., you are tasked with creating a dashboard that provides insights into patient readmission patterns. The dashboard needs to be accessible to clinic managers, who should only see data for their respective clinics.

**Assumed Prerequisites:**
* Access to Power BI Desktop and/or the Power BI service within Microsoft Fabric.
* The Gold layer tables created in Lab 4.8 are available in your `HealthDataLH_YourName` Lakehouse:
    * `Fact_Encounter`
    * `Dim_Patient`
    * `Dim_Date`
* For this lab, we'll assume `Dim_Patient` has an `AssignedClinicID` column (e.g., 'ClinicA', 'ClinicB', 'ClinicC'). You may need to add this to your `Dim_Patient` sample data generation and Gold layer notebook if you are running these sequentially.
* We also need a way to identify readmissions. Let's assume a calculated column `DaysToNextAdmission` was added to the `Fact_Encounter` table in the Gold layer (or we will create a DAX measure to approximate this). For simplicity, a readmission is defined as a subsequent hospital admission for the same patient within 30 days of the discharge of a previous admission.

**Adding `AssignedClinicID` to `Dim_Patient` (Conceptual Update for Lab 4.8):**
If you were to update Lab 4.8's `Create_Dim_Patient_Gold_YourName` notebook, you would add `AssignedClinicID` to the `Bronze_Patients_Raw` sample data and propagate it. For example:

In [None]:
# In Bronze_Patients_Raw sample data generation (Lab 4.8)
Row(PatientSourceID="PATID12345", ..., ZipCode="90210", AssignedClinicID="ClinicA", SourceSystem="EHR_SystemA", ...)
Row(PatientSourceID="PATID67890", ..., ZipCode="90211", AssignedClinicID="ClinicB", SourceSystem="EHR_SystemA", ...)
# ...and so on.

# Then, in Dim_Patient creation (Lab 4.8), select it:
# ...
    col("AssignedClinicID"), # Add this
    col("SourceSystem").alias("PatientSourceSystem")
# ...
# Ensure it's in the final select for final_dim_patient_df

**Lab Tasks (UI Steps in Power BI & DAX):**

**1. Connect to Data and Model in Power BI:**
    * **Step 1.1: Open Power BI Desktop or navigate to your Fabric Workspace and create a new Power BI Report.**
    * **Step 1.2: Get Data.**
        * In Power BI Desktop: Click "Get Data".
        * In Fabric Service: Create a new Power BI report. It will prompt you to pick a dataset. If you don't have one, create a new dataset.
        * Select **"Microsoft Fabric"** as the source.
        * Choose **"Lakehouses"**. Sign in if prompted.
        * Select your workspace and the `HealthDataLH_YourName` Lakehouse.
        * Connect to the **SQL Endpoint** of the Lakehouse.
    * **Step 1.3: Select Tables.**
        * In the Navigator window, select the following tables:
            * `Fact_Encounter`
            * `Dim_Patient`
            * `Dim_Date`
        * Click **"Load"**. Choose **"DirectQuery"** mode for optimal performance with Fabric OneLake (though Import mode is also an option if preferred for specific scenarios). DirectLake is preferred if connecting directly via the Lakehouse connector. If using SQL Endpoint, DirectQuery is standard.
    * **Step 1.4: Model View - Relationships.**
        * Go to the "Model" view in Power BI.
        * Verify that relationships are correctly established (or create them):
            * `Fact_Encounter[PatientKey]` to `Dim_Patient[PatientKey]` (Many-to-One, single direction from Dim to Fact)
            * `Fact_Encounter[AdmissionDateKey]` to `Dim_Date[DateKey]` (Many-to-One)
            * `Fact_Encounter[DischargeDateKey]` to `Dim_Date[DateKey]` (Many-to-One, may be inactive if AdmissionDateKey is primary for date slicing, or make it active and handle role-playing dimensions carefully). For simplicity, you might only use the AdmissionDateKey relationship for general date filtering.
    * **Step 1.5: Create Readmission Measures (DAX).**
        * In the "Report" view, right-click on the `Fact_Encounter` table in the "Data" pane and select "New measure".
        * **Measure 1: Count of Admissions**
            ```dax
            Total Admissions = COUNTROWS(Fact_Encounter)
            ```
        * **Measure 2: Identify Readmissions (Simplified)**
            *(This requires `DaysToNextAdmission` on `Fact_Encounter`. If not pre-calculated, this DAX is complex. For this lab, assume `DaysToNextAdmission` IS a column in `Fact_Encounter` which was calculated during the Gold ETL process in Spark, indicating days until the *next* admission for that patient. A value of -1 or NULL could mean no subsequent admission.)*
            ```dax
            Readmissions (within 30 days) =
            CALCULATE(
                COUNTROWS(Fact_Encounter),
                Fact_Encounter[DaysToNextAdmission] >= 0, // Ensure there is a next admission
                Fact_Encounter[DaysToNextAdmission] <= 30
            )
            ```
            *If `DaysToNextAdmission` is not available, a placeholder or simpler logic might be needed for the lab's flow, or a note that this requires prior ETL.*
            *For a true DAX calculation of `DaysToNextAdmission` (if not pre-ETLed), it would look something like this (can be performance intensive on large datasets):*
            ```dax
            // Example of how DaysToNextAdmission could be a CALCULATED COLUMN in Fact_Encounter in Power BI (if not done in ETL)
            // This is better done in ETL (Spark) for performance
            /*
            DaysToNextAdmission (Calculated Column in Fact_Encounter) =
            VAR CurrentPatientKey = Fact_Encounter[PatientKey]
            VAR CurrentDischargeDate = Fact_Encounter[DischargeDate] // Assuming DischargeDate is just the date part
            VAR NextAdmissionDate =
                MINX(
                    FILTER(
                        Fact_Encounter,
                        Fact_Encounter[PatientKey] = CurrentPatientKey &&
                        Fact_Encounter[AdmissionDate] > CurrentDischargeDate // Assuming AdmissionDate is just the date part
                    ),
                    Fact_Encounter[AdmissionDate]
                )
            RETURN
                IF(
                    NOT ISBLANK(NextAdmissionDate),
                    DATEDIFF(CurrentDischargeDate, NextAdmissionDate, DAY),
                    BLANK() // Or a large number/indicator for no subsequent admission
                )
            */
            ```
        * **Measure 3: Readmission Rate**
            ```dax
            Readmission Rate =
            DIVIDE(
                [Readmissions (within 30 days)],
                [Total Admissions], // Or a different denominator like total eligible discharges
                0
            )
            ```
            *Note: The denominator for readmission rate often considers "eligible discharges" rather than all admissions. For simplicity, we use Total Admissions here.*

        * **Measure 4: Average Length of Stay (LOS)**
            ```dax
            Average LOS = AVERAGE(Fact_Encounter[LengthOfStayInDays])
            ```

        * **Measure 5: Average LOS for Readmitted Patients**
            ```dax
            Average LOS (Readmitted) =
            CALCULATE(
                AVERAGE(Fact_Encounter[LengthOfStayInDays]),
                FILTER(
                    Fact_Encounter,
                    Fact_Encounter[DaysToNextAdmission] >= 0 && Fact_Encounter[DaysToNextAdmission] <= 30
                )
            )
            ```

---
**2. Design the Readmissions Dashboard:**
    * **Step 2.1: Add a Title.**
        * Insert a Text Box: "Hospital Readmissions Analysis".
    * **Step 2.2: Add KPI Cards.**
        * **Card 1:** "Readmission Rate" (Use the `Readmission Rate` measure, format as percentage).
        * **Card 2:** "Total Readmissions" (Use the `Readmissions (within 30 days)` measure).
        * **Card 3:** "Average LOS (Readmitted)" (Use the `Average LOS (Readmitted)` measure).
        * **Card 4:** "Total Admissions" (Use the `Total Admissions` measure).
    * **Step 2.3: Add Visualizations.**
        * **Visual 1: Readmission Rate Over Time (Line Chart)**
            * Axis: `Dim_Date[MonthName]` (or `Dim_Date[FullDate]` hierarchy: Year, Month)
            * Values: `Readmission Rate`
        * **Visual 2: Readmissions by Primary Diagnosis (Bar Chart)**
            * Axis: `Fact_Encounter[PrimaryDiagnosisCode]` (Consider creating a `Dim_Diagnosis` for descriptions)
            * Values: `Readmissions (within 30 days)`
        * **Visual 3: Readmissions by Patient Age Group (Column Chart)**
            * Create an Age Group column in `Dim_Patient` (e.g., using DAX for a calculated column or in Power Query):
                ```dax
                // Calculated Column in Dim_Patient
                Patient Age Group =
                SWITCH(
                    TRUE(),
                    Dim_Patient[Age] < 18, "0-17",
                    Dim_Patient[Age] >= 18 && Dim_Patient[Age] <= 44, "18-44",
                    Dim_Patient[Age] >= 45 && Dim_Patient[Age] <= 64, "45-64",
                    Dim_Patient[Age] >= 65 && Dim_Patient[Age] <= 74, "65-74",
                    Dim_Patient[Age] >= 75, "75+",
                    "Unknown"
                )
                ```
            * Axis: `Dim_Patient[Patient Age Group]`
            * Values: `Readmissions (within 30 days)`
        * **Visual 4: Readmissions by Assigned Clinic ID (Pie Chart or Treemap)**
            * Legend/Group: `Dim_Patient[AssignedClinicID]`
            * Values: `Readmissions (within 30 days)`
        * **Visual 5: Table with Patient Details (for drill-through or detailed view - optional for main dashboard)**
            * Columns: `Dim_Patient[FullName]`, `Dim_Patient[PatientMRN]`, `Fact_Encounter[AdmissionDate]`, `Fact_Encounter[DischargeDate]`, `Fact_Encounter[DaysToNextAdmission]`, `Dim_Patient[AssignedClinicID]`
    * **Step 2.4: Add Slicers.**
        * Slicer 1: `Dim_Date[FullDate]` (for date range selection).
        * Slicer 2: `Dim_Patient[AssignedClinicID]` (useful for managers to filter their own clinic before RLS is applied, or for admins).
        * Slicer 3: `Dim_Patient[Patient Age Group]`.

---
**3. Implement Row-Level Security (RLS):**
    * **Step 3.1: Define Roles.**
        * In Power BI Desktop, go to the "Modeling" tab.
        * Click **"Manage roles"**.
    * **Step 3.2: Create "Clinic Manager" Role.**
        * In the "Manage roles" dialog, click **"Create"**.
        * Name the role: `Clinic Manager`.
    * **Step 3.3: Define DAX Filter for the Role.**
        * Select the `Clinic Manager` role.
        * In the "Tables" section, find `Dim_Patient`.
        * In the "Table filter DAX expression" box for `Dim_Patient`, enter the following DAX expression:
            ```dax
            // Assumes the user's clinic ID is available via USERPRINCIPALNAME() or a lookup table.
            // For simplicity, let's assume USERPRINCIPALNAME() is something like 'manager_clinicA@healthforward.com'
            // and we extract 'ClinicA' from it.
            // Or, if using customData(), it can be passed at connection time for some scenarios.

            // A common pattern is to have a separate mapping table: UserEmail | ClinicID
            // For this lab, we'll directly use a simplified logic with USERPRINCIPALNAME() if it matches a clinic ID directly,
            // or a fixed value for testing if UPN trickery is too complex for the lab context.

            // Option 1: Fixed value for testing a specific clinic (replace 'ClinicA' as needed)
            // [AssignedClinicID] = "ClinicA"

            // Option 2: Using USERPRINCIPALNAME() if users are named like 'user@ClinicA.healthforward.com'
            // [AssignedClinicID] = LEFT(RIGHT(USERPRINCIPALNAME(), LEN(USERPRINCIPALNAME()) - FIND("@", USERPRINCIPALNAME())), FIND(".", RIGHT(USERPRINCIPALNAME(), LEN(USERPRINCIPALNAME()) - FIND("@", USERPRINCIPALNAME()))) -1)

            // Option 3: A more robust way is using a bridge table (UserEmail, ClinicID) and filtering Dim_Patient based on that.
            // For this lab, to make testing easier, let's use a direct check against USERPRINCIPALNAME()
            // or assume USERPRINCIPALNAME() is simply the ClinicID for testing purposes.

            // Let's use this for the lab, which means for testing "View as", you'd type "ClinicA", "ClinicB" etc. as the username.
            [AssignedClinicID] = USERPRINCIPALNAME()
            ```
            *Instructor Note: Explain that `USERPRINCIPALNAME()` returns the email address of the logged-in user. The DAX expression needs to map this email to a `ClinicID`. For lab simplicity, we assume `USERPRINCIPALNAME()` itself might directly be a value we are testing against `AssignedClinicID` when using "View as roles". A real-world setup would involve a mapping table or more sophisticated logic to extract relevant attributes from the UPN.*
    * **Step 3.4: Save Roles.** Click "Save".
    * **Step 3.5: Test the Role.**
        * On the "Modeling" tab, click **"View as"**.
        * In the "View as roles" dialog:
            * Check the box for **"Clinic Manager"**.
            * In the "Enter user name to use when viewing (optional)" textbox (this is where it simulates `USERPRINCIPALNAME()`), type `ClinicA` (or another valid ClinicID from your `Dim_Patient` data).
            * Click **"OK"**.
        * The report should now be filtered to show data only for "ClinicA". Test with other clinic IDs.
        * To stop viewing as the role, click "Stop viewing" in the yellow banner at the top.
    * **Step 3.6: Publish to Power BI Service (Fabric Workspace).**
        * Save your Power BI Desktop file (`.pbix`).
        * Publish the report to your Fabric workspace (e.g., `DEV_Analytics_YourName`).
    * **Step 3.7: Configure RLS in Power BI Service.**
        * Open your Fabric workspace.
        * Find the **dataset** associated with your report (not the report itself).
        * Click the ellipsis (...) next to the dataset name and select **"Security"**.
        * You should see the `Clinic Manager` role.
        * Add members (users or security groups) to this role. For example, add `manager_clinicA@healthforward.com` to the role. When this user logs in, their `USERPRINCIPALNAME()` will be `manager_clinicA@healthforward.com`. The DAX rule needs to handle this. If the DAX is `[AssignedClinicID] = USERPRINCIPALNAME()`, then you'd add users to the role whose UPN *is* the clinic ID (e.g., add "ClinicA" as a member if that's what you tested with, though UPNs are typically emails).
        * *For the DAX `[AssignedClinicID] = USERPRINCIPALNAME()` to work directly in service, the users assigned to the "Clinic Manager" role would need their UserPrincipalName to exactly match a value in the `Dim_Patient[AssignedClinicID]` column.* This is often simplified for labs. In reality, a mapping table (e.g. `UserSecurity` with `UserEmail` and `AssignedClinicID`) is joined in the model and used for RLS.

---
**Discussion Questions:**
1.  **What are the benefits of using DirectQuery mode for Power BI reports connected to a Fabric Lakehouse, especially with DirectLake datasets?**
    * **Real-time or Near Real-time Data:** DirectQuery (and especially DirectLake) means Power BI queries the data directly from OneLake each time the report is interacted with. This ensures the report reflects the latest data without needing scheduled refreshes of an imported dataset.
    * **Handles Large Datasets:** There's no need to import potentially huge datasets into Power BI's memory, as the data remains in OneLake. This allows analysis over very large volumes of data.
    * **Single Source of Truth:** Data isn't duplicated. Everyone accesses the same underlying data in OneLake, reducing inconsistencies.
    * **Performance with DirectLake:** DirectLake specifically reads Delta Parquet files directly from OneLake into memory for querying, offering performance comparable to Import mode but with the benefits of DirectQuery for data freshness.

2.  **If clinic managers dispute the readmission numbers, what steps would you take within Power BI and Fabric to trace the data and validate the calculations?**
    * **Review DAX Measures:** Double-check the DAX formulas for `Readmissions (within 30 days)` and `Readmission Rate` for logical errors or misunderstandings of the readmission definition. Use DAX Studio or Power BI's performance analyzer to examine the queries generated.
    * **Validate Input Data in Power BI:** Create a table visual in Power BI displaying the raw `Fact_Encounter` data for a few disputed cases. Include `PatientKey`, `AdmissionDate`, `DischargeDate`, and the `DaysToNextAdmission` column (or the columns used to calculate it). Manually verify if the readmission flag/calculation is correct for these sample cases.
    * **Trace Back to Gold Layer Table in Lakehouse:** Use the Lakehouse SQL endpoint to query the `Fact_Encounter` table directly (e.g., `SELECT * FROM HealthDataLH_YourName.dbo.Fact_Encounter WHERE PatientKey = 'some_disputed_patient_key' ORDER BY AdmissionDate`). Verify the underlying data matches what Power BI is showing and that the `DaysToNextAdmission` column was calculated correctly in the ETL.
    * **Trace Back to Silver/Bronze Layers:** If discrepancies are found in the Gold layer, trace further back to the `Silver_Unified_Encounters` table and then to the Bronze tables (`Bronze_HL7_Encounters_Raw`, `Bronze_FHIR_Encounters_Parsed`) to see how the raw source data was transformed. This involves checking the Spark notebooks or Dataflows used for ETL.
    * **Check Lineage in Purview:** Use Microsoft Purview (if configured) to visualize the data lineage from the source systems through Bronze, Silver, Gold, and into the Power BI dataset to understand transformations and dependencies.
    * **Data Profiling:** Use Fabric tools (e.g., in Notebooks or Dataflows) to profile the source data for anomalies, missing values, or data quality issues that might affect the readmission calculation.
    * **Clarify Definitions:** Reconfirm the exact business definition of "readmission" with stakeholders (e.g., what types of admissions count, exclusion criteria) and ensure the logic aligns.

3.  **Beyond `USERPRINCIPALNAME()`, what other DAX functions or techniques could be used in RLS to handle more complex security scenarios (e.g., user belongs to multiple departments, hierarchical roles)?**
    * **`USERNAME()`:** Similar to `USERPRINCIPALNAME()`, returns the username.
    * **`CUSTOMDATA()`:** Allows passing a custom string value when the user connects (often used with embedded analytics or specific connection configurations). This string can contain role information, department IDs, etc.
    * **Security/Mapping Tables:** The most common and robust approach is to include security tables in your Power BI model.
        * **User-to-Attribute Table:** A table mapping users (e.g., UserEmail) to the attributes they can see (e.g., `UserAccess[UserEmail]`, `UserAccess[AllowedClinicID]`). The RLS rule on `Dim_Patient` would then be `[AssignedClinicID] = LOOKUPVALUE(UserAccess[AllowedClinicID], UserAccess[UserEmail], USERPRINCIPALNAME())` or by establishing a relationship from `Dim_Patient` to `UserAccess` (filtered by UPN) and letting relationship filtering apply.
        * This easily handles users belonging to multiple departments/clinics if the mapping table has multiple rows per user or a list of allowed IDs.
    * **`CONTAINSROW()` or `IN` operator with a table:** If a user can see multiple clinics, the RLS DAX expression might check if the `Dim_Patient[AssignedClinicID]` is present in a list of clinics the current user is allowed to see (derived from the security table).
        * Example: `CONTAINSROW(FILTER(UserAccessTable, UserAccessTable[UserEmail] = USERPRINCIPALNAME()), UserAccessTable[AssignedClinicID], Dim_Patient[AssignedClinicID])`
    * **Path Functions for Hierarchies:** If you have organizational or geographical hierarchies (e.g., Region Manager > Clinic Manager), DAX path functions (`PATH()`, `PATHCONTAINS()`, `PATHITEM()`, etc.) can be used on a dimension table that defines the hierarchy. RLS rules can then filter based on whether a user's position in the hierarchy (or an entity they manage) is within the path of an employee or department.
        * For example, a role on `Dim_Employee` might be `PATHCONTAINS(Dim_Employee[EmployeePath], LOOKUPVALUE(Dim_Employee[EmployeeID], Dim_Employee[Email], USERPRINCIPALNAME()))`.
    * **Role-Playing Dimensions & `USERELATIONSHIP()`:** If security depends on context provided by different relationships, `USERELATIONSHIP()` can be used within `CALCULATE` in the RLS filter to activate specific relationships.

---
*(Continuing to Lab 7.8)*