# Visualizing with Power BI

In [2]:
%run "./Includes/Classroom-Setup"

### Querying Data 
This lesson uses the `people-10m` data set, which is in Parquet format.
This data could be coming with Parquet, CSV, JSON and other formats ...

The data is fictitious; in particular, the Social Security numbers are fake.

In [4]:
%fs ls /mnt/training/dataframes/people-10m.parquet


In [5]:
peopleDF = spark.read.parquet("/mnt/training/dataframes/people-10m.parquet")
display(peopleDF)

Take a look at the schema with the printSchema method. This tells you the field name, field type, and whether the column is nullable or not (default is true).

In [7]:
peopleDF.printSchema()

## Reference tables using DirectQuery in Power BI

Now that we have access to the data, we can start building the charts in Power BI. Power BI will allow us to quickly create charts and share them with other users.

## Connect Power BI Desktop to your Databricks cluster

Complete the following steps to get started:

* Download and install [Power BI Desktop](https://powerbi.microsoft.com/desktop/)
* In your Databricks workspace, go to Clusters and select the cluster you want to connect
* On the cluster page, scroll down and select the **JDBC/ODBC** tab, then copy the JDBC URL

![Cluster JDBC URL](https://raw.githubusercontent.com/solliancenet/Databricks-Labs/master/Labs/Lab02/images/databricks-cluster-jdbc-url.png)

* Now, you need to modify the JDBC URL to construct the JDBC server address that you will use to set up your Spark cluster connection in Power BI Desktop
* In the JDBC URL:
  * Replace `jdbc:hive2` with `https`
  * Remove everything in the path between the port number and `sql`, retaining the components indicated by the boxes in the image below

![Parsed Cluster JDBC URL](https://raw.githubusercontent.com/solliancenet/Databricks-Labs/master/Labs/Lab02/images/databricks-cluster-jdbc-url-parsed.png)

  * In our example, the server address would be: `https://eastus2.azuredatabricks.net:443/sql/protocolv1/o/8821428530515879/0515-194607-bails931`
  
* Copy your server address

## Configure and make the connection in Power BI Desktop

* From Power BI Desktop, select the dropdown under **Get Data**, then **More...**. In the Get Data window, select Other, then Spark
![GetData](https://github.com/Microsoft/MCW-Big-data-and-visualization/raw/master/Hands-on%20lab/media/image178.png)
* Select Continue on the Preview connector dialog

![Get Data](https://raw.githubusercontent.com/solliancenet/Databricks-Labs/master/Labs/Lab02/images/power-bi-get-data-dialog.png)

* Enter the Databricks server address you created above into the Server field
* Set the protocl to HTTP
* Select DirectQuery as the data connectivity mode, then select OK

![Power BI Desktop Spark Connection](https://raw.githubusercontent.com/solliancenet/Databricks-Labs/master/Labs/Lab02/images/power-bi-spark-connection.png)

* Before you can enter credentials on the next screen, you need to create an Access token in Databricks
* In your Databricks workspace, select the Account icon in the top right corner, then select User settings from the menu.
  
![Account menu](https://raw.githubusercontent.com/solliancenet/Databricks-Labs/master/Labs/Lab02/images/databricks-user-menu.png)

* On the User Settings page, select Generate New Token, enter "Power BI Desktop" in the comment, and select Generate
  
![Account menu](https://raw.githubusercontent.com/solliancenet/Databricks-Labs/master/Labs/Lab02/images/databricks-generate-token.png)

* Copy the generated token, and save it as you will need it more than once below. **NOTE**: You will not be able to access the token in Databricks once you close the Generate token dialog, so be sure to save this value to a text editor or another location you can access during this lab.

* Back in Power BI Desktop, enter "token" for the user name, and paste the access token you copied from Databricks into the password field.

![Power BI Desktop Spark Connection Login](https://raw.githubusercontent.com/solliancenet/Databricks-Labs/master/Labs/Lab02/images/power-bi-spark-connection-login.png)

* After authenticating, continue to the next step and check the box next to the `people10m` table, then select Load.

![Get Data](https://databricksdemostore.blob.core.windows.net/images/07/PowerBINavigator.png)

## Create the Power BI Report

Once the data finishes loading, you will see the fields appear on the far side of the Power BI Desktop client window

![Fields](https://databricksdemostore.blob.core.windows.net/images/07/PowerBI-Fields.png)

From the Visualizations area, next to Fields, select the Stacked Column Chart icon to add it to the report design surface

![Globe](https://databricksdemostore.blob.core.windows.net/images/07/PowerBI-Visualization.png)

With the stacked Column Chart still selected, drag the 'gender' field to the Axis field under Visualizations. Then Next, drag the salary field to the Value field under Visualizations.

![numdays](https://databricksdemostore.blob.core.windows.net/images/07/PowerBI-stackedColumn.png)

You should now see a stacked Column Chart that looks similar to the following (resize and zoom on your map if necessary):

![map](https://databricksdemostore.blob.core.windows.net/images/07/PowerBI-Graph1.png)

Unselect the Stacked Column Chart visualization by selecting the white space next to the chart in the report area

From the Visualizations area, select the Pie icon to add a Pie chart visual to the report's design surface

![stackedColumn](https://databricksdemostore.blob.core.windows.net/images/07/PowerBI-Pie.png)

With the Pie Chart still selected, drag the gender field and drop it into the details field located under Visualizations

Next, drag the salary field over, and drop it into the Values field

![dragnumdelays](https://databricksdemostore.blob.core.windows.net/images/07/PowerBI-Graph2.png)

You should see a pie chart that looks similar to the following

![pie](https://databricksdemostore.blob.core.windows.net/images/07/PowerBI-Pie2.png)

You can save the report, by choosing Save from the File menu, and entering a name and location for the file

![savereport](https://github.com/Microsoft/MCW-Big-data-and-visualization/raw/master/Hands-on%20lab/media/image197.png)

## Conclusion

In this lab, you have learned how to:

* Set up a DirectQuery connection in Power BI to Spark
* Created visualizations in Power BI directly from the Spark data

## Next Steps

Start the next lesson, [Matplotlib]($./04-Matplotlib).