# Lesson 4.8: Data Analysis with PowerBI

### Lesson Duration: 3 hours

> Purpose: The purpose of this lesson is to get familiar with the interface, explore the different features available, understand terminology of PowerBI. We will also start conducting some preliminary data analysis with the tool. There is a workbook to accompany the lesson in this folder. 

---

### Setup

- All previous set up
- PowerBI installed
- access to merged data file created from AB test data, using python, in the last lesson (share copy if necessary)

### Learning Objectives

After this lesson, students will be able to:

- Navigate around the PowerBI interface confidently 
- Differentiate: report, data, model view
- Awareness of possible transformation steps and how to add/ remove / reorder steps 
- When and how to use powerquery features
- Get more data and detect relationships between tables
- How to apply filters to reports
- Edit and add more details to the visualizations
- Work with aggregation types
- Review visualizations to gain insights about the case study

### Lesson 1 key concepts

> :clock10: 20 min

- Discuss different views available in PowerBI
  - Get data `filemergedfile.csv`: recent sources are available from landing page
  - Load stages
  - 3 views (navigate from top left) - report, data, model
  - explore the toolbars and menus 
  - discuss with the class what data tasks could be accomplished in each view 
  - from reports or model view > Transform data 
- Transform Data (power query)
  - Review and discuss the steps that have already been applied by selecting step and clicking on the cog + review the query window 
  - Delete the latest step, revert this delete by closing transformation menu and discarding changes
  - Add a new transformation step - change DateTime column to datetime data type by clicking on the data type icon above the DateTime column. Note the transform step resulted in errors on a large number of rows.
  - This happens because the dates are US locale formatting ie M/D/YY not D/M/YY. Remove this step because it causes errors
  - Click on the data type symbol to see the options again - select the **Using locale...** option
  - Select data type date/time and locale english(united states) to find the data format match
  - After creating the step, rename it appropriately and ensure it is the last step of the transformations (show how to reorder and discuss with class the impact of moving this one to an earlier step- wider concept of **dependencies up / downstream**)
  - Draw attention the query window for that step
  - add one more transformation step - client age to whole number - and rename that step
  - use this opportunity to explore the transformation options and Q&A with class - why to use these options and when ? 
  - class q&a - why is it helpful to rename the transformations? 
  - Close & Apply (and other options) 

# 4.08 Activity 1

- get data `files_for_activities/finalMergedFile.csv` into PowerBI
- Explore the PowerBI report, data and model views.
- add a powerquery transformation step to the **DateTime** column as the instructor has shown you (if not already done in class), for the appropriate locale 
- rename the step for transparency
- add a powerquery transformation step to filter out rows for customers with less than 5 years **Tenure**
- rename the step for transparency
- using **Replace Values** on the transformation ribbon, replace the X gender values with U 
- rename the step for transparency
- review the steps, selecting them in order to see the transformation queries and incremental changes
- close and apply those transformations

### Solution (images)

![activities solution image - tenure](https://education-team-2020.s3.eu-west-1.amazonaws.com/data-analytics/4.8-images/4.8-remove_tenure_4.PNG)

![activities solution image - gender](https://education-team-2020.s3.eu-west-1.amazonaws.com/data-analytics/4.8-images/4.8-replace_values_gender.PNG)

### Lesson 2 key concepts

> :clock10: 20 min


- Get more data sources - demonstrate adding the new source 'df_final_experiment_clients.csv' 
- Auto detect relationships between tables/sources
- navigation - how to show data for each source in Data view by selecting the sources under Fields on the right side
- Model view and properties - cardinality (notes below) 
- create summary table showing the number of clients (count distinct) in each source for class discussion - why are the numbers different? 
- Remove relationship if not needed 

**Cardinality**

This will recall what the students have learnt in SQL. Cardinality of relationships between sources has been detected and can be in the model view by clicking on the relationship and choosing Properties. 
  - the star indicates many - in this case there are many records with that client ID 
  - the 1 indicates 1 record. In this case for each transactional record in the merged table (visitID, date_time) there is only one client ID  

# 4.08 Activity 2

- Continuing with the data source `files_for_activities/finalMergedFile.csv` in PowerBI
- Get more data - as you saw in the lesson, bring in the `df_final_experiment_clients.csv` file
- Explore the model view
- Get more data - adding `df_final_web_data_pt_1.csv` - add a transformation step which retains only the **Confirm** process step 
- Close and apply
- Review the model view - how do the relationships differ between the tables? Has a relationship between the two recently added sources been detected?
- If no relationship exists go into *Manage Relationships* > *New relationship* > *Add relationship* 1 to many from client in `df_final_experiment_clients.csv` and client in `df_final_web_data_pt_1.csv`
- Remove the relationship detected from the previous step between `df_final_experiment_clients.csv` to the `finalMergedFile.csv` but retain the relationship between the two raw data sources 
- Create a summary table showing the total number of clients cross referenced to the total number of clients who reached the confirm step in the experiment (test or control) using count distinct in each case
- Rename the measures for this visual only to make it clearer

![activities solution image 1](https://education-team-2020.s3.eu-west-1.amazonaws.com/data-analytics/4.8-images/4.8-only_confirms.png)

![activities solution image 2](https://education-team-2020.s3.eu-west-1.amazonaws.com/data-analytics/4.8-images/4.8-modelled_relationships.png)

![activities solution image 3](https://education-team-2020.s3.eu-west-1.amazonaws.com/data-analytics/4.8-images/4.8-clients_confirm_table.png)

### Lesson 3 key concepts

> :clock10: 20 min

- Create a area plot ready for filtering as 
	count of distinct client id on primary Y axis 
	count of distinct visit id on secondary Y axis 
	process step on X 
- demonstrate how to sync the axis using the formatting menus of Y and secondary Y and set them to the same limit
- fixing the axis will also make it easier to see the result of the filtering 
- discuss what the plot reveals 
- use the formatting menu to amend the plot colors (Lines>Colors) to be contrasting 
- demonstrate the Filters pane 
- add filter Variation to page as Basic filter
- add second filter `clnt_tenure_year` to page as Advanced filter 
- rename field `clnt_tenure_yr` to Client Tenure (Years) to update the filter appearance
- as an example - select only clients with more than 10 years Tenure. 
- explore the other filtering options in this pane 
- show or hide filter pane 
- update the title of the page to show that filters have been applied 
- class discussion - what insights have we gathered from these plots ? 

**Images from PowerBI**

![basic_filter](https://education-team-2020.s3.eu-west-1.amazonaws.com/data-analytics/4.8-images/4.8-basic_filter.png)

![tenure_filter](https://education-team-2020.s3.eu-west-1.amazonaws.com/data-analytics/4.8-images/4.8-tenure_filter.png)

# 4.08 Activity 3

- Create a tree map plot on a new page of your workbook
- Tree map will display client balance by gender  
- Select balance first then gender to create the view 
- Add a basic filter for variation 
- Add an advanced filter for client age > 80 

### Solution

https://education-team-2020.s3.eu-west-1.amazonaws.com/data-analytics/4.8-images/4.8-basic_treemap.png

### Lesson 4 key concepts

> :clock10: 20 min

- On a new page, create a scatter plot as a way of demonstrating adding detail to visualizations
- X axis - client tenure months
- Y axis - balance 
- Values - Month (from Date hierarchy) 
- Legend - also Month 
- rename columns as appropriate or use the formatting pane to rename the axis 
- ask the class what other changes they want to the view from the formatting pane 
- Do the aggregations look correct ? what do the axis tell us ? 
- change the aggregation of the values on X and Y axes as appropriate (suggest Average) - because we are looking at the totals per month right now 
- add more detail to Values for the plot - Day in place of Month (keep month on Legend to see the color), Number of Accounts on Size, Process Step on Play Axis
- Review visualizations after each change to discuss which insights can be taken about the case study

# 4.08 Activity 4

- Following on from the previous activity 
- Break down the treemap by more detail by adding variation to details 
- Check your variation filter to select both variations 
- Remove variation detail
- Break this plot down by client age (details)
- Change your filter to see the changes in the plot 
- Set the filters as > client age > 80, variation = test
- Add the count distinct of clients to the tooltip 
- Change the aggregation type of balance in the tree map and consider what each plot shows you about these clients

https://education-team-2020.s3.eu-west-1.amazonaws.com/data-analytics/4.8-images/4.8-treemap_activity.png

# Lab | Analytics with PowerBI

Refer to the data source [Marketing_Customer_Analysis.csv](https://github.com/ironhack-labs/lab-analytics-with-powerbi/tree/master/files_for_lab) for this lab

### Instructions 

1. Using Power Query, transform the "Effective_to_date" column to the date format MM/DD/YYYY and convert "Total_claim_amount" to currency with 2 decimal places.
2. Plot a bar chart showing the following aggregations by gender. (Keep same colors for each gender on all charts)
    - Average Monthly Premium 
    - Sum of Total Claim amount 
    - Number of open policies 
3. Using an appropriate chart show which policy type has the largest number of open complaints and filter this by marriage status.
4. Use maps to show which states have generated the most revenue. Highlight number of clients in these states on the tool tip and use response as a filter. 
5. Use a Treemap to show the number of clients in each education level and filter to clients with college degrees
6. Use tables to show the: 
    - Sales Channel by average Lifetime_values 
    - Marital status by number of policies 
7. Now let's analyze these tables further by creating the following crosstabs (Matrix): 
    - Sales Channel by Average lifetime value and number of clients per sales channel
    - Marital status with number policies open by gender 

### Solution

https://github.com/ironhack-edu/data-bootcamp/blob/b2b-ey/01-lesson_plans/04-unit_adv_data_processing_KNN_powerbi/4.08_powerbi_basics_loading_data_gui_features/files_for_lesson_and_activities/data_analytics_with_powerbi_solutions.pbix