# Lesson 4.7: Intro to Business Intelligence tools and PowerBI

### Lesson Duration: 3 hours

> Purpose: The purpose of this lesson is to introduce business intelligence and business intelligence tool _PowerBI_ to the students. The accent is also on emphasizing that business knowledge is critical for business analysts/data analysts. We will also talk about `KPI`s and metrics. Also, we will explain the business case study (`A/B Testing`) for PowerBI.

---

### Setup

- All previous installation
- PowerBI installation (book at least 10-15 min of the class to make sure everyone has it, attendees will have completed an optional install PowerBI lab prior to this lesson) 

### Learning Objectives

After this lesson, students will be able to:

- Explain what is business intelligence/business analysis
- Define and design some `KPI`s and metrics for business success tracking
- Interpret the use and benefits of `A/B Testing` to make more informed product design decisions
- Use PowerBI as a BI Tool to get more insights as a team. That includes:
  - Connect PowerBI with different data sources
  - Perform transformation steps on the data with PowerBI (+ power query)
  - Navigate the PowerBI interface with confidence
  - Explore the different features available in PowerBI and choose them according to the project needs

---

### Lesson 1 key concepts

> :clock10: 20 min

- Introduction to Business Intelligence
- Available Business Intelligence tools
- `KPI`s and metrics

**Description: Intro to BI**

**Business Intelligence** is an application of data analysis that aids companies in making data-driven decisions. Using Business Intelligence technologies, we can provide insights into the performance of a business in the past and the present as well as make predictions or recommendations for the future. A key difference between Business Intelligence and data analysis is those insights are meant to be actionable and geared towards helping a business make decisions.


<br>

**Description: BI Tools**

- Business Intelligence tools are designed to make sense of the huge quantities of data that organizations accumulate over time. The BI tools analyze this information and present it as actionable information that can guide decision making.

- Business Intelligence software makes up a large heterogeneous category of software. Not all tools in the category can be meaningfully compared to each other. There are several types of BI tools of which the most substantive are the _Full-Stack Business Intelligence Tools_ and _Data Visualization Tools_.

  Some examples:
  - Tableau
  - Power BI
  - SAP Business Intelligence
  - Domo
  - IBM cognos analytics
  - Qliksense


<br>


**Description: KPIs and Metrics**

A **metric** is a quantifiable measure that provides us with information about how well a company is doing at achieving its business objectives. Since companies can only improve the things they measure, it is a crucial part of an analyst's job to use the most appropriate metrics. Many companies like to use the term `KPI` (Key Performance Indicator) interchangeably with metrics. An example of a metric is the amount of money spent per sale or the percent of customers that buy a product out of all customers visiting the site.

# 4.07 Activity 1

Think of possible `KPI`s in the context of Ironhack:

- KPIs for the organization?
- KPIs for your class?
- KPIs for you as an individual student?

### Solution / Example:
_`KPI`s for the organization:_

- Number/percentage of students asking for information
- Number/percentage of students enrolled
- Number/percentage of students hired in the 6 months after graduation
- ...

_`KPI`s for your class:_

- Number of delayed labs
- Kahoot points
- Number of katas done

_`KPI`s for you as an individual student:_

- Time to complete lab
- Kahoot points
- Number of katas done
- ...


### Lesson 2 key concepts

> :clock10: 20 min

- Introduction to PowerBI
- Introduction to a business problem - [**Slides**](https://docs.google.com/presentation/d/1bhXVQKmpkPgVUzIVOD8d8Ud7lG1O4wejByoRyTyjKJM/edit?usp=sharing)

**Description: Intro to PowerBI**

- **PowerBI** is one of the most popular and intuitive applications for business intelligence and report sharing. The application is familiar to users of Microsoft Excel, which makes it easy to connect to data sources, create plots, and perform basic analysis tasks to discover insights which can be shared within an organization. In this lesson, we will learn about the different features of PowerBI and how it can be a valuable addition to your analytics toolkit. We will also cover how to load data into PowerBI- and what kinds of data sources can be connected.

<br>

**Description: PowerBI features**
      
 PowerBI has several features that make it a great tool for data analysis. In this section, we will highlight some of the most important ones:

- _Data Transformations_: Using a familiar excel interface end users can create multi source data models and apply transformations, including calculated columns, aggregations, pivoting, even exploiting DAX language to create complex functions. 
- _Variety of Visualizations Types_: PowerBI has several types of visualizations you can use - from bar charts, line charts, and scatter plots to area charts, bubble charts, treemaps. Speaking of maps, it also allows you to visualize your data geographically.
- _Report Sharing_: You can combine several visualizations into reports, add commentary and with PowerBI enterprise tools, create dashboards to share the insights. Everyone with an office 365 account can utilize PowerBI desktop at no additional cost, allowing them to develop and share filterable workbooks with colleagues.  


<br>

**Description: Intro to Business Problem**

- [**Slides**](https://docs.google.com/presentation/d/1bhXVQKmpkPgVUzIVOD8d8Ud7lG1O4wejByoRyTyjKJM/edit?usp=sharing)

### Lesson 3 key concepts

> :clock10: 20 min

- Introduction to `A/B Testing`
- Data preparation for importing into PowerBI 

> [**Slides**](https://docs.google.com/presentation/d/1bhXVQKmpkPgVUzIVOD8d8Ud7lG1O4wejByoRyTyjKJM/edit?usp=sharing)

**Introduction to `A/B Testing`**

**Data preparation for importing into PowerBI**

In [1]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)

In [2]:
data1 = pd.read_csv('./ab_testing_case_study_docs/df_final_demo.txt')
print(data1.shape)

(70609, 9)


In [3]:
data1.head()

Unnamed: 0,client_id,clnt_tenure_yr,clnt_tenure_mnth,clnt_age,gendr,num_accts,bal,calls_6_mnth,logons_6_mnth
0,836976,6.0,73.0,60.5,U,2.0,45105.3,6.0,9.0
1,2304905,7.0,94.0,58.0,U,2.0,110860.3,6.0,9.0
2,1439522,5.0,64.0,32.0,U,2.0,52467.79,6.0,9.0
3,1562045,16.0,198.0,49.0,M,2.0,67454.65,3.0,6.0
4,5126305,12.0,145.0,33.0,F,2.0,103671.75,0.0,3.0


In [4]:
data1['client_id'].nunique()  # To check if all the IDs in the data are unique
data1.isna().sum()
data1[data1['clnt_age'].isna()]
data1 = data1.dropna()
data1.to_csv('df_final_demo.csv') # Exporting data to a csv file

In [5]:
data2a = pd.read_csv('./ab_testing_case_study_docs/df_final_web_data_pt_1.txt')
data2b = pd.read_csv('./ab_testing_case_study_docs/df_final_web_data_pt_2.txt')

In [6]:
data2a.head()

Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time
0,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:27:07
1,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:26:51
2,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:19:22
3,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:19:13
4,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:18:04


In [7]:
data2b.head()

Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time
0,763412,601952081_10457207388,397475557_40440946728_419634,confirm,2017-06-06 08:56:00
1,6019349,442094451_91531546617,154620534_35331068705_522317,confirm,2017-06-01 11:59:27
2,6019349,442094451_91531546617,154620534_35331068705_522317,step_3,2017-06-01 11:58:48
3,6019349,442094451_91531546617,154620534_35331068705_522317,step_2,2017-06-01 11:58:08
4,6019349,442094451_91531546617,154620534_35331068705_522317,step_1,2017-06-01 11:57:58


In [8]:
data2a.isna().sum()

client_id       0
visitor_id      0
visit_id        0
process_step    0
date_time       0
dtype: int64

In [9]:
data2b.isna().sum()

client_id       0
visitor_id      0
visit_id        0
process_step    0
date_time       0
dtype: int64

In [10]:
data2a.to_csv('df_final_web_data_pt_1.csv')
data2b.to_csv('df_final_web_data_pt_2.csv')

In [11]:
data2 = pd.concat([data2a, data2b], axis=0)
data2.to_csv('df_final_web_data.csv')

In [12]:
data2.shape

(755405, 5)

# 4.07 Activity 3

- Think of metrics for our `A/B experiment`.
- Think about how to present visually the results of the experiment.

## Solution

- Total number of customers that confirm the transaction
- Percentage of customers per process step in both variation groups

      - Line plot
      - Bar plot

### Lesson 4 key concepts

> :clock10: 20 min

- Data preparation for PowerBI
- Importing data into PowerBI

**Data Preparation**

- Merging the two files

In [12]:
data1.head()
data2.head()

Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time
0,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:27:07
1,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:26:51
2,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:19:22
3,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:19:13
4,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:18:04


In [13]:
data = data2.merge(data1, left_on='client_id', right_on='client_id')
print(data.shape)
data.head()

(449704, 13)


Unnamed: 0,client_id,visitor_id,visit_id,process_step,date_time,clnt_tenure_yr,clnt_tenure_mnth,clnt_age,gendr,num_accts,bal,calls_6_mnth,logons_6_mnth
0,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:27:07,5.0,64.0,79.0,U,2.0,189023.86,1.0,4.0
1,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:26:51,5.0,64.0,79.0,U,2.0,189023.86,1.0,4.0
2,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:19:22,5.0,64.0,79.0,U,2.0,189023.86,1.0,4.0
3,9988021,580560515_7732621733,781255054_21935453173_531117,step_2,2017-04-17 15:19:13,5.0,64.0,79.0,U,2.0,189023.86,1.0,4.0
4,9988021,580560515_7732621733,781255054_21935453173_531117,step_3,2017-04-17 15:18:04,5.0,64.0,79.0,U,2.0,189023.86,1.0,4.0


Note: The size of the original `data2` merged file was ~700k but this merged file has ~450k. This means that there are a lot of clients in the `data1` file that are not present here.

In [14]:
data3 = pd.read_csv('./ab_testing_case_study_docs/df_final_experiment_clients.txt')
data3.head()

Unnamed: 0,client_id,Variation
0,9988021,Test
1,8320017,Test
2,4033851,Control
3,1982004,Test
4,9294070,Control


In [29]:
data3.shape

(50500, 2)

In [15]:
data3.isna().sum()
data3 = data3.dropna()

In [16]:
# Checking for unique clients
print(data3.shape)
data3['client_id'].nunique()

(50500, 2)


50500

In [17]:
data = data.merge(data3, left_on='client_id', right_on='client_id')
data3.to_csv('df_final_experiment_clients.csv')

In [18]:
# As you can see, again there is information size drop from ~450k to ~320k.
# You can find out who were the clients whose
# information was available on data3 but not in data2.

In [19]:
data['date_time'] = pd.to_datetime(data['date_time'])
data.to_csv('finalMergedFile.csv')
data.shape

(321195, 14)

In [20]:
df = data.copy()  # we will use df to analyze data with python later

In [21]:
df.dtypes

client_id                    int64
visitor_id                  object
visit_id                    object
process_step                object
date_time           datetime64[ns]
clnt_tenure_yr             float64
clnt_tenure_mnth           float64
clnt_age                   float64
gendr                       object
num_accts                  float64
bal                        float64
calls_6_mnth               float64
logons_6_mnth              float64
Variation                   object
dtype: object

Note: We will use the final merged file as the get data source for PowerBI (type : text file).

Steps : 

- launch PowerBI 
- get data 
- common data sources - selecting a single data source (merged csv)
- how to preview and transform column types 
- how to rename columns 
- load data
- how to return to the data query if needed afterwards
- the modes of the user interface : report, data, model 
- fields, visualizations - expand collapse, step by step show the creation of a simple plot (bar or scatter) with X Y legend
- create table by selecting fields and not using the visualizations pane
- filters - apply one filter 

# 4.07 Activity 4

Get data  `files_for_activities/finalMergedFile.csv` into PowerBI.

- Explore the fields and field types. Are they the same as those in your Pandas dataframe?
- Create a simple stacked bar chart with gender on the color legend, test variation on the Y axis, count of visitor id on the X axis. 
- What insight can you gain from the plot? 


# Lab | Getting started with PowerBi

Refer to the `files_for_lab/we_fn_use_c_marketing_customer_value_analysis.csv` dataset.

### Instructions
1. Load the dataset to your PowerBI workbook.
2. Go to Transform Data. Review the data values in fields and column data types. 
3. Change decimal fields customer lifetime value and claim amount to fixed decimals (2 decimal places) by adding a transform step. Bonus - rename the steps as 'reduce decimals <fieldname>' for transparency
4. Review column names to confirm they are correct and clear before saving and loading the changes to your workbook
5. Select the first bar chart in the visualisations pane with Number of Policies and Coverage- to show the number of policies per **Coverage type**. Note the title now matches the purpose of the chart. Rename the tab 'bar chart'
6. In a new report page. Create a table by ignoring the visualisations pane, and selecting Coverage, Number of Policies, Number of Open Complaints. Rename the tab as 'table'
8. Save your powerbi workbook as `unit_4_lab.pbix`. Push to git hub and provide the url to complete the lab

In [26]:
len(data3)

50500

In [27]:
len(data3['client_id'].unique())

50500