# [2021 Week 14 | Tableau: Can You Recommend Profitable Return Customer Bundles?](http://www.workout-wednesday.com/2021w14tab/)

>**Table of contents:**
>
>&ensp;&ensp;[Introduction](#Introduction) <br>
>&ensp;&ensp;[Instructions](#Instructions) <br>
>&ensp;&ensp;[Workings](#Workings) <br>
>&ensp;&ensp;[Results](#Results)


### Introduction
The task is to visualize some popular product sub-category combos that the customers purchased on their second order. That is, we focus on their second order only. <br>
Particularly, we want to create the visuals in the form of bar chart and treemap, and the users can choose between the chart type.
    
The [dataset](https://data.world/cmack624/superstore-20204) used is the superstore dataset for Tableau 2021.4.

For reference, the solutions of the bar chart and treemap provided in the [challenge's page](http://www.workout-wednesday.com/2021w14tab/) are shown in Figure 1 and Figure 2 below, respectively.

![bar_chart](https://drive.google.com/uc?export=view&id=1OvQNqWZzPBWpknABH_Qv8PNdQzYZpVsi)
<p style="text-align: center;">Figure 1: The provided bar chart solution</p>

![tree_map](https://drive.google.com/uc?export=view&id=1pujwPe_JEup8WDutukLeZo8lv8tYL6hp)
<p style="text-align: center;">Figure 2: The provided treemap solution</p>

### Instructions
- Extract the second order of each customer from the purchase history given in the dataset.

- Include a list of items for the user to select.

- Compute the of order counts, net sales, profit and profit percentage of the items that were bought together with the item selected by the user.

- Include the values computed in the hover text.

- Include a toggle for the user to switch between bar chart and treemap.

- Allow the user to distinguish whether the items bought together are profitable using different colours.

### Workings

In [1]:
import pandas as pd
import warnings
warnings.filterwarnings("ignore")

In [3]:
csv_path = 'https://raw.githubusercontent.com/ywjet/Data-Visualization/main/Data/2021%20Week%2014_Tableau_Can%20You%20Recommend%20Profitable%20Return%20Customer%20Bundles.csv'

df = pd.read_csv(csv_path, usecols=['Order Date','Customer ID','Sub-Category','Sales','Quantity','Profit'])
df.head()

Unnamed: 0,Order Date,Customer ID,Sub-Category,Sales,Quantity,Profit
0,11/8/2019,CG-12520,Bookcases,261.96,2,41.9136
1,11/8/2019,CG-12520,Chairs,731.94,3,219.582
2,6/12/2019,DV-13045,Labels,14.62,2,6.8714
3,10/11/2018,SO-20335,Tables,957.5775,5,-383.031
4,10/11/2018,SO-20335,Storage,22.368,2,2.5164


In [4]:
# change the order date column to datetime
df['Order Date'] = pd.to_datetime(df['Order Date'], dayfirst=False)

# check if any missing value
for col in df.columns:
    print(col+':', df[col].isnull().any().sum())

df.head()

Order Date: 0
Customer ID: 0
Sub-Category: 0
Sales: 0
Quantity: 0
Profit: 0


Unnamed: 0,Order Date,Customer ID,Sub-Category,Sales,Quantity,Profit
0,2019-11-08,CG-12520,Bookcases,261.96,2,41.9136
1,2019-11-08,CG-12520,Chairs,731.94,3,219.582
2,2019-06-12,DV-13045,Labels,14.62,2,6.8714
3,2018-10-11,SO-20335,Tables,957.5775,5,-383.031
4,2018-10-11,SO-20335,Storage,22.368,2,2.5164


In [5]:
# get the first order date of each customer 
t = pd.DataFrame(df.groupby(['Customer ID'])['Order Date'].min())

dff = df.copy()
# label the first order of each customer
dff['First Order'] = 0
for customer in t.index:
    first = dff[(dff['Customer ID'] == customer) & (dff['Order Date'] == t['Order Date'][customer])].index.tolist()
    for index in first:
        dff['First Order'][index] = 1

# drop the first order
dff.drop(dff[dff['First Order'] == 1].index, inplace=True)

# look for the second order
dff.rename(columns={'First Order': 'Second Order'}, inplace=True)

# get the second order date of each customer
t = pd.DataFrame(dff.groupby(['Customer ID'])['Order Date'].min())

# label the second order of each customer
for customer in t.index:
    second = dff[(dff['Customer ID'] == customer) & (dff['Order Date'] == t['Order Date'][customer])].index.tolist()
    for index in second:
        dff['Second Order'][index] = 1

# we need only the second order
second_order = dff[dff['Second Order'] == 1]
second_order.head()

Unnamed: 0,Order Date,Customer ID,Sub-Category,Sales,Quantity,Profit,Second Order
0,2019-11-08,CG-12520,Bookcases,261.96,2,41.9136,1
1,2019-11-08,CG-12520,Chairs,731.94,3,219.582,1
35,2019-12-08,GH-14485,Phones,1097.544,7,123.4737,1
36,2019-12-08,GH-14485,Furnishings,190.92,5,-147.963,1
46,2017-10-20,PO-18865,Storage,211.96,4,8.4784,1


In [6]:
# return the item quantity that the customers purchased during their second order
baskets = second_order.groupby(['Customer ID','Sub-Category'])['Quantity'].sum()
baskets = baskets.unstack().reset_index().fillna(0).set_index('Customer ID')
baskets.head()

Sub-Category,Accessories,Appliances,Art,Binders,Bookcases,Chairs,Copiers,Envelopes,Fasteners,Furnishings,Labels,Machines,Paper,Phones,Storage,Supplies,Tables
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
AA-10315,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0
AA-10375,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
AA-10480,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,12.0,0.0,4.0,0.0,0.0
AA-10645,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,0.0,0.0,0.0,2.0,0.0,0.0,0.0
AB-10015,0.0,0.0,3.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,0.0


We only interested in whether the customer purchased the products (made an order), instead of the quantity that they were buying. <br>

If the customer made a purchase (`Quantity > 0`), the `order_counts` function below changes the `Quantity` into `1`. <br>
This indicates the customer made an order and this is all we need. <br>
Eventually, we'll include the number of order in the dashboard.

In [7]:
def order_counts(x):
    if x > 0:
        return 1
    else:
        return 0
    
counts = baskets.applymap(order_counts)
counts.head()

Sub-Category,Accessories,Appliances,Art,Binders,Bookcases,Chairs,Copiers,Envelopes,Fasteners,Furnishings,Labels,Machines,Paper,Phones,Storage,Supplies,Tables
Customer ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
AA-10315,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0
AA-10375,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
AA-10480,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0
AA-10645,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0
AB-10015,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0


In [8]:
# function to prepare the data for visualization
def tailor_data(item):
    orders = counts.loc[counts[counts[item] > 0].index]
    orders = orders.drop(columns=item)

    sales = second_order.groupby(['Customer ID','Sub-Category'])['Sales'].sum()
    sales = sales.unstack().reset_index().fillna(0).set_index('Customer ID')
    sales = sales.loc[counts[counts[item] > 0].index]
    sales = sales.drop(columns=item)

    profits = second_order.groupby(['Customer ID','Sub-Category'])['Profit'].sum()
    profits = profits.unstack().reset_index().fillna(0).set_index('Customer ID')
    profits = profits.loc[counts[counts[item] > 0].index]
    profits = profits.drop(columns=item)

    data = pd.DataFrame({'Orders': orders.sum(),
                         'Net Sales': round(sales.sum()),
                         'Profit': round(profits.sum()),
                         'Profit %': round(profits.sum() / sales.sum() * 100).dropna()
                        }).sort_values('Orders', ascending=False)
    data.drop(data[data['Orders'] == 0].index, inplace=True)
    data = data.astype(int)
    data['Profitable'] = ['Profitable' if profit > 0 else 'Unprofitable' for profit in data['Profit']]
    
    return data

The following parts define the functions to create the bar chart and treemap.

In [9]:
import plotly
import plotly.express as px
import plotly.graph_objects as go

In [10]:
# hover text
part1 = "There were <b>%{value}</b> orders where the customer ordered <b>"
part2 = """
</b> also placed an order for <b>%{label}</b> on their second purchase with Superstore Inc. <br>
<br> Net Sales: $%{customdata[0]}
<br> Profit: $%{customdata[1]}
<br> Profit: %{customdata[2]}%
<extra></extra>
"""

def bar_chart(item):
    data = tailor_data(item)
    
    fig = px.bar(data,
                 x='Orders', y=data.index,
                 orientation='h', text='Orders',
                 custom_data=['Net Sales', 'Profit', 'Profit %'],
                 color='Profitable',
                 color_discrete_map={'Unprofitable':'sandybrown',
                                     'Profitable':'lightgray'}
                )

    fig.update_traces(
        hovertemplate=part1 + item + part2,
        
        textposition='outside'
    )

    fig.update_layout(
        hoverlabel=dict(
            bgcolor='white'
        ),
        
        legend=dict(
            orientation="h",
            yanchor="bottom", y=1,
            xanchor="left", x=0,
            title=None
        ),
        
        yaxis={'categoryorder':'total ascending', 'title':None},
        
        xaxis={'visible':False},
        
        plot_bgcolor='white'
    )
    
    return fig

def tree_map(item):
    data = tailor_data(item)
    data = data.reset_index()
    
    title='On a second purchase, when customers purchased <b>{}</b>, what else did they buy?'
    title += '<br><b style="font-size:70%;color:lightgray;">Profitable</b> | '
    title += '<b style="font-size:70%;color:sandybrown;">Unprofitable</b>'
    
    fig = px.treemap(data,
                     path=['Sub-Category'],
                     custom_data=['Net Sales', 'Profit', 'Profit %'],
                     values='Orders',
                     color='Profitable',
                     color_discrete_map={'Unprofitable':'sandybrown',
                                         'Profitable':'lightgray'},
                     title=title.format(item)
                    )
    
    fig.data[0].textinfo='label+value'
    
    hovertext=part1 + item + part2
    
    fig.update_traces(
        hovertemplate=hovertext
    )
    
    fig.update_layout(
        hoverlabel=dict(
            bgcolor='white'
        )
    )
    
    return fig

The following parts create the dash app for visualization.

In [11]:
import dash
import dash_core_components as dcc
import dash_html_components as html
import dash_bootstrap_components as dbc
from dash.dependencies import Input, Output

In [12]:
################################################### options ####################################################

# bar chart <-> tree map
chart_type = dbc.FormGroup(
    [
        dbc.Label('Bar Chart ↔️ Tree Map'),
        dbc.Checklist(
            options=[
                {'label':'', 'value':'treemap'} # input either [](default) or [1](switch)
            ],
            style={'margin-left':'60px'},
            switch=True,
            id='chart-type'
        ),
        html.Hr()        
    ]
)

# select a sub-category
item_selections = dbc.FormGroup(
    [
        dbc.Label('Select a Sub-Category'),
        dbc.RadioItems(
            options=[
                {'label': item, 'value': item} for item in baskets.columns
            ],
            value='Accessories',
            id='item-selected'
        )
        
    ]
)

###################################### card component holding the options ######################################

card = dbc.Card(
        dbc.CardBody(
            [
                chart_type,
                item_selections
            ]
        )
)

In [13]:
app = dash.Dash(__name__, external_stylesheets=[dbc.themes.FLATLY])

app.layout = html.Div([
    html.Br(),
    
    html.H4('#workoutwednessday W14 | Can You Recommend Profitable Return Customer Product Bundles?', 
            style={'text-align':'center'}),
    
    html.Br(),
    
    dbc.Row(
        [
            dbc.Col(dcc.Graph(id='graph'),width=8),
            
            dbc.Col(card,width=2)
        ],
        justify='around')
])

@app.callback(
    Output('graph', 'figure'),
    [Input('chart-type', 'value'),
     Input('item-selected','value')]
)
def update_graph(chart, item):
    data = tailor_data(item)
    
    if chart == ['treemap']:
        fig = tree_map(item)
    else:
        fig = bar_chart(item)
    
    fig.update_layout(
        height=600,
        width=1000
    )
    
    return fig

if __name__ == '__main__':
#     app.run_server(debug=True, use_reloader=False)
    app.run_server(debug=False)  

Dash is running on http://127.0.0.1:8050/

 * Serving Flask app '__main__' (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: off


 * Running on http://127.0.0.1:8050/ (Press CTRL+C to quit)
127.0.0.1 - - [27/Sep/2021 16:21:59] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [27/Sep/2021 16:22:01] "GET /_favicon.ico?v=1.21.0 HTTP/1.1" 200 -
127.0.0.1 - - [27/Sep/2021 16:22:01] "GET /_dash-layout HTTP/1.1" 200 -
127.0.0.1 - - [27/Sep/2021 16:22:01] "GET /_dash-dependencies HTTP/1.1" 200 -
127.0.0.1 - - [27/Sep/2021 16:22:01] "GET /_dash-component-suites/dash_core_components/async-graph.js HTTP/1.1" 200 -
127.0.0.1 - - [27/Sep/2021 16:22:02] "GET /_dash-component-suites/dash_core_components/async-plotlyjs.js HTTP/1.1" 200 -
127.0.0.1 - - [27/Sep/2021 16:22:05] "POST /_dash-update-component HTTP/1.1" 200 -


<hr>

### Results

Figure 3 and Figure 4 below show the solution.<br>
![solution1](https://drive.google.com/uc?export=view&id=1DBiOkHuTEfi34e46U1I2kGvRwz96RLrs)
<p style="text-align: center;">Figure 3: The bar chart solution</p>

![solution2](https://drive.google.com/uc?export=view&id=1KU-mrv2gUifecCrxXDEXL_bvIQaObb7E)
<p style="text-align: center;">Figure 4: The bar chart solution</p>