# Use Case

## Definition & Variable

### KPIs

**Share of Wallet, SoW**

$$\text{Share of Wallet} = \sum  \text{Revenue vodafone} / \sum \text{Addressable Market} $$

**Whitespace**

$$\text{Whitespace} = \sum \text{Addressable Market} -  \sum \text{Revenue vodafone} $$

- *Addressable Market*: total available market for a given customer, in a given country for Mobility product
- Note, we drop the $\sum$ if computed at the customer level

## Definition

In the table below, we summarize the data by country. For each country in the dataset, we compute the:

- `nb_customers`: Number of active customers in country `c`
- `nb_industries`: Number of industries in country `c` 
- `sum_revenue`: Sum of revenue in country `c`
- `rank_revenue`: Worldwide rank of country `c` (descending order) in term of revenue
- `sum_AM`: Total Addressable market (ie market size for the customers) in country `c`
- `rank_AM`: Worldwide rank of country `c` (descending order) in term of potential
- `sum_whitespace`: Sum of whitespace (sum_AM - sum_revenue) in country `c`
- `SoW`: Share of Wallet (sum_revenue / sum_AM) in country `c`
- `penetration_rate`: Penetration rate (sum_AM / sum_revenue) in country `c`
- `rank_penetration`: Worldwide rank of country `c` (descending order) in term of penetration
- `avg_spent`: Average spend at the customer level in country `c`
- `avg_AM`: Average Addressable market at the customer level in country `c`

The next bunch of variables focuses on the empirical distribution of revenue, Addressable market and whitespace. By default, the table compares the top 10% with the remaining 90%

- `rank_customers`: Descending rank of customer in term of revenue: ie larger customer in country `c` will have rank equal to 1
- `revenue_cumsum_perc`: Total cumulated revenue (descending order) 
- `AM_cumsum_perc`: Total cumulated Addressable market (descending order) 
- `whitespace_top`: Total cumulated whitespace of the top 10% customers
- `whitespace_bottom`: Total cumulated whitespace of the bottom 90% customers
- `bottom_top_ratio`: whitespace_bottom/whitespace_top. If larger than 1, it indicates bottom 90% has larger potential than top 10%

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import researchpy as rp
#import functions.country_report as vd
cm = sns.light_palette("green", as_cmap=True)
### can ignore the warning for the presentation
import warnings
warnings.filterwarnings('ignore')

# Agenda

- Definition & Variables
    - KPI
    - Dataset overview
- Worldwide description of the market
    - Top 3 countries 
    - Top 3 partners
    - Worldwide revenue
    - Worldwide Whitespace
- French Market
    - Brief words about French market
    - French market sectors opportunities
    - Co-integrated market analysis
- French customers analysis
    - Target customers with opportunities
    - Which team to leverage

Set the index and remove prospect from database

In [None]:
df_final = pd.read_csv('dataPandasClass_UseCase.gz',
                       compression = 'gzip')
index = ['ID',
         'Country_name',
         'IncomeGroup',
         'Languages',
         'English',
         'French',
         'Relationship',
         'Region',
         'industry', 
        'country_ref']

df_final = df_final.set_index(index).drop(columns = ['A_reference',
                                                     'B_reference',
                                                     'C_reference',
                                                     'D_reference'])
#.loc[lambda x: (x['TARGET_reference'] > 0)
    #]
df_final.head()

In [None]:
list(df_final)

## Dataset overview

### Create table 1

- How many customers + prospects in each region of the world and by market (relationship)

Step 1: create a table with customers

In [None]:
custom = (df_final
          .loc[lambda x: x['TARGET_reference'] > 0]
          .reset_index('ID')
          .groupby(['Region', 'Relationship'])['ID']
          .count()
          .unstack(-1, fill_value=0)
          )

Step : create a table with prospects

In [None]:
prospect = (df_final
            .loc[lambda x: x['TARGET_reference'] <= 0]
            .reset_index('ID')
            .groupby(['Region', 'Relationship'])['ID']
            .count()
            .unstack(-1, fill_value=0)
            )
prospect

Step 3: All customers + prospects

In [None]:
all_ = (df_final
            .reset_index('ID')
            .groupby(['Region', 'Relationship'])['ID']
            .count()
            .unstack(-1, fill_value=0)
            )
all_

Step 3: Merge tables

In [None]:
df_customers = pd.concat([all_, prospect, custom], axis = 1)
df_customers.head()

step 4: Create percentage of customers

In [None]:
df_customers_ = df_customers.assign(
    perc_First=lambda x: x.iloc[:, 6] / x.iloc[:, 0],
    perc_Second=lambda x: x.iloc[:, 7] / x.iloc[:, 1],
    perc_Third=lambda x: x.iloc[:, 8] / x.iloc[:, 2]
)
df_customers_.head()

Step 5: Recreate Columns origin => multi index

In [None]:
columns=[
    ('All','Third'),
    ('All','Second'),
    ('All', 'First'),
    ('Customers','Third_cust'),
    ('Customers','Second_cust'),
    ('Customers', 'First_cust'),
    ('Prospects','Third_prosp'),
    ('Prospects','Second_prosp'),
    ('Prospects', 'First_prosp'),
    ('perc','Third_perc'),
    ('perc','Second_perc'),
    ('perc', 'First_perc')]
df_customers_.columns=pd.MultiIndex.from_tuples(columns)
df_customers_

In [None]:
df_customers_.iloc[:, -3:].fillna(0).style.format("{:.2%}")

In [None]:
(df_final
 .loc[lambda x: 
      (x['TARGET_reference'] > 0) 
     & (x.index.get_level_values('Region').isin(['Americas']))
     ]
)

## Worldwide description of the market

Objective:

- Create a table with the following variables:
    - 'nb_customers',
    - 'nb_industries',
    - 'sum_revenue',
    - 'rank_revenue',
    - 'sum_AM',
    - 'rank_AM',
    - 'sum_whitespace',
    - 'rank_whitespace',
    - 'SoW',
    - 'avg_spent',
    - 'avg_AM',
    - 'rank_customers',
    - 'revenue_cumsum_perc',
    - 'AM_cumsum_perc',
    - 'whitespace_top',
    - 'whitespace_bottom',
    - 'bottom_top_ratio'
    
The outcomes can be viewed from this [link](https://1drv.ms/x/s!AkDhd3h9fJNWhSqO80XCllh7qV2r?e=TdRcJr)

### Compute basic stat

In [None]:
df_final.head()

In [None]:
# TARGET_reference	TARGET_potential
grouping = ['Country_name',
            'IncomeGroup',
            'country_ref',
            'Relationship']

df_agg.head()

In [None]:
### Allow compute nb custo/indu without merge
df_agg0 = (df_final
          .reset_index(['ID', 'industry'])
         )
df_agg0.head()

In [None]:
# group
df_agg1 = (df_agg0
           .groupby(grouping)
           .agg(
               sum_reference=('TARGET_reference', np.sum),
               sum_potential=('TARGET_potential', np.sum),
               avg_referencet=('TARGET_reference', np.mean),
               avg_potential=('TARGET_potential', np.mean),
               nb_customers=('ID', 'nunique'),
               nb_industries=('industry', 'nunique')
           )
           )
df_agg1.head()

In [None]:
df_agg1.shape

Creation rank

In [None]:
df_agg1.sort_values(by='sum_reference').head()

In [None]:
df_agg1.assign(
    rank_revenue=lambda x: x['sum_reference'].rank(ascending=False),
    rank_potential=lambda x: x['sum_potential'].rank(ascending=False),
).sort_values(by='rank_revenue', ascending=True)

In [None]:
df_agg = (df_final
          .reset_index(['ID', 'industry'])
          .groupby(level=grouping)
          .agg(
              sum_reference=('TARGET_reference', np.sum),
              sum_potential=('TARGET_potential', np.sum),
              avg_referencet=('TARGET_reference', np.mean),
              avg_potential=('TARGET_potential', np.mean),
              nb_customers=('ID', 'nunique'),
              nb_industries=('industry', 'nunique')
          )
          .sort_values(by='sum_reference')
          .assign(
              rank_revenue=lambda x:
              x['sum_reference'].rank(ascending=False),
              rank_AM=lambda x:
              x['sum_potential'].rank(ascending=False),
          )
          .sort_values(by='rank_revenue', ascending=True)
         )
df_agg.head()

### Compute and merge cumulated revenue/potential

In [None]:
def percentage_cum(df, grouping):
    """
    Compute cumulated distribution of revenue, potential

    Args:
        df: A dataframe with the following variables:
            - Mobility_vodafone
            - Mobility_potential

            - Whitespace
            - Customer_Name
            - industry
            and the variables to group (in the index preferably)
        grouping: Variables in index to group
        exclude_country: Remove a list of country from the original database
        slice_country: A list of country to filter. By default None
        Note, industry and customers name should be in index
        Only countries with positive revenue are included and when potential
        is larger than Vodafone revenue

    Returns:
        A dataframe with:
        - revenue_perc: Percentage of the revenue of customer i in total revenue
        - revenue_cumsum: cumulated sum of revenue by n customers
        - revenue_cumsum_perc: cumulated share of revenue by n customers
        (bottom 0, max 1)
        - potential_perc: Percentage of the potential of customer i in total
        revenue
        - potential_cumsum: cumulated sum of potential by n customers
        - AM_cumsum_perc: cumulated share of potential by n customers
        (bottom 0, max 1)
        - nb_f_total: Positioning of customer n in term of revenue among all
        customers
        - nb_f_perc: Cumulated positioning of n customers in term of revenue
        among all customers

    """

    df_ = (df
           .groupby(level=grouping)
           .agg(
               rank_customers=('TARGET_reference', 'cumcount'),
               revenue_cumsum=('TARGET_reference', 'cumsum'),
               potential_cumsum=('TARGET_potential', 'cumsum')
           )
           )

    # COunter start at 0
    df_['rank_customers'] = df_['rank_customers'] + 1
    df_['total_sum_TARGET'] = (df['TARGET_reference']
                                 .groupby(level=grouping)
                                 .transform(
        lambda x: x.sum()
    )
    )
    df_['total_sum_potential'] = (df['TARGET_potential']
                                  .groupby(level=grouping)
                                  .transform(
        lambda x: x.sum()
    )
    )
    df_['total_customers'] = (df_['rank_customers']
                              .groupby(level=grouping)
                              .transform(
        lambda x: x.max()
    )
    )

    df_1 = (df_
            .merge(df,
                   left_index=True,
                   right_index=True)
            .assign(
                revenue_perc=lambda x: x['TARGET_reference'] /
                x['total_sum_TARGET'],
                potential_perc=lambda x: x['TARGET_potential'] /
                x['total_sum_potential'],
                rank_customers_perc=lambda x: x['rank_customers'] /
                x['total_customers'],
            )
            )

    df_1['revenue_cumsum_perc'] = (df_1['revenue_perc']
                                   .groupby(level=grouping)
                                   .transform(
        lambda x: x.cumsum()
    )
    )
    df_1['AM_cumsum_perc'] = (df_1['potential_perc']
                              .groupby(level=grouping)
                              .transform(
        lambda x: x.cumsum()
    )
    )
    df_1['whitespace_top'] = df_1['potential_cumsum'] - df_1['revenue_cumsum']
    df_1['whitespace_bottom'] = (df_1['total_sum_potential'] -
                                 df_1['potential_cumsum']) - \
    (df_1['total_sum_TARGET'] -df_1['revenue_cumsum'])

    df_1['bottom_top_ratio'] = df_1['whitespace_bottom'] / \
        df_1['whitespace_top']
    return df_1

In [None]:
### threshold
topN = 0.1
grouping.extend(['TARGET_reference'])
reorder = ['nb_customers',
               'nb_industries',
               'sum_revenue',
               'rank_revenue',
               'sum_AM',
               'rank_AM',
               'sum_whitespace',
               'rank_whitespace',
               'SoW',
               'avg_spent',
               'avg_AM',
               'rank_customers',
               'revenue_cumsum_perc',
               'AM_cumsum_perc',
               'whitespace_top',
               'whitespace_bottom',
               'bottom_top_ratio']

In [None]:
df_ = (df_final
           .reset_index(['Languages', 'English', 'French', 'Region'])
           .sort_values(by=grouping, ascending=False)
           .groupby(level='Country_name')
           .apply(
               lambda x: percentage_cum(x, grouping[:-1]),
           )
           .assign(temp_top=lambda x: np.where(
               np.around(x['total_customers'] * topN) < 1,
               1,
               np.around(x['total_customers'] * .1)
           )
           )
           .loc[lambda x: (x['rank_customers'] <= x['temp_top'])]
           .reset_index(['ID', 'industry'], drop=True)
           #.reset_index()
           .groupby(level=grouping[:-1])
           .apply(
               lambda x: x.loc[lambda x: (
                  x['rank_customers_perc'] == x['rank_customers_perc'].max())]
           )
           .reset_index(level=[4, 5, 6, 7], drop=True)
           .reindex(columns=['rank_customers',
                             'rank_customers_perc',
                             'revenue_cumsum_perc',
                             'AM_cumsum_perc',
                             'whitespace_top',
                             'whitespace_bottom',
                             'bottom_top_ratio'
                             ])
           )
df_.head()

In [None]:
top_ = (df_agg.merge(df_,
                     left_index=True,
                     right_index=True)
        .sort_values(by='sum_reference')
        .reindex(columns=reorder)
       )
top_

In [None]:
top_.to_excel('test.xlsx')

In [None]:
def format_row_wise(styler, formatterA, formatterB=None, to_exclude=[]):
    """
    Thanks to
    https://stackoverflow.com/questions/52783419/format-pandas-dataframe-row-wise
    """
    for row, row_formatter in formatterA.items():
        row_num = styler.index.get_loc(row)

        for col_num, col_name in enumerate(styler.columns):
            if col_name in to_exclude:
                pass
            else:
                styler._display_funcs[(row_num, col_num)] = row_formatter
    if formatterB != None:
        for row, row_formatter in formatterB.items():
            row_num = styler.index.get_loc(row)

            for col_num, col_name in enumerate(styler.columns):
                if col_name[1] in to_exclude:

                    styler._display_funcs[(row_num, col_num)] = row_formatter

                else:
                    pass
    return styler

In [None]:
n = 3
top_3 = (top_
         .droplevel(level = 1)
 .sort_values(by='rank_revenue')
 .reindex(columns=['nb_customers',
                   'sum_revenue',
                   'sum_AM',
                   'rank_AM',
                   'SoW',
                   'revenue_cumsum_perc',
                   'AM_cumsum_perc'])
 .head(n)
 .reset_index(['country_ref'], drop='True')
 .T
 )

formatters = {"sum_revenue": lambda x: f"€{x:,.0f}",
              "sum_AM": lambda x: f"€{x:,.0f}",
              "SoW": lambda x: f"{x:,.2%}",
              "revenue_cumsum_perc": lambda x: f"{x:,.2%}",
              "AM_cumsum_perc": lambda x: f"{x:,.2%}"
              }
styler = format_row_wise(top_3.style, formatters)
styler

In [None]:
(top_
 .style
 .bar(subset=['sum_revenue',
              'sum_AM',
              'sum_whitespace',
              'avg_spent',
              'avg_AM',
              'rank_AM',
              'rank_penetration',
              'rank_whitespace',
              'whitespace_top',
              'whitespace_bottom'],
      align='mid',
      color=['#d65f5f', '#5fba7d'])
 .format("{:.1%}", subset=['SoW',
                           'penetration_rate',
                           'revenue_cumsum_perc',
                           'AM_cumsum_perc'])
 .format('€{0:,.0f}', subset=['sum_revenue',
                              'sum_AM',
                              'sum_whitespace',
                              'avg_spent',
                              'avg_AM',
                              'whitespace_top',
                              'whitespace_bottom'])
 )

### WorldWide map

In [None]:
import plotly.express as px

In [None]:
fig = px.choropleth(top_.reset_index(),
                    locations="country_ref",
                    color="sum_revenue",
                    hover_name="Relationship",
                    title = 'World wide operating revenues')

#fig.layout.autosize = True
fig.layout.width = 800
fig.layout.height = 600

fig.show()