# Use Case

## Definition & Variable

### KPIs

**Share of Wallet, SoW**

$$\text{Share of Wallet} = \sum  \text{Revenue vodafone} / \sum \text{Addressable Market} $$

**Whitespace**

$$\text{Whitespace} = \sum \text{Addressable Market} -  \sum \text{Revenue vodafone} $$

- *Addressable Market*: total available market for a given customer, in a given country for Mobility product
- Note, we drop the $\sum$ if computed at the customer level

## Definition

In the table below, we summarize the data by country. For each country in the dataset, we compute the:

- `nb_customers`: Number of active customers in country `c`
- `nb_industries`: Number of industries in country `c` 
- `sum_revenue`: Sum of revenue in country `c`
- `rank_revenue`: Worldwide rank of country `c` (descending order) in term of revenue
- `sum_AM`: Total Addressable market (ie market size for the customers) in country `c`
- `rank_AM`: Worldwide rank of country `c` (descending order) in term of potential
- `sum_whitespace`: Sum of whitespace (sum_AM - sum_revenue) in country `c`
- `SoW`: Share of Wallet (sum_revenue / sum_AM) in country `c`
- `penetration_rate`: Penetration rate (sum_AM / sum_revenue) in country `c`
- `rank_penetration`: Worldwide rank of country `c` (descending order) in term of penetration
- `avg_spent`: Average spend at the customer level in country `c`
- `avg_AM`: Average Addressable market at the customer level in country `c`

The next bunch of variables focuses on the empirical distribution of revenue, Addressable market and whitespace. By default, the table compares the top 10% with the remaining 90%

- `rank_customers`: Descending rank of customer in term of revenue: ie larger customer in country `c` will have rank equal to 1
- `revenue_cumsum_perc`: Total cumulated revenue (descending order) 
- `AM_cumsum_perc`: Total cumulated Addressable market (descending order) 
- `whitespace_top`: Total cumulated whitespace of the top 10% customers
- `whitespace_bottom`: Total cumulated whitespace of the bottom 90% customers
- `bottom_top_ratio`: whitespace_bottom/whitespace_top. If larger than 1, it indicates bottom 90% has larger potential than top 10%

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import researchpy as rp
#import functions.country_report as vd
cm = sns.light_palette("green", as_cmap=True)
### can ignore the warning for the presentation
import warnings
warnings.filterwarnings('ignore')

  import pandas.util.testing as tm


# Agenda

- Definition & Variables
    - KPI
    - Dataset overview
- Worldwide description of the market
    - Top 3 countries 
    - Top 3 partners
    - Worldwide revenue
    - Worldwide Whitespace
- French Market
    - Brief words about French market
    - French market sectors opportunities
    - Co-integrated market analysis
- French customers analysis
    - Target customers with opportunities
    - Which team to leverage

Set the index and remove prospect from database

In [2]:
df_final = pd.read_csv('dataPandasClass_UseCase.gz',
                       compression = 'gzip')
index = ['ID',
         'Country_name',
         'IncomeGroup',
         'Languages',
         'English',
         'French',
         'Relationship',
         'Region',
         'industry', 
        'country_ref']

df_final = df_final.set_index(index).drop(columns = ['A_reference',
                                                     'B_reference',
                                                     'C_reference',
                                                     'D_reference'])
#.loc[lambda x: (x['TARGET_reference'] > 0)
    #]
df_final.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,TARGET_reference,TARGET_potential
ID,Country_name,IncomeGroup,Languages,English,French,Relationship,Region,industry,country_ref,Unnamed: 10_level_1,Unnamed: 11_level_1
Customer 88,Chad,Low income,"French, Arabic",0,0,Third,Middle East & Africa,Indu_G,TCD,0.0,390.0
Customer 20,Chad,Low income,"French, Arabic",0,0,Third,Middle East & Africa,Indu_H,TCD,0.0,234.0
Customer 523,Mali,Low income,French,0,0,Third,Middle East & Africa,Indu_K,MLI,0.0,404.857143
Customer 1086,Mali,Low income,French,0,0,Third,Middle East & Africa,Indu_C,MLI,0.0,21211.167857
Customer 20,Mali,Low income,French,0,0,Third,Middle East & Africa,Indu_H,MLI,0.0,234.0


In [3]:
list(df_final)

['TARGET_reference', 'TARGET_potential']

## Dataset overview

### Create table 1

- How many customers + prospects in each region of the world and by market (relationship)

Step 1: create a table with customers

In [21]:
custom = (df_final
          .loc[lambda x: x['TARGET_reference'] > 0]
          .reset_index('ID')
          .groupby(['Region', 'Relationship'])['ID']
          .count()
          .unstack(-1, fill_value=0)
          )

Step : create a table with prospects

In [22]:
prospect = (df_final
            .loc[lambda x: x['TARGET_reference'] <= 0]
            .reset_index('ID')
            .groupby(['Region', 'Relationship'])['ID']
            .count()
            .unstack(-1, fill_value=0)
            )
prospect

Relationship,First,Second,Third
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Americas,0,826,1982
Asia Pacific,907,1287,1116
Central Europe,569,1020,301
Middle East & Africa,78,121,222
Northern Europe,1063,1808,1
Southern Europe,627,18,2


Step 3: All customers + prospects

In [23]:
all_ = (df_final
            .reset_index('ID')
            .groupby(['Region', 'Relationship'])['ID']
            .count()
            .unstack(-1, fill_value=0)
            )
all_

Relationship,First,Second,Third
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Americas,0,826,1987
Asia Pacific,909,1517,1363
Central Europe,2285,2188,804
Middle East & Africa,1177,189,222
Northern Europe,1928,3035,1
Southern Europe,3092,18,2


Step 3: Merge tables

In [32]:
df_customers = pd.concat([all_, prospect, custom], axis = 1)
df_customers.head()

Relationship,First,Second,Third,First,Second,Third,First,Second,Third
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Americas,0,826,1987,0,826,1982,0,0,5
Asia Pacific,909,1517,1363,907,1287,1116,2,230,247
Central Europe,2285,2188,804,569,1020,301,1716,1168,503
Middle East & Africa,1177,189,222,78,121,222,1099,68,0
Northern Europe,1928,3035,1,1063,1808,1,865,1227,0


step 4: Create percentage of customers

In [34]:
df_customers_ = df_customers.assign(
    perc_First=lambda x: x.iloc[:, 6] / x.iloc[:, 0],
    perc_Second=lambda x: x.iloc[:, 7] / x.iloc[:, 1],
    perc_Third=lambda x: x.iloc[:, 8] / x.iloc[:, 2]
)
df_customers_.head()

Relationship,First,Second,Third,First,Second,Third,First,Second,Third,perc_First,perc_Second,perc_Third
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Americas,0,826,1987,0,826,1982,0,0,5,,0.0,0.002516
Asia Pacific,909,1517,1363,907,1287,1116,2,230,247,0.0022,0.151615,0.181218
Central Europe,2285,2188,804,569,1020,301,1716,1168,503,0.750985,0.533821,0.625622
Middle East & Africa,1177,189,222,78,121,222,1099,68,0,0.93373,0.359788,0.0
Northern Europe,1928,3035,1,1063,1808,1,865,1227,0,0.448651,0.404283,0.0


Step 5: Recreate Columns origin => multi index

In [40]:
columns=[
    ('All','Third'),
    ('All','Second'),
    ('All', 'First'),
    ('Customers','Third_cust'),
    ('Customers','Second_cust'),
    ('Customers', 'First_cust'),
    ('Prospects','Third_prosp'),
    ('Prospects','Second_prosp'),
    ('Prospects', 'First_prosp'),
    ('perc','Third_perc'),
    ('perc','Second_perc'),
    ('perc', 'First_perc')]
df_customers_.columns=pd.MultiIndex.from_tuples(columns)
df_customers_

Unnamed: 0_level_0,All,All,All,Customers,Customers,Customers,Prospects,Prospects,Prospects,perc,perc,perc
Unnamed: 0_level_1,Third,Second,First,Third_cust,Second_cust,First_cust,Third_prosp,Second_prosp,First_prosp,Third_perc,Second_perc,First_perc
Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
Americas,0,826,1987,0,826,1982,0,0,5,,0.0,0.002516
Asia Pacific,909,1517,1363,907,1287,1116,2,230,247,0.0022,0.151615,0.181218
Central Europe,2285,2188,804,569,1020,301,1716,1168,503,0.750985,0.533821,0.625622
Middle East & Africa,1177,189,222,78,121,222,1099,68,0,0.93373,0.359788,0.0
Northern Europe,1928,3035,1,1063,1808,1,865,1227,0,0.448651,0.404283,0.0
Southern Europe,3092,18,2,627,18,2,2465,0,0,0.797219,0.0,0.0


In [46]:
df_customers_.iloc[:, -3:].fillna(0).style.format("{:.2%}")

Unnamed: 0_level_0,perc,perc,perc
Unnamed: 0_level_1,Third_perc,Second_perc,First_perc
Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Americas,0.00%,0.00%,0.25%
Asia Pacific,0.22%,15.16%,18.12%
Central Europe,75.10%,53.38%,62.56%
Middle East & Africa,93.37%,35.98%,0.00%
Northern Europe,44.87%,40.43%,0.00%
Southern Europe,79.72%,0.00%,0.00%


In [47]:
(df_final
 .loc[lambda x: 
      (x['TARGET_reference'] > 0) 
     & (x.index.get_level_values('Region').isin(['Americas']))
     ]
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,TARGET_reference,TARGET_potential
ID,Country_name,IncomeGroup,Languages,English,French,Relationship,Region,industry,country_ref,Unnamed: 10_level_1,Unnamed: 11_level_1
Customer 109,United States,High income,English,1,0,Third,Americas,Indu_L,USA,5190.03,2609238.0
Customer 999,United States,High income,English,1,0,Third,Americas,Indu_L,USA,186.141429,987806.1
Customer 27,United States,High income,English,1,0,Third,Americas,Indu_L,USA,101909.824286,101909.8
Customer 23,United States,High income,English,1,0,Third,Americas,Indu_G,USA,154984.894286,1797973.0
Customer 10,United States,High income,English,1,0,Third,Americas,Indu_E,USA,43984.428571,9080424.0


## Worldwide description of the market

Objective:

- Create a table with the following variables:
    - 'nb_customers',
    - 'nb_industries',
    - 'sum_revenue',
    - 'rank_revenue',
    - 'sum_AM',
    - 'rank_AM',
    - 'sum_whitespace',
    - 'rank_whitespace',
    - 'SoW',
    - 'avg_spent',
    - 'avg_AM',
    - 'rank_customers',
    - 'revenue_cumsum_perc',
    - 'AM_cumsum_perc',
    - 'whitespace_top',
    - 'whitespace_bottom',
    - 'bottom_top_ratio'
    
The outcomes can be viewed from this [link](https://1drv.ms/x/s!AkDhd3h9fJNWhSqO80XCllh7qV2r?e=TdRcJr)

### Compute basic stat

In [57]:
df_final.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,TARGET_reference,TARGET_potential
ID,Country_name,IncomeGroup,Languages,English,French,Relationship,Region,industry,country_ref,Unnamed: 10_level_1,Unnamed: 11_level_1
Customer 88,Chad,Low income,"French, Arabic",0,0,Third,Middle East & Africa,Indu_G,TCD,0.0,390.0
Customer 20,Chad,Low income,"French, Arabic",0,0,Third,Middle East & Africa,Indu_H,TCD,0.0,234.0
Customer 523,Mali,Low income,French,0,0,Third,Middle East & Africa,Indu_K,MLI,0.0,404.857143
Customer 1086,Mali,Low income,French,0,0,Third,Middle East & Africa,Indu_C,MLI,0.0,21211.167857
Customer 20,Mali,Low income,French,0,0,Third,Middle East & Africa,Indu_H,MLI,0.0,234.0


In [60]:
# TARGET_reference	TARGET_potential
grouping = ['Country_name',
            'IncomeGroup',
            'country_ref',
            'Relationship']

df_agg.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,sum_reference,sum_potential,avg_referencet,avg_potential,nb_customers,nb_industries,rank_revenue,rank_AM
Country_name,IncomeGroup,country_ref,Relationship,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
China,Upper middle income,CHN,Third,-10000.0,155334500.0,-28.901734,448943.632147,346,15,130.0,7.0
Hong Kong,High income,HKG,Second,-1874.932857,128390700.0,-6.996018,479069.801006,268,16,129.0,11.0
Afghanistan,Low income,AFG,Third,0.0,63619.7,0.0,10603.283221,6,4,85.5,97.0
Nigeria,Lower middle income,NGA,Second,0.0,1410319.0,0.0,20147.417984,70,13,85.5,62.0
Niger,Low income,NER,Third,0.0,133911.7,0.0,66955.827145,2,2,85.5,92.0


In [63]:
### Allow compute nb custo/indu without merge
df_agg0 = (df_final
          .reset_index(['ID', 'industry'])
         )
df_agg0.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,ID,industry,TARGET_reference,TARGET_potential
Country_name,IncomeGroup,Languages,English,French,Relationship,Region,country_ref,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Chad,Low income,"French, Arabic",0,0,Third,Middle East & Africa,TCD,Customer 88,Indu_G,0.0,390.0
Chad,Low income,"French, Arabic",0,0,Third,Middle East & Africa,TCD,Customer 20,Indu_H,0.0,234.0
Mali,Low income,French,0,0,Third,Middle East & Africa,MLI,Customer 523,Indu_K,0.0,404.857143
Mali,Low income,French,0,0,Third,Middle East & Africa,MLI,Customer 1086,Indu_C,0.0,21211.167857
Mali,Low income,French,0,0,Third,Middle East & Africa,MLI,Customer 20,Indu_H,0.0,234.0


In [66]:
# group
df_agg1 = (df_agg0
           .groupby(grouping)
           .agg(
               sum_reference=('TARGET_reference', np.sum),
               sum_potential=('TARGET_potential', np.sum),
               avg_referencet=('TARGET_reference', np.mean),
               avg_potential=('TARGET_potential', np.mean),
               nb_customers=('ID', 'nunique'),
               nb_industries=('industry', 'nunique')
           )
           )
df_agg1.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,sum_reference,sum_potential,avg_referencet,avg_potential,nb_customers,nb_industries
Country_name,IncomeGroup,country_ref,Relationship,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Afghanistan,Low income,AFG,Third,0.0,63619.7,0.0,10603.283221,6,4
Albania,Upper middle income,ALB,First,152090.173783,241416.0,2493.281537,3957.639794,61,13
Angola,Lower middle income,AGO,Third,0.0,177914.4,0.0,12708.172521,14,7
Argentina,Upper middle income,ARG,Second,0.0,13868690.0,0.0,91845.607357,151,16
Armenia,Upper middle income,ARM,Second,0.0,630655.3,0.0,630655.251824,1,1


In [68]:
df_agg1.shape

(130, 6)

Creation rank

In [67]:
df_agg1.sort_values(by='sum_reference').head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,sum_reference,sum_potential,avg_referencet,avg_potential,nb_customers,nb_industries
Country_name,IncomeGroup,country_ref,Relationship,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
China,Upper middle income,CHN,Third,-10000.0,155334500.0,-28.901734,448943.632147,346,15
Hong Kong,High income,HKG,Second,-1874.932857,128390700.0,-6.996018,479069.801006,268,16
Afghanistan,Low income,AFG,Third,0.0,63619.7,0.0,10603.283221,6,4
Nigeria,Lower middle income,NGA,Second,0.0,1410319.0,0.0,20147.417984,70,13
Niger,Low income,NER,Third,0.0,133911.7,0.0,66955.827145,2,2


In [74]:
df_agg1.assign(
    rank_revenue=lambda x: x['sum_reference'].rank(ascending=False),
    rank_potential=lambda x: x['sum_potential'].rank(ascending=False),
).sort_values(by='rank_revenue', ascending=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,sum_reference,sum_potential,avg_referencet,avg_potential,nb_customers,nb_industries,rank_revenue,rank_potential
Country_name,IncomeGroup,country_ref,Relationship,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Germany,High income,DEU,First,3.833645e+07,2.910431e+08,39238.940998,297894.683234,977,17,1.0,4.0
United Kingdom,High income,GBR,First,3.246468e+07,3.005992e+08,34030.066844,315093.457198,954,16,2.0,3.0
Italy,High income,ITA,First,2.300417e+07,1.299793e+08,24603.392467,139015.290347,935,16,3.0,10.0
Switzerland,High income,CHE,Second,1.737817e+07,4.362785e+08,29207.015960,733241.106905,595,16,4.0,2.0
South Africa,Upper middle income,ZAF,First,1.447401e+07,2.614501e+07,26126.365212,47193.159323,554,17,5.0,29.0
...,...,...,...,...,...,...,...,...,...,...,...
Guatemala,Upper middle income,GTM,Third,0.000000e+00,1.921534e+05,0.000000,38430.689366,5,3,85.5,87.0
Kazakhstan,Upper middle income,KAZ,Third,0.000000e+00,5.662627e+04,0.000000,11325.253072,5,4,85.5,99.0
Lithuania,High income,LTU,Second,0.000000e+00,1.537410e+06,0.000000,33421.962584,46,11,85.5,61.0
Hong Kong,High income,HKG,Second,-1.874933e+03,1.283907e+08,-6.996018,479069.801006,268,16,129.0,11.0


In [79]:
df_agg = (df_final
          .reset_index(['ID', 'industry'])
          .groupby(level=grouping)
          .agg(
              sum_reference=('TARGET_reference', np.sum),
              sum_potential=('TARGET_potential', np.sum),
              avg_referencet=('TARGET_reference', np.mean),
              avg_potential=('TARGET_potential', np.mean),
              nb_customers=('ID', 'nunique'),
              nb_industries=('industry', 'nunique')
          )
          .sort_values(by='sum_reference')
          .assign(
              rank_revenue=lambda x:
              x['sum_reference'].rank(ascending=False),
              rank_AM=lambda x:
              x['sum_potential'].rank(ascending=False),
          )
          .sort_values(by='rank_revenue', ascending=True)
         )
df_agg.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,sum_reference,sum_potential,avg_referencet,avg_potential,nb_customers,nb_industries,rank_revenue,rank_AM
Country_name,IncomeGroup,country_ref,Relationship,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Germany,High income,DEU,First,38336450.0,291043100.0,39238.940998,297894.683234,977,17,1.0,4.0
United Kingdom,High income,GBR,First,32464680.0,300599200.0,34030.066844,315093.457198,954,16,2.0,3.0
Italy,High income,ITA,First,23004170.0,129979300.0,24603.392467,139015.290347,935,16,3.0,10.0
Switzerland,High income,CHE,Second,17378170.0,436278500.0,29207.01596,733241.106905,595,16,4.0,2.0
South Africa,Upper middle income,ZAF,First,14474010.0,26145010.0,26126.365212,47193.159323,554,17,5.0,29.0


### Compute and merge cumulated revenue/potential

In [80]:
def percentage_cum(df, grouping):
    """
    Compute cumulated distribution of revenue, potential

    Args:
        df: A dataframe with the following variables:
            - Mobility_vodafone
            - Mobility_potential

            - Whitespace
            - Customer_Name
            - industry
            and the variables to group (in the index preferably)
        grouping: Variables in index to group
        exclude_country: Remove a list of country from the original database
        slice_country: A list of country to filter. By default None
        Note, industry and customers name should be in index
        Only countries with positive revenue are included and when potential
        is larger than Vodafone revenue

    Returns:
        A dataframe with:
        - revenue_perc: Percentage of the revenue of customer i in total revenue
        - revenue_cumsum: cumulated sum of revenue by n customers
        - revenue_cumsum_perc: cumulated share of revenue by n customers
        (bottom 0, max 1)
        - potential_perc: Percentage of the potential of customer i in total
        revenue
        - potential_cumsum: cumulated sum of potential by n customers
        - AM_cumsum_perc: cumulated share of potential by n customers
        (bottom 0, max 1)
        - nb_f_total: Positioning of customer n in term of revenue among all
        customers
        - nb_f_perc: Cumulated positioning of n customers in term of revenue
        among all customers

    """

    df_ = (df
           .groupby(level=grouping)
           .agg(
               rank_customers=('TARGET_reference', 'cumcount'),
               revenue_cumsum=('TARGET_reference', 'cumsum'),
               potential_cumsum=('TARGET_potential', 'cumsum')
           )
           )

    # COunter start at 0
    df_['rank_customers'] = df_['rank_customers'] + 1
    df_['total_sum_TARGET'] = (df['TARGET_reference']
                                 .groupby(level=grouping)
                                 .transform(
        lambda x: x.sum()
    )
    )
    df_['total_sum_potential'] = (df['TARGET_potential']
                                  .groupby(level=grouping)
                                  .transform(
        lambda x: x.sum()
    )
    )
    df_['total_customers'] = (df_['rank_customers']
                              .groupby(level=grouping)
                              .transform(
        lambda x: x.max()
    )
    )

    df_1 = (df_
            .merge(df,
                   left_index=True,
                   right_index=True)
            .assign(
                revenue_perc=lambda x: x['TARGET_reference'] /
                x['total_sum_TARGET'],
                potential_perc=lambda x: x['TARGET_potential'] /
                x['total_sum_potential'],
                rank_customers_perc=lambda x: x['rank_customers'] /
                x['total_customers'],
            )
            )

    df_1['revenue_cumsum_perc'] = (df_1['revenue_perc']
                                   .groupby(level=grouping)
                                   .transform(
        lambda x: x.cumsum()
    )
    )
    df_1['AM_cumsum_perc'] = (df_1['potential_perc']
                              .groupby(level=grouping)
                              .transform(
        lambda x: x.cumsum()
    )
    )
    df_1['whitespace_top'] = df_1['potential_cumsum'] - df_1['revenue_cumsum']
    df_1['whitespace_bottom'] = (df_1['total_sum_potential'] -
                                 df_1['potential_cumsum']) - \
    (df_1['total_sum_TARGET'] -df_1['revenue_cumsum'])

    df_1['bottom_top_ratio'] = df_1['whitespace_bottom'] / \
        df_1['whitespace_top']
    return df_1

In [81]:
### threshold
topN = 0.1
grouping.extend(['TARGET_reference'])
reorder = ['nb_customers',
               'nb_industries',
               'sum_revenue',
               'rank_revenue',
               'sum_AM',
               'rank_AM',
               'sum_whitespace',
               'rank_whitespace',
               'SoW',
               'avg_spent',
               'avg_AM',
               'rank_customers',
               'revenue_cumsum_perc',
               'AM_cumsum_perc',
               'whitespace_top',
               'whitespace_bottom',
               'bottom_top_ratio']

In [82]:
df_ = (df_final
           .reset_index(['Languages', 'English', 'French', 'Region'])
           .sort_values(by=grouping, ascending=False)
           .groupby(level='Country_name')
           .apply(
               lambda x: percentage_cum(x, grouping[:-1]),
           )
           .assign(temp_top=lambda x: np.where(
               np.around(x['total_customers'] * topN) < 1,
               1,
               np.around(x['total_customers'] * .1)
           )
           )
           .loc[lambda x: (x['rank_customers'] <= x['temp_top'])]
           .reset_index(['ID', 'industry'], drop=True)
           #.reset_index()
           .groupby(level=grouping[:-1])
           .apply(
               lambda x: x.loc[lambda x: (
                  x['rank_customers_perc'] == x['rank_customers_perc'].max())]
           )
           .reset_index(level=[4, 5, 6, 7], drop=True)
           .reindex(columns=['rank_customers',
                             'rank_customers_perc',
                             'revenue_cumsum_perc',
                             'AM_cumsum_perc',
                             'whitespace_top',
                             'whitespace_bottom',
                             'bottom_top_ratio'
                             ])
           )
df_.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,rank_customers,rank_customers_perc,revenue_cumsum_perc,AM_cumsum_perc,whitespace_top,whitespace_bottom,bottom_top_ratio
Country_name,IncomeGroup,country_ref,Relationship,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Afghanistan,Low income,AFG,Third,1,0.166667,,0.212159,13497.47,50122.23,3.713455
Albania,Upper middle income,ALB,First,6,0.098361,0.536329,0.337883,0.0,89325.85,inf
Angola,Lower middle income,AGO,Third,1,0.071429,,0.054021,9611.114,168303.3,17.511321
Argentina,Upper middle income,ARG,Second,15,0.099338,,0.21597,2995221.0,10873470.0,3.630271
Armenia,Upper middle income,ARM,Second,1,1.0,,1.0,630655.3,0.0,0.0


In [84]:
top_ = (df_agg.merge(df_,
                     left_index=True,
                     right_index=True)
        .sort_values(by='sum_reference')
        .reindex(columns=reorder)
       )
top_

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,nb_customers,nb_industries,sum_revenue,rank_revenue,sum_AM,rank_AM,sum_whitespace,rank_whitespace,SoW,avg_spent,avg_AM,rank_customers,revenue_cumsum_perc,AM_cumsum_perc,whitespace_top,whitespace_bottom,bottom_top_ratio
Country_name,IncomeGroup,country_ref,Relationship,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
China,Upper middle income,CHN,Third,346,15,,130.0,,7.0,,,,,,35,-0.000000,0.083950,1.304035e+07,1.423041e+08,10.912604
Hong Kong,High income,HKG,Second,268,16,,129.0,,11.0,,,,,,27,-0.000000,0.047627,6.114852e+06,1.222777e+08,19.996842
Sri Lanka,Upper middle income,LKA,Third,2,2,,85.5,,80.0,,,,,,1,,0.700892,2.147551e+05,9.164730e+04,0.426753
Nigeria,Lower middle income,NGA,Second,70,13,,85.5,,62.0,,,,,,7,,0.027745,3.912974e+04,1.371190e+06,35.042129
Trinidad and Tobago,High income,TTO,Third,5,3,,85.5,,82.0,,,,,,1,,0.021963,5.503287e+03,2.450646e+05,44.530577
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
South Africa,Upper middle income,ZAF,First,554,17,,5.0,,29.0,,,,,,55,0.683550,0.684679,8.007240e+06,3.663764e+06,0.457556
Switzerland,High income,CHE,Second,595,16,,4.0,,2.0,,,,,,60,0.682253,0.319515,1.275412e+08,2.913590e+08,2.284430
Italy,High income,ITA,First,935,16,,3.0,,10.0,,,,,,94,0.662864,0.436856,4.153362e+07,6.544150e+07,1.575627
United Kingdom,High income,GBR,First,954,16,,2.0,,3.0,,,,,,95,0.728354,0.297193,6.569028e+07,2.024442e+08,3.081798


In [None]:
top_.to_excel('test.xlsx')

In [None]:
def format_row_wise(styler, formatterA, formatterB=None, to_exclude=[]):
    """
    Thanks to
    https://stackoverflow.com/questions/52783419/format-pandas-dataframe-row-wise
    """
    for row, row_formatter in formatterA.items():
        row_num = styler.index.get_loc(row)

        for col_num, col_name in enumerate(styler.columns):
            if col_name in to_exclude:
                pass
            else:
                styler._display_funcs[(row_num, col_num)] = row_formatter
    if formatterB != None:
        for row, row_formatter in formatterB.items():
            row_num = styler.index.get_loc(row)

            for col_num, col_name in enumerate(styler.columns):
                if col_name[1] in to_exclude:

                    styler._display_funcs[(row_num, col_num)] = row_formatter

                else:
                    pass
    return styler

In [None]:
n = 3
top_3 = (top_
         .droplevel(level = 1)
 .sort_values(by='rank_revenue')
 .reindex(columns=['nb_customers',
                   'sum_revenue',
                   'sum_AM',
                   'rank_AM',
                   'SoW',
                   'revenue_cumsum_perc',
                   'AM_cumsum_perc'])
 .head(n)
 .reset_index(['country_ref'], drop='True')
 .T
 )

formatters = {"sum_revenue": lambda x: f"€{x:,.0f}",
              "sum_AM": lambda x: f"€{x:,.0f}",
              "SoW": lambda x: f"{x:,.2%}",
              "revenue_cumsum_perc": lambda x: f"{x:,.2%}",
              "AM_cumsum_perc": lambda x: f"{x:,.2%}"
              }
styler = format_row_wise(top_3.style, formatters)
styler

In [None]:
(top_
 .style
 .bar(subset=['sum_revenue',
              'sum_AM',
              'sum_whitespace',
              'avg_spent',
              'avg_AM',
              'rank_AM',
              'rank_penetration',
              'rank_whitespace',
              'whitespace_top',
              'whitespace_bottom'],
      align='mid',
      color=['#d65f5f', '#5fba7d'])
 .format("{:.1%}", subset=['SoW',
                           'penetration_rate',
                           'revenue_cumsum_perc',
                           'AM_cumsum_perc'])
 .format('€{0:,.0f}', subset=['sum_revenue',
                              'sum_AM',
                              'sum_whitespace',
                              'avg_spent',
                              'avg_AM',
                              'whitespace_top',
                              'whitespace_bottom'])
 )

### WorldWide map

In [None]:
import plotly.express as px

In [None]:
fig = px.choropleth(top_.reset_index(),
                    locations="country_ref",
                    color="sum_revenue",
                    hover_name="Relationship",
                    title = 'World wide operating revenues')

#fig.layout.autosize = True
fig.layout.width = 800
fig.layout.height = 600

fig.show()