# French Market Analysis

- Presentation Vodafone

![](https://drive.google.com/uc?export=view&id=1gSpJRDs2qT7DwtXr8AorVg3ysdi6734I)


## Agenda

- Definition & Variables
    - KPI
    - Dataset overview
- Worldwide description of the market
    - Top 3 countries 
    - Top 3 partners
    - Worldwide revenue
    - Worldwide Whitespace
- French Market
    - Brief words about French market
    - French market sectors opportunities
    - Co-integrated market analysis
- French customers analysis
    - Target customers with opportunities
    - Which team to leverage

In this presentation, we provide bottleneck analysis about the french market. We first describe the performance of the customers in France. Then we move forward by showing the potential contribution of the partner and Vodafone countries toward the french market. At last, we give our recommendations about which customers to target along with an explanation. 

The updated dataset provides information on the revenue generated by customers operating in one or many countries. In this presentation, we focus on the mobility product in both partner and Vodafone countries. The dataset has value of the Addressable market for each customer. For example, customer 1 operates in Germany (Vodafone country), generates 1M revenue and has a TAM of 2M. The market share of this customer is .5, with a whitespace of 1M. 


# Definition & Variable

## KPIs

**Share of Wallet, SoW**

$$\text{Share of Wallet} = \sum  \text{Revenue vodafone} / \sum \text{Addressable Market} $$

**Whitespace**

$$\text{Whitespace} = \sum \text{Addressable Market} -  \sum \text{Revenue vodafone} $$

- *Addressable Market*: total available market for a given customer, in a given country for Mobility product
- Note, we drop the $\sum$ if computed at the customer level

In [2]:
import pandas as pd
from Fast_connectCloud import connector
from GoogleDrivePy.google_drive import connect_drive
from GoogleDrivePy.google_platform import connect_cloud_platform
import plotly.express as px
import numpy as np
import plotly.express as px
import seaborn as sns
import matplotlib.pyplot as plt
import researchpy as rp
#import functions.country_report as vd
cm = sns.light_palette("green", as_cmap=True)
### can ignore the warning for the presentation
import warnings
warnings.filterwarnings('ignore')

In [3]:
%load_ext autoreload
%autoreload 2

In [5]:
gs = connector.open_connection(online_connection=False,
                               path_credential='/Users/Thomas/Google Drive/Projects/Data_science/Google_code_n_Oauth/Client_Oauth/Google_auth/')

service_gd = gs.connect_remote(engine='GS')
service_gcp = gs.connect_remote(engine='GCP')

gdr = connect_drive.connect_drive(service_gd['GoogleDrive'])

project = 'valid-pagoda-132423'
gcp = connect_cloud_platform.connect_console(project=project,
                                             service_account=service_gcp['GoogleCloudP'])

Service Google Drive and Docs, Sheet are now connected. 
Service Google Drive is stored as <googleapiclient.discovery.Resource object at 0x1a21f5d7d0> and accessible with "drive" 
Service Google Doc is stored as <googleapiclient.discovery.Resource object at 0x1a220be990> and accessible with "doc" 
Service Google Sheet is stored as <googleapiclient.discovery.Resource object at 0x1a2219a9d0>and accessible with "sheet"
Service account storage and Bigquery are now connected. 
Service account storage is stored as <google.cloud.storage.client.Client object at 0x1a2219ae10> and accessible with "Storage_account" 
Service account Bigquery is stored as <google.cloud.bigquery.client.Client object at 0x1a2219af90> and accessible with "bigquery_account"


In [6]:
project = 'valid-pagoda-132423'
query = (
    "SELECT Customer_Name, Country_name, IncomeGroup, WB, Languages, "
    "English, French, Partnership, "
    "Inbound_Region, industry, Mobility_vodafone, Mobility_potential "
    "FROM Business.Vodafone "

)
index = ['Customer_Name',
         'Country_name',
         'IncomeGroup',
         'WB',
         'Languages',
         'English',
         'French',
         'Partnership',
         'Inbound_Region',
         'industry']
df_final = (gcp
            .upload_data_from_bigquery(query=query,
                                       location='US')
            .set_index(index)
            .assign(
                Whitespace=lambda x: x['Mobility_potential'] - x['Mobility_vodafone'])
            )

In [18]:
columns = [('Effective', 'Non_Vodafone'),
           ('Effective', 'Partner'),
           ('Effective', 'Vodafone')]

eff = (df_final
       .loc[lambda x: (x['Mobility_vodafone'] > 0) &
            (x['Mobility_potential'] > x['Mobility_vodafone'])
            ]
       .reset_index('Country_name')
       .groupby(level=[6, 7])['Country_name']
       .nunique()
       .unstack(-1, fill_value=0)
       .T
       )
eff

Partnership,Non_Vodafone,Partner,Vodafone
Inbound_Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Americas,1,0,0
Asia Pacific,1,1,1
Central Europe,3,8,4
Middle East & Africa,0,1,6
Northern Europe,0,6,2
Southern Europe,0,0,7


In [7]:
#### Make Table 1

columns = [('Effective', 'Non_Vodafone'),
           ('Effective', 'Partner'),
           ('Effective', 'Vodafone')]

eff = (df_final
       .loc[lambda x: (x['Mobility_vodafone'] > 0) &
            (x['Mobility_potential'] > x['Mobility_vodafone'])
            ]
       .reset_index('Country_name')
       .groupby(level=[6, 7])['Country_name']
       .nunique()
       .unstack(-1, fill_value=0)
       .T
       )

# eff.columns=pd.MultiIndex.from_tuples(columns)
columns = [('All', 'Non_Vodafone'),
           ('All', 'Partner'),
           ('All', 'Vodafone')]
# Distribution of country by partnership and region
all_ = (df_final
        .reset_index('Country_name')
        .groupby(level=[6, 7])['Country_name']
        .nunique()
        .unstack(-1, fill_value=0)
        .T
        .merge(eff, left_index=True,
               right_index=True,
               suffixes=('', '_e'))
        .assign(
            Non_Vod_change=lambda x: x.iloc[:, 3] - x.iloc[:,0],
            Part_change=lambda x: x.iloc[:, 4] - x.iloc[:, 1],
            Vod_change=lambda x: x.iloc[:, 5] - x.iloc[:, 2],
            Non_Vod_ptot=lambda x: x.iloc[:,3]/(x.iloc[:,0]),
            Part_ptot=lambda x: x.iloc[:,4]/(x.iloc[:, 1]),
            Vod_ptot=lambda x: x.iloc[:,5]/(x.iloc[:, 2]),
            
        )
        )
columns=[
    ('All','Non_Vodafone'),
    ('All','Partner'),
    ('All', 'Vodafone'),
    ('Effective','Non_Vodafone_e'),
    ('Effective','Partner_e'),
    ('Effective', 'Vodafone_e'),
    ('Diff','Non_Vod_change'),
    ('Diff','Part_change'),
    ('Diff', 'Vod_change'),
    ('perc','Non_Vod_ptot'),
    ('perc','Part_ptot'),
    ('perc', 'Vod_ptot')]
all_.columns=pd.MultiIndex.from_tuples(columns)
all_

Unnamed: 0_level_0,All,All,All,Effective,Effective,Effective,Diff,Diff,Diff,perc,perc,perc
Unnamed: 0_level_1,Non_Vodafone,Partner,Vodafone,Non_Vodafone_e,Partner_e,Vodafone_e,Non_Vod_change,Part_change,Vod_change,Non_Vod_ptot,Part_ptot,Vod_ptot
Inbound_Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2
Americas,16,5,0,1,0,0,-15,-5,0,0.0625,0.0,
Asia Pacific,12,5,2,1,1,1,-11,-4,-1,0.083333,0.2,0.5
Central Europe,6,10,4,3,8,4,-3,-2,0,0.5,0.8,1.0
Middle East & Africa,38,5,8,0,1,6,-38,-4,-2,0.0,0.2,0.75
Northern Europe,1,7,3,0,6,2,-1,-1,-1,0.0,0.857143,0.666667
Southern Europe,1,1,7,0,0,7,-1,-1,0,0.0,0.0,1.0


### Dataset overview

**Region/Countries**

- Number of Countries: 130
- Among the 130 countries, 41 have an Addressable Market larger than the revenue Or positive revenue
    - 16 countries are available in Americas but 6% of them are active
    - 80% and 86% of the partner countries in Central and Northern Europe are included in the analysis
    - All the Central and Southern Europe Vodafone countries are included
    - 67% of the Northern Europe Vodafone countries have a positive revenue or a TAM larger than the revenue


In [8]:
all_.iloc[:, -3:].fillna(0).style.format("{:.0%}")

Unnamed: 0_level_0,perc,perc,perc
Unnamed: 0_level_1,Non_Vod_ptot,Part_ptot,Vod_ptot
Inbound_Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2
Americas,6%,0%,0%
Asia Pacific,8%,20%,50%
Central Europe,50%,80%,100%
Middle East & Africa,0%,20%,75%
Northern Europe,0%,86%,67%
Southern Europe,0%,0%,100%


In [9]:
all_.iloc[:, :-3].style.bar(subset=[('Diff',
                        'Non_Vod_change'),
                       ('Diff',
                        'Part_change'),
                       ('Diff',
                        'Vod_change'),
                      ],
               align='mid', color=['#d65f5f', '#5fba7d'])

Unnamed: 0_level_0,All,All,All,Effective,Effective,Effective,Diff,Diff,Diff
Unnamed: 0_level_1,Non_Vodafone,Partner,Vodafone,Non_Vodafone_e,Partner_e,Vodafone_e,Non_Vod_change,Part_change,Vod_change
Inbound_Region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
Americas,16,5,0,1,0,0,-15,-5,0
Asia Pacific,12,5,2,1,1,1,-11,-4,-1
Central Europe,6,10,4,3,8,4,-3,-2,0
Middle East & Africa,38,5,8,0,1,6,-38,-4,-2
Northern Europe,1,7,3,0,6,2,-1,-1,-1
Southern Europe,1,1,7,0,0,7,-1,-1,0


In [10]:
df_final.index.get_level_values('Country_name').nunique()

130

In [11]:
### Make table 2
#### Total customers
tc = (df_final
 #.loc[lambda x: #(x['Mobility_vodafone'] > 0) &
                 #(x['Mobility_potential'] < x['Mobility_vodafone']) 
     #& (x.index.get_level_values('Customer_Name'))
 #    ]
 .sort_values(by = ['Inbound_Region',
 'Customer_Name'])
 .reset_index()
 .reindex(columns = ['Customer_Name',
                     'Inbound_Region'
                    ]
         )
 .groupby('Inbound_Region')
 .nunique()
 .drop(columns = 'Inbound_Region')
 .sort_values(by = 'Customer_Name')
 .rename(columns = {'Customer_Name':'total_customer'})
 #.unstack()
)
### Total active customers
tac = (df_final
 .loc[lambda x: (x['Mobility_vodafone'] > 0) 
                # &(x['Mobility_potential'] < x['Mobility_vodafone']) 
     #& (x.index.get_level_values('Customer_Name'))
     ]
 .sort_values(by = ['Inbound_Region',
 'Customer_Name'])
 .reset_index()
 .reindex(columns = ['Customer_Name',
                     'Inbound_Region'
                    ]
         )
 .groupby('Inbound_Region')
 .nunique()
 .drop(columns = 'Inbound_Region')
 .sort_values(by = 'Customer_Name')
 .rename(columns = {'Customer_Name':'total_active_customer'})
)
#### Total customers with AM > Revenue
tec = (df_final
 .loc[lambda x: (x['Mobility_vodafone'] > 0) &
                (x['Mobility_potential'] > x['Mobility_vodafone']) 
     ]
 .sort_values(by = ['Inbound_Region',
 'Customer_Name'])
 .reset_index()
 .reindex(columns = ['Customer_Name',
                     'Inbound_Region'
                    ]
         )
 .groupby('Inbound_Region')
 .nunique()
 .drop(columns = 'Inbound_Region')
 .sort_values(by = 'Customer_Name')
 .rename(columns = {'Customer_Name':'total_effective_customer'})
 #.unstack()
)
#### Final list countries
teco = (df_final
 .loc[lambda x: (x['Mobility_vodafone'] > 0) &
                (x['Mobility_potential'] > x['Mobility_vodafone']) 
     ]
 .sort_values(by = ['Inbound_Region',
 'Country_name'])
 .reset_index()
 .reindex(columns = ['Country_name',
                     'Inbound_Region'
                    ]
         )
 .groupby('Inbound_Region')
 .nunique()
 .drop(columns = 'Inbound_Region')
 .sort_values(by = 'Country_name')
 .rename(columns = {'Country_name':'total_effective_country'})
 #.unstack()
)

**Customers**

- Number of Customers: 1327
    - among them, 1189 are active (ie revenue > 0)
    - There are 71% of the customers who have at least for one country a revenue larger than the AM

In [12]:
pd.concat([tc, tac, tec, teco],
          axis = 1, sort=False)#.to_csv("active_cust_region.csv")

Unnamed: 0,total_customer,total_active_customer,total_effective_customer,total_effective_country
Middle East & Africa,734,640,450,7
Asia Pacific,961,390,232,3
Americas,986,5,4,1
Southern Europe,1085,977,845,7
Central Europe,1099,969,796,15
Northern Europe,1115,879,777,8


In [13]:
df_final.index.get_level_values('Customer_Name').nunique()

1327

In [14]:
(df_final
 .groupby(level = 0)['Mobility_vodafone']
 .sum()
 .loc[lambda x : x>0]
 .index
 .nunique()
)

1189

In [15]:
### Percentage customer with revenue > AM
(df_final
 .loc[lambda x: #(x['Mobility_vodafone'] > 0) &
                 (x['Mobility_potential'] < x['Mobility_vodafone']) 
     #& (x.index.get_level_values('Customer_Name'))
     ]
 .sort_values(by = ['Customer_Name'])
 .reset_index()
 .reindex(columns = ['Customer_Name',
                     #'Country_name'
                    ]
         )
 .drop_duplicates(subset = ['Customer_Name', 
                           # 'Country_name'
                           ]
                 )
 .shape[0]/ df_final.index.get_level_values('Customer_Name').nunique()
)

0.7121326299924642

In [16]:
(df_final
 .loc[lambda x: (x['Mobility_vodafone'] > 0) &
                (x['Mobility_potential'] > x['Mobility_vodafone']) &
      (x.index.get_level_values('Inbound_Region').isin(['Americas']))
     ]
)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Unnamed: 4_level_0,Unnamed: 5_level_0,Unnamed: 6_level_0,Unnamed: 7_level_0,Unnamed: 8_level_0,Unnamed: 9_level_0,Mobility_vodafone,Mobility_potential,Whitespace
Customer_Name,Country_name,IncomeGroup,WB,Languages,English,French,Partnership,Inbound_Region,industry,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Customer 109,United States,High income,USA,English,1,0,Non_Vodafone,Americas,Healthcare,5190.03,2609238.0,2604048.0
Customer 999,United States,High income,USA,English,1,0,Non_Vodafone,Americas,Healthcare,186.141429,987806.1,987620.0
Customer 23,United States,High income,USA,English,1,0,Non_Vodafone,Americas,Technology,154984.894286,1797973.0,1642988.0
Customer 10,United States,High income,USA,English,1,0,Non_Vodafone,Americas,Manufacturing,43984.428571,9080424.0,9036440.0


In [None]:
#sns.distplot((df_final
# .groupby(level = 0)['Mobility_vodafone']
# .sum()
# .loc[lambda x : x>0]
#))

In [None]:
#sns.distplot((df_final
# .groupby(level = 0)['Mobility_potential']
# .sum()
# .loc[lambda x : x>0]
#)
#            )

### Definition

In the table below, we summarize the data by country. For each country in the dataset, we compute the:

- `nb_customers`: Number of active customers in country `c`
- `nb_industries`: Number of industries in country `c` 
- `sum_revenue`: Sum of revenue in country `c`
- `rank_revenue`: Worldwide rank of country `c` (descending order) in term of revenue
- `sum_AM`: Total Addressable market (ie market size for the customers) in country `c`
- `rank_AM`: Worldwide rank of country `c` (descending order) in term of potential
- `sum_whitespace`: Sum of whitespace (sum_AM - sum_revenue) in country `c`
- `SoW`: Share of Wallet (sum_revenue / sum_AM) in country `c`
- `penetration_rate`: Penetration rate (sum_AM / sum_revenue) in country `c`
- `rank_penetration`: Worldwide rank of country `c` (descending order) in term of penetration
- `avg_spent`: Average spend at the customer level in country `c`
- `avg_AM`: Average Addressable market at the customer level in country `c`

The next bunch of variables focuses on the empirical distribution of revenue, Addressable market and whitespace. By default, the table compares the top 10% with the remaining 90%

- `rank_customers`: Descending rank of customer in term of revenue: ie larger customer in country `c` will have rank equal to 1
- `revenue_cumsum_perc`: Total cumulated revenue (descending order) 
- `AM_cumsum_perc`: Total cumulated Addressable market (descending order) 
- `whitespace_top`: Total cumulated whitespace of the top 10% customers
- `whitespace_bottom`: Total cumulated whitespace of the bottom 90% customers
- `bottom_top_ratio`: whitespace_bottom/whitespace_top. If larger than 1, it indicates bottom 90% has larger potential than top 10%


# Worldwide description of the market

**Germany is the largest country in the world with an enormous Addressable Market**

## Top 3 countries 

- Germany, UK and Italy are the top three countries in term of revenue.
- Germany has a consequent Addressable Market. More than 2 times larger than Italy
- Italy has captured more addressable market than Germany or the UK
- The top 10% of the customers in Germany represents 68% of the total sales, and gathers 54% of the TAM.
- The top 10% customers in the UK, however, has respectively 58% and 30 of the total revenue and TAM


<span style="font-size:smaller;">*Note*: we exclude Slovak Republic since there is only one customer</span>

In [None]:
world_data = (vd.country_aggregation(df=df_final,
                                    grouping=['Country_name',
                                              'IncomeGroup',
                                              'WB',
                                              'Partnership'],
                                    exclude_country=['Slovak Republic'],
                                    slice_country=None,
                                    topN=.1)
             .droplevel(level = 1)
             )
vd.tableStyteGlobal(df = world_data, n =3)

- **total_customer**: Unique number of customers
- **total_active_customer**: Unique number of active customers (revenue > 0)
- **total_effective_customer**: Unique number of active customers (revenue > 0 & revenue < AM)
- **total_effective_country**: Unique number of active country (revenue > 0 & revenue < AM)

## Partner countries

**Swiss is the country to go, with an Addressable Market of €3M, and a mere 3% Share of Wallet**

- The top three partners countries are Swiss, France and Belgium
- Swiss has the biggest Addressable Market in the world: €295,663,000. Twice larger than France
- The Swiss customers have addressed  3% of their market while Belgium reached 15%
- Belgium is the sixth largest country in revenue but the 10th in term of Addressable Market
- France has a very skewed distribution toward the top 10% customers. 
    - These customers account for 75% of the revenue, for 62% of the TAM

In [None]:
vd.tableStyteGlobal(df = world_data.sort_values(by = 'rank_revenue').xs('Partner', level = 'Partnership'),
                    n =3)

In [None]:
### Europe Revenue
a = world_data.sort_values(by = 'sum_revenue', ascending = False).loc[
    lambda x: 
               (~x.index.get_level_values('Country_name').isin(
                  ['South Africa', 'new Zealand']))
                  ].head(10)['sum_revenue'].sum()
'€{0:,.0f}'.format(a)

In [None]:
a

## Worldwide revenue

**Europe has a Revenue of €83,519,125**


In [None]:
fig = px.choropleth(world_data.reset_index(),
                    locations="WB",
                    color="sum_revenue",
                    hover_name="Partnership",
                    title = 'World wide operating revenues')

#fig.layout.autosize = True
fig.layout.width = 800
fig.layout.height = 600

fig.show()

In [None]:
fig = px.choropleth(world_data.reset_index(),
                    locations="WB",
                    color="rank_revenue",
                    hover_name="Partnership",
                    title = 'World wide operating revenues, country ranking')#.show()

#fig.layout.autosize = True
fig.layout.width = 900
fig.layout.height = 600

fig.show()

## Worldwide Whitespace

**Europe has a whitespace of €1,058,114,117**

- The largest potential for Vodafone is in Europe. 
    - The top ten countries (excluding Russia, Singapore and Turkey) are all located in Europe, with a total whitespace of **€1,058,114,117**
- Revenue (absolute value or ranking) is very correlated with the whitespace (absolute value or ranking)
    - The Pearson correlation is equal to 40% (36% for the Whitespace)

In [None]:
a = world_data.loc[lambda x: 
               (x['rank_whitespace'] <= 12) 
                &
              (~x.index.get_level_values('Country_name').isin(
                  [
                      #'The Russian Federation', 
                   'Singapore', 
                   'Turkey']
                   )
                   )
                  ]['sum_whitespace']#.sum()
#'€{0:,.0f}'.format(a)

In [None]:
a

In [None]:
df_final.loc[lambda x: (x['Mobility_vodafone'] > 0)
                & (x['Mobility_potential'] > x['Mobility_vodafone'])].corr()

In [None]:
fig = px.choropleth(world_data.reset_index(),
                    locations="WB",
                    color="rank_whitespace",
                    hover_name="Partnership",
                    title = 'Vodafone Whitespace within countries, country ranking')#.show()

#fig.layout.autosize = True
fig.layout.width = 900
fig.layout.height = 600

fig.show()

# French Market 

**The French market has large potential (€155,506,155) and its the second largest partner country**

## Brief words about French market

- The French market has 342 active customers spread across 14 industries. 
- These customers get 4.4% of their Addressable Market 
- France is ranked 5th in the world in term of revenue, and is the second largest partner market behind Switzerland.
- It ranks 4th in the world for its Addressable Market.
- The average spend by customer is about €19,850, and with an average Addressable Market of €454,696.
- The top 10% (34 customers) of the customers generate 75% of the revenue and gather 62% of the Addressable Market. 
   - In absolute value, the top 10% has a whitespace of €91,228,746 while the bottom 90% has a whitespace of €57,488,543

In comparison, the Swiss market is less skewed. The top 10% generates 60% of the revenue for a total potential around 37%. The potential for the remaining customers is 1.7 largers than the top 10%.

In [None]:
bottom =(vd.customer_TopBottomCustomers(df = df_final,
                              country = ['France'],
                              slice_='Bottom_customers',
                              describe = True,
                              begin = None,
                              end= None)
        #.sort_values(by='rank_revenue')
        .reindex(columns=['sum_revenue',
                         'sum_AM',
                         'SoW'
                         ]) 
)

top =(vd.customer_TopBottomCustomers(df = df_final,
                              country = ['France'],
                              slice_='Top_customers',
                              describe = True,
                              begin = None,
                              end= None)
        #.sort_values(by='rank_revenue')
        .reindex(columns=['sum_revenue',
                         'sum_AM',
                         'SoW'
                         ]) 
)

for n in ['Top', 'Bottom']:
    columns=[(n,'sum_revenue'),
             (n,'sum_AM'),
             (n, 'SoW')]
    if n == 'Top':
        top.columns=pd.MultiIndex.from_tuples(columns)
    else:
        bottom.columns=pd.MultiIndex.from_tuples(columns)
topbottom = pd.concat([top, bottom], axis = 1)

In [None]:
vd.tableStyteGlobal(topbottom.iloc[:, :3],n=None, method_=2)

**The French market is highly skewed toward its best customers**

- The minimum revenue generated by the top customers is €35,327 and a  maximum of €784,864. 
- The median is around €90.000
- The Addressable Market of the top customer is below €1.3M in 75% of the case. 
    - The Share of Wallet is also high for the top customers
- Half of the bottom customers (ie 308 customers) do not have a revenue larger than €2K. 77 customers have a revenue higher than €7K with a maximum of €35,319
- 77 customers have an Addressable Market between €62,625 to €19,297,185	while half of the 308 customers' Addressable Market does not exceed €22,204



In [None]:
vd.tableStyteGlobal(df = topbottom, n=None, method_=2)

In [None]:
vd.goStyle((
    vd.customer_TopBottomCustomers(df = df_final,
                              country = ['France'],
                              slice_='Bottom_customers',
                              describe = False,
                              begin = None,
                              end= None)
    .reindex(columns = ['sum_revenue', 'rank_customers', 
                       'sum_AM', 'rank_AM', 'sum_whitespace',
                       'SoW'])
    ),
        global_ = True)

## French market sectors opportunities

As mentioned earlier, the French market as 342 customers spread across 14 industries. 

There are few industries with potential:

- `Media & Entertainment`
- `Construction`
- `Transportation & Logistics`
- `Energy and utilities`
- `Consumer Goods`
- `Energy and utilities`
- `Financial Services`
- `Manufacturing`

In [None]:
indu = vd.country_aggregation(df=df_final,
                              grouping=['Country_name', 'industry'],
                              exclude_country=None,
                              slice_country=['France'],
                              styling=False)
vd.goStyle((indu
            .reindex(columns=['nb_customers',
                              'sum_revenue',
                              'sum_AM',
                              'sum_whitespace',
                              'SoW',
                              'rank_customers',
                              'whitespace_top',
                              'whitespace_bottom'])

            ), global_=True
           )

In [None]:
5362 + 115950 + 392795 + 552975 + 703034 + 1106430 + 1323289

In [None]:
922200 + 2098781 + 40237184 + 44234466 + 14591129 + 22462724 + 11204690

In [None]:
(4199835/135751174) *100

###  `Media & Entertainment`

**`Media & Entertainment` has a promising customer with €849,611.39 AM** 

- `Media & Entertainment` has respectively 4 customers. 
- The Addressable Market for `Media & Entertainment` is concentrated in one player only, customer 360. 
    - This customer concentrates 92% of the TAM with a value of €849,611. 
    - The revenue generated by customer 360 is almost negligible (€3,231).

In [None]:
vd.choose_customer(df = df_final,
                   country = ['France'],
                   sector = ['Media & Entertainment'],
                  index_minim = True)

### `Construction`

**`Construction` has a total potential of €2,098,781 with one customer worth  €1,808,810** 

- Vodafone does not have a large pool of customers in Construction
    - One of its customer concentrates 44% of the revenue and 86% of the TAM, with a whitespace of €1,757,791 
    - The other customers can be ignored in this industry.

In [None]:
vd.choose_customer(df = df_final,
                   country = ['France'],
                   sector = ['Construction'],
                  index_minim = True)

### `Transportation & Logistics`

**The industry to go, with one of the largest TAM in the world**


- `Transportation & Logistics` in France has only 11 customers but with a TAM among the largest in the World. 
- These 11 customers have one of the lowest Share of Wallet among all the other industries. 
- In fact, just `Media & Entertainment` does worth than `Transportation & Logistics`. 
- The average  TAM for `Transportation & Logistics` is €3,657,925 
    - The market is in fact concentrated among 3 customers (670, 390 and  753).
    - The revenue generated by customer 753 accounts for 54% of the total TAM and has 48%'s potential.
    - These top three customers have a total whitespace of €39,897,744
    
![](https://drive.google.com/uc?export=view&id=1_Y0FHkfAwZ0uJikdVj-uQzJ3bt4SO0cB)

In [None]:
vd.choose_customer(df = df_final,
                   country = ['France'],
                   sector = ['Transportation & Logistics'],
                  index_minim = True)

In [None]:
vd.goStyle((indu
            .reindex(columns=['nb_customers',
                              'sum_revenue',
                              'sum_AM',
                              'sum_whitespace',
                              'SoW',
                              'rank_customers',
                              'whitespace_top',
                              'whitespace_bottom'])

            ), global_=True
           )

### `Energy and utilities`

**Two customers are responsible  for 81.3% of the revenue 97.2% of the AM**

- The TAM is €44,234,465, among the largest in the world, while the share of wallet is about 1.25%. 
    - The total revenue is merely €552,975.
- Two customers own almost all the Addressable Market. 
    - Customer 454 and 808 have 81% of the revenue and 97% of the TAM. 
    - More precisely, customer 808 itself concentrates €42,006,220 of the TAM.  

In [None]:
vd.choose_customer(df = df_final,
                   country = ['France'],
                   sector = ['Energy & Utilities'],
                  index_minim = True)

In [None]:
vd.goStyle((indu
            .reindex(columns=['nb_customers',
                              'sum_revenue',
                              'sum_AM',
                              'sum_whitespace',
                              'SoW',
                              'rank_customers',
                              'whitespace_top',
                              'whitespace_bottom'])

            ), global_=True
           )

### `Consumer Goods`

**It is one of the most represented industry in France with 69 customers**

- `Consumer Goods` has generated €703,034 of revenue for a total Addressable Market of €14,591,129, with a reasonable SoW (4.8%, ie national average)
- Three customers generate 60% of the revenue, but provide only 30% of the TAM. 
    - The remaining 66 customers can address €2,204,956, it's 3.6 times larger than the top 3.
    - The average AM in this industry is €114,578
- Customers 927, 1007, 329, 627, 270 look promising

In [None]:
vd.choose_customer(df = df_final,
                   country = ['France'],
                   sector = ['Consumer Goods'],
                  index_minim = True)

In [None]:
vd.goStyle((indu
            .reindex(columns=['nb_customers',
                              'sum_revenue',
                              'sum_AM',
                              'sum_whitespace',
                              'SoW',
                              'rank_customers',
                              'whitespace_top',
                              'whitespace_bottom'])

            ), global_=True
           )

### `Financial Services`

**`Financial Services` is a crucial industry for Vodafone, and its second largest**

- 29 customers generate €1,106,430 revenue with €21,356,294 to go.
    - The SoW is in national average (4.9%)
- The average AM is the highest in France (excluded the two industries with outliers): €774,577
- 3 customers own 72.8%	of the revenue and 77.2% of the TAM.
    - In absolute value, these three customers have an extra €16,528,887 to go.
- The top 5 customers have all about €1M of AM and 4 of them already spend more than 100K of revenue

In [None]:
vd.choose_customer(df = df_final,
                   country = ['France'],
                   sector = ['Financial Services'],
                  index_minim = True)

In [None]:
vd.goStyle((indu
            .reindex(columns=['nb_customers',
                              'sum_revenue',
                              'sum_AM',
                              'sum_whitespace',
                              'SoW',
                              'rank_customers',
                              'whitespace_top',
                              'whitespace_bottom'])

            ), global_=True
           )

### `Manufacturing`

**The Largest industry with a very high SoW: 12%**

- €1,323,289 of revenue for `Manufacturing` and €11,204,690 of Addressable Market
    - TAM is twice smaller than `Financial Services`
- 7 customers encompass 81% of the revenue but provide as little as 30% Addressable Market (€3,233,139 against €6,648,262)
- Customer 754, 213, 245 have more than €1M of Addressable Market

In [None]:
vd.choose_customer(df = df_final,
                   country = ['France'],
                   sector = ['Manufacturing'],
                  index_minim = True)

In [None]:
vd.goStyle((indu
            .reindex(columns=['nb_customers',
                              'sum_revenue',
                              'sum_AM',
                              'sum_whitespace',
                              'SoW',
                              'rank_customers',
                              'whitespace_top',
                              'whitespace_bottom'])

            ), global_=True
           )

## Co-integrated market analysis

**What other countries can help the French team?**

- Focus on the performance of a customer in France knowing that it can operate in different other countries.
    - For instance, we know that there are 168 customers in Swiss and 342 in France. 
    - Among the 168 customers in the Swiss market, 85 customers have both a footprint in France and Switzerland. 
- Calculate the performance of those 85 customers in France, knowing they operate in Swiss (ie partner).
- Care about the potential in France and leverage the knowledge about those customers from the Swiss team. 
- Extrapolate this example to all the 13 partner countries where the 342 French customers also operate
    - Look at the distribution of revenue, market share, potential and so on for the customers with a footprint in France and a partner market.
    
- Focus on **Belgium, Swiss and Austria**

In [None]:
french_customers_a_b_partner = vd.country_a_potential_operating_b(df=df_final,
                                                                  grouping=["country__B",
                                                                            "WB__B",
                                                                            "Partnership__B",
                                                                            "IncomeGroup__B"],
                                                                  country='France',
                                                                  exclude_country=None,
                                                                  partnership='Partner',
                                                                  styling=False,
                                                                  customer_only=False)

vd.goStyle((french_customers_a_b_partner
            .loc[lambda x: ~
                 x.index.get_level_values('country__B').isin(['France'])]
            .sort_values(by='sum_revenue_A', ascending = False)
            .drop(columns=['rank_revenue_A',
                           'rank_potential_A',
                           'rank_whitespace_A',
                           'rank_penetration_A',
                           'rank_revenue_B',
                           'rank_potential_B',
                           'rank_whitespace_B',
                           'sum_whitespace_A',
                           'penetration_rate_A',
                           'penetration_rate_B',
                           'rank_penetration_rate_B',
                           'bottom_top_ratio',
                           'avg_spent_A',
                           'avg_AM_A',
                           'rank_customers',
                           ])
            .reset_index(['WB__B', 'Partnership__B'], drop = True)
            ), global_=False)

In [None]:
fig = px.choropleth(
    (french_customers_a_b_partner
     .loc[lambda x: ~ 
                 x.index.get_level_values('country__B').isin(['France'])]
     .reset_index()
    ),
                    locations="WB__B",
                    color="sum_revenue_A",
                    hover_name="nb_customers_B",
                    title = 'World wide operating revenues from customers operating in France,\
                    with footprint in partners')

#fig.layout.autosize = True
fig.layout.width = 900
fig.layout.height = 600

fig.show()

- The 17 customers with both a footprint in France and Croatia have an Addressable Market share difference of 40 percentage points. 
- These 17 in Croatia own about half their TAM. The bottom 90% (or 16 customers) have a whitespace of €3,706,489 in France. 
- A more striking pictures come from Belgium. 
    - Customers in both France and Belgium have an Addressable Market difference of about 18 percentage points. 
    - The top 10% (11 customers) own 69% of their Addressable Market and gather 66% of the potential. However, the bottom 90% still represents a whitespace of 2.6M€

In [None]:
vd.plot_marketshare_diff(df=(french_customers_a_b_partner
                             .loc[lambda x: ~ 
                                  x.index.get_level_values('country__B').isin(['France'])]),
                         partnership='Partner')

In [None]:
vd.goStyle((french_customers_a_b_partner
 .loc[lambda x :x.index.get_level_values('country__B').isin([
                                                             'Belgium',
                                                            'Austria',
                                                             #'France',
                                                            'Switzerland'])]
), global_ = False)

### `Belgium`

- Belgium is so far the largest partner countries for France. 
    - 33% of the French customers also operate in Belgium. 
    - These 113 customers generate €3,319,750 sales in France (€3,411,901 in Belgium)
    - Their Addressable Market in France is about €80,397,269 
    - The share of wallet is 4% in France. 
    - Among this 113 customers, the top 10 percent (11 customers) generate 69% of the sales and get 66% of the Addressable Market, or €50,820,943 whitespace.
    - All of these 11 customers have a potential larger than 1M€. 
        - If we exclude customer 808 (potential of €42,006,220), we are left with customer 1201, 754 and 942 with a whitespace of respectively  €5,375,628, €2,615,758 and €2,360,387. 

In [None]:
df_model.loc[lambda x: x.index.get_level_values('Customer_Name').isin(['Customer 808'])][tokepp]

In [None]:
vd.choose_customer_A_B(df = df_final,
                    countryA = 'France',
                    countryB= ['Belgium'],
                    partner = 'Partner',
                    index_minim = True, 
                       head = 10
                    )

In [None]:
### 75 french-belgium customers
vd.choose_customer_A_B(df = df_final,
                    countryA = 'France',
                    countryB= ['Belgium'],
                    partner = 'Partner',
                    index_minim = True, 
                       head = 113
                    )

In [None]:
vd.goStyle((french_customers_a_b_partner
 .loc[lambda x :x.index.get_level_values('country__B').isin([
                                                             'Belgium',
                                                             'France'
 ])]
            .drop(columns=['rank_revenue_A',
                           'rank_potential_A',
                           'rank_whitespace_A',
                           'rank_penetration_A',
                           'rank_revenue_B',
                           'rank_potential_B',
                           'rank_whitespace_B',
                           'sum_whitespace_A',
                           'penetration_rate_A',
                           'penetration_rate_B',
                           'rank_penetration_rate_B',
                           'bottom_top_ratio',
                           'avg_spent_A',
                           'avg_AM_A',
                           'rank_customers',
                           ])
            .reset_index(['Partnership__B', 'IncomeGroup__B'], drop = True)
), global_ = False)

- Customer 808 is one of the largest customer in France, and no other countries have higher sales than in France. 

- Customer 1201, however, is a large customer in Belgium. 
- In fact, Belgium is its the third largest market, behind France and Germany. It's in top 15% customers in Beligum and almost reach full potential (about 86%)

### `Swiss`

- Swiss is the second largest partner countries for France. 
    - 25% of the French  customers also operate in Swiss. 
    - These 85 customers generate €2,478,763 sales in France (€6,352,928 in Swiss)
    - Their Addressable market is about €21,833,802. 
    - The share of wallet is 11% (higher than the Swiss team). 
    - Among this 85 customers, the top 10 percent (8 customers) generate 67% of the sales and get 30% of the Addressable Market, or €4,905,616 whitespace.
    - All of these 5 over 11 customers have a potential larger than 1M€. Customer 942, 200 and 46 with a whitespace of respectively  €2,360,387,€1,588,472 and €1,874,344


In [None]:
### 75 french-swiss customers
vd.choose_customer_A_B(df = df_final,
                    countryA = 'France',
                    countryB= ['Switzerland'],
                    partner = 'Partner',
                    index_minim = True,
                        head = 113
                    )

In [None]:
(vd.country_B_team_contact(df = df_final,
                          customer = ["Customer 942"],
                          countryA='France',
                          countryB=None,
                          styling=False)
 .drop(columns = ['rank_customers_A', 'rank_customers_B',
                 'potential_rank_A', 'potential_rank_B',
                 #'Mobility_potential_A',
                  'nb_customers_A',
                 'nb_customers_B', 'potential_rank_perc_A',
                 'potential_rank_perc_B',
                  'revenue_rank_perc_A',
                 'revenue_rank_perc_B'])
 .reset_index(['IncomeGroup__B',
               'industry',
              'Inbound_Region',
              'Customer_Name',
              'Partnership__B'],
             drop = True)
 .style
                   .bar(subset=[
                       #'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'SoW_B'
                       #'Mobility_potential_A'
                   ],
                       align='mid',
                       color=['#d65f5f', '#5fba7d'])
                   .format('€{0:,.0f}', subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'
                   ])
                   .format("{:.1%}", subset=['revenue_rank_perc_A',
                                             'revenue_rank_perc_B',
                                             'revenue_perc_rank_B',
                                             'potential_rank_perc_A',
                                             'penetration_rate_B',
                                             'potential_rank_perc_B',
                                             'SoW_A',
                                             'SoW_B'])
                   )
 

In [None]:
vd.goStyle((french_customers_a_b_partner
 .loc[lambda x :x.index.get_level_values('country__B').isin([
                                                             'Switzerland',
                                                             'France'
 ])]
), global_ = False)

The french team can leverage the knowledge of the Swiss team about customer 942. The total revenue for this customer in Swiss is about €117,003.34, or (11% of the Addressable market) while in France is merely €29,870.43

In [None]:
### 85 french-swiss customers
#vd.choose_customer_A_B(df = df_final,
#                    countryA = 'France',
#                    countryB= ['Switzerland'],
#                    partner = 'Partner')

### `Austria`

- Austria is the third largest partner countries for France. 
    - 75 customers generate €2,163,478 sales in France (€1,331,694 in Austria)
    - Their Addressable market is about €15,058,456. 
    - The share of wallet is 14% (2 times lower than the Austrian market). 
    - Among this 75 customers, the top 10 percent (8 customers) generate 75% of the sales and get 42.1% of the Addressable market, or €4,710,339 whitespace.
    - All of these 5 over 8 customers have a potential larger than 1M€. Customer 200, 81 and 244 with a whitespace of respectively  €1,588,472, €1,458,244 and €1,271,207

In [None]:
### 75 french-swiss customers
vd.choose_customer_A_B(df = df_final,
                    countryA = 'France',
                    countryB= ['Austria'],
                    partner = 'Partner',
                    index_minim = True,
                    head = 113
                    )

In [None]:
(vd.country_B_team_contact(df = df_final,
                          customer = ["Customer 200"],
                          countryA='France',
                          countryB=None,
                          styling=False)
 .drop(columns = ['rank_customers_A', 'rank_customers_B',
                 'potential_rank_A', 'potential_rank_B',
                 #'Mobility_potential_A',
                  'nb_customers_A',
                 'nb_customers_B', 'potential_rank_perc_A',
                 'potential_rank_perc_B',
                  'revenue_rank_perc_A',
                 'revenue_rank_perc_B'])
 .reset_index(['IncomeGroup__B',
               'industry',
              'Inbound_Region',
              'Customer_Name',
              'Partnership__B'],
             drop = True)
 .style
                   .bar(subset=[
                       #'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'SoW_B'
                       #'Mobility_potential_A'
                   ],
                       align='mid',
                       color=['#d65f5f', '#5fba7d'])
                   .format('€{0:,.0f}', subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'
                   ])
                   .format("{:.1%}", subset=['revenue_rank_perc_A',
                                             'revenue_rank_perc_B',
                                             'revenue_perc_rank_B',
                                             'potential_rank_perc_A',
                                             'penetration_rate_B',
                                             'potential_rank_perc_B',
                                             'SoW_A',
                                             'SoW_B'])
                   )
 

# French customers analysis

**Predict the revenue from the potential to flag aberrant customers**

- Estimate the expected revenue, knowing the potential.
- The strategy aims at detecting the potential observations that do not fit with the dataset, or say differently points that the model consider as outliers. 
- Using a statistical methodology to detect the outliers, we can flag the customers who actually influence the model predictions.
    - We use the residual to detect abnormal prediction: Predictions that are 1 sd away from the mean residual (0) are flagged. 
- The model also tries to predict the revenue of a customer given a potential. We add the confidence interval of the predictions. 
- Our idea is to generate a linear model between the potential and the revenue, controlling  for the characteristics of the customers and industries within a country. 
    - Then, we extract the revenue's prediction in France
    - The data points that are far from the model predictions (ie, what should be the revenue knowing the performance of a given customer and the perfomance of the industry in a given country)

If the perfomance were linear, we should expect a quasi linear trend between the mobility ranking and the revenue ranking. Literally, if customer A has the 50th potential in France, we should expect a revenue around the 50th position. The figure below shows a slighlty different story. On average, we can see a positive and proportional line but some observations are far from the predicted line, especially for the bottom customers. We note an elbow shape around the 250th customers. 

The model will capture this effect to deliver potentials customers which do not fit the model expectation. One advantage of using a multivariate model is we can control for the performance within a customer and between industry-country.

## Reminder

- Residual:  The difference between the predicted value (based on the regression equation) and the actual, observed value.

- Outlier:  In linear regression, an outlier is an observation with large residual.  In other words, it is an observation whose dependent-variable value is unusual given its value on the predictor variables.  An outlier may indicate a sample peculiarity or may indicate a data entry error or other problem.

- Leverage:  An observation with an extreme value on a predictor variable is a point with high leverage.  Leverage is a measure of how far an independent variable deviates from its mean.  High leverage points can have a great amount of effect on the estimate of regression coefficients.

- Influence:  An observation is said to be influential if removing the observation substantially changes the estimate of the regression coefficients.  Influence can be thought of as the product of leverage and outlierness. 

- Cook’s distance (or Cook’s D): A measure that combines the information of leverage and residual of the observation. 

## Prepare Data

In [None]:
project = 'valid-pagoda-132423'
query = (
    "SELECT Customer_Name, Country_name, IncomeGroup, WB, Languages, "
    "English, French, Partnership, "
    "Inbound_Region, industry, Fixed_vodafone, "
    " Cloud_Hosting_vodafone, IoT_vodafone, "
    "Unified_Comms_vodafone, Mobility_vodafone, Mobility_potential "
    "FROM Business.Vodafone "
    "WHERE Mobility_vodafone > 0 AND Mobility_potential > Mobility_vodafone "
    
)
index = ['Customer_Name',
         'Country_name',
         'IncomeGroup',
         'WB',
         'Languages',
         'English',
         'French',
         'Partnership',
         'Inbound_Region',
         'industry']
df_iden = (gcp
            .upload_data_from_bigquery(query=query,
                                       location='US')
            .set_index(index)
            #.assign(
            #    Whitespace=lambda x: x['Mobility_potential'] - x['Mobility_vodafone'])
            )

In [None]:
df_model = vd.prepare_df_model(df = df_iden,
                               country = 'France',
                               var_to_remove = 'SoW',
                               quantile = 0)
df_model.head()

In [None]:
import statsmodels.api as sm

test = (df_model
        .xs('France', level='Country_name')
        .assign(
            rank_revenue_=lambda x: x['ln_revenue'].rank(ascending=False),
            rank_potential_=lambda x: x['ln_potential'].rank(ascending=False)
        )
        .reindex(columns=['ln_revenue', 'ln_potential',
                              'rank_revenue_', 'rank_potential_'])
        )

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 8), dpi=80)

fig.suptitle('Linear relationship between Potential and Revenue', fontsize=16)
#### plot 1
results = sm.OLS(test['rank_revenue_'],
                 sm.add_constant(test['rank_potential_'])
                ).fit()

ax1.scatter(test['rank_potential_'], test['rank_revenue_'])
X_plot = np.linspace(np.min(test['rank_potential_']),
                     np.max(test['rank_potential_']),100)
ax1.plot(X_plot, X_plot*results.params[1] + results.params[0])
ax1.set_xlabel('Rank of potential')
ax1.set_ylabel('Rank of revenue')

#### plot 2
results = sm.OLS(test['ln_revenue'],
                 sm.add_constant(test['ln_potential'])
                ).fit()

ax2.scatter(test['ln_potential'], test['ln_revenue'])
X_plot = np.linspace(np.min(test['ln_potential']),
                     np.max(test['ln_potential']),100)
ax2.plot(X_plot, X_plot*results.params[1] + results.params[0])
ax2.set_xlabel('Log of potential')
ax2.set_ylabel('Log of revenue')

plt.show()

A first look at the relationship between the potential and revenue show that the model will highlight customers where the revenue lies between €2980 and €162,754

## Bottom customers

**The top customers among who earn more than 10K have a large SoW**

- We already know that Vodafone french's team hunts only the top customers. 
- The French market is highly skewed toward the top customers. 
- Looking at the demography of the top 10% vs the bottom 90%, we can tell the following:
    - the median revenue for the top 10% of the customers is €91k while for the bottom customers, it is €2k. 
    - A large amount of customers with potential starts with revenue above €2k+ 
- The model will focus on those customers which are not in the top 10%.
    - Customer 51 is the largest customer from the bottom 90% of the distribution. It generates €35,319.30 revenues and has an Addressable market of €114,095.23. 
    - Its Addressable market is among the largest in France (top 10% percent). This customer, however, has already reached 30% of its Addressable market. 
- Most of the top customers in the bottom distribution got a large SOW.
- A manual check shows the following customers can be flagged:
    - Customers 670, 81, 227, 244's team hunts only the top customers. 

In [None]:
vd.goStyle((
    vd.customer_TopBottomCustomers(df = df_final,
                              country = ['France'],
                              slice_='Bottom_customers',
                              describe = False,
                              begin = 0,
                              end= 30)
.reindex(columns = ['sum_revenue', 'rank_customers', 
                       'sum_AM', 'rank_AM', 
                        #'sum_whitespace',
                       'SoW'])
    .reset_index(['Country_name', 'IncomeGroup', 'Partnership','Inbound_Region'], drop = True)
    ),
        global_ = True)

In [None]:
vd.goStyle((
    vd.customer_TopBottomCustomers(df = df_final,
                              country = ['France'],
                              slice_='Bottom_customers',
                              describe = False,
                              begin = 0,
                              end= 30)
    .loc[lambda x:x.index.get_level_values('Customer_Name').isin(["Customer 670", "Customer 81",
                                                                  "Customer 627", "Customer 244"])]
    .reindex(columns = ['sum_revenue', 'rank_customers', 
                       'sum_AM', 'rank_AM', 
                        #'sum_whitespace',
                       'SoW'])
    .reset_index(['Country_name', 'IncomeGroup', 'Partnership','Inbound_Region'], drop = True)
    ),
        global_ = True)

In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 244'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

(vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 244'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)
  .drop(columns = ['rank_customers_A', 'rank_customers_B',
                 'potential_rank_A', 'potential_rank_B',
                 'nb_customers_A', 'nb_customers_B',
                   'potential_rank_perc_A','potential_rank_perc_B',
                   'revenue_rank_perc_A',
                   'revenue_rank_perc_B',
                  ])
 .reset_index(['IncomeGroup__B',
               'industry',
              'Inbound_Region',
              'Customer_Name'],
             drop = True)
 .head(5)
 .style
                   .bar(subset=[
                       #'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       #'Mobility_potential_A'
                   ],
                       align='mid',
                       color=['#d65f5f', '#5fba7d'])
                   .format('€{0:,.0f}', subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'
                   ])
                   .format("{:.1%}", subset=[#'revenue_rank_perc_A',
                                             #'revenue_perc_rank_B',
                                             #'revenue_rank_perc_B',
                                             'potential_rank_perc_A',
                                             'penetration_rate_B',
                                             'potential_rank_perc_B',
                                             'SoW_A',
                                             'SoW_B'])
                   )

In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 81'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 81'],
                                 countryA='France',
                                 countryB=None,
                                 styling=True)

In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 227'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 627'],
                                 countryA='France',
                                 countryB=None,
                                 styling=True)

In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 244'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 244'],
                                 countryA='France',
                                 countryB=None,
                                 styling=True)

### Model

In [None]:
manual_selection = [
    'Customer 754',
     'Customer 213',
     'Customer 245',
     'Customer 927',
     'Customer 1007',
     'Customer 329',
     'Customer 627',
     'Customer 270',
     'Customer 454',
     'Customer 808',
     'Customer 942',
     'Customer 81',
     'Customer 627',
     'Customer 244',
    
]

In [None]:
df_model.shape

In [None]:
df_model.head()

In [None]:
customers = (vd.compute_OLS_model(df= df_model,
                                  print_ = True,
                                  method = 'log',
                                  revenue_minimum_pred = 15000)
             
            )
#customers.summary()
#list_cust = customers.index.get_level_values('Customer_Name')
customers

In [None]:
criterion_res = customers['Residual'].mean() - customers['Residual'].std()
revenue_minimum_pred = 15000
customers.loc[lambda x:
            (x["Residual"] <= criterion_res)
            & (np.exp(x["Predicted\nValue"]) >
               revenue_minimum_pred)
            ].index

In [None]:
### Same as scikit
customers.predict(df_model.head(9))


## Target customers with opportunities

**A potential of €253,489 can be leverage from 7 customers**

- The model picked 7 customers with overestimated prediction and predicted revenue above €15K
    - Just go after worthy customers
- If the model predictions are correct, those 7 customers can bring on average €253,489 more revenue to Vodafone, with a minimum of €80,634.
- The customers we picked by hand can increase the sales by €158,880, with at least €52,602

In [None]:
(customers
 .sort_values(by ='revenue_vodafone_pred',
              ascending = False)
 .reindex(columns  = [
     'industry',
     'revenue_vodafone',
     'potential_vodafone',
     'revenue_vodafone_pred',
     'revenue_vodafone_pred_lbd',
     'revenue_vodafone_pred_upb',
     'SoW',
     #'Residual',
     #'rank_mobility',
     #'rank_revenue'
 ])
 .style
 .format('€{0:,.0f}', subset=
     [
     'revenue_vodafone',
     'potential_vodafone',
     'revenue_vodafone_pred',
     'revenue_vodafone_pred_lbd',
     'revenue_vodafone_pred_upb'
               ]
        )
 .format("{:.1%}", subset=['SoW'])
)

In [None]:
(customers
 .sort_values(by ='revenue_vodafone_pred',
              ascending = False)
 .reindex(columns  = [
     'industry',
     'revenue_vodafone',
     'potential_vodafone',
     'revenue_vodafone_pred',
     'revenue_vodafone_pred_lbd',
     'revenue_vodafone_pred_upb',
     'SoW',
     'Residual',
     'rank_mobility',
     'rank_revenue'
 ])
 .sum()
 .to_frame()
 .T
 .assign(
     net_gain = lambda x :x['revenue_vodafone_pred'] - 
     x['revenue_vodafone'],
     lower_net_gain = lambda x :x['revenue_vodafone_pred_lbd'] - 
     x['revenue_vodafone'],
     #highest_net_gain = lambda x :x['revenue_vodafone_pred_lbd'] - 
     #x['revenue_vodafone_pred_upb'],
 )
 .reindex( columns = ['net_gain',
                      'lower_net_gain',
                     #'highest_net_gain'
                     ])
 .style
 .format('€{0:,.0f}', subset=
     ['net_gain',
                      'lower_net_gain',
                     #'highest_net_gain'
     ]
)
)

In [None]:
#for i in manual_selection:
#    if i in list_cust:
#        print(i)

In [None]:
(customers
 .loc[lambda x: (x.index.get_level_values('Customer_Name').isin(manual_selection))
      & (x['Residual'] < 0 )]
 .sort_values(by ='revenue_vodafone_pred',
              ascending = False)
)

In [None]:
(customers
 .loc[lambda x: (x.index.get_level_values('Customer_Name').isin(manual_selection))
      #& (x['Residual'] < 0 )
     ]
 .sort_values(by ='revenue_vodafone_pred',
              ascending = False)
)

In [None]:
(customers
 .loc[lambda x: (x.index.get_level_values('Customer_Name').isin(manual_selection))
      & (x['Residual'] < 0 )]
 .sort_values(by ='revenue_vodafone_pred',
              ascending = False)
 .reindex(columns  = [
     'industry',
     'revenue_vodafone',
     'potential_vodafone',
     'revenue_vodafone_pred',
     'revenue_vodafone_pred_lbd',
     'revenue_vodafone_pred_upb',
     'SoW',
     'Residual',
     'rank_mobility',
     'rank_revenue'
 ])
 .sum()
 .to_frame()
 .T
 .assign(
 net_gain = lambda x :x['revenue_vodafone_pred'] - 
     x['revenue_vodafone'],
     lower_net_gain = lambda x :x['revenue_vodafone_pred_lbd'] - 
     x['revenue_vodafone'],
     #highest_net_gain = lambda x :x['revenue_vodafone_pred_lbd'] - 
     #x['revenue_vodafone_pred_upb'],
 )
 .reindex( columns = ['net_gain',
                      'lower_net_gain',
                     #'highest_net_gain'
                     ])
 .style
 .format('€{0:,.0f}', subset=
     ['net_gain',
                      'lower_net_gain',
                     #'highest_net_gain'
     ]
)
      )

In [None]:
(customers
 .loc[lambda x: (x.index.get_level_values('Customer_Name').isin(
 ['Customer 81', 'Customer 56', 'Customer 213','Customer 372']
 ))
      & (x['Residual'] < 0 )]
 .sort_values(by ='revenue_vodafone_pred',
              ascending = False)
  .reindex(columns  = [
     'industry',
     'revenue_vodafone',
     'potential_vodafone',
     'revenue_vodafone_pred',
     'revenue_vodafone_pred_lbd',
     'revenue_vodafone_pred_upb',
     'SoW',
     #'Residual',
     #'rank_mobility',
     #'rank_revenue'
 ])
 .style
 .format('€{0:,.0f}', subset=
     [
     'revenue_vodafone',
     'potential_vodafone',
     'revenue_vodafone_pred',
     'revenue_vodafone_pred_lbd',
     'revenue_vodafone_pred_upb'
               ]
        )
 .format("{:.1%}", subset=['SoW'])
)

## Which team to leverage

- Pick up 4 customers from the preceding list
    - Customer 81: 
        - Revenue: €24,110
        - Potential: €1,482,354	
        - Predicted: €166,997	
    - Customer 56: 
        - Revenue: €2,669	
        - Potential: €1,482,354	
        - Predicted: €30,840	
    - Customer 213: 
        - Revenue: €5,157	
        - Potential: €1,140,399	
        - Predicted: €21,150	
    - Customer 372: 
        - Revenue: €2,652	
        - Potential: €1,140,399	
        - Predicted: €20,480	

#### Customer 109 ?

**€22,711 revenue in France vs €142,106 un the UK for a SoW of 11.4% and 86.2% respectively**

- Revenue in France for customer 109 is very low, €37.23 but the potential is very large, capping at €1,874,822.98. 
- The performance of Customer 46 in the UK is dramatically  larger than in France, with a revenue reaching €598,524.64 
- This customer is one of the best performing in the UK (rank 4). 

In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 109'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

(vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 109'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)
 .drop(columns = ['rank_customers_A', 'rank_customers_B',
                 'potential_rank_A', 'potential_rank_B',
                 'Mobility_potential_A', 'nb_customers_A',
                 'nb_customers_B', 'potential_rank_perc_A',
                 'potential_rank_perc_B',
                  'revenue_rank_perc_A',
                 'revenue_rank_perc_B'])
 .reset_index(['IncomeGroup__B',
               'industry',
              'Inbound_Region',
              'Customer_Name'],
             drop = True)
 .style
                   .bar(subset=[
                       #'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A',
                   'SoW_B'],
                       align='mid',
                       color=['#d65f5f', '#5fba7d'])
                   .format('€{0:,.0f}', subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'
                   ])
                   .format("{:.1%}", subset=['revenue_rank_perc_A',
                                             'revenue_perc_rank_B',
                                             'revenue_rank_perc_B',
                                             'potential_rank_perc_A',
                                             'penetration_rate_B',
                                             'potential_rank_perc_B',
                                             'SoW_A',
                                             'SoW_B'])
                   )

#### Customer 81

**€24,110 revenue in France with a potential of €1,482,354**

- Revenue in France for customer 81 is reasonable, €24,110 and the potential is very large, capping at €1,482,354. 
- The performance of customer 81 is elevenfold bigger in Belgium than in France. 
    - Customer 81 in Belgium belong to the top 1%
    - Customer 81 is also very far from  the full potential in Belgium, while in Germany, it almost reached its full potential.
    - Swiss also managed very well customer 81, both with revenue and SoW 

In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 81'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

(vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 81'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)
 .drop(columns = ['rank_customers_A', 'rank_customers_B',
                 'potential_rank_A', 'potential_rank_B',
                 'Mobility_potential_A', 'nb_customers_A',
                 'nb_customers_B', 'potential_rank_perc_A',
                 'potential_rank_perc_B',
                  'revenue_rank_perc_A',
                 'revenue_rank_perc_B'])
 .reset_index(['IncomeGroup__B',
               'industry',
              'Inbound_Region',
              'Customer_Name'],
             drop = True)
 .style
                   .bar(subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'],
                       align='mid',
                       color=['#d65f5f', '#5fba7d'])
                   .format('€{0:,.0f}', subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'
                   ])
                   .format("{:.1%}", subset=['revenue_rank_perc_A',
                                             'revenue_perc_rank_B',
                                             'revenue_rank_perc_B',
                                             'potential_rank_perc_A',
                                             'penetration_rate_B',
                                             'potential_rank_perc_B',
                                             'SoW_A',
                                             'SoW_B'])
                   )

In [None]:
df_model.head()

In [None]:
df_model.loc[lambda x: x.index.get_level_values('Customer_Name').isin(['Customer 81'])][tokepp]

In [None]:
df_model.loc[lambda x: 
             (x.index.get_level_values('Country_name').isin(['France'])) &
             (x['industry'].isin(['Financial Services']))
            ].mean()

In [None]:
np.exp(8.887459)

In [None]:
df_model.loc[lambda x: x.index.get_level_values('Customer_Name').isin(['Customer 81'])].mean()

In [None]:
vd.figure_test(df = test,
                 A='Mobility_vodafone_A',
                 B = 'Mobility_vodafone_B',
                 title = 'Vodafone Revenue Difference by Operating Countries for Customer 81',
                 save =True)

In [None]:
vd.figure_test(df = test,
               A = 'revenue_rank_perc_A',
               B= 'revenue_rank_perc_B',
               title = 'Vodafone Revenue ranking Difference by Operating Countries for Customer 81')

In [None]:
vd.figure_test(df = test,
               A = 'SoW_A',
               B= 'SoW_B',
               title = 'Vodafone SoW Difference by Operating Countries for Customer 81')

#### Customer 56

**€2,669 revenue in France vs €89,846 in Italy for a SoW of 0.9% and 25.5% respectively**

- The French market for customer 56 is one of its lowest, with only €2,669, compare to €89,846 in Italy, €67,630 in UK, €67,512	in Belgium and  €63,866	 in Swiss. 
    - The potential in France is not very large compare with the other high income country. 
    - Although, the SoW on the comparable market in significantly higher than France
    - Belgium and Swiss have a SoW of 9% and 10% respectively, which it should be expected similar pattern  for France


In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 56'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

(vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 56'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)
 .drop(columns = ['rank_customers_A', 'rank_customers_B',
                 'potential_rank_A', 'potential_rank_B',
                 'Mobility_potential_A', 'nb_customers_A',
                 'nb_customers_B', 'potential_rank_perc_A',
                 'potential_rank_perc_B',
                  'revenue_rank_perc_A',
                 'revenue_rank_perc_B'])
 .reset_index(['IncomeGroup__B',
               'industry',
              'Inbound_Region',
              'Customer_Name'],
             drop = True)
 .style
                   .bar(subset=[
                       #'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A',
                   'SoW_B'],
                       align='mid',
                       color=['#d65f5f', '#5fba7d'])
                   .format('€{0:,.0f}', subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'
                   ])
                   .format("{:.1%}", subset=['revenue_rank_perc_A',
                                             'revenue_perc_rank_B',
                                             'revenue_rank_perc_B',
                                             'potential_rank_perc_A',
                                             'penetration_rate_B',
                                             'potential_rank_perc_B',
                                             'SoW_A',
                                             'SoW_B'])
                   )

In [None]:
list(df_model)

tokepp = ['Customer_Name',
 'Country_name',
 'industry',
 'Mobility_vodafone',
 'Mobility_potential',
 'Whitespace',
 'ln_revenue',
 'ln_potential',
 'SoW',
 'rank_mobility',
 'rank_revenue',
 'revenue_norm',
 'potential_norm',
 'revenue_std',
 'potential_std',
 'FE_c_i']

In [None]:
df_model.loc[lambda x: x.index.get_level_values('Customer_Name').isin(['Customer 56'])][tokepp]

In [None]:
vd.figure_test(df = test,
               A = 'SoW_A',
               B= 'SoW_B',
               title = 'Vodafone SoW Difference by Operating Countries for Customer 56')

#### Customer 372

**€2,652 revenue in France vs €24,803 in Swiss for a SoW of 0.8% and 15.8% respectively**

- The French market for customer 372 is one of its lowest, with only €2,652	, compare to €127,069 in Italy, €29,522 in UK, and  €24,803	in Swiss. 
    - The potential in France is not very large compare with the other high income country. 
    - Although, the SoW on the comparable market in significantly higher than France
    - Swiss have a SoW of 16%, which it should be expected similar patern for France

In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 372'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

(vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 372'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)
 .drop(columns = ['rank_customers_A', 'rank_customers_B',
                 'potential_rank_A', 'potential_rank_B',
                 'Mobility_potential_A', 'nb_customers_A',
                 'nb_customers_B', 'potential_rank_perc_A',
                 'potential_rank_perc_B',
                  'revenue_rank_perc_A',
                 'revenue_rank_perc_B'])
 .reset_index(['IncomeGroup__B',
               'industry',
              'Inbound_Region',
              'Customer_Name'],
             drop = True)
 .style
                   .bar(subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'],
                       align='mid',
                       color=['#d65f5f', '#5fba7d'])
                   .format('€{0:,.0f}', subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'
                   ])
                   .format("{:.1%}", subset=['revenue_rank_perc_A',
                                             'revenue_perc_rank_B',
                                             'revenue_rank_perc_B',
                                             'potential_rank_perc_A',
                                             'penetration_rate_B',
                                             'potential_rank_perc_B',
                                             'SoW_A',
                                             'SoW_B'])
                   )

In [None]:
24802.879899/156611.451069

In [None]:
df_model.loc[lambda x: x.index.get_level_values('Customer_Name').isin(['Customer 372'])][tokepp]

In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 372'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 372'],
                                 countryA='France',
                                 countryB=None,
                                 styling=True)

In [None]:
vd.figure_test(df = test,
               A = 'SoW_A',
               B= 'SoW_B',
               title = 'Vodafone SoW Difference by Operating Countries for Customer 372',
              save = True)

#### Customer 213

**€5,157 revenue in France while the model predicts a revenue of €21,150**

- The revenue from French market for customer 213 is similar to Belgium (€5,157 vs €5,017). 
    - Belgium reached a SoW of 10% compared with France, .5%. 
    - France has the lowest SoW for customer 213 but a very large AM
    - The model expects more revenue from customer 213 knowing the performance of the manufacturing industries and its performance in the other countries.

In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 213'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 213'],
                                 countryA='France',
                                 countryB=None,
                                 styling=True)

In [None]:
test = vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 213'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)

(vd.country_B_team_contact(df=df_final,
                                 customer=['Customer 213'],
                                 countryA='France',
                                 countryB=None,
                                 styling=False)
 .drop(columns = ['rank_customers_A', 'rank_customers_B',
                 'potential_rank_A', 'potential_rank_B',
                 'Mobility_potential_A', 'nb_customers_A',
                 'nb_customers_B', 'potential_rank_perc_A',
                 'potential_rank_perc_B',
                  'revenue_rank_perc_A',
                 'revenue_rank_perc_B'])
 .reset_index(['IncomeGroup__B',
               'industry',
              'Inbound_Region',
              'Customer_Name'],
             drop = True)
 .style
                   .bar(subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'],
                       align='mid',
                       color=['#d65f5f', '#5fba7d'])
                   .format('€{0:,.0f}', subset=[
                       'Mobility_vodafone_A',
                       'Mobility_vodafone_B',
                       'Mobility_potential_A'
                   ])
                   .format("{:.1%}", subset=['revenue_rank_perc_A',
                                             'revenue_perc_rank_B',
                                             'revenue_rank_perc_B',
                                             'potential_rank_perc_A',
                                             'penetration_rate_B',
                                             'potential_rank_perc_B',
                                             'SoW_A',
                                             'SoW_B'])
                   )

In [None]:
5157.201429 / 0.004522

In [None]:
(21150 /1140469) * 100

In [None]:
0.004522* 100

In [None]:
df_model.loc[lambda x: x.index.get_level_values('Customer_Name').isin(['Customer 213'])][tokepp]

In [None]:
vd.figure_test(df = test,
               A = 'SoW_A',
               B= 'SoW_B',
               title = 'Vodafone SoW Difference by Operating Countries for Customer 213',
              save = True)

In [None]:
customers