## INTENTION OF THIS WORKBOOK

This next pass will:
1. gather a dataframe for the 5 year Ad Age data range

2. fix the revenue discrepancy (1/10 of revenue?) 

3. Add a new 'Type' column, typing into five tiers (Supernova (Big Hold + Consultant), Global (Midmarket), Burgeoning (Contenders), and Independents)

~~4. Simple CAGR on the tiers over 5 years~~

In [1]:
import pandas as pd
import re

In [2]:
raw='/Users/xavier/Documents/src/dataviz/AgencyRevenueModels/adage-data/adage-900_2015.csv'
adage = pd.read_csv(raw)
adage

Unnamed: 0,AGENCY-COMPANY,HEADQUARTERS,2015-REVENUE,% CHG
0,Epsilon [Alliance Data Systems Corp.],"Irving, Texas",1855045,3.5
1,Accenture Interactive [Accenture],New York,1231596,70.8
2,Deloitte Digital* [Deloitte],New York,865200,
3,IBM Interactive Experience* [IBM Corp.],"Armonk, N.Y.",796800,
4,Acxiom Corp.*,"Little Rock, Ark.",765299,7.0
5,Razorfish Global* [Publicis],New York,651665,-1.3
6,PwC Digital Services [PwC],New York,624000,
7,BBDO Worldwide* [Omnicom],New York,602798,4.5
8,SapientNitro* [Publicis],Boston,585000,1.3
9,McCann* [Interpublic],New York,561860,15.9


In [3]:
adage.dtypes

AGENCY-COMPANY     object
HEADQUARTERS       object
2015-REVENUE        int64
% CHG             float64
dtype: object

In [4]:
adage.shape

(915, 4)

In [5]:
# A function to RETURN two variables from AGENCY-COMPANY column
def indySubstituteSimple(x):
    parent = re.compile("\[(.*)\]")
    owned = re.compile("(.+?)\[(.*)\]")
    p = parent.search(x)
    if p:
        o = owned.search(x)
        return o.group(1),o.group(2)
    else:
        return x, "Independent"

In [6]:
# Create two new Columns that take the two outputs of the function
adage['AGENCY-NAME'], adage['AGENCY-OWNER'] = zip(*adage['AGENCY-COMPANY'].map(indySubstituteSimple))
adage

Unnamed: 0,AGENCY-COMPANY,HEADQUARTERS,2015-REVENUE,% CHG,AGENCY-NAME,AGENCY-OWNER
0,Epsilon [Alliance Data Systems Corp.],"Irving, Texas",1855045,3.5,Epsilon,Alliance Data Systems Corp.
1,Accenture Interactive [Accenture],New York,1231596,70.8,Accenture Interactive,Accenture
2,Deloitte Digital* [Deloitte],New York,865200,,Deloitte Digital*,Deloitte
3,IBM Interactive Experience* [IBM Corp.],"Armonk, N.Y.",796800,,IBM Interactive Experience*,IBM Corp.
4,Acxiom Corp.*,"Little Rock, Ark.",765299,7.0,Acxiom Corp.*,Independent
5,Razorfish Global* [Publicis],New York,651665,-1.3,Razorfish Global*,Publicis
6,PwC Digital Services [PwC],New York,624000,,PwC Digital Services,PwC
7,BBDO Worldwide* [Omnicom],New York,602798,4.5,BBDO Worldwide*,Omnicom
8,SapientNitro* [Publicis],Boston,585000,1.3,SapientNitro*,Publicis
9,McCann* [Interpublic],New York,561860,15.9,McCann*,Interpublic


In [7]:
# Remove the bigger agency parent, e.g. 'Omnicom (child of BBDO)'
# Strip any trailing whitespaces in the names
adage.loc[:, 'AGENCY-OWNER'] = adage['AGENCY-OWNER'].apply(lambda x: re.sub(r'\([^)]*\)', '', x))
adage.loc[:, 'AGENCY-OWNER'] = adage['AGENCY-OWNER'].apply(lambda x: re.sub(r'\s+$', '', x))
adage

Unnamed: 0,AGENCY-COMPANY,HEADQUARTERS,2015-REVENUE,% CHG,AGENCY-NAME,AGENCY-OWNER
0,Epsilon [Alliance Data Systems Corp.],"Irving, Texas",1855045,3.5,Epsilon,Alliance Data Systems Corp.
1,Accenture Interactive [Accenture],New York,1231596,70.8,Accenture Interactive,Accenture
2,Deloitte Digital* [Deloitte],New York,865200,,Deloitte Digital*,Deloitte
3,IBM Interactive Experience* [IBM Corp.],"Armonk, N.Y.",796800,,IBM Interactive Experience*,IBM Corp.
4,Acxiom Corp.*,"Little Rock, Ark.",765299,7.0,Acxiom Corp.*,Independent
5,Razorfish Global* [Publicis],New York,651665,-1.3,Razorfish Global*,Publicis
6,PwC Digital Services [PwC],New York,624000,,PwC Digital Services,PwC
7,BBDO Worldwide* [Omnicom],New York,602798,4.5,BBDO Worldwide*,Omnicom
8,SapientNitro* [Publicis],Boston,585000,1.3,SapientNitro*,Publicis
9,McCann* [Interpublic],New York,561860,15.9,McCann*,Interpublic


In [8]:
# Transform revenue to the literal number, not 'In thousands'
# Remove the * asterisk after the agency name
adage.iloc[:,2] = adage.iloc[:,2].apply(lambda x: x*1000)
adage.iloc[:,4] = adage.iloc[:,4].apply(lambda x: re.sub(r'\*', '', x))
adage

Unnamed: 0,AGENCY-COMPANY,HEADQUARTERS,2015-REVENUE,% CHG,AGENCY-NAME,AGENCY-OWNER
0,Epsilon [Alliance Data Systems Corp.],"Irving, Texas",1855045000,3.5,Epsilon,Alliance Data Systems Corp.
1,Accenture Interactive [Accenture],New York,1231596000,70.8,Accenture Interactive,Accenture
2,Deloitte Digital* [Deloitte],New York,865200000,,Deloitte Digital,Deloitte
3,IBM Interactive Experience* [IBM Corp.],"Armonk, N.Y.",796800000,,IBM Interactive Experience,IBM Corp.
4,Acxiom Corp.*,"Little Rock, Ark.",765299000,7.0,Acxiom Corp.,Independent
5,Razorfish Global* [Publicis],New York,651665000,-1.3,Razorfish Global,Publicis
6,PwC Digital Services [PwC],New York,624000000,,PwC Digital Services,PwC
7,BBDO Worldwide* [Omnicom],New York,602798000,4.5,BBDO Worldwide,Omnicom
8,SapientNitro* [Publicis],Boston,585000000,1.3,SapientNitro,Publicis
9,McCann* [Interpublic],New York,561860000,15.9,McCann,Interpublic


In [9]:
adage.groupby('AGENCY-OWNER').size()

AGENCY-OWNER
Accenture                              1
Acosta                                 1
Advance Publications                   1
Advantage Solutions                    1
Alliance Data Systems Corp.            2
Asatsu-DK                              1
BlueFocus Communication Group          3
Cheil Worldwide                        3
Creston                                1
DJE Holdings                           2
Deloitte                               1
Dentsu                                13
Eastport Holdings                      4
Engine Group                           2
Experian                               1
Hakuhodo DY Holdings                   1
Havas                                  5
Hearst Corp.                           1
Huntsworth                             2
IBM Corp.                              1
ICF International                      1
Independent                          634
Interpublic                           43
M&C Saatchi                            1
MDC

In [10]:
# adage.groupby('AGENCY-OWNER').size().to_csv('/Users/xavier/Documents/src/dataviz/AgencyRevenueModels/adage-data/adage-900_4tier_2015.csv', encoding='utf-8')

In [11]:
# Create arrays whice define 'The Four Tiers' of marketing companies
agency_holding_companies = ['Omnicom','Interpublic','WPP','Publicis','Dentsu','Havas']
consultant_holding_companies = ['Alliance Data Systems Corp.','Accenture','Advance Publications','Deloitte','Experian','IBM Corp.','PwC']
midmarket_holding_companies = ['MDC Partners','Project WorldWide','BlueFocus Communication Group','Cheil Worldwide','Next Fifteen Communications Group','Huntsworth','Hakuhodo DY Holdings']
contender_holding_companies = ['DJE Holdings','Engine Group','Asatsu-DK','ASM','BlueFocus Communication Group','Creston','FullSix Group','Hearst Corp.','Iris Worldwide','Klick Inc.','Marc USA','Matomy Media Group','Meredith Corp.','Mother Holdings','TMP Worldwide','Viad Corp.']

In [12]:
adage['AGENCY-OWNER'].isin(agency_holding_companies)

0      False
1      False
2      False
3      False
4      False
5       True
6      False
7       True
8       True
9       True
10     False
11      True
12      True
13      True
14     False
15     False
16      True
17      True
18      True
19      True
20     False
21      True
22     False
23      True
24      True
25      True
26      True
27      True
28      True
29     False
       ...  
885    False
886    False
887    False
888    False
889    False
890    False
891    False
892    False
893    False
894    False
895    False
896    False
897    False
898    False
899    False
900    False
901    False
902    False
903    False
904    False
905    False
906    False
907    False
908    False
909    False
910    False
911    False
912    False
913    False
914    False
Name: AGENCY-OWNER, dtype: bool

In [13]:
"""
agency_holding_companies
consultant_holding_companies
midmarket_holding_companies
contender_holding_companies
"""
def typer(item):
    if item in agency_holding_companies:
        return ("BIGHOLD")
    elif item in consultant_holding_companies:
        return ("CONSULTANT")        
    elif item in midmarket_holding_companies:
        return ("MIDMARKET")        
    elif item in contender_holding_companies:
        return ("CONTENDERS")        
    else:
        return ("INDY")        

In [14]:
typer('Accenture')

'CONSULTANT'

In [15]:
typer('Viad Corp.')

'CONTENDERS'

In [16]:
adage['AGENCY-TIER'] = adage['AGENCY-OWNER'].apply(typer)

In [17]:
adage

Unnamed: 0,AGENCY-COMPANY,HEADQUARTERS,2015-REVENUE,% CHG,AGENCY-NAME,AGENCY-OWNER,AGENCY-TIER
0,Epsilon [Alliance Data Systems Corp.],"Irving, Texas",1855045000,3.5,Epsilon,Alliance Data Systems Corp.,CONSULTANT
1,Accenture Interactive [Accenture],New York,1231596000,70.8,Accenture Interactive,Accenture,CONSULTANT
2,Deloitte Digital* [Deloitte],New York,865200000,,Deloitte Digital,Deloitte,CONSULTANT
3,IBM Interactive Experience* [IBM Corp.],"Armonk, N.Y.",796800000,,IBM Interactive Experience,IBM Corp.,CONSULTANT
4,Acxiom Corp.*,"Little Rock, Ark.",765299000,7.0,Acxiom Corp.,Independent,INDY
5,Razorfish Global* [Publicis],New York,651665000,-1.3,Razorfish Global,Publicis,BIGHOLD
6,PwC Digital Services [PwC],New York,624000000,,PwC Digital Services,PwC,CONSULTANT
7,BBDO Worldwide* [Omnicom],New York,602798000,4.5,BBDO Worldwide,Omnicom,BIGHOLD
8,SapientNitro* [Publicis],Boston,585000000,1.3,SapientNitro,Publicis,BIGHOLD
9,McCann* [Interpublic],New York,561860000,15.9,McCann,Interpublic,BIGHOLD


In [18]:
adage.groupby('AGENCY-TIER').size()

AGENCY-TIER
BIGHOLD       213
CONSULTANT      8
CONTENDERS     11
INDY          644
MIDMARKET      39
dtype: int64

In [19]:
adage.groupby('AGENCY-TIER').sum()

Unnamed: 0_level_0,2015-REVENUE,% CHG
AGENCY-TIER,Unnamed: 1_level_1,Unnamed: 2_level_1
BIGHOLD,23800927000,1503.8
CONSULTANT,6030425000,79.7
CONTENDERS,1255154000,96.9
INDY,14118269000,6309.7
MIDMARKET,1548381000,618.9


In [20]:
## adage.to_csv('/Users/xavier/Documents/src/dataviz/AgencyRevenueModels/adage-data/adage-900_cleaned_2010.csv', encoding='utf-8')