# Adwords

In this notebook, I am going to work with adwords data:
    - Load datasets.
    - Concatenate these datasets into one main dataset
    - Clean data (rename, select columns, etc.)
    - Select specific references and their data from previous dataset.
    - Split the main dataframe into multiple dataframes (a dataframe per reference)
    - Merge multiples dataframes with "datatime_df"
    - Save multiples dataframes correctly as csv files

First of all, I import the required libraries

In [55]:
from datetime import datetime
import numpy as np
import pandas as pd
import os

I set the directory to open "Adwords" dataset

In [56]:
path = "/home/miguel/my_project_python/my_project_env/tfm/datasets/adwords/original_data"
os.chdir(path)

I open adwords data. I have to open three files according to different time periods:
    - TFM_Datos_2017_18_1.csv
    - TFM_Datos_2017_18_2.csv
    - TFM_Datos_2018_19.csv

In [57]:
#file 1
df_2017_18_1 = pd.read_csv("TFM_Datos_2017_18_1.csv", sep=";", error_bad_lines=True)

#file 2
df_2017_18_2 = pd.read_csv("TFM_Datos_2017_18_2.csv", sep=";", error_bad_lines=True)

#file 3
df_2018_19 = pd.read_csv("TFM_Datos_2018_19.csv", sep=";", error_bad_lines=True)

  interactivity=interactivity, compiler=compiler, result=result)


I check all the data

In [58]:
df_2017_18_1.head(3)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,Clics,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Valor de conv.,Valor de todas las conversiones,Valor conv./coste
0,110275,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,19/11/2017,10,19,EUR,14,0,"0,00 %",0,0,0,0,0,0,0
1,110290,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,24/11/2017,10,19,EUR,10,0,"0,00 %",0,0,0,0,0,0,0
2,60067,SHOP - Baño - Armarios de baño - Cajoneras y e...,940399152,baño,armarios de baño,07/10/2017,40,17,EUR,173,2,"1,16 %",14,27,0,0,0,0,0


In [59]:
df_2017_18_2.head(3)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,Clics,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Valor de conv.,Valor de todas las conversiones,Valor conv./coste
0,29836,Shop_Muebles_Banquetas y bancos_Y18_W12,1317684940,cocina y comedor,orden en la cocina,01/06/2018,30,2,EUR,2,0,"0,00 %",0,0,0,0,0,0,0
1,101626,SHOP - Decoración - Todas las categorías,868125290,decoración para tu casa,jarrones y floreros,15/02/2018,60,1,EUR,1,0,"0,00 %",0,0,0,0,0,0,0
2,102065,SHOP - Organización y Almacenamiento,868200625,organización y almacenamiento,perchas y colgadores,19/02/2018,30,1,EUR,3,0,"0,00 %",0,0,0,0,0,0,0


In [60]:
df_2018_19.head(3)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,Clics,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Valor de conv.,Valor de todas las conversiones,Valor conv./coste
0,123624,Shop_Cocina y comedor_N1_Y18_W37,1559515409,cocina y comedor,tazas de café y mugs,12/10/2018,10,15,EUR,1,0,"0,00 %",0,0,0,0,0,0,0
1,107553,Shop_Smart_Lámparas_Y18_W45,1623248621,lámparas e iluminación,lámparas de techo,31/01/2019,40,1,EUR,2,0,"0,00 %",0,0,0,0,0,0,0
2,87810,Shop_Navidad_Estrellas de navidad_Y18_W12,1323457572,decoración de navidad,estrellas de navidad,17/10/2018,20,2,EUR,1,0,"0,00 %",0,0,0,0,0,0,0


In [61]:
df_2017_18_1.info()
df_2017_18_2.info()
df_2018_19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 568972 entries, 0 to 568971
Data columns (total 19 columns):
ID de producto                                     568972 non-null int64
Campaña                                            568972 non-null object
ID de la campaña                                   568972 non-null int64
Tipo de producto (primer nivel)                    568972 non-null object
Tipo de producto (segundo nivel)                   568972 non-null object
Día                                                568972 non-null object
Etiqueta personalizada 1                           568972 non-null object
CPC máximo predeterminado del grupo de anuncios    568972 non-null object
Moneda                                             568972 non-null object
Impresiones                                        568972 non-null int64
Clics                                              568972 non-null int64
CTR                                                568972 non-null object
CPC m

In [62]:
df_2017_18_1.shape

(568972, 19)

In [63]:
df_2017_18_2.shape

(579076, 19)

In [64]:
df_2018_19.shape

(447539, 19)

Now, I am going to concatenate all the data into one main dataframe

In [81]:
df_adwords = pd.concat([df_2017_18_1, df_2017_18_2, df_2018_19])
df_adwords.head(5)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,Clics,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Valor de conv.,Valor de todas las conversiones,Valor conv./coste
0,110275,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,19/11/2017,10,19,EUR,14,0,"0,00 %",0,0,0,0,0,0,0
1,110290,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,24/11/2017,10,19,EUR,10,0,"0,00 %",0,0,0,0,0,0,0
2,60067,SHOP - Baño - Armarios de baño - Cajoneras y e...,940399152,baño,armarios de baño,07/10/2017,40,17,EUR,173,2,"1,16 %",14,27,0,0,0,0,0
3,91514,SHOP - Decoración - Todas las categorías,868125290,decoración,adornos y objetos decorativos,27/12/2017,50,1,EUR,5,0,"0,00 %",0,0,0,0,0,0,0
4,49949,SHOP - Decoración - Todas las categorías,868125290,decoración,adornos y objetos decorativos,01/12/2017,30,1,EUR,1,0,"0,00 %",0,0,0,0,0,0,0


In [82]:
df_adwords.shape

(1595587, 19)

In [83]:
df_adwords.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1595587 entries, 0 to 447538
Data columns (total 19 columns):
ID de producto                                     1595587 non-null int64
Campaña                                            1595587 non-null object
ID de la campaña                                   1595587 non-null int64
Tipo de producto (primer nivel)                    1595587 non-null object
Tipo de producto (segundo nivel)                   1595587 non-null object
Día                                                1595587 non-null object
Etiqueta personalizada 1                           1595587 non-null object
CPC máximo predeterminado del grupo de anuncios    1595587 non-null object
Moneda                                             1595587 non-null object
Impresiones                                        1595587 non-null int64
Clics                                              1595587 non-null int64
CTR                                                1595587 non-null

I start with the data wrangling of this dataframe

Firstly, I rename the columns

In [84]:
df_adwords = df_adwords.rename(index=str, 
            columns={"ID de producto":"Reference", "Tipo de producto (primer nivel)": "CatN1",
                     "Tipo de producto (segundo nivel)":"CatN2", "Día":"Date", "Etiqueta personalizada 1": "Cat_Price", 
                     "CPC máximo predeterminado del grupo de anuncios": "CPC_max", "CPC medio": "CPC_medio", 
                     "Impresiones":"Impressions", "Coste":"Cost", "Conversiones": "Conversions",
                    "Todas las conversiones": "All_Conversions", "Valor de conv.": "Ads_Income", 
                    "Valor de todas las conversiones": "Ads_Income_All"})

In [85]:
df_adwords.head(5)

Unnamed: 0,Reference,Campaña,ID de la campaña,CatN1,CatN2,Date,Cat_Price,CPC_max,Moneda,Impressions,Clics,CTR,CPC_medio,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All,Valor conv./coste
0,110275,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,19/11/2017,10,19,EUR,14,0,"0,00 %",0,0,0,0,0,0,0
1,110290,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,24/11/2017,10,19,EUR,10,0,"0,00 %",0,0,0,0,0,0,0
2,60067,SHOP - Baño - Armarios de baño - Cajoneras y e...,940399152,baño,armarios de baño,07/10/2017,40,17,EUR,173,2,"1,16 %",14,27,0,0,0,0,0
3,91514,SHOP - Decoración - Todas las categorías,868125290,decoración,adornos y objetos decorativos,27/12/2017,50,1,EUR,5,0,"0,00 %",0,0,0,0,0,0,0
4,49949,SHOP - Decoración - Todas las categorías,868125290,decoración,adornos y objetos decorativos,01/12/2017,30,1,EUR,1,0,"0,00 %",0,0,0,0,0,0,0


Then, I select the columns of our interest for this study

In [86]:
column_interest_adwords = ["Reference", "CatN1", "CatN2", "Date", "Cat_Price", "CPC_max", "CPC_medio", "Impressions", 
                           "Clics", "Cost", "Conversions", "All_Conversions", "Ads_Income", "Ads_Income_All"]
df_adwords = df_adwords[column_interest_adwords]

In [87]:
df_adwords.head(5)

Unnamed: 0,Reference,CatN1,CatN2,Date,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All
0,110275,navidad,árbol de navidad,19/11/2017,10,19,0,14,0,0,0,0,0,0
1,110290,navidad,árbol de navidad,24/11/2017,10,19,0,10,0,0,0,0,0,0
2,60067,baño,armarios de baño,07/10/2017,40,17,14,173,2,27,0,0,0,0
3,91514,decoración,adornos y objetos decorativos,27/12/2017,50,1,0,5,0,0,0,0,0,0
4,49949,decoración,adornos y objetos decorativos,01/12/2017,30,1,0,1,0,0,0,0,0,0


In [88]:
df_adwords.shape

(1595587, 14)

After that, I change dtype from some columns in order to work correctly

In [90]:
#Change reference value from int to string in order to save references id as list later
df_adwords["Reference"] = df_adwords["Reference"].apply(str)

#Parse time to "Date" column
df_adwords["Date"] = pd.to_datetime(df_adwords["Date"], format = "%d/%m/%Y")

#Change to float type
#float_columns = ["CPC_medio", "Cost", "Conversions", "All_Conversions", "Ads_Income", "Ads_Income_All"]
df_adwords["CPC_max"] = df_adwords["CPC_max"].str.replace(",",".").astype(float)
df_adwords["CPC_medio"] = df_adwords["CPC_medio"].str.replace(",",".").astype(float)
df_adwords["Cost"] = df_adwords["Cost"].str.replace(",",".").astype(float)
df_adwords["Conversions"] = df_adwords["Conversions"].str.replace(",",".").astype(float)
df_adwords["All_Conversions"] = df_adwords["All_Conversions"].str.replace(",",".").astype(float)
df_adwords["Ads_Income"] = df_adwords["Ads_Income"].str.replace(",",".").astype(float)
df_adwords["Ads_Income_All"] = df_adwords["Ads_Income_All"].str.replace(",",".").astype(float)

In [92]:
df_adwords.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1595587 entries, 0 to 447538
Data columns (total 14 columns):
Reference          1595587 non-null object
CatN1              1595587 non-null object
CatN2              1595587 non-null object
Date               1595587 non-null datetime64[ns]
Cat_Price          1595587 non-null object
CPC_max            1595587 non-null float64
CPC_medio          1595587 non-null float64
Impressions        1595587 non-null int64
Clics              1595587 non-null int64
Cost               1595587 non-null float64
Conversions        1595587 non-null float64
All_Conversions    1595587 non-null float64
Ads_Income         1595587 non-null float64
Ads_Income_All     1595587 non-null float64
dtypes: datetime64[ns](1), float64(7), int64(2), object(4)
memory usage: 182.6+ MB


In [94]:
df_adwords.head(5)

Unnamed: 0,Reference,CatN1,CatN2,Date,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All
0,110275,navidad,árbol de navidad,2017-11-19,10,0.19,0.0,14,0,0.0,0.0,0.0,0.0,0.0
1,110290,navidad,árbol de navidad,2017-11-24,10,0.19,0.0,10,0,0.0,0.0,0.0,0.0,0.0
2,60067,baño,armarios de baño,2017-10-07,40,0.17,0.14,173,2,0.27,0.0,0.0,0.0,0.0
3,91514,decoración,adornos y objetos decorativos,2017-12-27,50,0.01,0.0,5,0,0.0,0.0,0.0,0.0,0.0
4,49949,decoración,adornos y objetos decorativos,2017-12-01,30,0.01,0.0,1,0,0.0,0.0,0.0,0.0,0.0


After all data wrangling steps, I continue selecting the specific references and their data from "df_adwords"

I open csv file "top_50_reference" to merge with "df_adwords"

In [None]:
I merge both files

In [None]:
I delete Nan Values

I write the steps to split dataframes into different dataframes

In [96]:
#Create unique list of names
references = df_adwords["Reference"].unique().tolist()

#Create a dataframe dictionary to store your data frames
DataFrameDict = {elem : pd.DataFrame for elem in references}

#Create a loop for to read all rows and save according to the key
for key in DataFrameDict.keys():
    DataFrameDict[key] = df_adwords[:][df_adwords.Reference == key]

KeyboardInterrupt: 

I use a reference id as example to check its functionality

In [None]:
DataFrameDict['4623']

Later, I load top the csv file "references_url.csv" to use its id values as filters to create the required dataframes per reference.

I set a directory to open the top_50_reference file

In [None]:
path = "/home/miguel/my_project_python/my_project_env/tfm/datasets"
os.chdir(path)

I open that file and I choose the column of our interest

In [None]:
df_reference_url = pd.read_csv("references_url.csv", sep='\t')
column_interest_reference = ["Reference"]
df_reference_url = df_reference_url[column_interest_reference]

I check the data

In [None]:
df_reference_url.head(5)

I create another list using last dataset (I apply string to "Reference" column as I have also made before)

In [None]:
lista = df_reference_url["Reference"].apply(str)
c = lista.values.tolist()

I make a for loop to:

- Read each component of the list and check its data from DataFrameDict.
- Set the directory to storage the data
- Save a csv file for each reference

In [None]:
for i in range(len(c)):
    df_test = DataFrameDict[c[i]]
    file = c[i]
    
    path = "/home/miguel/my_project_python/my_project_env/tfm/datasets/adwords/transform_data"
    os.chdir(path)
    
    df_test.to_csv(file)

I check it work well

In [None]:
df_test_ads = pd.read_csv("101318", sep=",", error_bad_lines=True)

In [None]:
df_test_ads.head(5)

In [None]:
df_test_ads.shape

In [None]:
df_test_ads.info