# Adwords

In this notebook, I am going to work with adwords data:
    - Load datasets.
    - Concatenate these datasets into one main dataset
    - Clean data (rename, select columns, etc.)
    - Select specific references and their data from previous dataset.
    - Split the main dataframe into multiple dataframes (a dataframe per reference)
    - Merge multiples dataframes with "datatime_df"
    - Save multiples dataframes correctly as csv files

First of all, I import the required libraries

In [1]:
from datetime import datetime
import numpy as np
import pandas as pd
import os

I set the directory to open "Adwords" dataset

In [34]:
path = "/home/miguel/my_project_python/my_project_env/tfm/datasets/adwords/original_data"
os.chdir(path)

I open adwords data. I have to open three files according to different time periods:
    - TFM_Datos_2017_18_1.csv
    - TFM_Datos_2017_18_2.csv
    - TFM_Datos_2018_19.csv

In [35]:
#file 1
df_2017_18_1 = pd.read_csv("TFM_Datos_2017_18_1.csv", sep=";", error_bad_lines=True)

#file 2
df_2017_18_2 = pd.read_csv("TFM_Datos_2017_18_2.csv", sep=";", error_bad_lines=True)

#file 3
df_2018_19 = pd.read_csv("TFM_Datos_2018_19.csv", sep=";", error_bad_lines=True)

  interactivity=interactivity, compiler=compiler, result=result)


I check all the data

In [36]:
df_2017_18_1.head(3)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,...,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Tasa de conversión,Coste/conv.,Coste/todas las conversiones,Valor conv./coste,Valor de todas las conversiones/coste
0,61745,SHOP - Campaña Base - Inventario Completo,210135538,cocina y comedor,vajilla y servicio de mesa,02/10/2017,20,15,EUR,3.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
1,88579,SHOP - Campaña Base - Inventario Completo,210135538,decoración de navidad,estrellas de navidad,23/02/2018,10,15,EUR,2.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
2,101618,SHOP - Disfraces,757342015,decoración,adornos y objetos decorativos,19/12/2017,50,18,EUR,2.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0


In [37]:
df_2017_18_2.head(3)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,...,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Tasa de conversión,Coste/conv.,Coste/todas las conversiones,Valor conv./coste,Valor de todas las conversiones/coste
0,89022,Shop_Textil hogar_Alfombras de bambú_Y18_W12,1318772108,textil hogar,alfombras de bambú,31/05/2018,50,15,EUR,6.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
1,49451,Shop_Textil hogar_Cojines_Y18_W12,1318841021,textil hogar,cojines,30/03/2018,20,2,EUR,2.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
2,105375,Shop_Novedades_Y18_W39,1578291814,muebles,taburetes,28/09/2018,150,2,EUR,18.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0


In [38]:
df_2018_19.head(3)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,...,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Tasa de conversión,Coste/conv.,Coste/todas las conversiones,Valor conv./coste,Valor de todas las conversiones/coste
0,127156,Shop_Smart_N1_Téxtil_Hogar_Y19_W5,1695676698,textil hogar,alfombras de vinilo,20/02/2019,200,1,EUR,153.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
1,104268,Shop_Decoración_N1_Y18_W37,1559890739,decoración para tu casa,figuras decorativas,16/10/2018,40,15,EUR,9.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
2,123872,Shop_Smart_N1_Téxtil_Hogar_Y19_W5,1695676698,textil hogar,edredones,25/03/2019,80,1,EUR,1.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0


In [39]:
df_2017_18_1.info()
df_2017_18_2.info()
df_2018_19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 576480 entries, 0 to 576479
Data columns (total 21 columns):
ID de producto                                     576480 non-null int64
Campaña                                            576480 non-null object
ID de la campaña                                   576480 non-null int64
Tipo de producto (primer nivel)                    576480 non-null object
Tipo de producto (segundo nivel)                   576480 non-null object
Día                                                576480 non-null object
Etiqueta personalizada 1                           576480 non-null object
CPC máximo predeterminado del grupo de anuncios    576480 non-null object
Moneda                                             576480 non-null object
Impresiones                                        576480 non-null float64
Clics                                              576480 non-null int64
CTR                                                576480 non-null object
CPC

In [40]:
df_2017_18_1.shape

(576480, 21)

In [41]:
df_2017_18_2.shape

(571568, 21)

In [42]:
df_2018_19.shape

(447539, 21)

Now, I am going to concatenate all the data into one main dataframe

In [None]:
df_all_rows = pd.concat([df_SN7577i_a, df_SN7577i_b])
df_all_rows

In [47]:
df_adwords = pd.concat([df_2017_18_1, df_2017_18_2, df_2018_19])
df_adwords.head(5)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,...,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Tasa de conversión,Coste/conv.,Coste/todas las conversiones,Valor conv./coste,Valor de todas las conversiones/coste
0,61745,SHOP - Campaña Base - Inventario Completo,210135538,cocina y comedor,vajilla y servicio de mesa,02/10/2017,20,15,EUR,3.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
1,88579,SHOP - Campaña Base - Inventario Completo,210135538,decoración de navidad,estrellas de navidad,23/02/2018,10,15,EUR,2.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
2,101618,SHOP - Disfraces,757342015,decoración,adornos y objetos decorativos,19/12/2017,50,18,EUR,2.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
3,102193,SHOP - Decoración - Todas las categorías,868125290,decoración,adornos y objetos decorativos,26/12/2017,40,1,EUR,13.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
4,102154,SHOP - Organización y Almacenamiento - Percheros,939196756,organización y almacenamiento,percheros,06/02/2018,100,17,EUR,806.0,...,"1,49 %",25,295,0,0,"0,00 %",0,0,0,0


In [48]:
df_adwords.shape

(1595587, 21)

In [49]:
df_adwords.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1595587 entries, 0 to 447538
Data columns (total 21 columns):
ID de producto                                     1595587 non-null int64
Campaña                                            1595587 non-null object
ID de la campaña                                   1595587 non-null int64
Tipo de producto (primer nivel)                    1595587 non-null object
Tipo de producto (segundo nivel)                   1595587 non-null object
Día                                                1595587 non-null object
Etiqueta personalizada 1                           1595587 non-null object
CPC máximo predeterminado del grupo de anuncios    1595587 non-null object
Moneda                                             1595587 non-null object
Impresiones                                        1595587 non-null float64
Clics                                              1595587 non-null int64
CTR                                                1595587 non-nu

I start with the data wrangling of this dataframe

Firstly, I rename the columns

In [50]:
df_adwords = df_adwords.rename(index=str, 
            columns={"ID de producto":"Reference", "Tipo de producto (primer nivel)": "CatN1",
                     "Tipo de producto (segundo nivel)":"CatN2", "Día":"Date", "Etiqueta personalizada 1": "Cat_Price", 
                     "CPC máximo predeterminado del grupo de anuncios": "CPC_max", "CPC medio": "CPC_medio", 
                     "Impresiones":"Impresions", "Coste":"Cost", "Conversiones": "Conversions",
                    "Todas las conversiones": "All_Conversions", "Valor conv./coste": "ROAS", 
                    "Valor de todas las conversiones/coste": "ROAS_All"})

In [51]:
df_adwords.head(5)

Unnamed: 0,Reference,Campaña,ID de la campaña,CatN1,CatN2,Date,Cat_Price,CPC_max,Moneda,Impresions,...,CTR,CPC_medio,Cost,Conversions,All_Conversions,Tasa de conversión,Coste/conv.,Coste/todas las conversiones,ROAS,ROAS_All
0,61745,SHOP - Campaña Base - Inventario Completo,210135538,cocina y comedor,vajilla y servicio de mesa,02/10/2017,20,15,EUR,3.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
1,88579,SHOP - Campaña Base - Inventario Completo,210135538,decoración de navidad,estrellas de navidad,23/02/2018,10,15,EUR,2.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
2,101618,SHOP - Disfraces,757342015,decoración,adornos y objetos decorativos,19/12/2017,50,18,EUR,2.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
3,102193,SHOP - Decoración - Todas las categorías,868125290,decoración,adornos y objetos decorativos,26/12/2017,40,1,EUR,13.0,...,"0,00 %",0,0,0,0,"0,00 %",0,0,0,0
4,102154,SHOP - Organización y Almacenamiento - Percheros,939196756,organización y almacenamiento,percheros,06/02/2018,100,17,EUR,806.0,...,"1,49 %",25,295,0,0,"0,00 %",0,0,0,0


Then, I select the columns of our interest for this study

In [52]:
column_interest_adwords = ["Reference", "CatN1", "CatN2", "Date", "Cat_Price", "CPC_max", "CPC_medio", "Impresions", 
                           "Clics", "Cost", "Conversions", "All_Conversions", "ROAS", "ROAS_All"]
df_adwords = df_adwords[column_interest_adwords]

In [53]:
df_adwords.head(5)

Unnamed: 0,Reference,CatN1,CatN2,Date,Cat_Price,CPC_max,CPC_medio,Impresions,Clics,Cost,Conversions,All_Conversions,ROAS,ROAS_All
0,61745,cocina y comedor,vajilla y servicio de mesa,02/10/2017,20,15,0,3.0,0,0,0,0,0,0
1,88579,decoración de navidad,estrellas de navidad,23/02/2018,10,15,0,2.0,0,0,0,0,0,0
2,101618,decoración,adornos y objetos decorativos,19/12/2017,50,18,0,2.0,0,0,0,0,0,0
3,102193,decoración,adornos y objetos decorativos,26/12/2017,40,1,0,13.0,0,0,0,0,0,0
4,102154,organización y almacenamiento,percheros,06/02/2018,100,17,25,806.0,12,295,0,0,0,0


In [54]:
df_adwords.shape

(1595587, 14)

After that, I change dtype from some columns in order to work correctly

In [None]:
#Save references id as list
df_adwords["Reference"] = df_adwords["Reference"].apply(str)

#Parse time
df_adwords["Date"] = pd.to_datetime(df_adwords["Date"], format = "%d/%m/%Y")

#Cost
df_adwords["Cost"] = df_adwords["Cost"].str.replace(",",".").astype(float)

In [None]:
df_adswords.info()

After all data wrangling steps, I continue selecting the specific references and their data from "df_adwords"

I write the steps to split dataframes into different dataframes

In [None]:
#I create unique list of names
references = df_adswords['Reference'].unique().tolist()

#I create a data frame dictionary to store your data frames
DataFrameDict = {elem : pd.DataFrame for elem in references}

#I create a loop for to read all rows and save according to the key
for key in DataFrameDict.keys():
    DataFrameDict[key] = df_adwords[:][df_adwords.Reference == key]

I use a reference id as example to check its functionality

In [None]:
DataFrameDict['4623']

Later, I load top the csv file "references_url.csv" to use its id values as filters to create the required dataframes per reference.

I set a directory to open the top_50_reference file

In [None]:
path = "/home/miguel/my_project_python/my_project_env/tfm/datasets"
os.chdir(path)

I open that file and I choose the column of our interest

In [None]:
df_reference_url = pd.read_csv("references_url.csv", sep='\t')
column_interest_reference = ["Reference"]
df_reference_url = df_reference_url[column_interest_reference]

I check the data

In [None]:
df_reference_url.head(5)

I create another list using last dataset (I apply string to "Reference" column as I have also made before)

In [None]:
lista = df_reference_url["Reference"].apply(str)
c = lista.values.tolist()

I make a for loop to:

- Read each component of the list and check its data from DataFrameDict.
- Set the directory to storage the data
- Save a csv file for each reference

In [None]:
for i in range(len(c)):
    df_test = DataFrameDict[c[i]]
    file = c[i]
    
    path = "/home/miguel/my_project_python/my_project_env/tfm/datasets/adwords/transform_data"
    os.chdir(path)
    
    df_test.to_csv(file)

I check it work well

In [None]:
df_test_ads = pd.read_csv("101318", sep=",", error_bad_lines=True)

In [None]:
df_test_ads.head(5)

In [None]:
df_test_ads.shape

In [None]:
df_test_ads.info