# Adwords

In this notebook, I am going to work with adwords data:
    - Load datasets.
    - Concatenate these datasets into one main dataset
    - Clean data (rename, select columns, etc.)
    - Select specific references and their data from previous dataset.
    - Split the main dataframe into multiple dataframes (a dataframe per reference)
    - Merge multiples dataframes with "datatime_df"
    - Save multiples dataframes correctly as csv files

First of all, I import the required libraries

In [55]:
from datetime import datetime
import numpy as np
import pandas as pd
import os

I set the directory to open "Adwords" dataset

In [56]:
path = "/home/miguel/my_project_python/my_project_env/tfm/datasets/adwords/original_data"
os.chdir(path)

I open adwords data. I have to open three files according to different time periods:
    - TFM_Datos_2017_18_1.csv
    - TFM_Datos_2017_18_2.csv
    - TFM_Datos_2018_19.csv

In [57]:
#file 1
df_2017_18_1 = pd.read_csv("TFM_Datos_2017_18_1.csv", sep=";", error_bad_lines=True)

#file 2
df_2017_18_2 = pd.read_csv("TFM_Datos_2017_18_2.csv", sep=";", error_bad_lines=True)

#file 3
df_2018_19 = pd.read_csv("TFM_Datos_2018_19.csv", sep=";", error_bad_lines=True)

  interactivity=interactivity, compiler=compiler, result=result)


I check all the data

In [58]:
df_2017_18_1.head(3)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,Clics,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Valor de conv.,Valor de todas las conversiones,Valor conv./coste
0,110275,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,19/11/2017,10,19,EUR,14,0,"0,00 %",0,0,0,0,0,0,0
1,110290,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,24/11/2017,10,19,EUR,10,0,"0,00 %",0,0,0,0,0,0,0
2,60067,SHOP - Baño - Armarios de baño - Cajoneras y e...,940399152,baño,armarios de baño,07/10/2017,40,17,EUR,173,2,"1,16 %",14,27,0,0,0,0,0


In [59]:
df_2017_18_2.head(3)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,Clics,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Valor de conv.,Valor de todas las conversiones,Valor conv./coste
0,29836,Shop_Muebles_Banquetas y bancos_Y18_W12,1317684940,cocina y comedor,orden en la cocina,01/06/2018,30,2,EUR,2,0,"0,00 %",0,0,0,0,0,0,0
1,101626,SHOP - Decoración - Todas las categorías,868125290,decoración para tu casa,jarrones y floreros,15/02/2018,60,1,EUR,1,0,"0,00 %",0,0,0,0,0,0,0
2,102065,SHOP - Organización y Almacenamiento,868200625,organización y almacenamiento,perchas y colgadores,19/02/2018,30,1,EUR,3,0,"0,00 %",0,0,0,0,0,0,0


In [60]:
df_2018_19.head(3)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,Clics,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Valor de conv.,Valor de todas las conversiones,Valor conv./coste
0,123624,Shop_Cocina y comedor_N1_Y18_W37,1559515409,cocina y comedor,tazas de café y mugs,12/10/2018,10,15,EUR,1,0,"0,00 %",0,0,0,0,0,0,0
1,107553,Shop_Smart_Lámparas_Y18_W45,1623248621,lámparas e iluminación,lámparas de techo,31/01/2019,40,1,EUR,2,0,"0,00 %",0,0,0,0,0,0,0
2,87810,Shop_Navidad_Estrellas de navidad_Y18_W12,1323457572,decoración de navidad,estrellas de navidad,17/10/2018,20,2,EUR,1,0,"0,00 %",0,0,0,0,0,0,0


In [61]:
df_2017_18_1.info()
df_2017_18_2.info()
df_2018_19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 568972 entries, 0 to 568971
Data columns (total 19 columns):
ID de producto                                     568972 non-null int64
Campaña                                            568972 non-null object
ID de la campaña                                   568972 non-null int64
Tipo de producto (primer nivel)                    568972 non-null object
Tipo de producto (segundo nivel)                   568972 non-null object
Día                                                568972 non-null object
Etiqueta personalizada 1                           568972 non-null object
CPC máximo predeterminado del grupo de anuncios    568972 non-null object
Moneda                                             568972 non-null object
Impresiones                                        568972 non-null int64
Clics                                              568972 non-null int64
CTR                                                568972 non-null object
CPC m

In [62]:
df_2017_18_1.shape

(568972, 19)

In [63]:
df_2017_18_2.shape

(579076, 19)

In [64]:
df_2018_19.shape

(447539, 19)

Now, I am going to concatenate all the data into one main dataframe

In [81]:
df_adwords = pd.concat([df_2017_18_1, df_2017_18_2, df_2018_19])
df_adwords.head(5)

Unnamed: 0,ID de producto,Campaña,ID de la campaña,Tipo de producto (primer nivel),Tipo de producto (segundo nivel),Día,Etiqueta personalizada 1,CPC máximo predeterminado del grupo de anuncios,Moneda,Impresiones,Clics,CTR,CPC medio,Coste,Conversiones,Todas las conversiones,Valor de conv.,Valor de todas las conversiones,Valor conv./coste
0,110275,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,19/11/2017,10,19,EUR,14,0,"0,00 %",0,0,0,0,0,0,0
1,110290,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,24/11/2017,10,19,EUR,10,0,"0,00 %",0,0,0,0,0,0,0
2,60067,SHOP - Baño - Armarios de baño - Cajoneras y e...,940399152,baño,armarios de baño,07/10/2017,40,17,EUR,173,2,"1,16 %",14,27,0,0,0,0,0
3,91514,SHOP - Decoración - Todas las categorías,868125290,decoración,adornos y objetos decorativos,27/12/2017,50,1,EUR,5,0,"0,00 %",0,0,0,0,0,0,0
4,49949,SHOP - Decoración - Todas las categorías,868125290,decoración,adornos y objetos decorativos,01/12/2017,30,1,EUR,1,0,"0,00 %",0,0,0,0,0,0,0


In [82]:
df_adwords.shape

(1595587, 19)

In [83]:
df_adwords.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1595587 entries, 0 to 447538
Data columns (total 19 columns):
ID de producto                                     1595587 non-null int64
Campaña                                            1595587 non-null object
ID de la campaña                                   1595587 non-null int64
Tipo de producto (primer nivel)                    1595587 non-null object
Tipo de producto (segundo nivel)                   1595587 non-null object
Día                                                1595587 non-null object
Etiqueta personalizada 1                           1595587 non-null object
CPC máximo predeterminado del grupo de anuncios    1595587 non-null object
Moneda                                             1595587 non-null object
Impresiones                                        1595587 non-null int64
Clics                                              1595587 non-null int64
CTR                                                1595587 non-null

I start with the data wrangling of this dataframe

Firstly, I rename the columns

In [84]:
df_adwords = df_adwords.rename(index=str, 
            columns={"ID de producto":"Reference", "Tipo de producto (primer nivel)": "CatN1",
                     "Tipo de producto (segundo nivel)":"CatN2", "Día":"Date", "Etiqueta personalizada 1": "Cat_Price", 
                     "CPC máximo predeterminado del grupo de anuncios": "CPC_max", "CPC medio": "CPC_medio", 
                     "Impresiones":"Impressions", "Coste":"Cost", "Conversiones": "Conversions",
                    "Todas las conversiones": "All_Conversions", "Valor de conv.": "Ads_Income", 
                    "Valor de todas las conversiones": "Ads_Income_All"})

In [85]:
df_adwords.head(5)

Unnamed: 0,Reference,Campaña,ID de la campaña,CatN1,CatN2,Date,Cat_Price,CPC_max,Moneda,Impressions,Clics,CTR,CPC_medio,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All,Valor conv./coste
0,110275,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,19/11/2017,10,19,EUR,14,0,"0,00 %",0,0,0,0,0,0,0
1,110290,SHOP - Navidad - Todas las Categorías,871212134,navidad,árbol de navidad,24/11/2017,10,19,EUR,10,0,"0,00 %",0,0,0,0,0,0,0
2,60067,SHOP - Baño - Armarios de baño - Cajoneras y e...,940399152,baño,armarios de baño,07/10/2017,40,17,EUR,173,2,"1,16 %",14,27,0,0,0,0,0
3,91514,SHOP - Decoración - Todas las categorías,868125290,decoración,adornos y objetos decorativos,27/12/2017,50,1,EUR,5,0,"0,00 %",0,0,0,0,0,0,0
4,49949,SHOP - Decoración - Todas las categorías,868125290,decoración,adornos y objetos decorativos,01/12/2017,30,1,EUR,1,0,"0,00 %",0,0,0,0,0,0,0


Then, I select the columns of our interest for this study

In [86]:
column_interest_adwords = ["Reference", "CatN1", "CatN2", "Date", "Cat_Price", "CPC_max", "CPC_medio", "Impressions", 
                           "Clics", "Cost", "Conversions", "All_Conversions", "Ads_Income", "Ads_Income_All"]
df_adwords = df_adwords[column_interest_adwords]

In [87]:
df_adwords.head(5)

Unnamed: 0,Reference,CatN1,CatN2,Date,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All
0,110275,navidad,árbol de navidad,19/11/2017,10,19,0,14,0,0,0,0,0,0
1,110290,navidad,árbol de navidad,24/11/2017,10,19,0,10,0,0,0,0,0,0
2,60067,baño,armarios de baño,07/10/2017,40,17,14,173,2,27,0,0,0,0
3,91514,decoración,adornos y objetos decorativos,27/12/2017,50,1,0,5,0,0,0,0,0,0
4,49949,decoración,adornos y objetos decorativos,01/12/2017,30,1,0,1,0,0,0,0,0,0


In [88]:
df_adwords.shape

(1595587, 14)

After that, I change dtype from some columns in order to work correctly

In [90]:
#Change reference value from int to string in order to save references id as list later
df_adwords["Reference"] = df_adwords["Reference"].apply(str)

#Parse time to "Date" column
df_adwords["Date"] = pd.to_datetime(df_adwords["Date"], format = "%d/%m/%Y")

#Change to float type
#float_columns = ["CPC_medio", "Cost", "Conversions", "All_Conversions", "Ads_Income", "Ads_Income_All"]
df_adwords["CPC_max"] = df_adwords["CPC_max"].str.replace(",",".").astype(float)
df_adwords["CPC_medio"] = df_adwords["CPC_medio"].str.replace(",",".").astype(float)
df_adwords["Cost"] = df_adwords["Cost"].str.replace(",",".").astype(float)
df_adwords["Conversions"] = df_adwords["Conversions"].str.replace(",",".").astype(float)
df_adwords["All_Conversions"] = df_adwords["All_Conversions"].str.replace(",",".").astype(float)
df_adwords["Ads_Income"] = df_adwords["Ads_Income"].str.replace(",",".").astype(float)
df_adwords["Ads_Income_All"] = df_adwords["Ads_Income_All"].str.replace(",",".").astype(float)

In [92]:
df_adwords.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1595587 entries, 0 to 447538
Data columns (total 14 columns):
Reference          1595587 non-null object
CatN1              1595587 non-null object
CatN2              1595587 non-null object
Date               1595587 non-null datetime64[ns]
Cat_Price          1595587 non-null object
CPC_max            1595587 non-null float64
CPC_medio          1595587 non-null float64
Impressions        1595587 non-null int64
Clics              1595587 non-null int64
Cost               1595587 non-null float64
Conversions        1595587 non-null float64
All_Conversions    1595587 non-null float64
Ads_Income         1595587 non-null float64
Ads_Income_All     1595587 non-null float64
dtypes: datetime64[ns](1), float64(7), int64(2), object(4)
memory usage: 182.6+ MB


In [94]:
df_adwords.head(5)

Unnamed: 0,Reference,CatN1,CatN2,Date,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All
0,110275,navidad,árbol de navidad,2017-11-19,10,0.19,0.0,14,0,0.0,0.0,0.0,0.0,0.0
1,110290,navidad,árbol de navidad,2017-11-24,10,0.19,0.0,10,0,0.0,0.0,0.0,0.0,0.0
2,60067,baño,armarios de baño,2017-10-07,40,0.17,0.14,173,2,0.27,0.0,0.0,0.0,0.0
3,91514,decoración,adornos y objetos decorativos,2017-12-27,50,0.01,0.0,5,0,0.0,0.0,0.0,0.0,0.0
4,49949,decoración,adornos y objetos decorativos,2017-12-01,30,0.01,0.0,1,0,0.0,0.0,0.0,0.0,0.0


After all data wrangling steps, I continue selecting the specific references and their data from "df_adwords"

I open csv file "top_50_reference" to merge with "df_adwords"

In [110]:
#Set directory to open the file
path = "/home/miguel/my_project_python/my_project_env/tfm/datasets"
os.chdir(path)

#Open the file
df_top_50_references = pd.read_csv("top_50_references.csv", sep="\t")

#Select the column that I want to use to merge
df_top_50_references = df_top_50_references[["Reference"]]

#Change from int to string (both columns with the same dtype)
df_top_50_references["Reference"] = df_top_50_references["Reference"].apply(str)

In [111]:
df_top_50_references.head(5)

Unnamed: 0,Reference
0,8141600
1,82462
2,82466
3,8146000
4,122539


I merge both dataframes

In [112]:
merge_adwords = df_adwords.merge(df_top_50_references,
    how="right",
    left_on=["Reference"],
    right_on=["Reference"])

In [119]:
merge_adwords.head(5)

Unnamed: 0,Reference,CatN1,CatN2,Date,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All
0,49809,decoración,adornos y objetos decorativos,2017-12-22,80,0.01,0.25,659,12,3.05,0.0,0.0,0.0,0.0
1,49809,decoración,adornos y objetos decorativos,2018-01-09,50,0.01,0.15,884,19,2.8,0.0,3.0,0.0,0.0
2,49809,decoración,adornos y objetos decorativos,2017-11-13,80,0.01,0.16,677,24,3.84,0.0,0.0,0.0,0.0
3,49809,decoración,adornos y objetos decorativos,2017-11-11,80,0.01,0.19,570,15,2.9,0.0,0.0,0.0,0.0
4,49809,decoración,adornos y objetos decorativos,2017-12-31,80,0.01,0.28,209,6,1.65,0.0,0.0,0.0,0.0


¿What is the shape for the new dataframe?

In [113]:
merge_adwords.shape

(18784, 14)

In [114]:
df_adwords.shape

(1595587, 14)

In [117]:
round(df_adwords.shape[0]/merge_adwords.shape[0],0)

85.0

The new dataframe is 85x times smaller in number of rows (both maintain the same colums)

After simplifying the dataset, I is less hard to the computer splitting dataframes into different dataframes

In [121]:
#Create unique list of names
references = merge_adwords["Reference"].unique().tolist()

#Create a dataframe dictionary to store your data frames
DataFrameDict = {elem : pd.DataFrame for elem in references}

#Create a loop for to read all rows and save according to the key
for key in DataFrameDict.keys():
    DataFrameDict[key] = merge_adwords[:][merge_adwords.Reference == key]

I use a reference id as example to check its functionality

In [122]:
DataFrameDict["101318"]

Unnamed: 0,Reference,CatN1,CatN2,Date,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All
9759,101318,muebles,sillería y tapicería,2017-10-06,80,0.24,0.00,30,0,0.00,0.00,0.00,0.00,0.00
9760,101318,muebles,sillería y tapicería,2017-10-13,80,0.24,0.20,82,2,0.39,0.00,0.00,0.00,0.00
9761,101318,muebles,sillas,2018-02-01,60,0.24,0.00,2,0,0.00,0.00,0.00,0.00,0.00
9762,101318,muebles,sillería y tapicería,2017-10-15,80,0.24,0.19,101,1,0.19,0.00,0.00,0.00,0.00
9763,101318,muebles,sillería y tapicería,2017-11-05,80,0.24,0.20,197,6,1.21,0.00,0.00,0.00,0.00
9764,101318,muebles,sillería y tapicería,2017-11-19,80,0.24,0.19,253,4,0.77,0.00,0.00,0.00,0.00
9765,101318,muebles,sillería y tapicería,2017-10-03,80,0.24,0.00,32,0,0.00,0.00,0.00,0.00,0.00
9766,101318,muebles,sillería y tapicería,2017-10-07,80,0.24,0.00,29,0,0.00,0.00,0.00,0.00,0.00
9767,101318,muebles,sillería y tapicería,2017-11-06,80,0.24,0.00,140,0,0.00,0.00,0.00,0.00,0.00
9768,101318,muebles,sillería y tapicería,2017-10-25,80,0.24,0.20,11,1,0.20,0.00,0.00,0.00,0.00


Later, I load top the csv file "references_url.csv" to use its id values as filters to create the required dataframes per reference.

Then, I create another list using last dataset (I use "Reference" column as I have also made before)

In [127]:
lista = df_top_50_references["Reference"].apply(str)
c = lista.values.tolist()

I make a for loop to:

- Read each component of the list and check its data from DataFrameDict.
- Set the directory to storage the data
- Save a csv file for each reference

In [129]:
for i in range(len(c)):
    df_test = DataFrameDict[c[i]]
    file = c[i]
    
    path = "/home/miguel/my_project_python/my_project_env/tfm/datasets/adwords/transform_data"
    os.chdir(path)
    
    df_test.to_csv(file)

I check it work well

In [130]:
df_test_ads = pd.read_csv("101318", sep=",", error_bad_lines=True)

In [131]:
df_test_ads.head(5)

Unnamed: 0.1,Unnamed: 0,Reference,CatN1,CatN2,Date,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All
0,9759,101318,muebles,sillería y tapicería,2017-10-06,80,0.24,0.0,30,0,0.0,0.0,0.0,0.0,0.0
1,9760,101318,muebles,sillería y tapicería,2017-10-13,80,0.24,0.2,82,2,0.39,0.0,0.0,0.0,0.0
2,9761,101318,muebles,sillas,2018-02-01,60,0.24,0.0,2,0,0.0,0.0,0.0,0.0,0.0
3,9762,101318,muebles,sillería y tapicería,2017-10-15,80,0.24,0.19,101,1,0.19,0.0,0.0,0.0,0.0
4,9763,101318,muebles,sillería y tapicería,2017-11-05,80,0.24,0.2,197,6,1.21,0.0,0.0,0.0,0.0


In [132]:
df_test_ads.shape

(323, 15)

In [134]:
df_test_ads.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 323 entries, 0 to 322
Data columns (total 15 columns):
Unnamed: 0         323 non-null int64
Reference          323 non-null int64
CatN1              323 non-null object
CatN2              323 non-null object
Date               323 non-null object
Cat_Price          323 non-null int64
CPC_max            323 non-null float64
CPC_medio          323 non-null float64
Impressions        323 non-null int64
Clics              323 non-null int64
Cost               323 non-null float64
Conversions        323 non-null float64
All_Conversions    323 non-null float64
Ads_Income         323 non-null float64
Ads_Income_All     323 non-null float64
dtypes: float64(7), int64(5), object(3)
memory usage: 37.9+ KB


Later, I create function "ads_missing_values" to be able to achieve following goals:

- Open some files that I will work
- Make the required data wrangling in this dataframes
- Work specificly in each reference
- Return a dataframe with the main data in each case

In [188]:
def ads_missing_values(path1, path2, file):
    
    #set directory and open datetime_df file
    os.chdir(path1)
    df_datetime = pd.read_csv("datetime_df.csv")
    
    #selecto column interest and parse data to datetime
    column_interest_datetime = ["Dates"]
    df_datetime = df_datetime[column_interest_datetime]
    df_datetime["Dates"] = pd.to_datetime(df_datetime["Dates"], format = "%Y-%m-%d")
    
        
    #set directory to open the reference file
    os.chdir(path2)
    
    #open the csv file for each reference
    df_ads_ref = pd.read_csv(c[i])
        
    #data wranggling of this data        
    df_ads_ref["Date"] = pd.to_datetime(df_ads_ref["Date"], format = "%Y-%m-%d")
    
        
    #merge with df_datetime
    merge_bw = df_ads_ref.merge(df_datetime,
        how="right",
        left_on=["Date"],
        right_on=["Dates"])
        
    #fill NaN values
    merge_bw["Reference"].fillna(method='ffill', inplace = True)
    merge_bw["CatN1"].fillna(method='ffill', inplace = True)
    merge_bw["CatN2"].fillna(method='ffill', inplace = True)
    merge_bw["Cat_Price"].fillna(method='ffill', inplace = True)
    merge_bw.fillna(0, inplace = True)
    
    #sort values by Dates
    merge_bw = merge_bw.sort_values(by="Dates")
    
    #select required columns
    merge_bw = merge_bw.drop(["Date", "Unnamed: 0"], axis=1)
    
    #drop duplicates in "Dates"
    merge_bw["Dates"] = merge_bw["Dates"].drop_duplicates(keep='first')

        
    return merge_bw


I apply the previous function using a for loop

In [189]:
path_datetime = "/home/miguel/my_project_python/my_project_env/tfm/datasets"
path_transform_data = path_datetime + "/adwords/transform_data"

for i in range(len(c)):
    df_test = ads_missing_values(path_datetime, path_transform_data, c[i])
    file = c[i]
    
    path = "/home/miguel/my_project_python/my_project_env/tfm/datasets/adwords/end_data"
    os.chdir(path)
    
    df_test.to_csv(file)

Finally, I check the data obtained

In [190]:
df_122335 = pd.read_csv("122335", sep=",", error_bad_lines=True)
df_122335.head(5)

Unnamed: 0.1,Unnamed: 0,Reference,CatN1,CatN2,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All,Dates
0,553,122335.0,muebles,mesitas de noche,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2017-09-02
1,554,122335.0,muebles,mesitas de noche,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2017-09-03
2,555,122335.0,muebles,mesitas de noche,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2017-09-04
3,556,122335.0,muebles,mesitas de noche,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2017-09-05
4,557,122335.0,muebles,mesitas de noche,100.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2017-09-06


In [191]:
df_122335.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 601 entries, 0 to 600
Data columns (total 15 columns):
Unnamed: 0         601 non-null int64
Reference          601 non-null float64
CatN1              601 non-null object
CatN2              601 non-null object
Cat_Price          601 non-null float64
CPC_max            601 non-null float64
CPC_medio          601 non-null float64
Impressions        601 non-null float64
Clics              601 non-null float64
Cost               601 non-null float64
Conversions        601 non-null float64
All_Conversions    601 non-null float64
Ads_Income         601 non-null float64
Ads_Income_All     601 non-null float64
Dates              576 non-null object
dtypes: float64(11), int64(1), object(3)
memory usage: 70.5+ KB


In [192]:
df_122335.shape

(601, 15)

In [193]:
df_122335["CPC_medio_1w"] = df_122335["CPC_medio"].transform(lambda x: x.rolling(7, 1).mean())

In [194]:
df_122335.tail(10)

Unnamed: 0.1,Unnamed: 0,Reference,CatN1,CatN2,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All,Dates,CPC_medio_1w
591,496,122335.0,muebles,mesitas de noche,100.0,0.3,0.36,1000.0,6.0,2.18,0.0,0.0,0.0,0.0,2019-03-22,0.435714
592,495,122335.0,muebles,mesitas de noche,100.0,0.3,0.51,717.0,9.0,4.63,0.0,0.0,0.0,0.0,2019-03-23,0.45
593,429,122335.0,muebles,mesitas de noche,100.0,0.3,0.51,895.0,5.0,2.55,1.0,2.0,145.8,291.6,2019-03-24,0.455714
594,542,122335.0,muebles,mesitas de noche,100.0,0.3,0.58,865.0,4.0,2.32,1.0,2.0,251.9,503.8,2019-03-25,0.468571
595,493,122335.0,muebles,mesitas de noche,100.0,0.3,0.42,760.0,6.0,2.51,0.0,0.0,0.0,0.0,2019-03-26,0.482857
596,525,122335.0,muebles,mesitas de noche,100.0,0.3,0.53,605.0,9.0,4.78,0.0,0.0,0.0,0.0,2019-03-27,0.491429
597,475,122335.0,muebles,mesitas de noche,100.0,0.3,0.27,495.0,2.0,0.54,0.0,0.0,0.0,0.0,2019-03-28,0.454286
598,477,122335.0,muebles,mesitas de noche,100.0,0.3,0.49,795.0,5.0,2.46,0.0,0.0,0.0,0.0,2019-03-29,0.472857
599,473,122335.0,muebles,mesitas de noche,100.0,0.3,0.36,534.0,4.0,1.42,0.0,0.0,0.0,0.0,2019-03-30,0.451429
600,446,122335.0,muebles,mesitas de noche,100.0,0.3,0.3,366.0,1.0,0.3,0.0,0.0,0.0,0.0,2019-03-31,0.421429


In [None]:
df['moving'] = df.groupby('object')['value'].transform(lambda x: x.rolling(10, 1).mean())

In [195]:
def rolling_data(columns, file, path1):
    
    #set directory to open the reference file
    os.chdir(path1)
    
    #open the csv file for each reference
    df_rolling = pd.read_csv(file)
    
    #for loop to apply columns
    for i in range(len(columns)):
        column = columns[i]
    
    #columns created with rolling
        column1 = columns[i]+"1w"
        column2 = columns[i]+"1m"
        column3 = columns[i]+"3m"
        column4 = columns[i]+"6m"
        column5 = columns[i]+"1y"
    
    #apply rolling in each column
        df_rolling[column1] = round(df_rolling[column].transform(lambda x: x.rolling(7, 1).mean()),2)
        df_rolling[column2] = round(df_rolling[column].transform(lambda x: x.rolling(28, 1).mean()),2)
        df_rolling[column3] = round(df_rolling[column].transform(lambda x: x.rolling(91, 1).mean()),2)
        df_rolling[column4] = round(df_rolling[column].transform(lambda x: x.rolling(182, 1).mean()),2)
        df_rolling[column5] = round(df_rolling[column].transform(lambda x: x.rolling(364, 1).mean()),2)
        
    #filter date period
        df_rolling = df_rolling[df_rolling["Dates"] > "2018-09-30"]
        
    return df_rolling

In [196]:
columns = ["CPC_max", "CPC_medio", "Impressions", "Clics"]
path = "/home/miguel/my_project_python/my_project_env/tfm/datasets/adwords"
path1 =  path + "/end_data"
path2 =  path + "/test"

for i in range(len(c)):
    #df_test = DataFrameDict[c[i]]
    df_test = rolling_data(columns, c[i], path1)
    file = c[i]
    
    #path = "/home/miguel/my_project_python/my_project_env/tfm/datasets/adwords/end_data"
    os.chdir(path2)
    
    df_test.to_csv(file)

In [197]:
df_122335.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 601 entries, 0 to 600
Data columns (total 16 columns):
Unnamed: 0         601 non-null int64
Reference          601 non-null float64
CatN1              601 non-null object
CatN2              601 non-null object
Cat_Price          601 non-null float64
CPC_max            601 non-null float64
CPC_medio          601 non-null float64
Impressions        601 non-null float64
Clics              601 non-null float64
Cost               601 non-null float64
Conversions        601 non-null float64
All_Conversions    601 non-null float64
Ads_Income         601 non-null float64
Ads_Income_All     601 non-null float64
Dates              576 non-null object
CPC_medio_1w       601 non-null float64
dtypes: float64(12), int64(1), object(3)
memory usage: 75.2+ KB


In [198]:
pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows',500)

In [199]:
df_122335 = pd.read_csv("122335", sep=",", error_bad_lines=True)
df_122335

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Reference,CatN1,CatN2,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,Cost,Conversions,All_Conversions,Ads_Income,Ads_Income_All,Dates,CPC_max1w,CPC_max1m,CPC_max3m,CPC_max6m,CPC_max1y,CPC_medio1w,CPC_medio1m,CPC_medio3m,CPC_medio6m,CPC_medio1y,Impressions1w,Impressions1m,Impressions3m,Impressions6m,Impressions1y,Clics1w,Clics1m,Clics3m,Clics6m,Clics1y
0,414,460,122335.0,muebles,mesitas de noche,100.0,0.29,0.29,1312.0,6.0,1.74,0.0,1.0,0.0,0.0,2018-10-01,0.29,0.29,0.29,0.25,0.25,0.29,0.29,0.29,0.29,0.29,1312.0,1312.0,1312.0,1312.0,1312.0,6.0,6.0,6.0,6.0,6.0
1,415,471,122335.0,muebles,mesitas de noche,100.0,0.29,0.15,1055.0,1.0,0.15,0.0,0.0,0.0,0.0,2018-10-02,0.29,0.29,0.29,0.25,0.25,0.22,0.22,0.22,0.22,0.22,1183.5,1183.5,1183.5,1183.5,1183.5,3.5,3.5,3.5,3.5,3.5
2,416,459,122335.0,muebles,mesitas de noche,100.0,0.29,0.0,889.0,0.0,0.0,0.0,0.0,0.0,0.0,2018-10-03,0.29,0.29,0.29,0.25,0.25,0.15,0.15,0.15,0.15,0.15,1085.33,1085.33,1085.33,1085.33,1085.33,2.33,2.33,2.33,2.33,2.33
3,417,499,122335.0,muebles,mesitas de noche,100.0,0.29,0.37,1048.0,1.0,0.37,0.0,0.0,0.0,0.0,2018-10-04,0.29,0.29,0.29,0.25,0.25,0.2,0.2,0.2,0.2,0.2,1076.0,1076.0,1076.0,1076.0,1076.0,2.0,2.0,2.0,2.0,2.0
4,418,381,122335.0,muebles,mesitas de noche,100.0,0.29,0.15,835.0,3.0,0.44,1.0,3.0,156.42,156.42,2018-10-05,0.29,0.29,0.29,0.25,0.25,0.19,0.19,0.19,0.19,0.19,1027.8,1027.8,1027.8,1027.8,1027.8,2.2,2.2,2.2,2.2,2.2
5,419,423,122335.0,muebles,mesitas de noche,100.0,0.29,0.26,825.0,5.0,1.29,0.0,1.0,0.0,0.0,2018-10-06,0.29,0.29,0.29,0.25,0.25,0.2,0.2,0.2,0.2,0.2,994.0,994.0,994.0,994.0,994.0,2.67,2.67,2.67,2.67,2.67
6,420,494,122335.0,muebles,mesitas de noche,100.0,0.29,0.19,1117.0,5.0,0.97,0.0,0.0,0.0,0.0,2018-10-07,0.29,0.29,0.29,0.25,0.25,0.2,0.2,0.2,0.2,0.2,1011.57,1011.57,1011.57,1011.57,1011.57,3.0,3.0,3.0,3.0,3.0
7,421,526,122335.0,muebles,mesitas de noche,100.0,0.29,0.18,780.0,3.0,0.53,0.0,0.0,0.0,0.0,2018-10-08,0.29,0.29,0.29,0.25,0.25,0.19,0.2,0.2,0.2,0.2,935.57,982.62,982.62,982.62,982.62,2.57,3.0,3.0,3.0,3.0
8,422,552,122335.0,muebles,mesitas de noche,100.0,0.29,0.17,600.0,3.0,0.51,0.0,0.0,0.0,0.0,2018-10-09,0.29,0.29,0.29,0.25,0.25,0.19,0.2,0.2,0.2,0.2,870.57,940.11,940.11,940.11,940.11,2.86,3.0,3.0,3.0,3.0
9,423,539,122335.0,muebles,mesitas de noche,100.0,0.29,0.2,458.0,3.0,0.6,0.0,0.0,0.0,0.0,2018-10-10,0.29,0.29,0.29,0.25,0.25,0.22,0.2,0.2,0.2,0.2,809.0,891.9,891.9,891.9,891.9,3.29,3.0,3.0,3.0,3.0


In [200]:
df_122335.shape

(182, 36)

In [170]:
df_101318 = pd.read_csv("101318", sep=",", error_bad_lines=True)
df_101318.tail(5)

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Reference,CatN1,CatN2,Cat_Price,CPC_max,CPC_medio,Impressions,Clics,...,Impressions1w,Impressions1m,Impressions3m,Impressions6m,Impressions1y,Clics1w,Clics1m,Clics3m,Clics6m,Clics1y
576,576,576,101318.0,muebles,sillas,80.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1134.1,2389.86,1451.44,0.0,0.0,7.58,15.86,9.69
577,577,577,101318.0,muebles,sillas,80.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1077.36,2389.86,1450.78,0.0,0.0,7.18,15.86,9.68
578,578,578,101318.0,muebles,sillas,80.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1010.84,2389.79,1450.13,0.0,0.0,6.74,15.86,9.68
579,579,579,101318.0,muebles,sillas,80.0,0.0,0.0,0.0,0.0,...,0.0,0.0,942.21,2389.33,1449.5,0.0,0.0,6.31,15.86,9.66
580,580,580,101318.0,muebles,sillas,80.0,0.0,0.0,0.0,0.0,...,0.0,0.0,908.88,2381.65,1448.73,0.0,0.0,6.03,15.78,9.65
