# Cleaning II

In the last notebook [CleaningI](CleaningI.ipynb) we merged the two data files for respondents who completed both surveys and then we changed the column names from `data1` to their descriptions.  In this notebook we complete the following tasks:
- **Translatations for `data2`** 
- **Delete unwanted columns**

## Translations for `data2`

The Google translate API is not the best translator, even though in the [Problem Statement notebook](ProblemStatementStakeholdersKPIs.ipynb) it was useful to give an idea of what the data was measuring.  This time we shall do a little more preprocessing before running the data through a different translator API.

In [1]:
# Import pandas to handle the data
import pandas as pd

# Load master_data and show the columns
master_data = pd.read_csv("data/master_data.csv")
master_data.columns.values.tolist()

['Unnamed: 0',
 'Age of the respondent. (Numeric)                                                              ',
 'Gender of the respondent. (Categorical)                                                       ',
 'Country of residence of the respondent. (Categorical)                                         ',
 'Number of times the respondent has used psychedelics. (Numeric)                               ',
 'Number of times the respondent has used other psychoactive drugs. (Numeric)                   ',
 'Number of times the respondent has used psychedelics in the past year. (Numeric)              ',
 'Number of times the respondent has used other psychoactive drugs in the past year. (Numeric)  ',
 'Number of times the respondent has used psychedelics in the past month. (Numeric)             ',
 'Number of times the respondent has used other psychoactive drugs in the past month. (Numeric) ',
 'Number of times the respondent has used psychedelics in the past week. (Numeric)            

Many of the column names have a number at the beginning, so we get rid of this number first.  We make a loop that will first split each column name into strings separated by the first underscore, then check whether the first string for each entry is a number and if it is then delete it.  There are still some columns labelled with numbers that didn't have a description on the Kaggle site, so in this loop we delete them as well.    

In [5]:
# Remove numbers and delete columns that are labelled as numbers
for entry in master_data.columns.values.tolist():
    #print("The entry is "+str(entry)+".")
    split_entry = entry.split("_", maxsplit=1) # Split the entry into two or fewer strings separated by the first underscore
    #print("The string(s) are "+str(split_entry)+".")  
    if split_entry[0].isnumeric()==True:
        del split_entry[0]
    #print("After the conditional, now split_entry[0] = "+str(split_entry[0])+".")   
    if split_entry[0].isnumeric()==True: # Check if the column needs to be deleted, otherwise, change the name
        del master_data[entry]
    else:
        master_data.rename(columns = {entry:split_entry[0]},
                       inplace=True)
    #print()
master_data    

Unnamed: 0.1,Unnamed: 0,Age of the respondent. (Numeric),Gender of the respondent. (Categorical),Country of residence of the respondent. (Categorical),Number of times the respondent has used psychedelics. (Numeric),Number of times the respondent has used other psychoactive drugs. (Numeric),Number of times the respondent has used psychedelics in the past year. (Numeric),Number of times the respondent has used other psychoactive drugs in the past year. (Numeric),Number of times the respondent has used psychedelics in the past month. (Numeric),Number of times the respondent has used other psychoactive drugs in the past month. (Numeric),...,usualmente_encuentro_cosas_de_que_reirme,confianza_en_mi_pasar_tiempos_dificiles,en_emergencias_pueden_confiar_en_mi,ver_situacion_varios_puntos_de_vista,mi_vida_tiene_sentido,no_insito_en_cosas_que_no_puedo_hacer_nada,encuentro_salida_en_situacion_dificil,tengo_energia_para_lo_que_tengo_que_hacer,siento_comodo_si_hay_gente_a_la_que_no_le_agrado,login_disclaimer_fork
0,0,4,5,5,1,5,3,5,1,5,...,6,6,7,7,7,7,6,7,6,2
1,1,4,3,5,3,4,3,4,4,4,...,6,7,7,5,5,6,5,4,6,2
2,2,5,5,5,3,3,5,5,1,1,...,6,3,7,6,6,2,6,4,6,2
3,3,4,4,5,4,4,3,5,3,3,...,4,2,6,6,3,3,6,3,2,2
4,4,5,3,2,2,5,5,5,4,4,...,7,7,7,6,7,5,6,5,7,2
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7085,7085,5,3,5,2,5,1,5,5,3,...,7,7,7,5,6,2,6,4,2,2
7086,7086,4,5,5,5,2,4,4,5,1,...,5,2,7,6,1,7,6,2,4,2
7087,7087,4,2,3,3,3,4,5,5,3,...,7,4,7,7,3,4,6,4,4,2
7088,7088,2,3,3,3,3,4,5,4,4,...,4,5,7,7,6,4,5,4,1,2


In [None]:
master_data