In this notebook, we merge the data from two different datasets: `data_whole.RData` which includes the political data, and `participants.csv` which includes the demographic data. We match participants based on their AOMIC-ID and keep only the data of participants who are present in both data sets, and filter out all variables except identity-based social ideology, issue-based social ideology, age, gender, education, background socio-economic status.  

In the following cell, we load in dplyr, and select only the variables of interst in `data_whole.RData`, and save this selective dataset as `poli_data.csv`.

In [52]:
# load libraries
#install.packages("dplyr")
library(dplyr)

# set working dir
setwd("/home/c13572687/Documents/scripts_and_data/csv_data/")

Here, we match up our political data (`poli_data.csv`) with our demographic data (`participants.tsv`).

In [53]:
poli_data = read.csv("poli_data.csv", sep = ',')
data_whole = read.csv("participants.tsv", sep = '\t')

# Creating a not in operator:
`%notin%` <- Negate(`%in%`)

# removing participants from poli data if they are not in AOMIC data
missing_participants = c()
for(participant_nr in (1:length(poli_data$participant_id))){
  if (poli_data$participant_id[participant_nr] %notin% data_whole$participant_id){
    missing_participants = append(missing_participants, poli_data$participant_id[participant_nr])}}

poli_data = filter(poli_data, poli_data$participant_id %notin% missing_participants)

# removing participants from AOMIC data if they are not in poli data
missing_participants = c()
for(participant_nr in (1:length(data_whole$participant_id))){
  if (data_whole$participant_id[participant_nr] %notin% poli_data$participant_id){
    missing_participants = append(missing_participants, data_whole$participant_id[participant_nr])}}

data_whole = filter(data_whole, data_whole$participant_id %notin% missing_participants)

# making sure all participants match up, element-wise
table(data_whole$participant_id == poli_data$participant_id)

poli_data <- poli_data %>% select(social.ideology, social.ideology.scale,
                                  social.identity, social.identity.scale,
                                  political.interest, political.interest.scale)

data_whole <- data_whole %>% select(participant_id, age, sex, education_level, background_SES)
all_data = cbind(data_whole, poli_data)

write.csv(all_data, "all_data.csv")