-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathegor_allbus.Rmd
208 lines (165 loc) · 6.71 KB
/
egor_allbus.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
---
title: "Working with Allbus 2010 ego-centered network data using egor"
author: "Till Krenz"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Working with Allbus 2010 ego-centered network data using egor}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 8,
fig.height = 5
)
```
***Note: The data used in this vignette is simulated based on the the original Allbus
2010 SPSS data by GESIS. The dataset simulates 100 respondents and does not resemble any
actual Allbus respondents. Each variable is randomly generated based on the range
of the original variables, co-variations of variables are disregarded. The
data's purpose is purely to demonstrate how to technically work with the Allbus data
using egor and R - no analytical assumptions should be made based on this data!
The code in this vignette works with the original Allbus 2010 data, that can be
acquired [here](https://www.gesis.org/en/allbus/allbus-home).***
## The Allbus 2010: ego-centered network data
The Allbus 2010 splits the respondents into two groups. Both groups were presented
different name generators.
- Allbus name generator - the generated alters are called "Freunde" (friends in German) in the data (max. 3 persons, "spent time with in private, not living in same household")
- GSS name generator - these alters are called "Kontakte" (contacts in German) in the dataset (max. 5 persons, "discussed important matters")
For more information please consult the Allbus documentation.
## Load packages and data
```{r message=FALSE, warning=FALSE}
library(egor)
library(purrr)
library(haven)
```
In addition to *egor*, this vignette uses the *haven* package, to import the SPSS
file of the Allbus 2010 and the *[purrr](https://purrr.tidyverse.org/)* package, that provides enhanced functional programming functions. The *purrr* functions used in
this vignette are _map*()_ functions, which are similar in their functionality to
base R's *lapply()*.
When using *haven* to import the original Allbus data, that would look like this.
```
raw_data <- read_sav("ZA4610_A10.SAV")
```
For the purpose of the vignette we are loading a simulated data instead.
```{r}
data("allbus_2010_simulated")
raw_data <- allbus_2010_simulated
```
The Allbus variable names are quite technical ranging from V1 to V981. Fortunately
the *haven* data import preserves the SPSS variable labels, that describe each
variable in more detail. We are going to convert these labels into a format,
that allows us to use them as variable names.
The code below extracts all variable labels and eliminates all non-characters from
the labels and substitutes spaces with underscores.
```{r}
var_labels <- map_chr(raw_data, ~attr(., "label"))
var_labels <- gsub("[,\\.:;><?+()-]", " ", var_labels)
var_labels <- gsub("\\s+", "_", trimws(var_labels))
```
The variable labels for the ego-centered network module need some special treatment,
so that egor can give useful names to the alter variables. We are deleting the
first part from those labels, that is shared by all variables in each split.
```{r}
var_labels <- gsub("FREUND_IN_._", "", var_labels)
var_labels <- gsub("^KONTAKT_._", "", var_labels)
```
Now we can use the cleaned up variable labels as *names* for our data.
```{r}
names(raw_data) <- make.unique(var_labels, sep = "")
```
## Convert the data to egor objects
We are going to create two separate ego objects for each split. Starting with the "Freunde" split.
First we filter out only those respondents from split 1.
```{r}
split_freunde <-
raw_data %>%
filter(FRAGEBOGENSPLIT_F020 == 1)
```
Now we use the *onefile_to_egor()* function to convert the data to an egor object.
This function needs a few arguments in order for it to be able to locate the
alter data and alter-alter tie data in the dataset.
```{r}
e_freunde <- onefile_to_egor(
egos = split_freunde,
ID.vars = list(ego = "IDENTIFIKATIONSNUMMER_DES_BEFRAGTEN"),
netsize = split_freunde$ANZ_GENANNTER_NETZWERKPERS_SPLIT_1,
attr.start.col = "GESCHLECHT",
attr.end.col = "SPANNUNGEN_KONFLIKTE2",
aa.first.var = "KENNEN_SICH_A_B",
max.alters = 3)
```
The *onefile_to_egor()* function prints some messages during the conversion, that
are supposed to help us to identify problems in case something something goes
wrong.
We also see a NOTE, that tells us that we need to filter out invalid alter-alter
ties. In this case those are ties with a weight of 2, since Allbus codes
not existing ties with 2 here.
```{r}
attr(raw_data$KENNEN_SICH_A_B, "labels")
```
"KENNEN SICH NICHT" means "don't know each other" in german.
We can filter the alter-alter ties using the *activate()* and *filter()*
functions.
```{r}
e_freunde <-
e_freunde %>%
activate(aatie) %>%
filter(weight != 2) %>%
activate(ego)
```
Next we repeat the same steps for split 2. Here we need to filter out the weight
value 3 from the alter-alter ties and of adjust some arguments according to the
position of the data in the dataset and the maximum amount of alters that the
respondents were allowed to nominate.
```{r}
split_kontakte <-
raw_data %>%
filter(FRAGEBOGENSPLIT_F020 == 2)
e_kontakte <- onefile_to_egor(
egos = split_kontakte,
ID.vars = list(ego = "IDENTIFIKATIONSNUMMER_DES_BEFRAGTEN"),
netsize = split_kontakte$ANZ_GENANNTER_NETZWERKPERS_SPLIT_2,
attr.start.col = "GESCHLECHT3",
attr.end.col = "SPANNUNGEN_KONFLIKTE7",
aa.first.var = "KENNEN_SICH_KONTAKT_A_B",
max.alters = 5)
e_kontakte <-
e_kontakte %>%
activate(aatie) %>%
filter(weight != 3) %>%
activate(ego)
```
## Visualize and analyze
Now we can visualize and analyze the Allbus data. A few demonstrations follow.
For an overview of available options, please see the main vignette of egor "Using `egor` to analyse ego-centered network data".
```{r}
plot(e_freunde, ego_no = 4, x_dim = 2, y_dim = 1)
plot(e_kontakte, ego_no = 4, x_dim = 2, y_dim = 1)
```
```{r}
e_freunde <-
e_freunde%>%
activate(alter) %>%
mutate(WO_GEBOREN = droplevels(as_factor(WO_GEBOREN)),
KONTAKTE = droplevels(as_factor(KONTAKTE)))
plot_egograms(e_freunde,
ego_no = 4,
x_dim = 1,
y_dim = 1, venn_var = "KONTAKTE",
pie_var = "WO_GEBOREN")
e_kontakte <-
e_kontakte %>%
activate(alter) %>%
mutate(WO_GEBOREN = droplevels(as_factor(WO_GEBOREN)),
KONTAKTE = droplevels(as_factor(KONTAKTE)))
plot_egograms(e_kontakte,
ego_no = 4,
x_dim = 1,
y_dim = 1,
venn_var = "KONTAKTE" ,
pie_var = "WO_GEBOREN")
```