In [18]:
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

import numpy as np
import pandas as pd
import networkx as nx
import pyreadstat

pd.options.mode.chained_assignment = None

# Importing & Preparing Data

In [19]:
PATH = 'data/addhealth_data/w1network_fmt_data.sas7bdat'

In [20]:
df = pd.read_sas(PATH)
columns = list(df.columns)
print(columns)
print(df.shape)

['FMTNAME', 'START', 'END', 'LABEL', 'MIN', 'MAX', 'DEFAULT', 'LENGTH', 'FUZZ', 'PREFIX', 'MULT', 'FILL', 'NOEDIT', 'TYPE', 'SEXCL', 'EEXCL', 'HLO', 'DECSEP', 'DIG3SEP', 'DATATYPE', 'LANGUAGE']
(12, 21)


In [21]:
network_df = pd.read_sas("data/addhealth_data/w1network.sas7bdat")
columns = list(network_df.columns)
print(columns[:10])
print(network_df.shape)


['AID', 'SIZE', 'IDGX2', 'ODGX2', 'NOUTNOM', 'TAB113', 'BCENT10X', 'REACH', 'REACH3', 'IGDMEAN']
(6504, 439)


The Add Health study, focused on adolescent health and well-being, employs a clustered design to gather extensive social network data from 90,118 students in 145 schools across 80 communities. This approach facilitates a detailed analysis of individual and school-level networks, particularly peer and friend networks. The study's in-school questionnaire includes a friendship section, where students nominate up to ten friends and provide their identification numbers. This data allows for an in-depth examination of each respondent's extended friendship network and the broader social structure of their school, enhancing understanding of the social contexts influencing adolescent health.

- In-Degree and Out-Degree (IDGX2, ODGX2): Measures the number of ties directed to and from the respondent, indicating popularity and sociability.
- Bonacich Centrality (BCENT10X): Assesses an individual's centrality in the network, indicating influence or prominence.
- Reach (REACH, REACH3): Quantifies the number of reachable alters within one or three steps, indicating network spread.
- Mean Distance to Reachable Alters (IGDMEAN): Calculates the average social distance to reachable peers.
- Proximity Prestige (PRXPREST): Reflects the respondent's status based on closeness to other high-status individuals.
- Best Friend Reciprocity (BMFRECIP, BMFRECBF, BFFRECIP, BFFRECBF): Indicates whether best friend nominations are mutual.
- Network Density (ESDEN, ERDEN, ESRDEN): Represents the closeness of ties within the respondent's network.
- Network Size (NES, NER, NESR): The total number of ties in the respondent's network.
- Network Heterogeneity (EHSGRD, EHRGRD, EHGRD): Measures diversity in the network based on grade, race, and other demographics.
- Saliency Index (SS37, SS38, etc.): Indicates the importance of certain characteristics like grade or race within the network.
- Freeman Segregation Index (SEG1S3, SEG1RCE5, etc.): Measures the degree of segregation within the network based on grade, race, or gender.