# Victoria McCray, Muran Huang, Saleha Farooqui, Madison Hill
Cluster Risk Analysis of Longitudinal Health Data

DS3000

# Problem

Prevention of ailments and identification of risk factors are crucial to improving health and wellness outcomes. Longitudinal data consists of repeated measures collected over time and use correlational analysis to understand change. Analysis of longitudinal data can be used to evaluate risk factors, predict clinical outcomes, and simulate measures for a given population (Solomon et al. 2021). The following project may employ cluster analysis to evaluate the risk level of selected clinical outcomes or diagnoses for a group of women followed annually in the US.

# Data Source

The Study of Womens Health Across the Nation (SWAN) Biospecimen Repository provides a dataset of fourteen clinical visits for nearly 3,000 women in the US. Each participant (row) is characterized by demographic information including age, race/ethnicity, and other information. Moreover, during each visit, biological, clinical, and psychological measures are recorded for each participant. There are over 700 measures (columns) that are collected for each of the 14 clinical visits. To interrogate the data, I loaded the data of one clinical visit, Visit # 5 of 14, into a pandas dataframe.

Because the dataset has over 700 measures collected at each clinical visit for every woman, data cleaning and organizing for this project may require us to reduce the dimensionality of these features. Measures recorded range considerably, and some additional processing is needed to choose how features should be grouped. For example, biological measures collected during visits include measures regarding nutrition, metabolism, sleep hygiene, menstrual activity, and more. Psychosocial measures are collected regarding levels of energy, income and discrimination experiences, sexual behaviors, and more. However, the measures are not parsed in their current format, and additional steps should be taken to group column features thematically. Some measures may also need to be grouped based on null values. For example, in the dataframe below, many cells have missing data that should be dropped from further processing.

The data dictionary contains descriptions of each of the measures collected and how they are called in the data file.

data dictionary: https://www.icpsr.umich.edu/web/ICPSR/studies/30501/variables#

In [1]:
import pandas as pd

In [2]:
pd.set_option('display.max_columns', 100)

In [3]:
# The data was downloaded as a .sav file and read in using .read_spss()
visit5_df = pd.read_spss('30501-0001-Data.sav')
visit5_df;


|      |  SWANID |   VISIT | INTDAY5 |   AGE5 |  LANGINT5 |                          RACE | PREGNAN5 | PREVBLO5 | EATDRIN5 | STRTPER5 | DAYBLE5 |                 BLDRWAT5 |             BLDDRAW5 | SPEDAY5 | ANTICO15 | ACOATW15 | ANTICO25 | ACOATW25 | HEART15 | HARTTW15 | HEART25 | HARTTW25 | CHOLST15 | CHOLTW15 | CHOLST25 | CHOLTW25 | BP15 | BPTW15 | BP25 | BPTW25 | DIURET15 | DIURTW15 | DIURET25 | DIURTW25 | THYROI15 | THYRTW15 | THYROI25 | THYRTW25 | INSULN15 | INSUTW15 | INSULN25 | INSUTW25 | NERVS15 | NERVTW15 | NERVS25 | NERVTW25 | STEROI15 | STERTW15 | STEROI25 | STERTW25 | ... | EFPVEGS5 | EFPVEGF5 | EFPGRNS5 | EFPGRNF5 | EFPMTSV5 | EFPMTFQ5 | EFPDARS5 | EFPDARF5 | EFPFVSV5 | EFPFVFQ5 | ADD1XWK5 | NUMADDS5 | NSKIP5 | EXCLUDE5 | HRMDAY5 | CYCDAY5 | DHAS5 |  FSH5 | SHBG5 |   T5 | E2AVE5 | FLGCV5 | FLGDIF5 | CVRDAY5 | FLAGSER5 | CHOLRES5 | TRIGRES5 | LDLRESU5 | HDLRESU5 | GLUCRES5 | INSURES5 | FACRESU5 | FIBRESU5 | PAIRESU5 | TPARESU5 | LPARESU5 | LPA1RES5 | APOARES5 | APOBRES5 | CRPRESU5 | FLGCVRV5 | SPSCDAY5 | SPSCTIM5 |     SPSCMOD5 | HPSCDAY5 | HPSCTIM5 |     HPSCMOD5 | SPBMDT5 | HPBMDT5 | BMDFLG5 |   |
|-----:|--------:|--------:|--------:|-------:|----------:|------------------------------:|---------:|---------:|---------:|---------:|--------:|-------------------------:|---------------------:|--------:|---------:|---------:|---------:|---------:|--------:|---------:|--------:|---------:|---------:|---------:|---------:|---------:|-----:|-------:|-----:|-------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|--------:|---------:|--------:|---------:|---------:|---------:|---------:|---------:|----:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|-------:|---------:|--------:|--------:|------:|------:|------:|-----:|-------:|-------:|--------:|--------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|---------:|-------------:|---------:|---------:|-------------:|--------:|--------:|--------:|---|
|    0 | 10046.0 |      05 |  1967.0 |   57.0 |   English |      Chinese/Chinese American |       No |       No |       No |       No |     NaN |     Yes, as per protocol |                  Yes |  1967.0 |      Yes |      Yes |       No |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN |  Yes |    Yes |   No |    NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |     Yes |      Yes |      No |      NaN |       No |      NaN |      NaN |      NaN | ... | 0.055714 |      0.1 | 0.000000 |      0.0 | 0.000000 |      0.0 |      0.0 |      0.0 | 0.055714 |      0.1 |      NaN |      NaN |    0.0 |      NaN |  1967.0 |     NaN | 172.7 |  24.0 |  33.5 | 39.0 |  45.15 |     No |      No |  1967.0 |       No |    211.0 |    255.0 |    108.0 |     52.0 |    101.0 |     11.5 |    166.0 |    308.0 |     35.6 |      9.5 |      2.0 |     50.0 |    191.0 |    136.0 |    7.800 |       No |   1967.0 |  0:09:52 | 2000 machine |   1967.0 |  0:09:47 | 2000 machine | 1.18920 | 1.01770 |      No |   |
|    1 | 10056.0 |      05 |  1841.0 |   56.0 |   English | Caucasian/ White Non-Hispanic |       No |       No |       No |       No |     NaN |     Yes, as per protocol |                  Yes |  1841.0 |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN |   No |    NaN |  NaN |    NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN | ... | 0.167143 |      0.1 | 0.174118 |      0.1 | 0.126157 |      0.1 |      0.0 |      0.0 | 0.167143 |      0.1 |      NaN |      NaN |    0.0 |      NaN |  1841.0 |     NaN |  92.3 | 116.4 | 107.8 | 27.3 |  11.60 |     No |      No |  1841.0 |       No |    191.0 |     71.0 |     96.0 |     81.0 |     90.0 |      5.1 |    110.0 |    226.0 |      3.0 |      3.5 |     71.0 |     82.0 |    215.0 |     85.0 |    0.700 |       No |   1930.0 |  0:10:58 | 4500 machine |   1930.0 |  0:10:55 | 4500 machine | 0.83910 | 0.81550 |      No |   |
|    2 | 10126.0 |      05 |  1819.0 |   53.0 |   English |        Black/African American |       No |      NaN |       No |       No |     NaN | Yes, menses too variable |                  Yes |  1821.0 |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN |   No |    NaN |  NaN |    NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN | ... |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |    0.0 |      NaN |  1821.0 |     NaN |  88.8 |  89.1 |  48.3 | 36.4 |  20.35 |     No |      No |  1821.0 |       No |    186.0 |     85.0 |    118.0 |     51.0 |     90.0 |      7.6 |    101.0 |    292.0 |      4.5 |      3.8 |     35.0 |     47.0 |    137.0 |    102.0 |    1.400 |       No |      NaN |        . |          NaN |      NaN |        . |          NaN |     NaN |     NaN |     NaN |   |
|    3 | 10153.0 |      05 |  1820.0 |   56.0 |   English |    Japanese/Japanese American |       No |       No |       No |       No |     NaN |     Yes, as per protocol |                  Yes |  1820.0 |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN |  Yes |    Yes |   No |    NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN | ... | 1.621429 |      0.5 | 0.025882 |      0.0 | 0.031667 |      0.1 |      0.0 |      0.0 | 1.621429 |      0.5 |      NaN |      NaN |    0.0 |      NaN |  1820.0 |     NaN | 129.9 |  67.9 |  22.7 | 45.1 |  19.45 |     No |      No |  1820.0 |       No |    248.0 |    161.0 |    157.0 |     59.0 |     83.0 |     21.2 |    115.0 |    300.0 |     21.3 |      8.8 |     86.0 |     45.0 |    169.0 |    152.0 |    2.300 |       No |   1835.0 |  0:09:23 | 4500 machine |   1835.0 |  0:09:21 | 4500 machine | 1.01120 | 0.96700 |      No |   |
|    4 | 10196.0 |      05 |  1881.0 |   51.0 |   English |      Chinese/Chinese American |       No |       No |       No |       No |     NaN |     Yes, as per protocol |                  Yes |  1881.0 |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN |   No |    NaN |  NaN |    NaN |       No |      NaN |      NaN |      NaN |      Yes |      Yes |       No |      NaN |       No |      NaN |      NaN |      NaN |     Yes |      Yes |      No |      NaN |       No |      NaN |      NaN |      NaN | ... | 0.074286 |      0.3 | 0.000000 |      0.0 | 0.105882 |      0.1 |      0.0 |      0.0 | 0.074286 |      0.3 |      NaN |      NaN |    0.0 |      NaN |  1881.0 |     NaN |  98.1 |  84.1 |  48.4 | 19.9 |  12.00 |     No |      No |  1881.0 |       No |    207.0 |    127.0 |    116.0 |     66.0 |     86.0 |     12.6 |    126.0 |    251.0 |     17.8 |     12.5 |      9.0 |     68.0 |    182.0 |    115.0 |    0.400 |       No |   1881.0 |  0:10:36 | 2000 machine |   1881.0 |  0:10:19 | 2000 machine | 0.93800 | 0.78710 |      No |   |
|  ... |     ... |     ... |     ... |    ... |       ... |                           ... |      ... |      ... |      ... |      ... |     ... |                      ... |                  ... |     ... |      ... |      ... |      ... |      ... |     ... |      ... |     ... |      ... |      ... |      ... |      ... |      ... |  ... |    ... |  ... |    ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |     ... |      ... |     ... |      ... |      ... |      ... |      ... |      ... | ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |    ... |      ... |     ... |     ... |   ... |   ... |   ... |  ... |    ... |    ... |     ... |     ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |      ... |          ... |      ... |      ... |          ... |     ... |     ... |     ... |   |
| 2612 | 99809.0 |      05 |  1820.0 |   48.0 |   English | Caucasian/ White Non-Hispanic |       No |      NaN |       No |      Yes |  1807.0 |     Yes, as per protocol |                  Yes |  1808.0 |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN |   No |    NaN |  NaN |    NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN | ... |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |    0.0 |      NaN |  1808.0 |     2.0 |  30.9 |  20.1 |  82.0 | 11.9 |  15.45 |     No |      No |  1808.0 |       No |    164.0 |    110.0 |     83.0 |     59.0 |     86.0 |      9.6 |    131.0 |    256.0 |      9.4 |      6.7 |      6.0 |     65.0 |    185.0 |     84.0 |    6.200 |       No |   1820.0 |  0:08:51 | 4500 machine |   1820.0 |  0:08:47 | 4500 machine | 1.17736 | 1.00035 |      No |   |
| 2613 | 99888.0 |      05 |  1848.0 |   54.0 |   English |    Japanese/Japanese American |       No |       No |       No |       No |     NaN |     Yes, as per protocol |                  Yes |  1848.0 |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN |   No |    NaN |  NaN |    NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN | ... | 0.040000 |      0.0 | 0.050588 |      0.1 | 0.014333 |      0.1 |      0.0 |      0.0 | 0.040000 |      0.0 |      NaN |      NaN |    0.0 |      NaN |  1848.0 |     NaN | 142.8 |  34.7 |  69.4 | 31.1 |  24.45 |     No |      No |  1848.0 |       No |    193.0 |    209.0 |    114.0 |     37.0 |    100.0 |      8.5 |    163.0 |    262.0 |     18.4 |      8.2 |      2.0 |     33.0 |    125.0 |    128.0 |    1.600 |       No |   1918.0 |  0:11:54 | 4500 machine |   1918.0 |  0:11:50 | 4500 machine | 0.91460 | 0.84900 |      No |   |
| 2614 | 99898.0 |      05 |  1798.0 |   50.0 |   English | Caucasian/ White Non-Hispanic |       No |      NaN |       No |      Yes |  1865.0 |     Yes, as per protocol |                  Yes |  1868.0 |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN |   No |    NaN |  NaN |    NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN | ... |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |    0.0 |      NaN |  1868.0 |     4.0 |  71.5 |  29.9 |  19.4 | 40.6 |  15.85 |     No |      No |  1868.0 |       No |    196.0 |    234.0 |     95.0 |     54.0 |     87.0 |     19.8 |    138.0 |    234.0 |    193.0 |      6.5 |     57.0 |     50.0 |    166.0 |    108.0 |    1.800 |       No |      NaN |        . |          NaN |      NaN |        . |          NaN |     NaN |     NaN |     NaN |   |
| 2615 | 99962.0 |      05 |  1834.0 |   52.0 | Cantonese |      Chinese/Chinese American |       No |       No |       No |       No |     NaN |     Yes, as per protocol |                  Yes |  1834.0 |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN |   No |    NaN |  NaN |    NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN | ... | 0.422857 |      0.8 | 0.000000 |      0.0 | 0.136471 |      0.2 |      0.0 |      0.0 | 0.422857 |      0.8 |      NaN |      NaN |    0.0 |      NaN |  1834.0 |     NaN | 233.6 | 117.4 |  52.5 | 44.8 |  10.85 |     No |      No |  1834.0 |       No |    250.0 |    109.0 |    151.0 |     77.0 |     94.0 |     10.1 |    110.0 |    251.0 |     28.3 |      5.5 |     16.0 |     77.0 |    177.0 |    117.0 |    0.117 |       No |   1834.0 |  0:11:08 | 2000 machine |   1834.0 |  0:10:56 | 2000 machine | 0.92350 | 0.81760 |      No |   |
| 2616 | 99992.0 |      05 |  2016.0 |   48.0 |   Spanish |                      Hispanic |       No |      NaN |       No |      Yes |  2100.0 |     Yes, as per protocol |                  Yes |  2103.0 |       No |      NaN |      NaN |      NaN |      No |      NaN |     NaN |      NaN |       No |      NaN |      NaN |      NaN |  Yes |    Yes |  Yes |    Yes |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |       No |      NaN |      NaN |      NaN |     Yes |      Yes |      No |      NaN |       No |      NaN |      NaN |      NaN | ... |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |      NaN |    0.0 |      NaN |  2103.0 |     4.0 | 154.1 |  12.6 |  24.3 | 31.4 |  58.65 |     No |      No |  2103.0 |       No |    219.0 |    124.0 |    138.0 |     56.0 |    107.0 |     11.2 |      NaN |      NaN |      NaN |      NaN |     13.0 |     60.0 |    193.0 |    135.0 |    1.900 |       No |      NaN |        . |          NaN |      NaN |        . |          NaN |     NaN |     NaN |     NaN |   |
|      |    2616 | 99992.0 |      05 | 2016.0 |      48.0 |                       Spanish | Hispanic |       No |      NaN |       No |     Yes |                   2100.0 | Yes, as per protocol |     Yes |          |          |          |          |         |          |         |          |          |          |          |          |      |        |      |        |          |          |          |          |          |          |          |          |          |          |          |          |         |          |         |          |          |          |          |          |     |          |          |          |          |          |          |          |          |          |          |          |          |        |          |         |         |       |       |       |      |        |        |         |         |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |              |          |          |              |         |         |         |   |
|      |         |         |         |        |           |                               |          |          |          |          |         |                          |                      |         |          |          |          |          |         |          |         |          |          |          |          |          |      |        |      |        |          |          |          |          |          |          |          |          |          |          |          |          |         |          |         |          |          |          |          |          |     |          |          |          |          |          |          |          |          |          |          |          |          |        |          |         |         |       |       |       |      |        |        |         |         |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |              |          |          |              |         |         |         |   |
|      |         |         |         |        |           |                               |          |          |          |          |         |                          |                      |         |          |          |          |          |         |          |         |          |          |          |          |          |      |        |      |        |          |          |          |          |          |          |          |          |          |          |          |          |         |          |         |          |          |          |          |          |     |          |          |          |          |          |          |          |          |          |          |          |          |        |          |         |         |       |       |       |      |        |        |         |         |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |              |          |          |              |         |         |         |   |
|      |         |         |         |        |           |                               |          |          |          |          |         |                          |                      |         |          |          |          |          |         |          |         |          |          |          |          |          |      |        |      |        |          |          |          |          |          |          |          |          |          |          |          |          |         |          |         |          |          |          |          |          |     |          |          |          |          |          |          |          |          |          |          |          |          |        |          |         |         |       |       |       |      |        |        |         |         |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |          |              |          |          |              |         |         |         |   |

In [5]:
# list of all collected measures per subject
col_measures = list(visit5_df.columns)
num_measures = len(col_measures) # 703 measures
col_measures


['SWANID',
 'VISIT',
 'INTDAY5',
 'AGE5',
 'LANGINT5',
 'RACE',
 'PREGNAN5',
 'PREVBLO5',
 'EATDRIN5',
 'STRTPER5',
 'DAYBLE5',
 'BLDRWAT5',
 'BLDDRAW5',
 'SPEDAY5',
 'ANTICO15',
 'ACOATW15',
 'ANTICO25',
 'ACOATW25',
 'HEART15',
 'HARTTW15',
 'HEART25',
 'HARTTW25',
 'CHOLST15',
 'CHOLTW15',
 'CHOLST25',
 'CHOLTW25',
 'BP15',
 'BPTW15',
 'BP25',
 'BPTW25',
 'DIURET15',
 'DIURTW15',
 'DIURET25',
 'DIURTW25',
 'THYROI15',
 'THYRTW15',
 'THYROI25',
 'THYRTW25',
 'INSULN15',
 'INSUTW15',
 'INSULN25',
 'INSUTW25',
 'NERVS15',
 'NERVTW15',
 'NERVS25',
 'NERVTW25',
 'STEROI15',
 'STERTW15',
 'STEROI25',
 'STERTW25',
 'FERTIL15',
 'FRTLTW15',
 'FERTIL25',
 'FRTLTW25',
 'BCP15',
 'BCPTWI15',
 'BCP25',
 'BCPTWI25',
 'BCREAS5',
 'BCRES_S5',
 'ESTROG15',
 'ESTRTW15',
 'ESTROG25',
 'ESTRTW25',
 'ESTRDA15',
 'ESTRDA25',
 'ESTRNJ15',
 'EINJTW15',
 'ESTRNJ25',
 'EINJTW25',
 'COMBIN15',
 'COMBTW15',
 'COMBIN25',
 'COMBTW25',
 'PROGES15',
 'PROGTW15',
 'PROGES25',
 'PROGTW25',
 'PROGDA15',
 'PROGDA25',

# Solution and Impact

The longitudinal data from our given source may be used to cluster feature dimensions of health and clinical outcomes including risk factors for stress, disease/disorder prognosis, immune activity, pregnancy, and more. We may be able to produce an analysis of correlated repeated measures and assess change over time. Outputs for this project may include risk analysis of conditions/clinical outcomes, categorical clustering of potential risk factors, and visualizations of change in various health measures over time. 


The analysis may be used to identify and group the risk level of participants to develop particular outcomes or prognoses. Risk analysis provides a valuable means to parse covariance and correlations of repeated measures. Moreover, identifying risk levels for individuals can aid healthcare systems in preventative care for many conditions. The proposed analysis may provide a comprehensive grouping of biospecimen, physical, and psychosocial features, which may paint a more holistic representation of health and wellness over time.

# References

Solomon, D. H., Santacroce, L., Colvin, A., Lian, Y., Ruppert, K., & Yoshida, K. (2021). The relationship between 19‐year trends in medication use and changes in physical function among women in the mid‐life: A Study of Women's Health Across the Nation pharmacoepidemiology study. Pharmacoepidemiology and Drug Safety.
