#### Supplemental Materials 
For my systematic review on latent Variable modeling approaches, here are some additional plots and data that might be of interest. If you discover any issues, please contact zoe.sandle@donders.ru.nl.

In [26]:
### read in data and activate the environmentimport pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import statsmodels.formula.api as smf
import pandas as pd
import plotly.graph_objects as go

#setting wd to where the data is
import os
os.chdir('C:/Users/U727148/Desktop/DATA/REVIEW/')
df = pd.read_excel('updated_from_cleaned_20250423.xlsx')
df_sankey_models_transformed = pd.read_excel('dfsankeymodels_updated_20250424.xlsx')
LCA_outcomes = pd.read_excel('LCA_outcomes.xlsx')
df_sankey_cups = pd.read_excel('dfsankeycu.xlsx')
df_sankey_beh = pd.read_excel('dfsankey_behavioral_variables.xlsx')
df_sankey_cog = pd.read_excel('df_sankey_cog.xlsx')
df_reports = pd.read_excel('df_reports.xlsx')

#### Sankey Plots 

In this first plot, the first column represents the general model used in all studies, the second column includes additional specifications (e.g. for factor analysis, whether it was exploratory or confirmatory, or which type of rotation was used). Follow up analyses are shown in the third column. Hovering over each node shows the number of incoming and outgoing flow, hovering over the connections shows the First Author & Year. Each node can be moved around for readability or bundled using box or lasso select. 4

Variables have been exploded, meaning that if a study assessed more than one variable for each of the nodes, its name will show up in multiple parts of the graph. 

In [27]:
# 1. Create unique labels
labels = list(pd.concat([
    df_sankey_models_transformed['gm'], 
    df_sankey_models_transformed['mit'], 
    df_sankey_models_transformed['oa']
]).unique())

# 2. Map labels to indices
label_map = {label: idx for idx, label in enumerate(labels)}

# 3. Define sources, targets, and values
sources = df_sankey_models_transformed['gm'].map(label_map).tolist() + \
          df_sankey_models_transformed['mit'].map(label_map).tolist()

targets = df_sankey_models_transformed['mit'].map(label_map).tolist() + \
          df_sankey_models_transformed['oa'].map(label_map).tolist()

values = df_sankey_models_transformed['gm_count'].tolist() + \
         df_sankey_models_transformed['oa_count'].tolist()

# 4. Generate pastel colors
import random

def pastel_color():
    r = lambda: random.randint(100, 255)
    return f'rgba({r()},{r()},{r()},0.6)'

label_to_color = {label: pastel_color() for label in labels}
node_color_list = [label_to_color[label] for label in labels]

# 5. Set link colors by source node's color (first half) and middle node (second half)
link_colors = [label_to_color[df_sankey_models_transformed['gm'].iloc[i]] 
               for i in range(len(df_sankey_models_transformed))] + \
              [label_to_color[df_sankey_models_transformed['mit'].iloc[i]] 
               for i in range(len(df_sankey_models_transformed))]

# 7. Create the figure
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=15,
        thickness=20,
        line=dict(color="black", width=0.9),
        label=labels,
        color=node_color_list,
        align="left"
    ),
    link=dict(
        source=sources,
        target=targets,
        value=values,
        color=link_colors,
        customdata=df_sankey_models_transformed['nameyear'].tolist() * 2,
        hovertemplate='%{customdata}<extra></extra>'
    )
)])

fig.update_layout(
    title_text="Sankey Diagram for model types",
    font_size=16.5,
    width=1500,
    height=800,
    hovermode='x'
)
fig.show()

fig.write_html("C:/Users/U727148/Latent_Variable_Supplement/sankey_models_plot.html", include_plotlyjs="cdn")

#### Variable Plots

In these plots, I am modeling the flow of usage from 1. variable, 2. scale used, 3. type of assessment (self and other report, experimental, observational). This is for the psychopathic traits/CU traits variable, the behavioral variable, and the cognitive variable (for the brain variables, there was not enough data available).


In [28]:
# 1. Create unique labels
labels = list(pd.concat([
    df_sankey_cups['p_or_cu'], 
    df_sankey_cups['scale'], 
    df_sankey_cups['reporting_type']
]).unique())

# 2. Map labels to indices
label_map = {label: idx for idx, label in enumerate(labels)}

# 3. Define sources, targets, and values
sources = df_sankey_cups['p_or_cu'].map(label_map).tolist() + \
          df_sankey_cups['scale'].map(label_map).tolist()

targets = df_sankey_cups['scale'].map(label_map).tolist() + \
          df_sankey_cups['reporting_type'].map(label_map).tolist()

values = df_sankey_cups['pcu_count'].tolist() + \
         df_sankey_cups['reporttype_count'].tolist()

# 4. Generate pastel colors
import random

def pastel_color():
    r = lambda: random.randint(100, 255)
    return f'rgba({r()},{r()},{r()},0.6)'

label_to_color = {label: pastel_color() for label in labels}
node_color_list = [label_to_color[label] for label in labels]

# 5. Set link colors by source node's color (first half) and middle node (second half)
link_colors = [label_to_color[df_sankey_cups['p_or_cu'].iloc[i]] 
               for i in range(len(df_sankey_cups))] + \
              [label_to_color[df_sankey_cups['scale'].iloc[i]] 
               for i in range(len(df_sankey_cups))]

# 7. Create the figure
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=15,
        thickness=20,
        line=dict(color="black", width=0.9),
        label=labels,
        color=node_color_list,
        align="left"
    ),
    link=dict(
        source=sources,
        target=targets,
        value=values,
        color=link_colors,
        customdata=df_sankey_models_transformed['nameyear'].tolist() * 2,
        hovertemplate='%{customdata}<extra></extra>'
    )
)])

fig.update_layout(
    title_text="Sankey Diagram for psychopathic/CU traits, scales, and assessment types",
    font_size=16.5,
    width=1500,
    height=800,
    hovermode='x'
)
fig.show()

fig.write_html("C:/Users/U727148/Latent_Variable_Supplement/sankey_pcu_plot.html", include_plotlyjs="cdn")

In [29]:
#adding count vars to the behavioral variables based on how many times they appear in the data
#we add a new column to the df_sankey_beh dataframe that counts the number of times each behavioral variable appears in the data
df_sankey_beh['b_count'] = df_sankey_beh['behavioral_variable'].map(df_sankey_beh['behavioral_variable'].value_counts())
df_sankey_beh['ss_count'] = df_sankey_beh['scale_summarized'].map(df_sankey_beh['scale_summarized'].value_counts())
df_sankey_beh['a_count'] = df_sankey_beh['behavioral_assessment'].map(df_sankey_beh['behavioral_assessment'].value_counts())
df_sankey_beh['at_count'] = df_sankey_beh['behavioral_assessment_type'].map(df_sankey_beh['behavioral_assessment_type'].value_counts())



#### Behavioral Variables

In this plot, we see the behavioral variables, the scale they were investigated with, and reporting types. Because of the diversity of scales, I also added a node where I summarize the type of scale for easier viewing between the scale and the reporting type. 

In [30]:
#Sankey plot for behavioral variables
# 1. Create unique labels
labels = list(pd.concat([
    df_sankey_beh['behavioral_variable'], 
    df_sankey_beh['behavioral_assessment'], 
    df_sankey_beh['scale_summarized'],
    df_sankey_beh['behavioral_assessment_type'],
]).unique())

# 2. Map labels to indices
label_map = {label: idx for idx, label in enumerate(labels)}

# 3. Define sources, targets, and values
sources = df_sankey_beh['behavioral_variable'].map(label_map).tolist() + \
          df_sankey_beh['behavioral_assessment'].map(label_map).tolist() + \
          df_sankey_beh['scale_summarized'].map(label_map).tolist() 

targets = df_sankey_beh['behavioral_assessment'].map(label_map).tolist() + \
          df_sankey_beh['scale_summarized'].map(label_map).tolist() + \
          df_sankey_beh['behavioral_assessment_type'].map(label_map).tolist()

values = df_sankey_beh['b_count'].tolist() + \
         df_sankey_beh['ss_count'].tolist() + \
         df_sankey_beh['at_count'].tolist()

# 4. Generate pastel colors
import random

def pastel_color():
    r = lambda: random.randint(100, 255)
    return f'rgba({r()},{r()},{r()},0.6)'

label_to_color = {label: pastel_color() for label in labels}
node_color_list = [label_to_color[label] for label in labels]

# 5. Set link colors by source node's color (first) and middle nodes (second) and last node
link_colors = [label_to_color[df_sankey_beh['behavioral_variable'].iloc[i]] 
               for i in range(len(df_sankey_beh))] + \
              [label_to_color[df_sankey_beh['scale_summarized'].iloc[i]] 
               for i in range(len(df_sankey_beh))] + \
              [label_to_color[df_sankey_beh['behavioral_assessment_type'].iloc[i]] 
               for i in range(len(df_sankey_beh))]

# 7. Create the figure
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=15,
        thickness=20,
        line=dict(color="black", width=0.9),
        label=labels,
        color=node_color_list,
        align="left"
    ),
    link=dict(
        source=sources,
        target=targets,
        value=values,
        color=link_colors,
        customdata=df_sankey_models_transformed['nameyear'].tolist() * 5,
        hovertemplate='%{customdata}<extra></extra>'
    )
)])

fig.update_layout(
    title_text="Sankey Diagram for variables, scales, and assessment types",
    font_size=16.5,
    width=1700,
    height=1200,
    hovermode='x'
)
fig.show()

fig.write_html("C:/Users/U727148/Latent_Variable_Supplement/sankey_behavior_plot.html", include_plotlyjs="cdn")

#### Cognitive variables

This is an overview plot for the studies using cognitive variables (which of course is only 40.5% of studies actually did). Left is the variable, in the middle is the name of the scale or test used, and left is the type of report.

In [31]:
#Making the sankey plot for cognitive variables
# 1. Create unique labels
labels = list(pd.concat([
    df_sankey_cog['cv'], 
    df_sankey_cog['ca'], 
    df_sankey_cog['cat']
]).unique())

# 2. Map labels to indices
label_map = {label: idx for idx, label in enumerate(labels)}

# 3. Define sources, targets, and values
sources = df_sankey_cog['cv'].map(label_map).tolist() + \
          df_sankey_cog['ca'].map(label_map).tolist()

targets = df_sankey_cog['ca'].map(label_map).tolist() + \
            df_sankey_cog['cat'].map(label_map).tolist()

values = df_sankey_cog['c_count'].tolist() + \
            df_sankey_cog['cat_count'].tolist()

# 4. Generate pastel colors
import random

def pastel_color():
    r = lambda: random.randint(100, 255)
    return f'rgba({r()},{r()},{r()},0.6)'

label_to_color = {label: pastel_color() for label in labels}
node_color_list = [label_to_color[label] for label in labels]

# 5. Set link colors by source node's color (first half) and middle node (second half)
link_colors = [label_to_color[df_sankey_cog['cv'].iloc[i]] 
               for i in range(len(df_sankey_cog))] + \
              [label_to_color[df_sankey_cog['ca'].iloc[i]] 
               for i in range(len(df_sankey_cog))]

# 7. Create the figure
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=15,
        thickness=20,
        line=dict(color="black", width=0.9),
        label=labels,
        color=node_color_list,
        align="left"
    ),
    link=dict(
        source=sources,
        target=targets,
        value=values,
        color=link_colors,
        customdata=df_sankey_cog['nameyear'].tolist() * 2,
        hovertemplate='%{customdata}<extra></extra>'
    )
)])

fig.update_layout(
    title_text="Sankey Diagram for cognitive variables, assessments, and assessment types",
    font_size=16.5,
    width=1700,
    height=1200,
    hovermode='x'
)
fig.show()

fig.write_html("C:/Users/U727148/Latent_Variable_Supplement/sankey_cog_plot.html", include_plotlyjs="cdn")

#### Reporting types plot

In this plot, we can see how the reporting types feed into each other for all the variables (answering the question whether studies only using self report for one of the variables are more likely to use self report for other variables as well). In contrast to the other dataframes, the reporting types are not exploded, meaning that studies using multiple reporting types are not fed into multiple measures. The left column is the P/CU trait column, the middle the behavioral and the left column contains the cognitive variable reporting types. For readability, some variable names have been abbreviated (in the behavioral: self report = s, other report = o; in the cognitive: self report = self, other report = other).

In [34]:
# 1. Create unique labels
labels = list(pd.concat([
    df_reports['assessment_type'], 
    df_reports['cognitive_assessment_type'], 
    df_reports['behavioral_assessment_type'],
]).unique())

# 2. Map labels to indices
label_map = {label: idx for idx, label in enumerate(labels)}

# 3. Define sources, targets, and values
sources = df_reports['assessment_type'].map(label_map).tolist() + \
          df_reports['behavioral_assessment_type'].map(label_map).tolist()

targets = df_reports['behavioral_assessment_type'].map(label_map).tolist() + \
            df_reports['cognitive_assessment_type'].map(label_map).tolist()

values = df_reports['assessment_type_count'].tolist() + \
            df_reports['cognitive_assessment_type_count'].tolist()

# 4. Generate pastel colors
import random

def pastel_color():
    r = lambda: random.randint(100, 255)
    return f'rgba({r()},{r()},{r()},0.6)'

label_to_color = {label: pastel_color() for label in labels}
node_color_list = [label_to_color[label] for label in labels]

# 5. Set link colors by source node's color (first half) and middle node (second half)
link_colors = [label_to_color[df_reports['assessment_type'].iloc[i]] 
               for i in range(len(df_reports))] + \
              [label_to_color[df_reports['behavioral_assessment_type'].iloc[i]] 
               for i in range(len(df_reports))]

# 7. Create the figure
fig = go.Figure(data=[go.Sankey(
    node=dict(
        pad=15,
        thickness=20,
        line=dict(color="black", width=0.9),
        label=labels,
        color=node_color_list,
        align="left"
    ),
    link=dict(
        source=sources,
        target=targets,
        value=values,
        color=link_colors,
        customdata=df_reports['nameyear'].tolist() * 2,
        hovertemplate='%{customdata}<extra></extra>'
    )
)])

fig.update_layout(
    title_text="Reporting types for P/CU traits, cognitive assessments, and behavioral assessments",
    font_size=16.5,
    width=1700,
    height=1200,
    hovermode='x'
)
fig.show()

fig.write_html("C:/Users/U727148/Latent_Variable_Supplement/sankey_reports_plot.html", include_plotlyjs="cdn")

In [42]:
df_reports_corr = df_reports[['assessment_type_count', 'cognitive_assessment_type_count', 'behavioral_assessment_type_count']].copy()
corr = df_reports_corr.corr()
corr

Unnamed: 0,assessment_type_count,cognitive_assessment_type_count,behavioral_assessment_type_count
assessment_type_count,1.0,0.107188,0.27632
cognitive_assessment_type_count,0.107188,1.0,0.372502
behavioral_assessment_type_count,0.27632,0.372502,1.0


So we also see that how many assessments types were used correlate fairly substantially for the behavioral variable with both the P/CU variable (weakly) and the cognitive variable (medium)

#### Outcomes of Latent Class Analysis Overview

Here is a simple table with an overview of Latent Class Analysis/Latent Profile Analysis Solutions. Described are the Study Identifiers such as name of the first author, year, name of the study. what follows is which variables made up predictor(s) and how they were measured, outcomes, name and distribution of classes and some additional explanations. 

In [33]:
LCA_outcomes

Unnamed: 0,nameyear,region,age_mean,sample size,gender distr. (% f),number_of_predictors,institutional_sample,psychopathic or CU trait,predictor (Scale),Unnamed: 9,predictor (additional variables),outcome variables,model,solution (no. of groups and distribution),group names,comments
0,Ciesenski 2024,north america,35.24,504,55.2,7,0,psychopathic traits,Psychopathy Checklist: Revised (PCL-R),self report,anger; hostility; emotion liability; emotion i...,"psychiatric comorbidity, aggression, self-harm...",latent class analysis,4,36.11% low emotion dysregulation and impulsivi...,mixture salsa/qualitative. CU-traits formed it...
1,Neumann 2024,north america,16.7,409,19.4,1,1,CU traits,Proposed Specifiers for Conduct Disorder Scale...,other report,,"Externalizing Behavior, Criminality, Behaviora...",latent profile analysis,4,34% general group; 31% externalizing group; 24...,Facets of the PSCD are: grandiose-manipulative...
2,Roy 2023,north america,29.9,2570,0.0,1,1,psychopathic traits,Psychopathy Checklist: Revised (PCL-R),self report,,"Behavioral Inhibition/Activation, Criminal His...",latent profile analysis,4,"40% externalizing, 36% psychopathic, 18% gener...","In a forensic sample, these four different gro..."
3,Brislin 2024,north america,9.92,11552,45.3,2,0,CU traits,ABCD cu trait scale (SDQ/CBCL),other report,"impulsivity (BIS/BAS, UPPS)",Externalizing Behavior,latent profile analysis,3,"35.89% low impulsivity but high CU traits, 52....",groups differ significantly on externalizing b...
4,Shields 2024,north america,9.81,342,53.22,1,0,CU traits,"Inventory of Callous-Unemotional Traits (ICU),...",other report,,"aggression, CU-traits",latent profile analysis,0,no solution,results suggest that reactive aggression is un...
5,Hare 2022,south america,51.227,411,0.0,1,1,psychopathic traits,Psychopathy Checklist: Revised (PCL-R),interview,,Criminal Record,latent profile analysis,4,21% Prototypic Psychopathy; 26.2% Callous-Conn...,"Study compared community, ""normal"" offenders, ..."
6,Colins 2022,europe,16.2,302,100.0,1,1,psychopathic traits,Antisocial Process Screening Device (APSD),self report,,Criminal Record,latent profile analysis,2,"74.2% low scoring, 25.8% high scoring",Study examined likelihood of future criminalit...
7,Gong 2022,asia,45.43,279,100.0,1,1,psychopathic traits,Levenson Self Report Psychopathy Rating scale ...,self report,,Aggression,latent profile analysis,3,"66.7% medium psychopathy, 27.6% low psychopath...",None of the groups were high on calloussness i...
8,Voulgaridou 2022,europe,14.04,2207,52.8,1,0,(CU traits),Relational Aggression Scale,self report,,CU-traits; hostile attribution bias,latent transition analysis,3,67.5% low; 24.7% medium; 7.8% high,Salsa. The three profiles differed (in expecte...
9,Willoughby 2022,north america,7.2,138,32.0,1,0,(CU traits),ADHD and ODD behavior,other report,,"academic productivity, impairment, and social ...",latent profile analysis,3,17% high ADHD; 49% low ADHD and ODD; 34% high ...,significant differences between groups on acad...


As a tentative interpretation, it seems to me that group solutions of 3 mostly result in spectrum driven solution ('salsa' effect), meaning that there is a low, middle and high group that are not or only minimally different in a qualitative way. Four and five class solutions result mostly in qualitatively different profiles, with usually one large reference/low CU/psychopathic traits group, one that is consistently high, and two that are higher on only affective or behavioral traits. When there is variation, this occurs more on the behavioral traits. Often, outcomes also quantitatively and qualitatively differ between these groups. 

Another conclusion is that while different facets of psychopathy/CU traits exist on a spectrum and thus psychopathic traits themselves are a spectrum as well, they don't always neatly combine in a linear way. While the consistently high traits group often differ the most on aggression from the consistently low groups, groups high only behavioral/CU traits differ from the other groups in quantitative *and* qualitative ways (see e.g. involvement in different criminal activities or different cognitive outcomes). There was only one study on our dataset that predicted offense types and CU traits from cognitive variables (Piehler, 2019). The authors in this study draw the conclusion that both cognitively inflexible and emotionally disregulated youth tend to exhibit more conduct problems and externalizing behaviors, but differ in both *severity* and, crucially, in *type*. They suggest that there might be different developmental pathways leading to disruptive behavior disorders, that may also result in different behavioral subtypes. 


We draw some tentative recommendations:

1. When performing LCA/LPA with psychopathic/CU traits as a sole predictor, one should aim for a class solution of >= 4. This gives a more qualitatively meaningful solution. If a solution of 2 or 3 is strongly preferred, then a linear regression might be indicated instead to actually cover the entire spectrum of the scale. 
2. When performing LCA/LPA with multiple predictors, this recommendation can be relaxed slightly, though the statistical indicators may favor a group solution with 4+ groups anyway. 
3. There is currently and underutilization of cognitive variables as predictors

Considering the wider project and studies utilizing other modeling approaches such as factor analysis and growth analysis (see poster):

4. Generally, person-centered approaches such as LCA/LPA find more (meaningful) group solutions than variable centered ones (growth models find more than LPA still). This is a good indicator that these approaches may be more adept at capturing the very real heterogeneity of how these variables tend to present (in patients/the community) and their outcomes. However, all of these approaches run under the assumption of linearity. It is quite likely though (and the data presented here support this idea as well) that associations are not linear. We should try to find some modeling approaches that do not work on linear assumptions. 