Supplemental Figures for _Uncovering mental structure through data-driven ontology discovery_

Preprint available: https://psyarxiv.com/fvqej/

In [None]:
import pandas as pd
pd.set_option('max_rows', 200)

from dimensional_structure import DA_plots
from dimensional_structure.EFA_plots import plot_heatmap_factors, plot_factor_correlation
from dimensional_structure.HCA_plots import plot_subbranches, plot_results_dendrogram
from selfregulation.utils.utils import get_recent_dataset
from selfregulation.utils.result_utils import load_results

In [None]:
%matplotlib inline
dataset = get_recent_dataset()
results = load_results(datafile=dataset)

# Exploratory Factor Analysis Results

Below are the loading matrices for the exploratory factor analysis (EFA) solutions for surveys, tasks, and the outcome measures. These matrices are depicted as heatmaps, as well as dataframes with the actual values.

### Survey Exploratory Factor Analysis Loadings

12 factors were determined using a BIC criteria for exploratory factor analysis. The 66 survey DVs are grouped and ordered based on the largest (absolute) factor loading for that DV. Dotted lines indicate separate groups derived from this criteria, and are used for visualization purposes only

In [None]:
survey_results = results['survey']
survey_factor_loading = survey_results.EFA.get_loading()

In [None]:
survey_c = survey_results.EFA.results['num_factors']
plot_heatmap_factors(survey_results, survey_c, thresh=0, size=12)

Full loading matrix, as a dataframe...

In [None]:
survey_factor_loading

### Task Exploratory Factor Analysis Loadings

5 factors were determined using a BIC criteria for exploratory factor analysis. The 130 survey DVs are grouped and ordered based on the largest (absolute) factor loading for that DV. Dotted lines indicate separate groups derived from this criteria, and are used for visualization purposes only.

In [None]:
task_results = results['task']
task_factor_loading = task_results.EFA.get_loading()

In [None]:
task_c = task_results.EFA.results['num_factors']
plot_heatmap_factors(task_results, task_c, thresh=0, size=13)

In [None]:
task_factor_loading

### Outcome Exploratory Factor Analysis Loadings

9 factors were determined using a BIC criteria for exploratory factor analysis. The 55 target measures are grouped and ordered based on the largest (absolute) factor loading for that target measure. Dotted lines indicate separate groups derived from this criteria, and are used for visualization purposes only.

In [None]:
outcome_factor_loading = task_results.DA.get_loading()

In [None]:
outcome_c = task_results.DA.results['num_factors']
DA_plots.plot_heatmap_factors(task_results, outcome_c, thresh=0, size=8, DA=True)

In [None]:
outcome_factor_loading

## Factor Robustness Analyses

# Hierarchical Clustering

Hierarchical clustering was used to order dependent variables based on the similarity of their loading vectors. This resulted in a dendrogram, which was subset into clusters using the DynamicTreeCut algorithm. These clusters are separately plotted below, allowing the constituent DVs to be read.

### Survey Clusters

Below is the survey dendrogram (reproduced from the main manuscript). Following are the 13 clusters. separately plotted. The third and fourth clusters, referenced in the main text, together reflect canonical components of "self-control".

In [None]:
_ = plot_results_dendrogram(survey_results, size=20, drop_list=[1,3,5,7, 9,11])

In [None]:
plot_subbranches(survey_results, size=6)

### Task Clusters

Below is the task dendrogram (reproduced from the main manuscript). Following are the 13 clusters separately plotted. THe 8th and 9th clusters, referenced in the ain text, divide two groups of "information processing" tasks.

In [None]:
_ = plot_results_dendrogram(task_results, size=20, drop_list=[1,3,5,7,9,11,13,15], double_drop_list=[2,6,10,14])

In [None]:
plot_subbranches(task_results, size=6)

## Cluster Robustness Analyses