# Numerical Anatomical Variability Ratio analyses

## Cortical area

## Summary

This notebook provides an analysis of Numerical Anatomical Variability Ratio coefficient variability for cortical area measurements in neuroimaging data.
It calculates the Numerical-Anatomical Variability Ratio (NAVR) to understand the relationship between processing variability and biological variability.

### Numerical-Anatomical Variability Ratio (NAVR)

The NAVR quantifies the relationship between numerical processing variability and true anatomical variability:

$$\nu_{\text{nav}} = \frac{\sigma_{\text{num}}}{\sigma_{\text{anat}}}$$

where:
- $\sigma_{\text{num}}$ = standard deviation due to numerical variability
- $\sigma_{\text{anat}}$ = standard deviation due to anatomical differences

In [40]:
import pandas as pd
import numpy as np
from pathlib import Path
import os
from IPython.display import display
import warnings

# Configure warnings and display
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

# Analysis configuration
METRIC = "area"
EXPECTED_REPETITIONS = 26
RANDOM_SEED = 42

anonymizer = True
root_dir = Path.cwd()


def anondir(path: Path, prefix=root_dir) -> Path:
    """Anonymize a directory path by replacing user-specific parts with <root>."""
    if not anonymizer:
        return path
    path_str = str(path).replace(str(prefix), "<living-park>")
    return Path(path_str)


print(f"Root directory: {anondir(root_dir)}")
stats_dir = root_dir / "stats_QCed" / "sampled"
if not stats_dir.exists():
    raise FileNotFoundError(f"Stats directory not found: {stats_dir}")
print(f"Stats directory: {anondir(stats_dir)}")

# Create output directories
output_dirs = {
    "main": root_dir / "navr",
    "parquet": root_dir / "navr" / "parquet_all",
    "figures": root_dir / "navr" / "figures_all",
    "csv": root_dir / "navr" / "csv_all",
}

for name, dir_path in output_dirs.items():
    os.makedirs(dir_path, exist_ok=True)

# Extract directory variables for backwards compatibility
output_dir = output_dirs["main"]
parquet_dir = output_dirs["parquet"]
figures_dir = output_dirs["figures"]
csv_dir = output_dirs["csv"]

print("Output directories created successfully")


# Initialize random number generator
rng = np.random.default_rng(RANDOM_SEED)
print(f"Analysis initialized for {METRIC} with random seed {RANDOM_SEED}")

Root directory: <living-park>
Stats directory: <living-park>/stats_QCed/sampled
Output directories created successfully
Analysis initialized for area with random seed 42


In [41]:
regions_template = [
    "bankssts",
    "caudalanteriorcingulate",
    "caudalmiddlefrontal",
    "cuneus",
    "entorhinal",
    "fusiform",
    "inferiorparietal",
    "inferiortemporal",
    "isthmuscingulate",
    "lateraloccipital",
    "lateralorbitofrontal",
    "lingual",
    "medialorbitofrontal",
    "middletemporal",
    "parahippocampal",
    "paracentral",
    "parsopercularis",
    "parsorbitalis",
    "parstriangularis",
    "pericalcarine",
    "postcentral",
    "posteriorcingulate",
    "precentral",
    "precuneus",
    "rostralanteriorcingulate",
    "rostralmiddlefrontal",
    "superiorfrontal",
    "superiorparietal",
    "superiortemporal",
    "supramarginal",
    "frontalpole",
    "temporalpole",
    "transversetemporal",
    "insula",
]

In [42]:
def save_figure(fig, filename, formats=None):
    """
    Save and display a plotly figure in multiple formats.

    Parameters
    ----------
    fig : plotly.graph_objects.Figure
        The plotly figure to save and display
    filename : str or Path
        The base filename (without extension) to save the figure to
    formats : list of str, optional
        List of formats to save ('html', 'pdf', 'png', 'svg')
        If None, only displays the figure

    Notes
    -----
    This function displays the figure and saves it in specified formats.
    For paper-quality figures, recommended formats are ['pdf', 'svg', 'png'].
    """
    # Always display the figure
    display(fig)

    # Save in specified formats
    if formats:
        filename = Path(filename)

        # Ensure parent directory exists
        filename.parent.mkdir(parents=True, exist_ok=True)

        for fmt in formats:
            try:
                if fmt == "html":
                    fig.write_html(f"{filename}.html")
                    print(f"Saved: {filename}.html")
                elif fmt == "pdf":
                    fig.write_image(
                        f"{filename}.pdf", format="pdf", width=1400, height=600, scale=2
                    )
                    print(f"Saved: {filename}.pdf")
                elif fmt == "png":
                    fig.write_image(
                        f"{filename}.png", format="png", width=1400, height=600, scale=2
                    )
                    print(f"Saved: {filename}.png")
                elif fmt == "svg":
                    fig.write_image(
                        f"{filename}.svg", format="svg", width=1400, height=600
                    )
                    print(f"Saved: {filename}.svg")
                else:
                    print(f"Warning: Unsupported format '{fmt}' skipped")
            except Exception as e:
                print(f"Error saving {fmt} format: {e}")


def create_figure_metadata(fig, title, description, analysis_params):
    """
    Create metadata for a figure to ensure reproducibility.

    Parameters
    ----------
    fig : plotly.graph_objects.Figure
        The figure object
    title : str
        Figure title
    description : str
        Detailed description of what the figure shows
    analysis_params : dict
        Parameters used in the analysis

    Returns
    -------
    dict
        Metadata dictionary
    """
    return {
        "title": title,
        "description": description,
        "analysis_parameters": analysis_params,
        "figure_type": type(fig).__name__,
        "creation_timestamp": pd.Timestamp.now().isoformat(),
        "data_shape": getattr(fig, "data", [{}])[0].get("x", []),
        "layout_template": fig.layout.template.layout if fig.layout.template else None,
    }

In [43]:
def assert_number_of_repetitions(df, hemisphere=None, repetitions=26):
    """
    Validate that each subject/region has exactly the expected number of repetitions.

    Parameters
    ----------
    df : pandas.DataFrame
        DataFrame containing subjects, regions, and repetitions
    hemisphere : bool, optional
        Whether to include hemisphere in the grouping (default: None)
    repetitions : int, optional
        Expected number of repetitions per subject/region (default: 26)

    Raises
    ------
    AssertionError
        If any subject/region does not have the expected number of repetitions

    Notes
    -----
    This validation ensures data completeness before statistical analysis.
    The default of 26 repetitions corresponds to MCA (Monte Carlo Arithmetic) runs.
    """
    groups = ["subject_visit", "region"] + (["hemisphere"] if hemisphere else [])
    grouped = df.groupby(groups).count() == repetitions
    assert (
        grouped.all().all()
    ), f"Not all subject_visits/regions have {repetitions} repetitions. Missing data found."

In [44]:
def get_cohort_stat():
    """
    Load and process cohort statistics from CSV file.

    Returns
    -------
    pandas.DataFrame
        Processed cohort data with the following columns:
        - PATNO: Patient number
        - first_visit: Subject-session identifier for first visit
        - second_visit: Subject-session identifier for second visit
        - dx_group: Diagnosis group (PD-non-MCI or HC)
        - SEX: Subject sex
        - AGE_AT_VISIT: Age at visit

    Notes
    -----
    This function:
    1. Loads cohort data from 'cohort_stat.csv'
    2. Creates BIDS-compliant visit identifiers
    3. Removes same-visit pairs
    4. Provides summary statistics of the loaded cohort
    """
    df_cohort = pd.read_csv(root_dir / "cohort" / "longitudinal_cohort_qced.csv")

    columns = [
        "PATNO",
        "first_visit",
        "second_visit",
        "dx_group",
        "SEX",
        "AGE_AT_VISIT",
    ]

    # Remove same-visit pairs (quality control)
    df_cohort = df_cohort[df_cohort["first_visit"] != df_cohort["second_visit"]]

    # Report cohort statistics
    print(
        f"Loaded cohort with {df_cohort.shape[0]} visit pairs from {df_cohort['PATNO'].nunique()} subjects"
    )
    print(
        f"PD-non-MCI: {df_cohort[df_cohort['dx_group'] == 'PD-non-MCI'].shape[0]}, "
        f"HC: {df_cohort[df_cohort['dx_group'] == 'HC'].shape[0]}"
    )

    return df_cohort[columns]


df_cohort = get_cohort_stat()

Loaded cohort with 201 visit pairs from 201 subjects
PD-non-MCI: 112, HC: 89


In [45]:
df: pd.DataFrame = pd.read_parquet(stats_dir / f"{METRIC}.parquet")

In [46]:
def get_population(df, cohort, visit="both", groups=("PD-non-MCI", "HC")):
    """
    Return population

    Parameters
    ----------
    cohort : pandas.DataFrame
        DataFrame containing cohort statistics
    df : pandas.DataFrame
        DataFrame containing subject data
    visit : str, optional
        Which visit to use ("first", "second" or "both", default: "both")
    groups : tuple of str, optional
        Tuple of two group names to split the population into (default: ("PD-non-MCI", "HC"))

    Returns
    -------
    list of numpy.ndarray
        List of arrays, each containing subject identifiers for one group
    """
    if visit not in ["first", "second", "both"]:
        raise ValueError("visit must be 'first', 'second' or 'both'")
    df = df["subject_visit"].drop_duplicates().to_frame()

    if visit == "first":
        visit_col = ["first_visit"]
    elif visit == "second":
        visit_col = ["second_visit"]
    else:
        visit_col = ["first_visit", "second_visit"]
    df_merged = pd.DataFrame()
    for visit in visit_col:
        df_tmp = pd.merge(
            df,
            cohort[["PATNO", "dx_group", visit]],
            left_on="subject_visit",
            right_on=visit,
            how="inner",
        )
        df_merged = pd.concat([df_merged, df_tmp], axis=0)
    df_merged = df_merged.drop_duplicates().reset_index(drop=True)
    if df_merged.shape[0] == 0:
        raise ValueError("No subjects found in the specified visit(s)")
    if df_merged["dx_group"].nunique() < 2:
        raise ValueError("Not enough groups found in the specified visit(s)")
    if not all(g in df_merged["dx_group"].unique() for g in groups):
        raise ValueError(f"Groups {groups} not found in the specified visit(s)")

    groups = [
        df_merged[df_merged["dx_group"] == g]["subject_visit"].unique() for g in groups
    ]
    return groups

In [47]:
def is_bilateral_column(column_name):
    bilateral_indicators = ["Left", "Right", "lh", "rh"]
    return any(indicator in column_name for indicator in bilateral_indicators)


def _preprocess_dataframe(df, metric, hemisphere=None, bilateral=False):
    # Rename hemisphere column if needed
    if hemisphere and "hemi" in df.columns:
        df = df.rename(columns={"hemi": "hemisphere"})

    # Remove unnecessary columns with error handling
    columns_to_drop = [
        "PATNO_id",
        "dx_group",
        "PD_status",
        "image_name",
        "path",
        "rejected_images",
        "input_dir",
    ]
    existing_columns_to_drop = [col for col in columns_to_drop if col in df.columns]
    if existing_columns_to_drop:
        df = df.drop(columns=existing_columns_to_drop)

    # Validate required columns
    required_cols = ["subject_visit"]
    if hemisphere:
        required_cols.append("hemisphere")

    missing_cols = [col for col in required_cols if col not in df.columns]
    if missing_cols:
        raise ValueError(f"Missing required columns: {missing_cols}")

    # Add repetition counter
    group_cols = ["subject_visit"] + (["hemisphere"] if hemisphere else [])
    df["repetition"] = df.groupby(group_cols).cumcount() + 1

    # Reshape data
    id_vars = ["repetition", "subject_visit", "subject", "visit"]
    if hemisphere:
        id_vars.append("hemisphere")

    if bilateral:
        for col in df.columns:
            if is_bilateral_column(col):
                bilateral_col = (
                    col.replace("Left", "")
                    .replace("Right", "")
                    .replace("lh", "")
                    .replace("rh", "")
                    .replace("_", "")
                    .replace("-", "")
                )
                if bilateral_col not in df.columns:
                    df[bilateral_col] = df[col]
                else:
                    df[bilateral_col] += df[col]
                df.drop(columns=[col], inplace=True)

    df = df.melt(id_vars=id_vars, var_name="region", value_name=metric)
    df["region"] = df["region"].str.replace(f"_{metric}", "", regex=False)

    # Convert numeric columns to float64
    numeric_cols = df.select_dtypes(include=["number"]).columns
    for col in numeric_cols:
        df[col] = df[col].astype(np.float64)

    return df


def get_stats_metric(metric, hemisphere=None, bilateral=False):
    """
    Load and validate neuroimaging statistics for a specific metric.

    Parameters
    ----------
    metric : str
        The neuroimaging metric to load (e.g., 'volume', 'area', 'area')
    hemisphere : bool, optional
        Whether to include hemisphere information (default: None)
    bilateral : bool, optional
        Whether to process regions bilaterally (default: False)

    Returns
    -------
    pandas.DataFrame
        Processed DataFrame with validated structure and data types

    Raises
    ------
    FileNotFoundError
        If the metric parquet file doesn't exist
    ValueError
        If required columns are missing or data validation fails
    """
    try:
        file_path = stats_dir / f"{metric}.parquet"
        if not file_path.exists():
            raise FileNotFoundError(f"Metric file not found: {file_path}")

        df = pd.read_parquet(file_path)
        print(f"Loaded {metric} data: {df.shape[0]} rows, {df.shape[1]} columns")

        # Preprocess DataFrame
        df = _preprocess_dataframe(df, metric, hemisphere, bilateral)
        print(f"Preprocessed {metric} data: {df.shape[0]} rows, {df.shape[1]} columns")

        df["PATNO"] = df["subject"].str.replace("sub-", "")

        # Validate data completeness
        assert_number_of_repetitions(
            df, hemisphere=hemisphere, repetitions=EXPECTED_REPETITIONS
        )

        # Check for missing values
        if df[metric].isna().any():
            n_missing = df[metric].isna().sum()
            print(f"Warning: Found {n_missing} missing values in {metric} data")

        if (df[metric] < 0).any():
            raise ValueError(f"{metric} values cannot be negative")

        print(f"Data validation completed successfully for {metric}")
        return df

    except Exception as e:
        print(f"Error loading {metric} data: {e}")
        raise

## Numerical-Anatomical Variability Ratio

In [48]:
from dataclasses import dataclass
from pathlib import Path
from typing import Callable, Iterable, List, Sequence, Tuple, Dict, Optional
import numpy as np
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly
from scipy.stats import f


# --------------------------------------------------------------------------------------
# Types & utilities
# --------------------------------------------------------------------------------------

GetStatsMetricFn = Callable[
    ..., pd.DataFrame
]  # signature: get_stats_metric(metric, hemisphere=True, bilateral=False)


@dataclass(frozen=True)
class NavrConfig:
    metric: str
    bilateral: bool = False
    change: bool = False
    csv_dir: Optional[Path] = None
    figures_dir: Optional[Path] = None
    regions_template: Optional[Sequence[str]] = None  # e.g., DKT list
    anondir: Optional[Callable[[Path], str]] = None  # pretty-printer for paths
    title_suffix: str = "Cortical"  # used in plot titles
    timepoint: str = ""
    k: int = 26  # number of MCA repetitions
    alpha: float = 0.05  # CI level


def _csv_filename(
    metric: str, name: str, bilateral: bool, change: bool, timepoint: str
) -> str:
    parts = [f"navr_{name}"]
    if timepoint:
        parts.append(timepoint)
    parts.append(metric)
    if bilateral:
        parts.append("bilateral")
    if change:
        parts.append("longitudinal")
    return "_".join(parts) + ".csv"


def _assert_subjects_match(df: pd.DataFrame) -> None:
    """
    Expectation:
      - Subject IDs (prefix before first '_') MUST match between baseline and follow-up.
      - Visit/session IDs (suffix after first '_') MUST differ between baseline and follow-up.
    """

    # Split "subject_visit" once into subject (prefix) and visit (suffix)
    sb, vb = (
        df["subject_visit_baseline"]
        .astype(str)
        .str.split("_", n=1, expand=True)
        .T.values
    )
    sf, vf = (
        df["subject_visit_followup"]
        .astype(str)
        .str.split("_", n=1, expand=True)
        .T.values
    )

    # Handle missing underscores -> visit becomes NaN
    # (this keeps comparisons well-defined)
    # Vectorized comparisons
    subj_equal = sb == sf
    visit_diff = vb != vf

    subjects_match_all = bool(subj_equal.all())
    visits_mismatch_all = bool(visit_diff.all())

    if not subjects_match_all or not visits_mismatch_all:
        print("Subject/visit consistency check failed between baseline and follow-up.")

        # Build a small diagnostics DataFrame
        diag = df[["subject_visit_baseline", "subject_visit_followup"]].copy()
        diag["subject_match"] = subj_equal
        diag["visit_mismatch"] = visit_diff

        # Rows violating either rule:
        bad_rows = diag[~diag["subject_match"] | ~diag["visit_mismatch"]]
        # Show up to first 20 problematic rows to avoid huge outputs
        print(bad_rows.head(20))

        # Also summarize counts
        total = len(df)
        print(
            f"\nSummary: total={total}, "
            f"subject_mismatch={(~subj_equal).sum()}, "
            f"visit_not_mismatched={(~visit_diff).sum()}"
        )

    assert (
        subjects_match_all
    ), "Subject IDs must match between baseline and follow-up (prefix before '_')."
    assert (
        visits_mismatch_all
    ), "Visit/session IDs must differ between baseline and follow-up (suffix after '_')."


def _add_navr_ci(navr_df: pd.DataFrame, k: int, alpha: float = 0.05) -> pd.DataFrame:
    """
    Add (1 - alpha) CI for NAVR using the F-distribution.

    NAVR = sigma_num / sigma_anat
    df1 = n * (k - 1)
    df2 = k * (n - 1)
    CI_NAVR = [NAVR / sqrt(F_{1 - a/2}), NAVR / sqrt(F_{a/2})]

    Assumes:
        - column 'NAVR' already exists
        - column 'n' (number of subject-visits) is constant within the dataframe
    """
    navr_df = navr_df.copy()

    # assume same n for all rows in this cohort
    n = float(navr_df["n"].iloc[0])

    df1 = n * (k - 1)
    df2 = k * (n - 1)

    F_low = f.ppf(1 - alpha / 2.0, df1, df2)
    F_high = f.ppf(alpha / 2.0, df1, df2)

    navr_df["NAVR_CI_low"] = navr_df["NAVR"] / np.sqrt(F_low)
    navr_df["NAVR_CI_high"] = navr_df["NAVR"] / np.sqrt(F_high)

    return navr_df


# --------------------------------------------------------------------------------------
# Data preparation (shared)
# --------------------------------------------------------------------------------------


def _prepare_metric_df(
    get_stats_metric: GetStatsMetricFn,
    metric: str,
    population,
    bilateral: bool,
    change: bool,
) -> Tuple[pd.DataFrame, str, int]:
    """
    Returns a tidy DataFrame with the column `metric` ready for variability computations.
    Also returns the subject column name and the number of unique subjects (Card).
    """
    df = get_stats_metric(metric, hemisphere=True, bilateral=bilateral)

    if change:
        # population = (baseline_subject_ids, followup_subject_ids)
        baseline, followup = population
        df_baseline = df[df["subject_visit"].isin(baseline)]
        df_followup = df[df["subject_visit"].isin(followup)]
        df = pd.merge(
            df_baseline,
            df_followup,
            on=["PATNO", "region", "repetition", "hemisphere"],
            suffixes=("_baseline", "_followup"),
        )
        _assert_subjects_match(df)

        # relative error follow-up / baseline for the metric
        metric_t1 = df[f"{metric}_baseline"].astype(np.float64)
        metric_t2 = df[f"{metric}_followup"].astype(np.float64)
        df[metric] = (metric_t2 - metric_t1) / metric_t1
        # clean up
        drop_cols = [
            f"{metric}_baseline",
            f"{metric}_followup",
            "subject_visit_followup",
            "subject_visit_baseline",
        ]
        df = df.drop(columns=[c for c in drop_cols if c in df.columns])
        subject_col = "PATNO"
    else:
        # population = list/iterable of subjects
        df = df[df["subject_visit"].isin(population)]
        df[metric] = df[metric].astype(np.float64)
        subject_col = "subject_visit"

    card = df[subject_col].nunique()
    return df, subject_col, card


def _group_variance(
    df: pd.DataFrame,
    metric: str,
    subject_col: str,
    mode: str,
) -> pd.DataFrame:
    """
    Compute variance by the appropriate grouping:
      - Numerical variability: var over repetitions within (region, subject, hemisphere)
      - Anatomical variability: var over subjects within (region, repetition, hemisphere)
    """
    if mode == "numerical":
        grp = ["region", subject_col, "hemisphere"]
    elif mode == "anatomical":
        grp = ["region", "repetition", "hemisphere"]
    else:
        raise ValueError("mode must be 'numerical' or 'anatomical'")

    out = (
        df[grp + [metric]]
        .groupby(grp, dropna=False)
        .var()
        .reset_index()
        .rename(columns={metric: "variance"})
    )
    return out


def _std_over_mean_of_variances(
    var_df: pd.DataFrame, group_cols: List[str]
) -> pd.Series:
    """
    Given a variance dataframe with 'variance' column, compute
    sqrt(mean(variance)) per region[/hemisphere].
    """
    return np.sqrt(var_df.groupby(group_cols, dropna=False)["variance"].mean())


def _navr_df(
    std_num: pd.Series,
    std_anat: pd.Series,
    n: int,
    timepoint_name: str,
) -> pd.DataFrame:
    idx_cols = list(std_num.index.names)
    navr = pd.DataFrame(
        {
            "NAVR": std_num / std_anat,
            "num": std_num,
            "anat": std_anat,
            "n": n,
            "timepoint": timepoint_name,
        }
    ).reset_index()
    # ensure expected columns exist even when bilateral=True (no hemisphere)
    for col in ["region", "hemisphere"]:
        if col not in navr.columns:
            navr[col] = np.nan
    return navr


def compute_variabilities_for_population(
    get_stats_metric: GetStatsMetricFn,
    metric: str,
    population,
    bilateral: bool,
    change: bool,
) -> Tuple[int, pd.DataFrame, int, pd.DataFrame]:
    """
    For a population, compute numerical and anatomical variance tables and cards.
    Returns: (card_num, var_num_df, card_anat, var_anat_df)
    """
    df, subject_col, _ = _prepare_metric_df(
        get_stats_metric, metric, population, bilateral, change
    )
    num_var = _group_variance(df, metric, subject_col, mode="numerical")
    anat_var = _group_variance(df, metric, subject_col, mode="anatomical")
    # Cards (unique subjects)
    card = df[subject_col].nunique()
    return card, num_var, card, anat_var


# --------------------------------------------------------------------------------------
# Public API: pooled variance (generalized and safer)
# --------------------------------------------------------------------------------------


def compute_pooled_variance(
    card_1: int,
    population_1: pd.DataFrame,
    card_2: int,
    population_2: pd.DataFrame,
    value_col: str = "variance",
) -> pd.DataFrame:
    """
    Pooled variance by region (and hemisphere if present) across two variance tables
    with the same index columns and a value column (default 'variance').
    """
    index_cols = [c for c in population_1.columns if c in ("region", "hemisphere")]
    assert population_1[index_cols].equals(
        population_2[index_cols]
    ), "Indices mismatch between populations"

    out = population_1.copy()
    out["pooled_variance"] = (
        (card_1 - 1) * population_1[value_col] + (card_2 - 1) * population_2[value_col]
    ) / (card_1 + card_2 - 2)
    return out


# --------------------------------------------------------------------------------------
# Stats printing
# --------------------------------------------------------------------------------------


def _print_navr_stats(
    navr_df: pd.DataFrame, label: str, regions_template: Optional[Sequence[str]] = None
) -> None:
    mean_all = navr_df["NAVR"].mean()
    std_all = navr_df["NAVR"].std()
    rng = (navr_df["NAVR"].min(), navr_df["NAVR"].max())

    print(f"\n{label} NAVR Statistics:")
    print(f"Mean NAVR: {mean_all:.3f} ± {std_all:.3f}")
    print(f"Range: [{rng[0]:.3f}, {rng[1]:.3f}]")

    if regions_template is not None:
        dkt_mask = navr_df["region"].isin(regions_template)
        if dkt_mask.any():
            mean_dkt = navr_df.loc[dkt_mask, "NAVR"].mean()
            std_dkt = navr_df.loc[dkt_mask, "NAVR"].std()
            print(f"Mean NAVR (template regions): {mean_dkt:.3f} ± {std_dkt:.3f}")


# --------------------------------------------------------------------------------------
# Plotting (split into small, reusable functions)
# --------------------------------------------------------------------------------------


def plot_navr_by_region(
    navr_df: pd.DataFrame, metric: str, title_suffix: str, bilateral: bool
):

    title = f"Numerical-Anatomical Variability Ratio (NAVR) for {title_suffix} {metric.capitalize()}"
    title += f" (at {config.timepoint})" if config.timepoint else ""

    fig = px.scatter(
        navr_df,
        x="region",
        y="NAVR",
        color="timepoint",
        hover_name="region",
        symbol="hemisphere" if not bilateral else None,
        hover_data={"NAVR": ":.3f", "num": ":.2e", "anat": ":.2e"},
        title=title,
    )
    fig.update_traces(marker=dict(size=12, line=dict(width=1, color="white")))
    fig.update_layout(
        xaxis_title="Brain Region",
        yaxis_title="NAVR (σ<sub>numerical</sub>/σ<sub>anatomical</sub>)",
        xaxis_tickangle=45,
        xaxis_ticks="outside",
        yaxis_ticks="outside",
        title_x=0.5,
        font=dict(size=14),
        title_font=dict(size=16),
        height=600,
        width=1400,
        plot_bgcolor="white",
    )
    fig.add_hline(
        y=navr_df["NAVR"].mean(),
        line_dash="dash",
        line_color="gray",
        annotation_text=f"Mean NAVR = {navr_df['NAVR'].mean():.3f}",
        annotation_position="bottom right",
    )
    fig.update_xaxes(showline=True, linewidth=1, linecolor="black")
    fig.update_yaxes(
        showline=True, linewidth=1, linecolor="black", exponentformat="power"
    )
    return fig


def plot_num_vs_anat_subplots(
    navr_df: pd.DataFrame, metric: str, bilateral: bool, title_suffix: str
):
    fig = make_subplots(
        rows=1,
        cols=2,
        subplot_titles=[
            "Numerical Variability (σ<sub>num</sub>)",
            "Anatomical Variability (σ<sub>anat</sub>)",
        ],
        shared_yaxes=True,
        horizontal_spacing=0.08,
    )
    colormap = dict(
        zip(navr_df["timepoint"].unique(), plotly.colors.qualitative.Plotly)
    )
    symbolmap = {"lh": "circle", "rh": "square", np.nan: "diamond"}

    def _add_scatter(data, name, col, timepoint):
        fig.add_scatter(
            x=data["region"],
            y=data[name],
            mode="markers",
            marker=dict(
                size=10,
                symbol=(
                    data["hemisphere"].map(symbolmap, na_action="ignore")
                    if not bilateral
                    else "circle"
                ),
                color=colormap[timepoint],
                line=dict(width=1, color="white"),
            ),
            name=f"{'Numerical' if name=='num' else 'Anatomical'} ({timepoint})",
            showlegend=True,
            row=1,
            col=col,
        )

    if not bilateral and "hemisphere" in navr_df.columns:
        for hemisphere in navr_df["hemisphere"].dropna().unique():
            for timepoint in navr_df["timepoint"].unique():
                data = navr_df[
                    (navr_df["timepoint"] == timepoint)
                    & (navr_df["hemisphere"] == hemisphere)
                ]
                for col, name in enumerate(["num", "anat"], start=1):
                    _add_scatter(data, name, col, timepoint)
    else:
        for timepoint in navr_df["timepoint"].unique():
            data = navr_df[navr_df["timepoint"] == timepoint]
            for col, name in enumerate(["num", "anat"], start=1):
                _add_scatter(data, name, col, timepoint)

    fig.update_xaxes(tickangle=45)
    fig.update_yaxes(
        title_text="Standard Deviation", type="log", exponentformat="power", row=1
    )

    title = (
        f"Numerical vs Anatomical Variability for {title_suffix} {metric.capitalize()}"
    )
    title += f" (at {config.timepoint})" if config.timepoint else ""

    fig.update_layout(
        height=600,
        width=1400,
        title_text=title,
        title_x=0.5,
        font=dict(size=14),
        plot_bgcolor="white",
        xaxis_ticks="outside",
        yaxis_ticks="outside",
    )
    fig.update_xaxes(showline=True, linewidth=1, linecolor="black")
    fig.update_yaxes(
        showline=True, linewidth=1, linecolor="black", exponentformat="power"
    )
    return fig


def plot_num_vs_anat_correlation(navr_df: pd.DataFrame, metric: str, title_suffix: str):
    regions_excluded = ["eTIV", "BrainSegVolNotVent"]
    df = navr_df[~navr_df.region.isin(regions_excluded)].copy()

    title = f"Numerical vs Anatomical Variability Correlation for {title_suffix} {metric.capitalize()}"
    title += f" (at {config.timepoint})" if config.timepoint else ""

    fig = px.scatter(
        df,
        x="num",
        y="anat",
        color="NAVR",
        symbol="timepoint",
        hover_name="region",
        hover_data={"NAVR": ":.3f"},
        color_continuous_scale="viridis",
        title=title,
        trendline="ols",
    )
    fig.update_traces(marker=dict(size=12, line=dict(width=1, color="white")))
    fig.update_layout(
        xaxis_title="Numerical Variability (σ<sub>num</sub>)",
        yaxis_title="Anatomical Variability (σ<sub>anat</sub>)",
        xaxis_ticks="outside",
        yaxis_ticks="outside",
        title_x=0.5,
        font=dict(size=14),
        height=600,
        width=1400,
        plot_bgcolor="white",
        coloraxis_colorbar=dict(title="NAVR"),
    )
    correlation = df["num"].corr(df["anat"])
    fig.add_annotation(
        x=0.05,
        y=0.95,
        xref="paper",
        yref="paper",
        text=f"r = {correlation:.3f}",
        showarrow=False,
        font=dict(size=14),
        bgcolor="white",
        bordercolor="black",
        borderwidth=2,
    )
    fig.update_legends(
        title_text="Timepoint",
        title_font=dict(size=12),
        font=dict(size=12),
        y=1.2,
        x=0.02,
    )
    fig.update_xaxes(
        showline=True, linewidth=1, linecolor="black", exponentformat="power"
    )
    fig.update_yaxes(
        showline=True, linewidth=1, linecolor="black", exponentformat="power"
    )
    return fig


# --------------------------------------------------------------------------------------
# Main driver
# --------------------------------------------------------------------------------------


def generate_navr(
    get_stats_metric: GetStatsMetricFn,
    pop1,
    pop2,
    pop_all,
    pop1_name: str,
    pop2_name: str,
    config: NavrConfig,
):
    """
    End-to-end NAVR generation: compute variabilities, export CSVs, print stats, and return figures.
    - pop1, pop2: baseline / follow-up (or two cohorts)
    - pop_all: union / combined cohort
    """
    metric = config.metric
    bilateral = config.bilateral
    change = config.change
    group_cols = ["region"] if bilateral else ["region", "hemisphere"]

    # Compute numerical & anatomical variances per cohort
    card1, var_num_1, _, var_anat_1 = compute_variabilities_for_population(
        get_stats_metric, metric, pop1, bilateral, change
    )
    card2, var_num_2, _, var_anat_2 = compute_variabilities_for_population(
        get_stats_metric, metric, pop2, bilateral, change
    )
    card_all, var_num_all, _, var_anat_all = compute_variabilities_for_population(
        get_stats_metric, metric, pop_all, bilateral, change
    )

    # std = sqrt(mean(variance) over appropriate grouping)
    std_num_1 = _std_over_mean_of_variances(var_num_1, group_cols)
    std_anat_1 = _std_over_mean_of_variances(var_anat_1, group_cols)
    std_num_2 = _std_over_mean_of_variances(var_num_2, group_cols)
    std_anat_2 = _std_over_mean_of_variances(var_anat_2, group_cols)
    std_num_A = _std_over_mean_of_variances(var_num_all, group_cols)
    std_anat_A = _std_over_mean_of_variances(var_anat_all, group_cols)

    # NAVR dataframes
    n1 = np.concatenate(pop1).shape[0] if change else len(pop1)
    n2 = np.concatenate(pop2).shape[0] if change else len(pop2)
    nA = np.concatenate(pop_all).shape[0] if change else len(pop_all)

    # Insert CI
    k = config.k if hasattr(config, "k") else 26
    alpha = config.alpha if hasattr(config, "alpha") else 0.05

    navr_pop1_df = _navr_df(std_num_1, std_anat_1, n1, pop1_name)
    navr_pop1_df = _add_navr_ci(navr_pop1_df, k=k, alpha=alpha)

    navr_pop2_df = _navr_df(std_num_2, std_anat_2, n2, pop2_name)
    navr_pop2_df = _add_navr_ci(navr_pop2_df, k=k, alpha=alpha)

    navr_all_df = _navr_df(std_num_A, std_anat_A, nA, pop1_name + " & " + pop2_name)
    navr_all_df = _add_navr_ci(navr_all_df, k=k, alpha=alpha)

    # Save CSVs (optional)
    if config.csv_dir is not None:
        config.csv_dir.mkdir(parents=True, exist_ok=True)

        f1 = config.csv_dir / _csv_filename(
            metric, pop1_name, bilateral, change, config.timepoint
        )
        f2 = config.csv_dir / _csv_filename(
            metric, pop2_name, bilateral, change, config.timepoint
        )
        fA = config.csv_dir / _csv_filename(
            metric, pop1_name + "-" + pop2_name, bilateral, change, config.timepoint
        )

        navr_pop1_df.to_csv(f1, index=False)
        navr_pop2_df.to_csv(f2, index=False)
        navr_all_df.to_csv(fA, index=False)

        if config.anondir:
            print(f"NAVR results saved to {config.anondir(f1)}")
            print(f"NAVR results saved to {config.anondir(f2)}")
            print(f"NAVR results saved to {config.anondir(fA)}")
        else:
            print(f"NAVR results saved to {f1}")
            print(f"NAVR results saved to {f2}")
            print(f"NAVR results saved to {fA}")

    # Print stats
    _print_navr_stats(
        navr_pop1_df, label=pop1_name, regions_template=config.regions_template
    )
    _print_navr_stats(
        navr_pop2_df, label=pop2_name, regions_template=config.regions_template
    )
    _print_navr_stats(
        navr_all_df,
        label=f"{pop1_name} & {pop2_name}",
        regions_template=config.regions_template,
    )

    # Combine for plotting
    navr_df = pd.concat(
        [navr_pop1_df, navr_pop2_df, navr_all_df], axis=0, ignore_index=True
    )

    fig1 = plot_navr_by_region(
        navr_df, metric=metric, title_suffix=config.title_suffix, bilateral=bilateral
    )
    fig2 = plot_num_vs_anat_subplots(
        navr_df, metric=metric, bilateral=bilateral, title_suffix=config.title_suffix
    )
    fig3 = plot_num_vs_anat_correlation(
        navr_df, metric=metric, title_suffix=config.title_suffix
    )

    # Display (comment out if running headless)
    display(fig1)
    display(fig2)
    display(fig3)

    if figures_dir is not None:

        f1 = (
            config.figures_dir
            / "scatter"
            / _csv_filename(
                metric, pop1_name + "-" + pop2_name, bilateral, change, config.timepoint
            ).replace(".csv", ".png")
        )
        f2 = (
            config.figures_dir
            / "num_anat"
            / _csv_filename(
                metric, pop1_name + "-" + pop2_name, bilateral, change, config.timepoint
            ).replace(".csv", ".png")
        )
        f3 = (
            config.figures_dir
            / "correlation"
            / _csv_filename(
                metric, pop1_name + "-" + pop2_name, bilateral, change, config.timepoint
            ).replace(".csv", ".png")
        )

        f1.parent.mkdir(parents=True, exist_ok=True)
        f2.parent.mkdir(parents=True, exist_ok=True)
        f3.parent.mkdir(parents=True, exist_ok=True)

        fig1.write_image(f1, height=900, width=1200)
        fig2.write_image(f2, height=900, width=1200)
        fig3.write_image(f3, height=900, width=1200)

        if config.anondir:
            print(f"NAVR figure saved to {config.anondir(f1)}")
            print(f"NAVR figure saved to {config.anondir(f2)}")
            print(f"NAVR figure saved to {config.anondir(f3)}")
        else:
            print(f"NAVR figure saved to {f1}")
            print(f"NAVR figure saved to {f2}")
            print(f"NAVR figure saved to {f3}")

    # Return everything useful for downstream use/tests
    return {
        "navr_df": navr_df,
        "pop1_df": navr_pop1_df,
        "pop2_df": navr_pop2_df,
        "all_df": navr_all_df,
        "fig_navr": fig1,
        "fig_subplots": fig2,
        "fig_corr": fig3,
        "cards": {"pop1": card1, "pop2": card2, "all": card_all},
        "var_tables": {
            "num_pop1": var_num_1,
            "anat_pop1": var_anat_1,
            "num_pop2": var_num_2,
            "anat_pop2": var_anat_2,
            "num_all": var_num_all,
            "anat_all": var_anat_all,
        },
    }

### Baseline - HC

In [49]:
import numpy as np

# Get populations and compute variability
pop_hc_baseline, _ = get_population(df, df_cohort, visit="first")
pop_hc_followup, _ = get_population(df, df_cohort, visit="second")
pop_hc = np.concatenate([pop_hc_baseline, pop_hc_followup])

config = NavrConfig(
    metric=METRIC,
    bilateral=False,
    change=False,
    csv_dir=csv_dir,
    figures_dir=figures_dir,
    regions_template=regions_template,
    anondir=anondir,
    title_suffix="Cortical",
)

res = generate_navr(
    get_stats_metric=get_stats_metric,
    pop1=pop_hc_baseline,
    pop2=pop_hc_followup,
    pop_all=pop_hc,
    pop1_name="hc_baseline",
    pop2_name="hc_followup",
    config=config,
)

Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
NAVR results saved to <living-park>/navr/csv_all/navr_hc_baseline_area.csv
NAVR results saved to <living-park>/navr/csv_all/navr_hc_followup_area.csv
NAVR results saved to <living-park>/navr/csv_all/navr_hc_baseline-hc_followup_area.csv

hc_baseline NAVR Statistics:
Mean NAVR: 0.163 ± 0.092
Range: [0.001, 0.423]
Mean NAVR (template regions): 0.175 ± 0.085

hc_followup NAVR Statistics:
Mean NAVR: 0.148 ± 0.092
Range: [0.000, 0.402]
Mean NAVR (template regions): 0.159 ± 0.087

hc_baseline & hc_followup NAVR Statistics:
Mean NAVR: 0.157 ± 0.091
Range: [0.001, 0.408]
Mean NAVR (template

NAVR figure saved to <living-park>/navr/figures_all/scatter/navr_hc_baseline-hc_followup_area.png
NAVR figure saved to <living-park>/navr/figures_all/num_anat/navr_hc_baseline-hc_followup_area.png
NAVR figure saved to <living-park>/navr/figures_all/correlation/navr_hc_baseline-hc_followup_area.png


![](../../navr/figures_all/scatter/navr_hc_baseline_area.png)
![](../../navr/figures_all/scatter/navr_hc_followup_area.png)
![](../../navr/figures_all/scatter/navr)

![](../../navr/figures_all/scatter/navr_hc_baseline-hc_followup_area.png)
![](../../navr/figures_all/num_anat/navr_hc_baseline-hc_followup_area.png)
![](../../navr/figures_all/correlation/navr_hc_baseline-hc_followup_area.png)

![](../../navr/figures_all/)

### Baseline - PD

In [50]:
import numpy as np

# Get populations and compute variability
_, pop_pd_baseline = get_population(df, df_cohort, visit="first")
_, pop_pd_followup = get_population(df, df_cohort, visit="second")
pop_pd = np.concatenate([pop_pd_baseline, pop_pd_followup])

config = NavrConfig(
    metric=METRIC,
    bilateral=False,
    change=False,
    csv_dir=csv_dir,
    figures_dir=figures_dir,
    regions_template=regions_template,
    anondir=anondir,
    title_suffix="Cortical",
)

_ = generate_navr(
    get_stats_metric=get_stats_metric,
    pop1=pop_pd_baseline,
    pop2=pop_pd_followup,
    pop_all=pop_pd,
    pop1_name="pd_baseline",
    pop2_name="pd_followup",
    config=config,
)

Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
NAVR results saved to <living-park>/navr/csv_all/navr_pd_baseline_area.csv
NAVR results saved to <living-park>/navr/csv_all/navr_pd_followup_area.csv
NAVR results saved to <living-park>/navr/csv_all/navr_pd_baseline-pd_followup_area.csv

pd_baseline NAVR Statistics:
Mean NAVR: 0.168 ± 0.091
Range: [0.000, 0.421]
Mean NAVR (template regions): 0.181 ± 0.084

pd_followup NAVR Statistics:
Mean NAVR: 0.177 ± 0.093
Range: [0.001, 0.435]
Mean NAVR (template regions): 0.190 ± 0.086

pd_baseline & pd_followup NAVR Statistics:
Mean NAVR: 0.173 ± 0.092
Range: [0.001, 0.426]
Mean NAVR (template

NAVR figure saved to <living-park>/navr/figures_all/scatter/navr_pd_baseline-pd_followup_area.png
NAVR figure saved to <living-park>/navr/figures_all/num_anat/navr_pd_baseline-pd_followup_area.png
NAVR figure saved to <living-park>/navr/figures_all/correlation/navr_pd_baseline-pd_followup_area.png


![](../../navr/figures_all/scatter/navr_pd_baseline-pd_followup_area.png)
![](../../navr/figures_all/num_anat/navr_pd_baseline-pd_followup_area.png)
![](../../navr/figures_all/correlation/navr_pd_baseline-pd_followup_area.png)

### Baseline HC+PD

In [51]:
pop_hc_baseline, pop_pd_baseline = get_population(df, df_cohort, visit="first")
pop_hc_followup, pop_pd_followup = get_population(df, df_cohort, visit="second")

pop_hc = pop_hc_baseline
pop_pd = pop_pd_baseline
pop_all = np.concatenate([pop_hc_baseline, pop_pd_baseline])

config = NavrConfig(
    metric=METRIC,
    bilateral=False,
    change=False,
    csv_dir=csv_dir,
    figures_dir=figures_dir,
    regions_template=regions_template,
    anondir=anondir,
    title_suffix="Cortical",
    timepoint="baseline",
)

_ = generate_navr(
    get_stats_metric=get_stats_metric,
    pop1=pop_hc,
    pop2=pop_pd,
    pop_all=pop_all,
    pop1_name="hc",
    pop2_name="pd",
    config=config,
)

pop_hc = pop_hc_followup
pop_pd = pop_pd_followup
pop_all = np.concatenate([pop_hc_followup, pop_pd_followup])

config = NavrConfig(
    metric=METRIC,
    bilateral=False,
    change=False,
    csv_dir=csv_dir,
    figures_dir=figures_dir,
    regions_template=regions_template,
    anondir=anondir,
    title_suffix="Cortical",
    timepoint="followup",
)

_ = generate_navr(
    get_stats_metric=get_stats_metric,
    pop1=pop_hc,
    pop2=pop_pd,
    pop_all=pop_all,
    pop1_name="hc",
    pop2_name="pd",
    config=config,
)

Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
NAVR results saved to <living-park>/navr/csv_all/navr_hc_baseline_area.csv
NAVR results saved to <living-park>/navr/csv_all/navr_pd_baseline_area.csv
NAVR results saved to <living-park>/navr/csv_all/navr_hc-pd_baseline_area.csv

hc NAVR Statistics:
Mean NAVR: 0.163 ± 0.092
Range: [0.001, 0.423]
Mean NAVR (template regions): 0.175 ± 0.085

pd NAVR Statistics:
Mean NAVR: 0.168 ± 0.091
Range: [0.000, 0.421]
Mean NAVR (template regions): 0.181 ± 0.084

hc & pd NAVR Statistics:
Mean NAVR: 0.164 ± 0.089
Range: [0.001, 0.398]
Mean NAVR (template regions): 0.176 ± 0.082


NAVR figure saved to <living-park>/navr/figures_all/scatter/navr_hc-pd_baseline_area.png
NAVR figure saved to <living-park>/navr/figures_all/num_anat/navr_hc-pd_baseline_area.png
NAVR figure saved to <living-park>/navr/figures_all/correlation/navr_hc-pd_baseline_area.png
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
NAVR results saved to <living-park>/navr/csv_all/navr_hc_followup_area.csv
NAVR results saved to <living-park>/navr/csv_all/navr_pd_followup_area.csv
NAVR results saved to <living-park>/navr/csv_all/navr_hc-pd_followup_area.csv

hc NAVR Statistics:
Mean NAVR: 0.148 ± 0.092
Range: [0.000, 0.402]
Mean NAVR (template r

NAVR figure saved to <living-park>/navr/figures_all/scatter/navr_hc-pd_followup_area.png
NAVR figure saved to <living-park>/navr/figures_all/num_anat/navr_hc-pd_followup_area.png
NAVR figure saved to <living-park>/navr/figures_all/correlation/navr_hc-pd_followup_area.png


![](../../navr/figures_all/scatter/navr_hc-pd_baseline_area.png)
![](../../navr/figures_all/num_anat/navr_hc-pd_baseline_area.png)
![](../../navr/figures_all/correlation/navr_hc-pd_baseline_area.png)

![](../../navr/figures_all/scatter/navr_hc-pd_followup_area.png)
![](../../navr/figures_all/num_anat/navr_hc-pd_followup_area.png)
![](../../navr/figures_all/correlation/navr_hc-pd_followup_area.png)


### Longitudinal

In [52]:
import numpy as np

# Get populations and compute variability
pop_hc_baseline, pop_pd_baseline = get_population(df, df_cohort, visit="first")
pop_hc_followup, pop_pd_followup = get_population(df, df_cohort, visit="second")
pop_hc = [pop_hc_baseline, pop_hc_followup]
pop_pd = [pop_pd_baseline, pop_pd_followup]
pop_all = [
    np.concatenate([pop_hc_baseline, pop_pd_baseline]),
    np.concatenate([pop_hc_followup, pop_pd_followup]),
]

config = NavrConfig(
    metric=METRIC,
    bilateral=False,
    change=True,
    csv_dir=csv_dir,
    figures_dir=figures_dir,
    regions_template=regions_template,
    anondir=anondir,
    title_suffix="Cortical",
)

_ = generate_navr(
    get_stats_metric=get_stats_metric,
    pop1=pop_hc,
    pop2=pop_pd,
    pop_all=pop_all,
    pop1_name="hc",
    pop2_name="pd",
    config=config,
)

Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
NAVR results saved to <living-park>/navr/csv_all/navr_hc_area_longitudinal.csv
NAVR results saved to <living-park>/navr/csv_all/navr_pd_area_longitudinal.csv
NAVR results saved to <living-park>/navr/csv_all/navr_hc-pd_area_longitudinal.csv

hc NAVR Statistics:
Mean NAVR: 0.587 ± 0.181
Range: [0.004, 0.895]
Mean NAVR (template regions): 0.623 ± 0.133

pd NAVR Statistics:
Mean NAVR: 0.607 ± 0.175
Range: [0.006, 0.891]
Mean NAVR (template regions): 0.645 ± 0.117

hc & pd NAVR Statistics:
Mean NAVR: 0.596 ± 0.174
Range: [0.004, 0.894]
Mean NAVR (template regions): 0.634 ± 0.119


NAVR figure saved to <living-park>/navr/figures_all/scatter/navr_hc-pd_area_longitudinal.png
NAVR figure saved to <living-park>/navr/figures_all/num_anat/navr_hc-pd_area_longitudinal.png
NAVR figure saved to <living-park>/navr/figures_all/correlation/navr_hc-pd_area_longitudinal.png


![](../../navr/figures_all/scatter/navr_hc_area_longitudinal.csv.png)
![](../../navr/figures_all/num_anat/navr_hc_area_longitudinal.csv.png)
![](../../navr/figures_all/correlation/navr_hc_area_longitudinal.csv.png)

## Bilateral

### Baseline - HC

In [53]:
import numpy as np

# Get populations and compute variability
pop_hc_baseline, _ = get_population(df, df_cohort, visit="first")
pop_hc_followup, _ = get_population(df, df_cohort, visit="second")
pop_hc = np.concatenate([pop_hc_baseline, pop_hc_followup])

config = NavrConfig(
    metric=METRIC,
    bilateral=True,
    change=False,
    csv_dir=csv_dir,
    figures_dir=figures_dir,
    regions_template=regions_template,
    anondir=anondir,
    title_suffix="Cortical",
)

_ = generate_navr(
    get_stats_metric=get_stats_metric,
    pop1=pop_hc_baseline,
    pop2=pop_hc_followup,
    pop_all=pop_hc,
    pop1_name="hc_baseline",
    pop2_name="hc_followup",
    config=config,
)

Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
NAVR results saved to <living-park>/navr/csv_all/navr_hc_baseline_area_bilateral.csv
NAVR results saved to <living-park>/navr/csv_all/navr_hc_followup_area_bilateral.csv
NAVR results saved to <living-park>/navr/csv_all/navr_hc_baseline-hc_followup_area_bilateral.csv

hc_baseline NAVR Statistics:
Mean NAVR: 0.165 ± 0.090
Range: [0.001, 0.388]
Mean NAVR (template regions): 0.173 ± 0.081

hc_followup NAVR Statistics:
Mean NAVR: 0.148 ± 0.091
Range: [0.000, 0.390]
Mean NAVR (template regions): 0.154 ± 0.082

hc_baseline & hc_followup NAVR Statistics:
Mean NAVR: 0.158 ± 0.089
Range: [0.0

NAVR figure saved to <living-park>/navr/figures_all/scatter/navr_hc_baseline-hc_followup_area_bilateral.png
NAVR figure saved to <living-park>/navr/figures_all/num_anat/navr_hc_baseline-hc_followup_area_bilateral.png
NAVR figure saved to <living-park>/navr/figures_all/correlation/navr_hc_baseline-hc_followup_area_bilateral.png


![](../../navr/figures_all/scatter/navr_hc_baseline-hc_followup_area_bilateral.png)
![](../../navr/figures_all/num_anat/navr_hc_baseline-hc_followup_area_bilateral.png)
![](../../navr/figures_all/correlation/navr_hc_baseline-hc_followup_area_bilateral.png)

### Baseline - PD

In [54]:
import numpy as np

# Get populations and compute variability
_, pop_pd_baseline = get_population(df, df_cohort, visit="first")
_, pop_pd_followup = get_population(df, df_cohort, visit="second")
pop_pd = np.concatenate([pop_pd_baseline, pop_pd_followup])

config = NavrConfig(
    metric=METRIC,
    bilateral=True,
    change=False,
    csv_dir=csv_dir,
    figures_dir=figures_dir,
    regions_template=regions_template,
    anondir=anondir,
    title_suffix="Cortical",
)

_ = generate_navr(
    get_stats_metric=get_stats_metric,
    pop1=pop_pd_baseline,
    pop2=pop_pd_followup,
    pop_all=pop_pd,
    pop1_name="pd_baseline",
    pop2_name="pd_followup",
    config=config,
)

Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
NAVR results saved to <living-park>/navr/csv_all/navr_pd_baseline_area_bilateral.csv
NAVR results saved to <living-park>/navr/csv_all/navr_pd_followup_area_bilateral.csv
NAVR results saved to <living-park>/navr/csv_all/navr_pd_baseline-pd_followup_area_bilateral.csv

pd_baseline NAVR Statistics:
Mean NAVR: 0.169 ± 0.091
Range: [0.000, 0.404]
Mean NAVR (template regions): 0.177 ± 0.079

pd_followup NAVR Statistics:
Mean NAVR: 0.179 ± 0.091
Range: [0.001, 0.422]
Mean NAVR (template regions): 0.187 ± 0.078

pd_baseline & pd_followup NAVR Statistics:
Mean NAVR: 0.174 ± 0.090
Range: [0.0

NAVR figure saved to <living-park>/navr/figures_all/scatter/navr_pd_baseline-pd_followup_area_bilateral.png
NAVR figure saved to <living-park>/navr/figures_all/num_anat/navr_pd_baseline-pd_followup_area_bilateral.png
NAVR figure saved to <living-park>/navr/figures_all/correlation/navr_pd_baseline-pd_followup_area_bilateral.png


![](../../navr/figures_all/scatter//navr_pd_baseline-pd_followup_area_bilateral.png)
![](../../navr/figures_all/num_anat//navr_pd_baseline-pd_followup_area_bilateral.png)
![](../../navr/figures_all/correlation//navr_pd_baseline-pd_followup_area_bilateral.png)

### Baseline HC+PD

In [55]:
pop_hc_baseline, pop_pd_baseline = get_population(df, df_cohort, visit="first")
pop_hc_followup, pop_pd_followup = get_population(df, df_cohort, visit="second")

pop_hc = pop_hc_baseline
pop_pd = pop_pd_baseline
pop_all = np.concatenate([pop_hc_baseline, pop_pd_baseline])

config = NavrConfig(
    metric=METRIC,
    bilateral=True,
    change=False,
    csv_dir=csv_dir,
    figures_dir=figures_dir,
    regions_template=regions_template,
    anondir=anondir,
    title_suffix="Cortical",
    timepoint="baseline",
)

_ = generate_navr(
    get_stats_metric=get_stats_metric,
    pop1=pop_hc,
    pop2=pop_pd,
    pop_all=pop_all,
    pop1_name="hc",
    pop2_name="pd",
    config=config,
)

pop_hc = pop_hc_followup
pop_pd = pop_pd_followup
pop_all = np.concatenate([pop_hc_followup, pop_pd_followup])

config = NavrConfig(
    metric=METRIC,
    bilateral=True,
    change=False,
    csv_dir=csv_dir,
    figures_dir=figures_dir,
    regions_template=regions_template,
    anondir=anondir,
    title_suffix="Cortical",
    timepoint="followup",
)

_ = generate_navr(
    get_stats_metric=get_stats_metric,
    pop1=pop_hc,
    pop2=pop_pd,
    pop_all=pop_all,
    pop1_name="hc",
    pop2_name="pd",
    config=config,
)

Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
NAVR results saved to <living-park>/navr/csv_all/navr_hc_baseline_area_bilateral.csv
NAVR results saved to <living-park>/navr/csv_all/navr_pd_baseline_area_bilateral.csv
NAVR results saved to <living-park>/navr/csv_all/navr_hc-pd_baseline_area_bilateral.csv

hc NAVR Statistics:
Mean NAVR: 0.165 ± 0.090
Range: [0.001, 0.388]
Mean NAVR (template regions): 0.173 ± 0.081

pd NAVR Statistics:
Mean NAVR: 0.169 ± 0.091
Range: [0.000, 0.404]
Mean NAVR (template regions): 0.177 ± 0.079

hc & pd NAVR Statistics:
Mean NAVR: 0.165 ± 0.088
Range: [0.001, 0.390]
Mean NAVR (template regions): 0.17

NAVR figure saved to <living-park>/navr/figures_all/scatter/navr_hc-pd_baseline_area_bilateral.png
NAVR figure saved to <living-park>/navr/figures_all/num_anat/navr_hc-pd_baseline_area_bilateral.png
NAVR figure saved to <living-park>/navr/figures_all/correlation/navr_hc-pd_baseline_area_bilateral.png
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
NAVR results saved to <living-park>/navr/csv_all/navr_hc_followup_area_bilateral.csv
NAVR results saved to <living-park>/navr/csv_all/navr_pd_followup_area_bilateral.csv
NAVR results saved to <living-park>/navr/csv_all/navr_hc-pd_followup_area_bilateral.csv

hc NAVR Statistics:
Mean NAV

NAVR figure saved to <living-park>/navr/figures_all/scatter/navr_hc-pd_followup_area_bilateral.png
NAVR figure saved to <living-park>/navr/figures_all/num_anat/navr_hc-pd_followup_area_bilateral.png
NAVR figure saved to <living-park>/navr/figures_all/correlation/navr_hc-pd_followup_area_bilateral.png


![](../../navr/figures_all/scatter/navr_hc-pd_baseline_area_bilateral.png)
![](../../navr/figures_all/num_anat/navr_hc-pd_baseline_area_bilateral.png)
![](../../navr/figures_all/correlation/navr_hc-pd_baseline_area_bilateral.png)

![](../../navr/figures_all/scatter/navr_hc-pd_followup_area_bilateral.png)
![](../../navr/figures_all/num_anat/navr_hc-pd_followup_area_bilateral.png)
![](../../navr/figures_all/correlation/navr_hc-pd_followup_area_bilateral.png)

### Longitudinal

In [56]:
import numpy as np

# Get populations and compute variability
pop_hc_baseline, pop_pd_baseline = get_population(df, df_cohort, visit="first")
pop_hc_followup, pop_pd_followup = get_population(df, df_cohort, visit="second")
pop_hc = [pop_hc_baseline, pop_hc_followup]
pop_pd = [pop_pd_baseline, pop_pd_followup]
pop_all = [
    np.concatenate([pop_hc_baseline, pop_pd_baseline]),
    np.concatenate([pop_hc_followup, pop_pd_followup]),
]

config = NavrConfig(
    metric=METRIC,
    bilateral=True,
    change=True,
    csv_dir=csv_dir,
    figures_dir=figures_dir,
    regions_template=regions_template,
    anondir=anondir,
    title_suffix="Cortical",
)

_ = generate_navr(
    get_stats_metric=get_stats_metric,
    pop1=pop_hc,
    pop2=pop_pd,
    pop_all=pop_all,
    pop1_name="hc",
    pop2_name="pd",
    config=config,
)

Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
Loaded area data: 27768 rows, 46 columns
Preprocessed area data: 1027416 rows, 7 columns
Data validation completed successfully for area
NAVR results saved to <living-park>/navr/csv_all/navr_hc_area_bilateral_longitudinal.csv
NAVR results saved to <living-park>/navr/csv_all/navr_pd_area_bilateral_longitudinal.csv
NAVR results saved to <living-park>/navr/csv_all/navr_hc-pd_area_bilateral_longitudinal.csv

hc NAVR Statistics:
Mean NAVR: 0.585 ± 0.180
Range: [0.004, 0.861]
Mean NAVR (template regions): 0.617 ± 0.129

pd NAVR Statistics:
Mean NAVR: 0.616 ± 0.168
Range: [0.006, 0.861]
Mean NAVR (template regions): 0.653 ± 0.102

hc & pd NAVR Statistics:
Mean NAVR: 0.599 ± 0.166
Range: [0.004, 0.862]
Mean NAVR (template re

NAVR figure saved to <living-park>/navr/figures_all/scatter/navr_hc-pd_area_bilateral_longitudinal.png
NAVR figure saved to <living-park>/navr/figures_all/num_anat/navr_hc-pd_area_bilateral_longitudinal.png
NAVR figure saved to <living-park>/navr/figures_all/correlation/navr_hc-pd_area_bilateral_longitudinal.png


![](../../navr/figures_all/scatter/navr_hc-pd_area_bilateral_longitudinal.png)
![](../../navr/figures_all/num_anat/navr_hc-pd_area_bilateral_longitudinal.png)
![](../../navr/figures_all/correlation/navr_hc-pd_area_bilateral_longitudinal.png)