In [12]:
# Import dependencies
import CopBET_final as CopBET
import pandas as pd
import numpy as np
import os
from scipy.io import loadmat

Load all the necessary dependencies. Note that `scipy.io` is only used to load the .mat files in this case specifically.

In [13]:
# Create helper function
def load_mat_roi_data(base_path, atlas_name):
    roi_folder = os.path.join(base_path, "ROIdata", atlas_name)
    data_list = []
    
    # List all .mat files in the atlas directory
    if not os.path.exists(roi_folder):
        raise FileNotFoundError(f"Folder not found: {roi_folder}")
        
    mat_files = [f for f in os.listdir(roi_folder) if f.endswith('.mat')]
    
    for filename in mat_files:
        # Load the .mat file
        file_path = os.path.join(roi_folder, filename)
        mat_contents = loadmat(file_path)
        
        # In your MATLAB code, the variable is named 'V_roi'
        if 'V_roi' in mat_contents:
            timeseries = mat_contents['V_roi'] # This is your T x N matrix
            
            # Parse metadata from filename (e.g., sub-01_ses-LSD_task-rest_run-01_bold.mat)
            parts = filename.split('_')
            data_list.append({
                "data": timeseries.tolist(), # Convert to list for JSON compatibility
                "subject": parts[0],
                "condition": parts[1],
                "session": parts[3], # run-01, run-03, etc.
                "filename": filename
            })
    
    data_list = data_list[:3][::-1] + [data_list[-1]]  # Reorder to match MATLAB code
    return pd.DataFrame(data_list)

This helper function is specifically to load all the .mat files from the LSD dataset and assemble them into a pandas dataframe of size nx1 (with n=4). This function is specific to the CopBET folder's file hierarchy so it might not work for other datasets, although you can use the logic to assemble the same dataframe for the computation.

In [14]:
# Define initial variables
data_path = r"/usr/local/CopBET/LSDdata/exampledata"    # Path to data
ext = "bold_shortened"    # Extension of data
task = "rest"    # Task name ("music" or "rest")
session_types = ["LSD", "PLCB"]

Here the path to the LSD data is defined, along with the file path extension, task, and session types. These variables are used to compose `criteria` that `CopBET.return_target_files()` uses to filter out target subjects.

In [15]:
# Path to CopBET installation and name of directory to save results
CopBET_path = "/usr/local/CopBET"
dir_name = f"CopBET_LSD_data_ses-{'_'.join(session_types)}_task-{task}"    # Name of directory to save results

Define the path to your CopBET installation (very important!) and the directory where you would like the entropy results to be saved.

This wrapper works by parsing the Python arguments into the MATLAB engine and running the original CopBET functions and then parsing the results back. **This means that you need the original CopBET folder downloaded _and_ a valid MATLAB license.**

You can download the CopBET git repository using the following command.

`git clone https://github.com/anders-s-olsen/CopBET.git`


In [16]:
# Define criteria and parameters
criteria = [ext, task, session_types]    # Criteria to select target files (from file name)
params = {
    "LEiDA_k" : 3,
    "TR" : 2.2,
    "LZ_types" : ['LZ78temporal', 'LZ78spatial', 'LZ76temporal', 'LZ76spatial'],
    "NGSC_workers" : 32,
    "Atlas_path" : "/usr/local/CopBET/Atlases/Schaefer1000_2mm.nii"
}

# Extract the atlas into a numpy-operable format for later entropy functions.
numpy_atlas = CopBET.load_atlas(params["Atlas_path"])
numpy_atlas_4D, atlas_4D_labels = CopBET.convert_atlas_to_4D(numpy_atlas)

Here the criteria that is used to select target files is defined. Note that any sublists included in `criteria` (like `session_types`) result in `return_target_files()` looking for any one of the elements of the sublist in the file names. More importantly, the parameters for the CopBET computations are setup in the params dictionary. You can change these values here for convenience. The atlas is also extracted from its path.

In [25]:
# Get target files
subjects = [f"{data_path}/{f}/ses-{st}/func" for st in session_types for f in os.listdir(data_path) if f != "ROIdata"]    # All possible subjects
targets = CopBET.return_target_files(subjects, criteria)    # Filtered target files based on criteria

# Reorder targets to match ordering used in MATLAB code. DELETE/IGNORE if using own data.
targets = targets[2:] + targets[:2]

print("Defined initial variables")

Defined initial variables


The subjects list finds and includes all the files in the LSDdata directory except those in the ROIdata folder. You can adjust this line based on the naming conventions of your files. `targets` uses the aforementioned `return_target_files()`  to filter all the subjects based on the provided criteria.

In [26]:
# Create timeseries and filenames tables
roi_data_path = "/usr/local/CopBET/LSDdata/exampledata"
ts_atlas_name = "Schaefer1000"

CopBET.make_directory(dir_name)
tbl_filenames = pd.DataFrame(targets, columns=["targets"])
tbl_timeseries = load_mat_roi_data(base_path=roi_data_path, atlas_name=ts_atlas_name)

Creates the directory to save the entropy results. Creates a table with file paths for some specific entropy functions. Uses the helper function to load the LSD example data into a dataframe as described above.

In [27]:
ts_atlas_path = "/usr/local/CopBET/Atlases/Schaefer1000_2mm.nii"
timeseries_save_path = f"{dir_name}/CopBET_LSD_data_ses-{'_'.join(session_types)}_task-{task}_ts.pkl"
# tbl_timeseries = CopBET.get_timeseries(targets, ts_atlas_path, timeseries_save_path).drop(columns="target")

print("Created tables")

Created tables


Do not run this code snippet if using the .mat ROI data. You can use this if you need to compute your timeseries from .nii files to create `tbl_timeseries`. `get_timeseries()` saves the table of computed timeseries as a .pkl so that the timeseries need not be recomputed every time you run the code or change any parameters.

In [28]:
# Assemble computer by defining parameters
computer = CopBET.CopBET_computer(
    matlab_path = CopBET_path,
    df_timeseries = tbl_timeseries,
    df_filenames = tbl_filenames,
    results_path = dir_name,
    overwrite = True,    # optional, default is False
    keepdata = False,    # optional, default is True
    parallel = True,    # optional, default is True
    NRU_specific = False    # optional, default is False
)

print("Assembled computer")

Assembled computer


Here, we finally assemble our computer for the entropy calculations. Note that all the prior code was the assemble the needed variables for this code snippet. `overwrite` determines whether existing .csv files with entropy results should be overwritten when entropy is recomputed. `keepdata` determines whether the data input to the function should be saved and output with the results. I recommend against setting it to `True` unless you specifically want the timeseries of each target saved along with each entropy result. `parallel` determines whether parallel workers are implemented for complex computations (recommended, especially for DCC). `NRU_specific` is intended to allow the original developers to bypass user input and make use of their pre-defined directories (you would probably never set this to `True`).

In [None]:
# Compute entropies
computer.von_Neumann_entropy(savedata=True)
print("von_Neumann_entropy finished")

computer.metastate_series_complexity(savedata=True)
print("metastate_series_complexity finished")

computer.geodesic_entropy(savedata=True)
print("geodesic_entropy finished")

computer.LEiDA_transition_entropy(K=params["LEiDA_k"], savedata=True)
print("LEiDA_transition_entropy finished")

computer.temporal_entropy(TR=params["TR"], savedata=True)
print("temporal_entropy finished")

for lz_type in params["LZ_types"]:
    computer.time_series_complexity(LZtype=lz_type, savedata=True)
    print(f"time_series_complexity ({lz_type}) finished")

computer.NGSC(atlas_path=params["Atlas_path"], num_workers=params["NGSC_workers"], savedata=True)
print("NGSC finished")

computer.degree_distribution_entropy(savedata=True)
print("degree_distribution_entropy finished")

computer.intranetwork_synchrony(atlas=numpy_atlas_4D, savedata=True)
print("intranetwork_synchrony finished")

computer.sample_entropy(atlas=numpy_atlas, compute=True, savedata=True)
print("sample_entropy finished")

computer.diversity_coefficient(savedata=True)
print("diversity_coefficient finished")

computer.DCC_entropy(compute=True, savedata=True)
print("DCC_entropy finished")

Saved DataFrame to CopBET_LSD_data_ses-LSD_PLCB_task-rest/von_Neumann_entropy_CopBET_LSD_data_ses-LSD_PLCB_task-rest.csv
von_Neumann_entropy finished
Concatenating data
running kmeans and LZ calculations
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 32 workers.
Replicate 8, 18 iterations, total sum of distances = 520.219.
Replicate 23, 15 iterations, total sum of distances = 521.23.
Replicate 10, 11 iterations, total sum of distances = 520.616.
Replicate 4, 12 iterations, total sum of distances = 520.127.
Replicate 30, 9 iterations, total sum of distances = 519.749.
Replicate 9, 11 iterations, total sum of distances = 517.844.
Replicate 12, 11 iterations, total sum of distances = 518.375.
Replicate 6, 12 iterations, total sum of distances = 519.46.
Replicate 24, 12 iterations, total sum of distances = 520.45.
Replicate 14, 19 iterations, total sum of distances = 515.33.
Replicate 27, 20 iterations, total sum of distances = 515.857.
R

At last, the entropies are calculated! As you can see, once the computer is assembled, computing all the entropies is very straightforward. The `savedata` parameter for each function is just to choose whether you want the results saved in a .csv in the provided directory `dir_name`. The default is `True`. Feel free to cross-check the entropy values produced by the Python code with those from the MATLAB functions (just note that the order of the files is different, so the order of the entropies is as well).