# Condensed Matrix to Vector

This notebook seeks to create an efficient way to turn image data into a vector and turn a vector into a region by time dataframe. 

.

To accomplish this, there are two functions that behave exactly as described. 
Much of the code and logic for this program comes from the experimentaiton conducted in Initial_Atlas_Application_2023.6.8

.

There are two additional functions that ease the process of loading image data. 
1. A function to access the base filepath
2. A function to load image data from a filepath

.

At the end of this notebook is a demonstration of every function created here. 
When comparing the result with that of Initial_Atlas_Application_2023.6.8, the tests were successful

## Imports

There are a few packages that must be imported for this notebook to work correctly

- `os` for the filepaths

- `nibabel` to load the images

- `pandas` for dataframes

- `numpy` for finding unique regions

In [1]:
import os
import nibabel as nib
import pandas as pd
import numpy as np

## Functions

In this section are all of the functions that are used in the demonstration. They can be divided into two categories:

1. Image Loading: The two functions relating to loading an image's data from its filepath

2. Vectors: Creating a vector from an image, creating a dataframe from a vector

### Image Loading

Functions relating to loading an image

#### get_base_filepath()

Access the filepath for the base folder of the project

**Input**: None

**Output**: The filepath to the root of the folder

In [2]:
def get_base_filepath():
    '''
    Access the filepath for the base folder of the project
    
    Input: None
    
    Output: The filepath to the root of the folder
    '''
    # Get current directory
    os.path.abspath(os.curdir)

    # Go up a directory level
    os.chdir('..')
    os.chdir('..')

    # Set baseline filepath to the project folder directory
    base_folder_filepath = os.path.abspath(os.curdir)
    return base_folder_filepath

#### get_image_data()

Access the floating point data of an image

**Input**: Filepath to the image

**Output:** The image's floating point data

In [3]:
def get_image_data(filepath):
    '''
    Access the floating point data of an image
    
    Input: Filepath to the image
    
    Output: The image's floating point data
    '''
    img = nib.load(filepath)
    data = img.get_fdata()
    return data

### Vectors

Functions relating to vectors. This includes creating a vector and creating a dataframe from a vector

#### image_to_vector()

Create a vector from a region by time matrix from a image using the atlas

**Input:**

- Data for the image to take points of
    
- Data from the atlas to apply to the image data

**Output:** A vector of the image's region by time matrix

In [8]:
def image_to_vector(image_data, atlas_data):
    '''
    Create a vector from a region by time matrix from a image using the atlas
    
    Input:
        - Data for the image to take points of
        - Data from the atlas to apply to the image data
    
    Output: A vector of the image's region by time matrix
    '''
    # Names for columns and index
    column_names = ['time_' + str(i) for i in range(247)]
    region_names = ['region_' + str(region) for region in np.unique(atlas_data.reshape(-1))]
    
    # Dataframe containing image data with correct column names
    df_times = pd.DataFrame(img_data.reshape(-1, 247), columns = column_names)
    
    # Dataframe containing image with addition of atlas region
    df_full = pd.concat([pd.Series(atlas_data.reshape(-1)), df_times], axis=1)
    df_full = df_full.rename(columns={0:'atlas_region'})
    
    # Dataframe of region vs. time
    regions_x_time = df_full.groupby('atlas_region').mean()
    regions_x_time.index = region_names
    
    # Return vector of region x time dataframe
    regions_x_time_vector = regions_x_time.to_numpy().reshape(-1)
    return regions_x_time_vector

#### vector_to_dataframe()

Create a region by time dataframe from a vector

**Input:** A vector containing the image data (condensed region by time dataframe)

**Output:** A region by time dataframe 

In [5]:
def vector_to_dataframe(image_vector):
    '''
    Create a region by time dataframe from a vector
    
    Input: A vector containing the image data (condensed region by time dataframe)
    
    Output: A region by time dataframe
    '''
    # Names for columns and index
    column_names = ['time_' + str(i) for i in range(247)]
    region_names = ['region_' + str(region) for region in np.unique(atlas_data.reshape(-1))]
    
    # Turn vector into dataframe
    image_array = image_vector.reshape(117,247)
    df_vector = pd.DataFrame(data=image_array)

    # Change columns and index to be more clear
    df_vector.columns = column_names
    df_vector.index = region_names
    
    return df_vector

## Demonstration

Show how the functions are used on a real image and atlas. 

Confirm that these functions output the correct result.

### Get Image Data

Load the image data and atlas data using the filepaths

In [6]:
# Get filepaths to the image and atlas
base_folder_filepath = get_base_filepath()
img_filepath = base_folder_filepath + '\\Data\\Preprocessed_data\\Brown\\0026001\\sfnwmrda0026001_session_1_rest_1.nii.gz'
atlas_filepath = base_folder_filepath + '\\Data\\Atlases\\aal_mask_pad.nii.gz'

# Get the data for the image and atlas
img_data = get_image_data(img_filepath)
atlas_data = get_image_data(atlas_filepath)

### Get vector

Get the vector for the region by time matrix created when applying the atlas to the image

In [9]:
# Get vector
img_vector = image_to_vector(img_data, atlas_data)

# View first five items
img_vector[:5]

array([-0.01009854,  0.01349824,  0.03575692,  0.04519964,  0.03926589])

### Get dataframe

Use the vector in the previous cell to restore it to a dataframe

In [10]:
# Get dataframe
img_df = vector_to_dataframe(img_vector)

# View first 5 rows
img_df.head()

Unnamed: 0,time_0,time_1,time_2,time_3,time_4,time_5,time_6,time_7,time_8,time_9,...,time_237,time_238,time_239,time_240,time_241,time_242,time_243,time_244,time_245,time_246
region_0.0,-0.010099,0.013498,0.035757,0.0452,0.039266,0.024453,0.009799,-0.000222,-0.007501,-0.016107,...,-0.102615,-0.094255,-0.057972,-0.009564,0.034287,0.063249,0.074109,0.067912,0.048045,0.02073
region_2001.0,1.692365,1.563411,0.747202,-0.273904,-0.974473,-1.128608,-0.871923,-0.498292,-0.211244,-0.035654,...,3.54347,3.464397,1.997852,0.070382,-1.454879,-2.283037,-2.631729,-2.791547,-2.768334,-2.328003
region_2002.0,0.659177,0.639452,0.420765,0.308812,0.456353,0.720466,0.817896,0.623798,0.325246,0.276293,...,4.31679,4.555073,2.90681,0.384342,-1.838814,-3.144503,-3.596402,-3.5509,-3.215132,-2.548798
region_2101.0,0.151973,0.418374,1.303733,2.33776,2.914812,2.743485,2.029023,1.249384,0.728115,0.375654,...,-1.850897,-2.117193,-2.08932,-1.978564,-1.946051,-1.894787,-1.548241,-0.748108,0.310491,1.147373
region_2102.0,0.143743,0.354073,0.600906,0.815291,0.900404,0.797972,0.532047,0.195487,-0.106767,-0.310437,...,-0.734564,-0.697949,-0.348394,0.165908,0.609005,0.812228,0.764356,0.582065,0.399574,0.271834


### Compare results

Based on the limited view of the vector and dataframe, it appears that the vector was correctly restored to a dataframe.

This dataframe is the same as the one in Initial_Atlas_Application_2023.6.8 where this task was first completed. 
From these metrics, this notebook successfully created a vector from region by time matrix of the image and then restored that dataframe from the vector