<a href="https://colab.research.google.com/github/ImagingDataCommons/CloudSegmentator/blob/main/workflows/TotalSegmentator/Notebooks/postProcessingExtractPerframe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **This notebook converts raw radiomics features in JSON format to a pandas dataframe. It takes the raw radiomics files in lz4 format as input, decompresses them, and flattens them to a dataframe, output a csv.lz4**

### **Installing Packages**

In [None]:
%%capture
import sys
if 'google.colab' in sys.modules:
    !sudo apt-get update \
    && apt-get install -y --no-install-recommends \
    lz4

### **Importing Packages**

In [None]:
import os
import subprocess
import json
import pandas as pd
from pandas import json_normalize

### **Parameters for papermill**

In [None]:
if 'google.colab' in sys.modules:
    !wget -q https://github.com/vkt1414/CloudSegmentator/releases/download/test/pyradiomicsRadiomicsFeatures.tar.lz4
    rawJsonRadiomicsFiles=["pyradiomicsRadiomicsFeatures.tar.lz4"]


### **This is the cell used on cloud, as the file paths are passed to the notebook as a string**

In [None]:
if not 'google.colab' in sys.modules:
    rawJsonRadiomicsFiles=rawJsonRadiomicsFiles.split(',')

In [None]:
def flatten_json(seriesInstanceUID, radiomics_file_path):
    # Load the JSON file
    with open(radiomics_file_path, 'r') as f:
        data = json.load(f)

    # Create an empty list to store DataFrames
    df_list = []

    # Iterate over the items in the dictionary and flatten each to a row
    for organ, properties in data.items():
        # Normalize the nested dictionary
        organ_df = json_normalize(properties)
        # Add SeriesInstanceUID
        organ_df['seriesInstanceUID'] = seriesInstanceUID
        # Add the organ name as a column
        organ_df['organ'] = organ
        # Append the result to the list
        df_list.append(organ_df)

    # Concatenate all DataFrames in the list
    df = pd.concat(df_list, ignore_index=True)

    return df

### **Convert Radiomics features in JSON to DataFrame, finally to a csv**

In [None]:
for rawJsonRadiomicsFile in rawJsonRadiomicsFiles:
    
    !lz4 -d --rm $rawJsonRadiomicsFile -c | tar  -xvf -
    
    # Main script to decompress files and flatten JSON
    all_dataframes = []  # List to store all DataFrames

    # Assuming 'radiomics' is a directory in the current working directory
    for dirpath, dirnames, filenames in os.walk('radiomics'):
        # The directory name is the seriesInstanceUID
        seriesInstanceUID = os.path.basename(dirpath)
        for file in filenames:
            if file.endswith('_raw.json.lz4'):
                # Construct the full file path
                file_path = os.path.join(dirpath, file)
                # Decompress the file using the lz4 command
                subprocess.run(['lz4', '-d', '--rm', file_path, file_path[:-4]], check=True)
                # Flatten the JSON file into a DataFrame
                df = flatten_json(seriesInstanceUID, file_path[:-4])
                # Add the DataFrame to the list
                all_dataframes.append(df)
                # Remove the decompressed file
                os.remove(file_path[:-4])

    # Concatenate all DataFrames in the list
    final_df = pd.concat(all_dataframes, ignore_index=True)

    # Remove the 'radiomics' directory and the tar file
    subprocess.run(['rm', '-r', 'radiomics'], check=True)

    final_df.to_csv('raw_radiomics.csv', index=False)
    !lz4 raw_radiomics.csv rawRadiomics.csv.lz4