## Data Processing

We now have physical data. How the radiative transfer code works is that a set of parameters, including the log stellar mass, log dust mass, etc, are fed to a SKIRT model, which produces a .h5 file containing values for flux, half-radius, and sersic index against wavelength: at each angle of inclination. There're 20 angles of inclination that the model considers.

I've got 6 .h5 files. Each file was given different values for log stellar and dust mass. My objective here is to merge all of the data in those files into one csv. My intention is to feed the resulting csv into the model we've developed previously, to see if it can predict the same values that the classical model does.

I'll start with module importation.

In [217]:
import pandas as pd
import numpy as np

In [218]:
def read_params(filename: str, filepath: str = '../../data/radiative_transfer/input/'):
    """
    Reads parameters from a specified file and returns them as a dictionary.

    The function reads a text file where each line contains a parameter in the format:
    `key = value # optional comment`. The function parses these lines to extract the keys 
    and values, ignoring any text following a '#' as a comment.

    Parameters:
    - file (str, optional): The path to the file containing the parameters. 
    Default is '../../data/radiative_transfer/input/parameters.txt'.

    Returns:
    - dict: A dictionary where each key-value pair corresponds to a parameter 
    and its respective value. If a line contains a comma-separated list of values, 
    they are converted to a NumPy array. If the value is a single number (except for 
    the 'theta' parameter), it is converted to a float.

    Note:
    - This function assumes that each parameter is defined only once in the file.
    - The function is designed to handle special cases where the value is a list 
    (converted to a NumPy array) or a single float. The exception is the 'theta' 
    parameter, which is always treated as a NumPy array.
    """

    lines = open(filepath+filename, 'r').readlines()

    keys = []
    values = []
    for i in range(len(lines)):
        
        line_i = lines[i]
        line1 = line_i.split('\n')[0]
        line2 = line1.split('#')[0]
        line3 = line2.split('=')
        line4 = []
        for j in range(len(line3)):
            line4.append( line3[j].strip(' ') )
        
        if len(line4) == 2:
            keys.append(line4[0])
            line5 = line4[1].split(', ')
            line5 = np.array(line5).astype(float)
            if len(line5) == 1 and line4[0]!='theta':
                line5 = line5[0]
            values.append(line5)

    table = dict(zip(keys, values) )
    return table

In [219]:
def read_h5_file(filename: str, thetas, log_mstar, log_mdust_over_mstar, filepath: str = '../../data/radiative_transfer/output/'):
    """
    Reads HDF5 files and compiles data into a single DataFrame with additional parameters.

    Parameters:
    - filename (str): The name of the HDF5 file to be read.
    - thetas (array-like): An array of viewing angles corresponding to each entry in the HDF5 file.
    - log_mstar (float): Logarithmic value of stellar mass.
    - log_mdust_over_mstar (float): Logarithmic value of the dust mass over stellar mass ratio.
    - filepath (str, optional): Path to the directory containing the HDF5 file. Defaults to '../../data/radiative_transfer/output/'.

    Returns:
    - pd.DataFrame: A DataFrame containing wavelength, flux, half-light radius, Sersic index, viewing angle, logarithm of stellar mass, and logarithm of dust mass over stellar mass ratio.

    This function iterates over keys in the HDF5 file, extracts relevant data, and compiles it into a comprehensive DataFrame, adding constant parameters for stellar mass and dust mass ratios.
    """
    
    filepath += filename 
    print(filepath)

    # Finding hdf keys
    hdf_keys = np.array([])
    with pd.HDFStore(filepath, 'r') as hdf:
        hdf_keys = np.append(hdf_keys, hdf.keys())

    big_df = pd.DataFrame(columns=['wvl', 'flux', 'r', 'n', 'theta', 'log_mstar', 'log_mdust_over_mstar'])

    for i in range(len(hdf_keys)):
        table = pd.read_hdf(filepath, hdf_keys[i]) # Face-on view
        wvl = table['wvl'].to_numpy() # rest-frame wavelength [micron]
        flux = table['flux'].to_numpy() # flux [W/m^2]
        r = table['r'].to_numpy() # half-light radius [kpc]
        n = table['n'].to_numpy() # Sersic index
        theta = np.array([thetas[i] for _ in range(len(wvl))]) # viewing angle [degrees]

        df = pd.DataFrame({'wvl': wvl, 'flux': flux, 'r': r, 'n': n, 'theta': theta})
        big_df = pd.concat([big_df, df], ignore_index=True)

    big_df['log_mstar'] = np.array([log_mstar for _ in range(len(big_df))])
    big_df['log_mdust_over_mstar'] = np.array([log_mdust_over_mstar for _ in range(len(big_df))])

    return big_df.reset_index(drop=True)

In [220]:
def read_parameter_files(filenames: list, filepath: str = "../../data/radiative_transfer/input/"):
    """
    Reads multiple parameter files and extracts key information.

    Parameters:
    - filenames (list): A list of filenames for the parameter files to be read.
    - filepath (str, optional): Path to the directory containing the parameter files. Defaults to "../../data/radiative_transfer/input/".

    Returns:
    - tuple: A tuple containing three arrays - list_log_mstar, list_log_mdust_over_mstar, and list_theta. 
      - list_log_mstar (numpy.ndarray): Array of logarithmic stellar mass values.
      - list_log_mdust_over_mstar (numpy.ndarray): Array of logarithmic dust mass over stellar mass ratio values.
      - list_theta (numpy.ndarray): Array of viewing angles.

    The function iterates over each file, reads its parameters, and compiles key data into arrays for further processing.
    """

    list_log_mstar = np.array([])
    list_log_mdust = np.array([])
    list_theta = np.array([])

    for filename in filenames:
        table = read_params(filename, filepath)
        list_log_mstar = np.append(list_log_mstar, table['logMstar'])
        list_log_mdust = np.append(list_log_mdust, table['logMdust'])
        list_theta = np.append(list_theta, table['theta'])

    list_log_mdust_over_mstar = list_log_mdust - list_log_mstar

    return list_log_mstar, list_log_mdust_over_mstar, list_theta


In [221]:
#obtaining logs of stellar mass, and ratio of dust to stellar mass
parameter_files = [f"parameters{i}.txt" for i in range(1, 7)]
h5_files = [f"data{i}.h5" for i in range(1, 7)]
list_log_mstar, list_log_mdust_over_mstar, list_theta = read_parameter_files(parameter_files)

In [222]:
final_df = pd.DataFrame(columns=["theta", "wvl", "flux", "r", "n", "log_mstar", "log_mdust_over_mstar"])
for i in range(len(h5_files)):
    output = read_h5_file(h5_files[i], list_theta, list_log_mstar[i], list_log_mdust_over_mstar[i])
    # output.to_csv(f"../../data/radiative_transfer/output/data{i+1}.csv", index=False)
    final_df = pd.concat([final_df, output], ignore_index=True)

final_df.to_csv("../../data/radiative_transfer/output/data.csv", index=False)

../../data/radiative_transfer/output/data1.h5
../../data/radiative_transfer/output/data2.h5
../../data/radiative_transfer/output/data3.h5
../../data/radiative_transfer/output/data4.h5
../../data/radiative_transfer/output/data5.h5
../../data/radiative_transfer/output/data6.h5
