# Image Reduction Pipeline Introduction

<b>Description:</b> This pipeline is meant to be part of a three-step process for performing timing analysis on observations of a target time-variable object. The first step of the process was to reduce FITS files of the object by debiasing, dark-subtracting, and flat-fielding. For the pipeline corresponding to the first step, refer to <i>image_reduction_pipeline.ipynb</i> (this file). The second step of the process is to perform aperture photometry on the reduced FITS files and extract the apparent magnitude(s) for the desired object(s). For the pipeline corresponding to the second step, refer to <i>aperture_photometry_pipeline.ipynb</i>. The third step of this process is to perform the timing analysis on the extracted magnitudes themselves. For the pipeline corresponding to the third step, refer to <i>timing_analysis_pipeline.ipynb</i>.

<b>This Jupyter Notebook file will perform the first step of this process: reduction of the unprocessed FITS files</b>. This pipeline is meant to debias, dark-subtract, and flatten all your fits files of an astronomical object. 

<b>Benefits and Limitations:</b> You can run this pipeline on directories containing multiple bandpasses, and the pipeline will automatically sort out the fits files by bandpass and reduce them accordingly. However, the limitation of this pipeline is that it does not automatically sort out fits file by night, so you will have to feed in your image files, bias files, dark files, and flat files separately by night into this pipeline. Another limitation is that it does not automatically trim the science files; you will manually have to do that separately before feeding in your fits files into this pipeline.

<b>Output:</b> This pipeline will automatically create local subdirectories within your current directory. Each subdirectory corresponds to a different bandpass.

## How to Run This File

### The command line arguments to run this Python file is written below:
#### python image_reduction_pipeline.py dir_raw_fits dir_bias dir_flats dir_darks=None
<ul>
    <li>The first argument is the name of this python file (image_reduction_pipeline.py).
    <li>The second argument is the PATH of the directory containing all your raw fits files.
    <li>The third argument is the PATH of the directory containing all your bias files.
    <li>The fourth argument is the PATH of the directory containing all your flat files.
    <li>The (optional) fifth argument is the PATH of the directory containing all your dark files.
</ul>

### Assumptions Implicit in This Pipeline
<ul>
    <li>This pipeline works assuming that all the headers of your fits files are properly labeled with their bandpasses. 
    <li>This pipeline also assumes that your fits files are directly within the directories fed into the command line, instead of being separated within subdirectories. The advantage of this option is that you can put all the fits files you don't want involved in the image reduction process into a subdirectory of the directories you feed into the command line, and the pipeline will not read in these "undesired" fits files.
    <li>This pipeline assumes that you are only inputting data for one night into the command line (as opposed to inputting data for multiple nights into the command line). This is because the pipeline cannot separate fits files by night when reducing the image files.
</ul>

## Importing Packages and Setting Initial Conditions

In [1]:
# Importing packages and setting initial conditions
import numpy as np
import warnings
import os
import sys
import shutil
from astropy.io import fits

np.set_printoptions(threshold=np.inf)
#os.chdir("test_dirs_Dec")

## Reading in Files from Command Line Directories

In [2]:
# Just for testing purposes
while len(sys.argv) < 4:
    sys.argv.append("")
sys.argv[1] = "C:\\Users\\baske\\Downloads\\A0620 Data\\test_dirs_Dec\\raw_fits_dir"
sys.argv[2] = "C:\\Users\\baske\\Downloads\\A0620 Data\\test_dirs_Dec\\bias_dir"
sys.argv[3] = "C:\\Users\\baske\\Downloads\\A0620 Data\\test_dirs_Dec\\flats_dir"

Defining a helper function that returns whether a file's extension is '.fits' or not.

In [3]:
def has_fits_file_extension(filename):
    '''A function that returns whether a file's extension is '.fits' or not.'''
    return filename.split(".")[-1] == "fits"

Reading in the directories and filenames for the raw fits files, the bias files, the flat files, and (optionally) the dark files.

In [4]:
if len(sys.argv) == 1:
    raise TypeError("timing_analysis_pipeline.py missing three positional arguments: 'raw_fits_dir', 'bias_dir', and 'flats_dir'")
if len(sys.argv) == 2:
    raise TypeError("timing_analysis_pipeline.py missing two positional arguments: 'bias_dir', and 'flats_dir'")
if len(sys.argv) == 3:
    raise TypeError("timing_analysis_pipeline.py missing one positional argument: 'flats_dir'")

# If sys.argv has four indices, that means the command line inputted in the raw_fits_dir, bias_dir, and the flats_dir.
raw_fits_filenames = list(filter(has_fits_file_extension, os.listdir(sys.argv[1])))
bias_filenames = list(filter(has_fits_file_extension, os.listdir(sys.argv[2])))
flat_filenames = list(filter(has_fits_file_extension, os.listdir(sys.argv[3])))

dark_filenames = []
# If sys.argv has five indices, that means the command line inputted in the raw_fits_dir, bias_dir, flats_dir, and the darks_dir.
if len(sys.argv) == 5:
    dark_filenames = list(filter(os.path.isfile, os.listdir(sys.argv[4])))
    
# If sys.argv has more than five indices, that means the command line inputted in too many parameters.
if len(sys.argv) > 5:
    raise TypeError(f"timing_analysis_pipeline.py takes from 3 to 4 positional arguments, but {len(sys.argv)} were given")

Finding the files in the current directory and obtaining their data and headers.

In [44]:
raw_fits_headers = []
raw_fits_data = []
for fits_filename in raw_fits_filenames:
    with fits.open(f"{sys.argv[1]}/{fits_filename}") as hdu:
        raw_fits_headers.append(hdu[0].header)
        raw_fits_data.append(hdu[0].data)

bias_headers = []
bias_data = []
for bias_filename in bias_filenames:
    with fits.open(f"{sys.argv[2]}/{bias_filename}") as hdu:
        bias_headers.append(hdu[0].header)
        bias_data.append(hdu[0].data)

flat_headers = []
flat_data = []
for flat_filename in flat_filenames:
    with fits.open(f"{sys.argv[3]}/{flat_filename}") as hdu:
        flat_headers.append(hdu[0].header)
        flat_data.append(hdu[0].data)

dark_headers = []
dark_data = []
for dark_filename in dark_filenames:
    with fits.open(f"{sys.argv[4]}/{dark_filename})") as hdu:
        dark_headers.append(hdu[0].header)
        dark_data.append(hdu[0].data)

## Separating Files by Filter

Generates list of all the filters in the fits files of the raw_fits_dir directory.

In [46]:
raw_fits_filters = np.unique(np.array([raw_fits_headers[i]["FILTER"] for i in range(len(raw_fits_headers))]))

Separates out the data for the raw fits files, bias files, flat files, and dark files (if any) into their respective filters.

In [47]:
raw_fits_dict = {}
#bias_dict = {}
flat_dict = {}
#dark_dict = {}
for bandpass in raw_fits_filters:
    raw_fits_data_per_bandpass = []
    for i in range(len(raw_fits_headers)):
        if raw_fits_headers[i]["FILTER"] == bandpass:
            raw_fits_data_per_bandpass.append([raw_fits_filenames[i], raw_fits_data[i]])
    raw_fits_dict[bandpass] = raw_fits_data_per_bandpass

    # Apparently for bias files, you don't need to separate them by bandpass (?)
    #bias_data_per_bandpass = []
    #for i in range(len(bias_headers)):
    #    if bias_headers[i]["FILTER"] == bandpass:
    #        bias_data_per_bandpass.append(bias_data[i])
    #bias_dict[bandpass] = bias_data_per_bandpass
    
    flat_data_per_bandpass = []
    for i in range(len(flat_headers)):
        if flat_headers[i]["FILTER"] == bandpass:
            flat_data_per_bandpass.append([flat_filenames, flat_data[i]])
    flat_dict[bandpass] = flat_data_per_bandpass
    
    # You might not need to separate dark files by bandpass either
    #dark_data_per_bandpass = []
    #for i in range(len(dark_headers)):
    #    if dark_headers[i]["FILTER"] == bandpass:
    #        dark_data_per_bandpass.append(dark_data[i])
    #dark_dict[bandpass] = dark_data_per_bandpass

## Creating the Bias and Dark Files

Median combining the bias and darks, and subtracting the median-combined bias from the median-combined dark.

In [48]:
# Apparently for bias and dark files, you don't need to separate them by bandpass (?)
comb_bias = np.median(np.array(bias_data), axis=0) # Median combines all the bias files for a specific bandpass
comb_bias_median_per_row = np.median(comb_bias, axis=1) # Takes the median of each row in the median-combined bias file, and puts the list of medians (of each row) into a np.array

# Suppresses any RuntimeWarnings that may arise from combining dark files. This is a possibility because if you don't
#   input a dark_dir into the command line, np.median() will give back a RuntimeWarning.
def fxn():
    warnings.warn("mean of empty slice", RuntimeWarning)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    fxn()
    comb_dark = np.median(np.array(dark_data), axis=0) # Median combines all the dark files for a specific bandpass
    bias_subtracted_dark = np.subtract(comb_dark,np.reshape(comb_bias_median_per_row, (len(comb_bias_median_per_row),1))) # Subtracts the median of each row of the combined bias file from each row of the combined dark file
    comb_dark_median_per_row = np.median(bias_subtracted_dark, axis=1) # Takes the median of each row in the debiased combined dark file, and puts the list of medians (of each row) into a np.array

Writing the combined bias and dark file into current directory.

In [49]:
# Writes combined bias file to current directory.
hdu = fits.PrimaryHDU(comb_bias)
hdu.writeto(f"{os.getcwd()}/BiasComb.fits", overwrite=True)

# If any dark files were fed into the command line, writes the combined dark file to current directory
if not np.isnan(comb_dark):
    hdu = fits.PrimaryHDU(bias_subtracted_dark)
    hdu.writeto(f"{os.getcwd()}/DarkCombDebiased.fits", overwrite=True)

## Creating the Flat Files

Subtracting the flat files by both the combined bias and debiased combined dark file.

In [50]:
combined_flats_dict = {}
for bandpass in list(flat_dict.keys()):
    flat_bandpass_data = []
    for flat_filename, flat_data in flat_dict[bandpass]:
        # Debiases each flat file
        bias_subtracted_flat = np.subtract(flat_data,np.reshape(comb_bias_median_per_row, (len(comb_bias_median_per_row),1))) # Subtracts the median of each row of the combined bias file from each row of each flat file
        
        # If dark files are inputted into the command line, then subtract the debiased combined dark from each debiased flat file. Otherwise, just use the debiased flat file in further image processing steps.
        # Finds the mean of each flat and normalizes the flat by its corresponding mean
        if dark_filenames == []:
            flat_mean = np.mean(bias_subtracted_flat)
            flat_normalized = bias_subtracted_flat / flat_mean
        else:
            dark_and_bias_subtracted_flat = np.subtract(bias_subtracted_flat,np.reshape(comb_dark_median_per_row, (len(comb_dark_median_per_row),1))) # Subtracts the median of each row of the debiased combined dark file from each row of each flat file
            flat_mean = np.mean(dark_and_bias_subtracted_flat)
            flat_normalized = dark_and_bias_subtracted_flat / flat_mean
        flat_bandpass_data.append(flat_normalized)
    combined_flat = np.median(flat_bandpass_data, axis=0)
    combined_flats_dict[bandpass] = combined_flat

Writing the processed flat files into current directory.

In [51]:
# Writes combined bias file to current directory.
for bandpass in combined_flats_dict.keys():
    hdu = fits.PrimaryHDU(combined_flats_dict[bandpass])
    hdu.writeto(f"{os.getcwd()}/{bandpass}_FlatCombProcessed.fits", overwrite=True)

## Creating the Science Images

Subtracts the science images by both the combined bias and debiased combined dark file, and then flattens the science file by its corresponding flat file.

In [52]:
# Debiases, dark-subtracts, and flattens the science files
flattened_sci_dict = {}
for bandpass in list(raw_fits_dict.keys()):
    flattened_sci_per_bandpass_data = []
    for sci_filename, sci_data in raw_fits_dict[bandpass]:
        bias_subtracted_sci = np.subtract(sci_data,np.reshape(comb_bias_median_per_row, (len(comb_bias_median_per_row),1))) # Subtracts the median of each row of the combined bias file from each row of each science file
        # If dark files are inputted into the command line, then subtract the debiased combined dark from each debiased flat file. Otherwise, just use the debiased flat file in further image processing steps.
        if dark_filenames == []:
            flattened_sci = bias_subtracted_sci / combined_flats_dict[bandpass] # Flattens the science file
        else:
            dark_and_bias_subtracted_sci = np.subtract(bias_subtracted_sci,np.reshape(comb_dark_median_per_row, (len(comb_dark_median_per_row),1))) # Subtracts the median of each row of the debiased combined dark file from each row of each science file
            flattened_sci = dark_and_bias_subtracted_sci / combined_flats_dict[bandpass] # Flattens the science file
        flattened_sci_per_bandpass_data.append([sci_filename, flattened_sci])
    flattened_sci_dict[bandpass] = flattened_sci_per_bandpass_data

Writing all the processed science files into local directories labeled by bandpass.

In [55]:
for bandpass in list(flattened_sci_dict.keys()):
    bandpass_path_name = os.path.join(os.getcwd(), bandpass)
    try:
        os.mkdir(bandpass_path_name)
        for i in range(len(flattened_sci_dict[bandpass])):
            src_file_path = f"{sys.argv[1]}/{flattened_sci_dict[bandpass][i][0]}"
            out_file_path = f"{bandpass_path_name}/proc_{flattened_sci_dict[bandpass][i][0]}"
            shutil.copy(src_file_path, out_file_path)
            with fits.open(out_file_path, mode="update") as hdu:
                hdu[0].header["WCSNAME"] = "REDUCED"
                hdu[0].data = flattened_sci_dict[bandpass][i][1]
    except:
        for i in range(len(flattened_sci_dict[bandpass])):
            src_file_path = f"{sys.argv[1]}/{flattened_sci_dict[bandpass][i][0]}"
            out_file_path = f"{bandpass_path_name}/proc_{flattened_sci_dict[bandpass][i][0]}"
            shutil.copy(src_file_path, out_file_path)
            with fits.open(out_file_path, mode="update") as hdu:
                hdu[0].header["WCSNAME"] = "REDUCED"
                hdu[0].data = flattened_sci_dict[bandpass][i][1]