# TARDIS Analysis Tool Documentation 📊
This tool is designed to analyze and compare TARDIS regression data across different commits. It provides visualization capabilities using Plotly for:

1. **Direct Spectrum Comparison**: Compares the luminosity vs wavelength plots across different commits
2. **Residual Analysis**: Shows the fractional differences between spectra from different commits



## Imports 📥

In [None]:
import os
from git import Repo
from tardis_analysis import (
    process_commits, load_h5_data, SPECTRUM_KEYS, PLOTLY_COLORS,
    commits
)
from tardis_analysis.combined_viz import plot_combined_analysis_plotly
import plotly.graph_objects as go
from plotly.subplots import make_subplots

## Configuration Settings ⚙️

The configuration dictionary defines essential parameters for the analysis:

- **Repository Paths**:
  - `tardis_repo`: Path to TARDIS repository
  - `regression_data_repo`: Path to regression data repository

- **Parameters**:
  - `branch`: Target branch for analysis (default: "master")
  - `n`: Number of commits to analyze (default: 2), will be ignored if commits list is provided
  - `target_file`: Path to HDF5 file containing spectrum solver test data
  - `save_plots`: Boolean flag to enable plot saving (default: True)
  - `gap`: Number of days between each commit
  - `info`: If True, returns a DataFrame with hash, author, and message; otherwise returns a list of hashes

- **Optional Parameters**:
  - `commits`: List of specific commit hashes (commented out by default)
  - `output_dir`: Custom output directory (defaults to "comparison_plots" in tardis_repo if None)

In [None]:
# commits.calculate_commits(n=10, gap=0, info=True)

In [None]:
config = {
    "tardis_repo": "/home/riddhi/workspace/tardis-main/tardis",
    "regression_data_repo": "/home/riddhi/workspace/tardis-main/tardis-regression-data",
    "branch": "master",
    "n": 3,
    "target_file": "tardis/spectrum/tests/test_spectrum_solver/test_spectrum_solver/TestSpectrumSolver.h5",
    "output_dir": None,
    "commits": ["300e565e83112528faaa76e970057ffb1b13f743", "2a06fdfb60190bbd9b49ff572d78772607138660", "2d775dcd1c486227532f537fc41066e942000e56"],  # Uncomment for specific commits
    # "commits": commits.calculate_commits(n=10, gap=0, info=False), #Uncomment for n commits with custom gap
    "save_plots": True
}

if config["output_dir"] is None:
    config["output_dir"] = os.path.join(config["tardis_repo"], "comparison_plots")

## Process Commits ↻

This code processes either specific commits or the latest `n` commits from the TARDIS repository. For each commit, it runs regression tests, generates reference data, and stores the results in the regression data repository. The function returns the processed commit hashes, corresponding regression commit hashes, original HEAD position, and the target file path.

It handles both specific commit inputs (via `commits` parameter) and default behavior (latest `n` commits), ensuring proper repository state management throughout the process.


In [None]:
if config.get("commits"):
    processed_commits, regression_commits, original_head, target_file_path = process_commits(
        config["tardis_repo"], config["regression_data_repo"], 
        config["branch"], config["target_file"], 
        commits_input=config["commits"]
    )
else:
    processed_commits, regression_commits, original_head, target_file_path = process_commits(
        config["tardis_repo"], config["regression_data_repo"], 
        config["branch"], config["target_file"], 
        n=config["n"]
    )

## Load HDF5 Data 📊

This code loads spectrum data from the regression repository for each processed commit. It initializes an empty list `commit_data`, then iterates through each regression commit to load the corresponding HDF5 data. For each commit, it checks out the commit, loads wavelength and luminosity data using `load_h5_data()`, and stores it in the list.
After processing all commits, it resets the repository to its original state and returns to the main branch.

In [None]:
commit_data = []
regression_repo = Repo(config["regression_data_repo"])

for reg_commit in regression_commits:
    regression_repo.git.checkout(reg_commit)
    commit_data.append(load_h5_data(target_file_path, SPECTRUM_KEYS))

regression_repo.git.reset('--hard', original_head)
regression_repo.git.checkout('main')

## Display Plots 📈
This cell creates and saves visualization plots using Plotly. It sets up a 2x2 subplot grid for spectrum analysis and generates both direct comparison and residual plots when `save_plots` is enabled. The plots are saved in the configured output directory as interactive HTML files.

### Spectrum Plot

The spectrum plot shows the relationship between wavelength ($\lambda$) and luminosity ($L$) for each commit:

$$L(\lambda) = f(\lambda)$$

where:
- $L(\lambda)$ is the luminosity at wavelength $\lambda$
- $f(\lambda)$ represents the spectral energy distribution function

### Residual Plot

The residual plot shows the fractional difference between spectra from different commits:

$$R(\lambda) = \frac{L_2(\lambda) - L_1(\lambda)}{L_1(\lambda)} \times 100\%$$

where:
- $R(\lambda)$ is the residual at wavelength $\lambda$
- $L_1(\lambda)$ is the luminosity from the reference commit
- $L_2(\lambda)$ is the luminosity from the comparison commit
- The result is expressed as a percentage difference

The code ensures the output directory exists before saving and handles both types of plots: luminosity vs wavelength comparisons and fractional residuals across commits.

In [None]:
output_dir = os.path.abspath(config["output_dir"])
fig = plot_combined_analysis_plotly(
    commit_data,
    SPECTRUM_KEYS,
    output_dir,
    commit_hashes=processed_commits
)

fig.show()