# Visualizing File Changes Across Commits

This notebook analyzes differences in files across multiple tardis-regression-data commits. There are two main approaches to get tardis-regression-data commits:

### Method 1: Run pytest on tardis commits and generate regression commits (False commits)

To fetch tardis commits, you have three options:

- Run pytest on latest n tardis commits
- Run pytest on str or list of multiple tardis commits

### Method 2: Directly use tardis-regression-data repo commits

To get those commits, you have two options:

- Manually provide a list of multiple tardis-regression-data commits
- Get last n tardis-regression-data commits

### Note:
By default this notebook runs pytest on latest n tardis commits and generates falsey regression commits to analyze difference.

In [None]:
from tardisbase.testing.regression_comparison.run_tests import run_tests
from tardisbase.testing.regression_comparison.visualize_files import MultiCommitCompare
from tardisbase.testing.regression_comparison.util import get_last_n_commits
import pandas as pd

Display Configuration

In [None]:
# Configure pandas display options for better visualization
pd.set_option('display.max_colwidth', None)
pd.set_option('display.width', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

Setup Configuration

In [None]:
# Configuration for the analysis
config = {
    "tardis_repo_path": "/home/riddhi/workspace/tardis-main/tardis",
    "regression_data_repo_path": "/home/riddhi/workspace/tardis-main/tardis-regression-data",
    "branch": "master",
    "n": 3, # Last n commits   
    # "commits": ["300e565e83112528faaa76e970057ffb1b13f743", "2a06fdfb60190bbd9b49ff572d78772607138660", "2d775dcd1c486227532f537fc41066e942000e56"],  # Uncomment for specific commits
    "conda_manager": "conda"
}

## Method 1: Run pytest on tardis commits to generate falsey regression data commits

### Case 1: Test latest N TARDIS commits
Important Note: 
1. Comment out `commits` from config
2. Provide the value of `n` in config
3. To forcely recreate new enviornment each time even when enviornment already exist, do `force_recreate` as `True`
4. Either provide entire "tardis" module or selective path like "tardis/spectrum/tests/test_spectrum_solver.py" in `test_path`
5. Provide path to default current enviornment in `default_curr_env`
6. If you want to use default current enviornment only without creating new enviornment each time, do `use_new_envs` as `False`.

In [None]:
processed_commits, regression_commits, original_head = run_tests(
        **config,
        force_recreate=True,
        test_path="tardis/spectrum/tests/test_spectrum_solver.py",
        default_curr_env="/home/riddhigangbhoj/miniforge3/envs/tardis-master",
        use_new_envs=True
    )

### Case 2: Test specific TARDIS commits

Important Note:
1. Comment out `n` from config
2. `commits_input` is list of `commits` hashes from config 
3. If tardis commits provided are [1,2,3,4] then the comparison would be ["2-1","3-2","4-3"] of the respective regression commits. So, make the list accordingly.
4. To forcely recreate new enviornment each time even when enviornment already exist, do `force_recreate` as `True`
5. Either provide entire "tardis" module or selective path like "tardis/spectrum/tests/test_spectrum_solver.py" in `test_path`
6. Provide path to default current enviornment in `default_curr_env`
7. If you want to use default current enviornment only without creating new enviornment each time, do `use_new_envs` as `False`.

In [None]:
# processed_commits, regression_commits, original_head = run_tests(
#     **config, 
#     commits_input=config["commits"],
#     conda_manager=config["conda_manager"],
#     force_recreate=True,
#     test_path="tardis/spectrum/tests/test_spectrum_solver.py",
#     default_curr_env="/home/riddhigangbhoj/miniforge3/envs/tardis-master",
#     use_new_envs=True
# )

## Method 2: Use existing regression data commits

### A.  Manual Commit Selection
Note:
1. No need to run pytest for this.
2. If commits provided are [1,2,3,4] then the comparison would be ["2-1","3-2","4-3"]. So, make the list accordingly.

In [None]:
# regression_commits = ["66a96a847c873544babb7bf934040c86433a5962",
#                       "d12d869bd2bb2038c9090852ee9ef998959f412d",
#                       "b008a7180440a697ad5b54a9f77b692d4f71b120",
#                       "a2a946a43d710c44bb3b08bcae69359fe13ed032",
#                       "9404dc594563d9457e3ba91fcaa8400cae231801"]

### B.  Automatically fetch the most recent N commits from regression data repository
Note:
1. No need to run pytest for this.
2. Set `n` to the number of recent regression commits you want to fetch.

In [None]:
# regression_commits = get_last_n_commits(n=2, repo_path=config["regression_data_repo_path"])
# regression_commits

## Visualize File Changes
Create a visualizer object to analyze file changes across commits.
Note:
1. Uncomment and set `file_extensions` to any type of file type to filter specific files.
2. Choose `compare_function` of your choice either "git_diff" or "cmd_diff"
    - 'git_diff': Uses git's built-in diff functionality to compare files
          directly within the repository.
    - 'cmd_diff': Extracts files to temporary locations and uses the
          system's diff command.

#### Case 1: Direct regression data commits (no TARDIS commits)
Use when you are directly providing regression commits.


In [None]:
# visualizer = MultiCommitCompare(
#     regression_repo_path=config["regression_data_repo_path"],
#     commits=regression_commits,
#     # file_extensions=('.h5', '.hdf5') # Uncomment to filter specific files
#     compare_function="git_diff"
# )


#### Case 2: Regression data commits generated from TARDIS commits
Use when you are providing tardis comits.
Note:
1. These regression commits are falsey commits(created just for testing).

In [None]:
visualizer = MultiCommitCompare(
    regression_repo_path=config["regression_data_repo_path"],
    commits=regression_commits,
    tardis_commits=processed_commits,
    tardis_repo_path=config["tardis_repo_path"],
    # file_extensions=('.h5', '.hdf5') # Uncomment to filter specific files
    compare_function="git_diff"
)

### Analyze the commits

In [None]:
visualizer.analyze_commits()

### Display the file change matrix 

In [None]:
commit_info, legend, matrix = visualizer.get_analysis_results()

In [None]:
commit_info

In [None]:
legend

In [None]:
matrix