# Results
Compare results and create visualization

In [None]:
import os
import pickle

import matplotlib.pyplot as plt
import seaborn as sns

import numpy as np
import scipy.stats as stats

Set path to the pickle result files

In [None]:
PATH = "D:/Informatik/Projekte/torch-mask-rcnn-instance-segmentation/output/evaluations"
os.listdir(PATH)

Loading all results
(only from one dataset?)

In [None]:
results = dict()
for cur_file in os.listdir(PATH):
    cur_path = os.path.join(PATH, cur_file)
    cur_name = ".".join(cur_file.split(".")[:-1])

    # get the result
    if cur_path.endswith(".pkl"):
        with open(cur_path, "rb") as file:  
            loaded_dict = pickle.load(file)
        results[cur_name] = dict()
        for key, value in loaded_dict.items()
            results[cur_name][key] = value

### Compare Significants of change 

Test of normal distribution

In [None]:
normality_results = {}
for name, data in results.items():
    iou_values = data.get("intersection over union")

    if iou_values:
        shapiro_stat, shapiro_p = stats.shapiro(iou_values)
        normality_results[name] = (shapiro_stat, shapiro_p)
        if shapiro_p > 0.05:
            print(f"{name}: IOU-values are normally distributed (p-Wert = {shapiro_p:.3f})")
        else:
            print(f"{name}: IOU-values are NOT normally distributed (p-Wert = {shapiro_p:.3f})")

all_normal = all(p > 0.05 for _, p in normality_results.values())
if all_normal:
    print("\n\n    -> All results are normally distributed, t test will be applied.")
else:
    print("\n\n    -> NOT All results are normally distributed, Wilcoxon test will be applied.")

Let the test beginn

In [None]:
keys = list(results.keys())
num_results = len(keys)
p_values_matrix = np.ones((num_results, num_results))  # Initialisiere mit 1 (Diagonale wird sp�ter 0)

# pairwaise t test
for i in range(num_results):
    for j in range(i + 1, num_results):
        iou_1 = results[keys[i]]["intersection over union"]
        iou_2 = results[keys[j]]["intersection over union"]
        
        # making a t test o wilcoxon test
        if all_normal:
            _, p_value = stats.ttest_rel(iou_1, iou_2)
        else:
            _, p_value = stats.wilcoxon(iou_1, iou_2)

        p_values_matrix[i, j] = p_value

# plotting of p values of all paris
plt.figure(figsize=(10, 8))
mask = np.triu(np.ones_like(p_values_matrix, dtype=bool)) 
sns.heatmap(p_values_matrix, mask=mask, annot=True, fmt=".3f", cmap="coolwarm", 
            xticklabels=keys, yticklabels=keys, cbar_kws={"label": "p-Wert"})

plt.title("P-Values test of IOU significants")
plt.show()

Here is a detailed explanation for interpreting the results from the half heatmap plot of p-values:

1. **Understanding p-values in the Context of Statistical Significance**:
   - Each cell in the heatmap represents the p-value from a pairwise statistical test comparing the Intersection over Union (IOU) results of two experiments.
   - A **p-value** indicates the probability that the observed difference between two sets of IOU results is due to random chance. Lower p-values suggest a significant difference, while higher p-values suggest that the difference may not be meaningful.

2. **Heatmap Color Coding**:
   - **Dark Colors (Low p-values)**: These cells represent pairs of experiments where the IOU results are significantly different. For example, if a cell has a p-value < 0.05, it means that there is less than a 5% chance that the observed difference is random. This suggests that the experimental modifications in those pairs likely lead to a real change in IOU performance.
   - **Light Colors (High p-values)**: Cells with higher p-values (e.g., > 0.05) indicate pairs of experiments where the difference in IOU is not statistically significant. In other words, the observed difference could likely be due to random variation, and there is not enough evidence to conclude a true performance difference.

3. **Diagonal and Masked Elements**:
   - The **diagonal cells** (where each experiment is compared to itself) are masked because they naturally have a p-value of 1.0 (no difference exists when comparing a result to itself).
   - The **upper half** of the matrix is masked to simplify the plot, as the p-values are symmetrical; each pairwise comparison only needs to be shown once.

4. **Choosing Significance Thresholds**:
   - Typically, a **threshold of 0.05** is used to denote significance (darker colors in cells below this threshold), but some researchers may apply more stringent thresholds (e.g., 0.01 or 0.001) depending on their confidence requirements.
   - If a significant number of cells show p-values below 0.05, it suggests that many experimental adjustments may genuinely affect IOU performance. Conversely, if most p-values are above 0.05, it may indicate that the experimental variations do not lead to meaningful differences in IOU performance.

5. **Further Considerations and Practical Implications**:
   - A **cluster of dark cells** for certain experiments suggests that those specific experiments may have systematically different IOU results compared to others. This can be valuable in identifying which experimental adjustments are particularly impactful.
   - The presence of high p-values (light cells) between multiple experiments implies that those experimental variations produce similar IOU results, indicating that changing those factors may not significantly impact model performance.
   - **Interpret the direction and magnitude** of significant differences separately from the p-values, as the heatmap itself only tells us whether the difference is statistically significant, not whether it improves or worsens IOU.

By using this heatmap interpretation, you can effectively determine which experiments lead to meaningful changes in performance, guiding future adjustments or fine-tuning based on significant outcomes.

### Visualize Results

In [None]:
# Calculate mean IOU for each experiment
mean_results = {name: np.mean(data["intersection over union"]) for name, data in results.items()}

# Plot the mean results in a bar chart using subplots
fig, ax = plt.subplots(figsize=(10, 6))
ax.bar(mean_results.keys(), mean_results.values(), color="skyblue")
ax.set_xlabel("Experiment")
ax.set_ylabel("Mean IOU")
ax.set_title("Mean IOU Results by Experiment")
ax.set_xticklabels(mean_results.keys(), rotation=45, ha="right")

# Show the plot
plt.tight_layout()
plt.show()

... more visualizations

---