# == **Determination of Optical Density from FIJI analysis** ==




## Dictionary construction and use

In this Colab cell you can upload a text file named "treatment_dict.txt" that contains a valid Python dictionary literal (e.g., {"339": "CTR", "340": "ABC"}).

It will safely parses it with ast.literal_eval, and normalizes it to dict[str, str] and then prints how many pairs were loaded plus a short sample.

### How to create the file ("treatment_dict.txt")
1. Open a plain text editor (e.g., Notepad, VS Code).
2. Paste or create a valid Python dictionary literal with key–value pairs. Examples:
- {"339": "CTR", "340": "ABC"}
- {339: "CTR", 340: "ABC"} (numbers will be coerced to strings)

>Notes: use straight quotes (' or "), no trailing commas after the last item, and one top-level dictionary only.

3. Save the file exactly as "treatment_dict.txt" (UTF-8 encoding recommended).

### How to run

- Run the cell,
-  click “Choose Files”,
- select "treatment_dict.txt" file.

At the end, check the printed summary/ sample.

In [None]:
from google.colab import files
import ast
from pprint import pprint
import os
import glob

# ---- settings ----
EXPECTED_FILENAME = "treatment_dict.txt"

basename, _ = os.path.splitext(EXPECTED_FILENAME)
for path in glob.glob(f"/content/{basename}*.txt"):
    try:
        os.remove(path)
        # print(f"Removed old file: {path}")  # opcional para debug
    except FileNotFoundError:
        pass

# 1) Upload your .txt file (must be named treatment_dict.txt)
uploaded = files.upload()
if not uploaded:
    raise ValueError("No file uploaded.")
print(list(uploaded.keys()))

# Enforce the expected filename for clarity/reproducibility
if EXPECTED_FILENAME not in uploaded:
    available = ", ".join(uploaded.keys())
    raise ValueError(
        f"Expected a file named '{EXPECTED_FILENAME}', but got: {available or 'none'}.\n"
        f"Please rename your file to '{EXPECTED_FILENAME}' and try again."
    )

# 2) Read the uploaded file directly from memory (no disk I/O)
fname = EXPECTED_FILENAME
content_bytes = uploaded[fname]
text = content_bytes.decode("utf-8").strip()

# 3) Convert the Python literal to a dict (safe parse)
data = ast.literal_eval(text)
if not isinstance(data, dict):
    raise ValueError("The file must contain a Python dictionary literal, e.g., {'339':'CTR', ...}")

# 4) Optional normalization (ensures dict[str, str])
treatment_dict: dict[str, str] = {str(k): str(v) for k, v in data.items()}

print(f"Result: {len(treatment_dict)} pairs loaded from '{fname}'. Sample:")
for i, (k, v) in enumerate(treatment_dict.items()):
    print(f"{k} -> {v}")
    if i >= 4:
        break


Saving treatment_dict.txt to treatment_dict.txt
['treatment_dict.txt']
Result: 2 pairs loaded from 'treatment_dict.txt'. Sample:
rato1 -> CTR
Rato1 -> ABC


## Optic density determination
This pipeline lets you batch-process multiple ROI measurement files (tab-separated .csv/tsv with columns Label, Area, IntDen).

For each file, it extracts metadata from the Label (Sample ID, Tissue, Imaging channel), maps each Sample to a treatment using a dictionary you loaded earlier (treatment_dict.txt), filters out channel c0, converts metrics to numeric, computes optical density
>>> DO = IntDen / Area,

and writes a processed Excel per input file.

It also builds a global concatenated file and a per-sample summary with weighted OD (DOpond = sum(IntDen)/sum(Area)).

Finally, it zips the output folder and triggers a download.

### Prerequisites

You must have run the previous cell to load "treatment_dict.txt": dict[str, str] (e.g., {'339':'CTR','340':'LES',...}).

Your input files must be tab-separated with columns Label, Area, IntDen.
The Label must end with ..._c1 or ..._c2 so the script can infer the channel and assign protein names.

## How to use

1. Run the cell.
2. When prompted, type:
- An output folder name.
- The list of valid tissue / structures (Tissue), comma-separated (e.g., CA1,CA3,dDG,vDG).
- (Optional) valid sub-areas (comma-separated).
- The protein name shown in green (channel c1).
- The protein name shown in red (channel c2).
3. Upload one or more tab-separated files when asked.
4. Check the console messages for success/validation.
5. Download the final ZIP with all generated Excel files.

In [None]:
import pandas as pd
import os
import re
from google.colab import files
import shutil

# =====================================================
# 1. Global inputs
# =====================================================

output_folder = input("Enter the name of the folder where the files will be saved: ")
output_dir = f"/content/{output_folder}"
os.makedirs(output_dir, exist_ok=True)

Tissue_input = input("Enter all valid areas/structures (Tissue), comma-separated (e.g., CA1,CA3,dDG,vDG): ")
valid_Tissue_options = [x.strip() for x in Tissue_input.split(",") if x.strip()]

subarea_input = input("Enter valid sub-areas (between the two ':'), comma-separated (or leave empty): ")
valid_subarea_options = [x.strip() for x in subarea_input.split(",") if x.strip()]

protein_c1 = input("Define the protein visualized in green (channel c1): ").strip()
protein_c2 = input("Define the protein visualized in red (channel c2): ").strip()

print("Upload the files (tab-separated .csv/tsv with columns Label, Area, IntDen):")
uploaded = files.upload()
processed_dfs = []

# =====================================================
# 2. Normalize dictionary keys
# =====================================================

trat_by_name = {str(k).strip().lower(): v for k, v in treatment_dict.items()}
trat_by_digits = {}
for k, v in treatment_dict.items():
    m = re.search(r'\d+', str(k))
    if m:
        trat_by_digits[m.group(0)] = v

# =====================================================
# 3. Helper functions
# =====================================================

def extract_label_columns(label):
    parts_colon = label.split(":")
    useful_part = parts_colon[-1]
    out = {}

    # sample
    if "_" in useful_part:
        out["sample"] = useful_part.split("_", 1)[0]
    else:
        out["sample"] = None

    # Tissue
    parts = useful_part.split("_")
    if len(parts) >= 2:
        Tissue_candidate = parts[1]
        if Tissue_candidate in valid_Tissue_options:
            out["Tissue"] = Tissue_candidate
        else:
            out["Tissue"] = None
    else:
        out["Tissue"] = None

    # Subarea (token between '...tif:' and ':sample...')
    subarea_val = ""
    if len(parts_colon) >= 3:
        candidate = parts_colon[1].strip()
        if valid_subarea_options:
            if candidate in valid_subarea_options:
                subarea_val = candidate
        else:
            subarea_val = candidate
    out["Subarea"] = subarea_val

    # channels / protein
    if re.search(r'c1$', useful_part):
        out["chanel1"] = "c1"
        out["chanel2"] = None
        out["PROT"] = protein_c1
    elif re.search(r'c2$', useful_part):
        out["chanel1"] = None
        out["chanel2"] = "c2"
        out["PROT"] = protein_c2
    else:
        out["chanel1"] = None
        out["chanel2"] = None
        out["PROT"] = None

    return pd.Series(out)

def map_treatment_by_mouse(sample_val):
    if not sample_val:
        return "Unknown"
    key = str(sample_val).strip().lower()
    if key in trat_by_name:
        return trat_by_name[key]
    m = re.search(r'\d+', key)
    if m and m.group(0) in trat_by_digits:
        return trat_by_digits[m.group(0)]
    return "Unknown"

# =====================================================
# 4. Process each file
# =====================================================

for filename in uploaded.keys():
    print(f"\nProcessing: {filename}")

    try:
        df = pd.read_csv(filename, sep="\t", encoding="ISO-8859-1", dtype=str)
        print("Read successfully.")
    except Exception as e:
        print(f"Error reading file: {e}")
        continue

    required_columns = {"Label", "Area", "IntDen"}
    if not required_columns.issubset(df.columns):
        print(f"Missing required columns: {required_columns - set(df.columns)}")
        continue

    df_extractions = df["Label"].apply(extract_label_columns)
    df = pd.concat([df, df_extractions], axis=1)
    df["TRAT"] = df["sample"].apply(map_treatment_by_mouse)
    df = df[~df["Label"].str.endswith("c0", na=False)]
    df["Area"] = pd.to_numeric(df["Area"], errors="coerce")
    df["IntDen"] = pd.to_numeric(df["IntDen"], errors="coerce")
    df["DO"] = (df["IntDen"] / df["Area"]).round(2)

    base_name = os.path.splitext(filename)[0]
    output_path = os.path.join(output_dir, base_name + "_processed.xlsx")
    df.to_excel(output_path, index=False)
    print(f"Saved: {output_path}")

    df_final = df[["sample", "Tissue", "Subarea", "PROT", "Area", "IntDen", "DO", "TRAT"]]
    processed_dfs.append(df_final)

# =====================================================
# 5. Global post-processing
# =====================================================

if processed_dfs:
    df_concatenated = pd.concat(processed_dfs, ignore_index=True)
    concat_path = os.path.join(output_dir, "all_concatenated.xlsx")
    df_concatenated.to_excel(concat_path, index=False)
    print(f"\nConcatenated file saved to: {concat_path}")

    df_by_mouse = (
        df_concatenated
        .groupby(["sample", "Tissue", "Subarea", "PROT", "TRAT"])
        .agg({
            "Area": "sum",
            "IntDen": "sum",
            "DO": "mean"
        })
        .reset_index()
    )
    df_by_mouse["DOpond"] = df_by_mouse["IntDen"] / df_by_mouse["Area"]
    by_mouse_path = os.path.join(output_dir, "by_mouse.xlsx")
    df_by_mouse.to_excel(by_mouse_path, index=False)
    print(f"Per-mouse file saved to: {by_mouse_path}")
else:
    print("No valid file was processed.")

# =====================================================
# 6. Final ZIP + download
# =====================================================

zip_path = shutil.make_archive(
    base_name=output_dir,
    format='zip',
    root_dir=output_dir
)
files.download(zip_path)
print(f"\nZIP download started: {os.path.basename(zip_path)}")


## DO and DOpond Visualization

This cell generates **bar and box plots** from the **`by_mouse.xlsx`** file (output of the previous pipeline).

It visualizes **Optical Density (DO)** and **Weighted Optical Density (DOpond)** by experimental group (*TRAT*) and by brain area (*Tissue*) or subarea (*Subarea*, if available).


### Instructions

1. **Upload** your `by_mouse.xlsx` file when prompted.  
2. **Enter** a name for the output folder.  
3. The script will:
   - Load and summarize the data.  
   - Create **colorblind-friendly** bar and box plots.  
   - Display plots inline for review.  
   - Save all `.png` files and a `.zip` archive for download.

You can re-run this cell anytime to adjust colors, styles, or layouts interactively.


In [None]:
import os
from pathlib import Path
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import shutil
from google.colab import files

# 1) Upload
print("Please upload your 'by_mouse.xlsx' file:")
uploaded = files.upload()
if not uploaded:
    raise ValueError("No file uploaded.")
xlsx_name, xlsx_bytes = next(iter(uploaded.items()))
if not xlsx_name.lower().endswith(".xlsx"):
    raise ValueError("Please upload an .xlsx file named by_mouse.xlsx.")
with open("/content/by_mouse.xlsx", "wb") as f:
    f.write(xlsx_bytes)

# 2) Output folder
output_folder = input("Enter the name of the folder where plots will be saved: ")
base_dir = Path(f"/content/{output_folder}").resolve()
plots_dir = base_dir / "plots"
base_dir.mkdir(parents=True, exist_ok=True)
plots_dir.mkdir(parents=True, exist_ok=True)

# 3) Load data
df = pd.read_excel("/content/by_mouse.xlsx")
df.columns = [str(c).strip() for c in df.columns]
required = {"sample", "Tissue", "TRAT", "DO", "DOpond"}
missing = required - set(df.columns)
if missing:
    raise ValueError(f"Missing required columns: {missing}")
has_subarea = "Subarea" in df.columns

# 4) Palette (colorblind-friendly: Okabe–Ito)
import numpy as np
import matplotlib.pyplot as plt

OKABE_ITO = [
    "#E69F00",  # orange
    "#56B4E9",  # sky blue
    "#009E73",  # bluish green
    "#F0E442",  # yellow
    "#0072B2",  # blue
    "#D55E00",  # vermillion
    "#CC79A7",  # reddish purple
    "#999999",  # grey
]

group_order = sorted(df["TRAT"].astype(str).unique())
def build_color_map(groups, palette):
    # cycle if there are more groups than colors
    colors = [palette[i % len(palette)] for i in range(len(groups))]
    return {g: c for g, c in zip(groups, colors)}

COLOR_MAP = build_color_map(group_order, OKABE_ITO)

# 5) Helpers
def agg_summary(frame, index_cols, value_col):
    g = (
        frame.groupby(index_cols, dropna=False)[value_col]
        .agg(["mean", "std", "count"])
        .rename(columns={"count": "n"})
        .reset_index()
    )
    return g

def bar_with_err_colored(summary, x_col, hue_col, value_label, title, outfile, color_map):
    cats = summary[x_col].astype(str).unique().tolist()
    hues = [h for h in group_order if h in summary[hue_col].astype(str).unique().tolist()]
    x = np.arange(len(cats))
    width = 0.8 / max(len(hues), 1)

    plt.figure(figsize=(10, 5))
    for i, h in enumerate(hues):
        sub = summary[summary[hue_col].astype(str) == h]
        means, stds = [], []
        for c in cats:
            row = sub[sub[x_col].astype(str) == c]
            means.append(float(row["mean"].values[0]) if not row.empty else np.nan)
            stds.append(float(row["std"].values[0]) if not row.empty else np.nan)
        plt.bar(
            x + i*width - (len(hues)-1)*width/2,
            means,
            width=width,
            yerr=stds,
            capsize=4,
            label=h,
            color=color_map.get(h)
        )

    plt.xticks(x, cats, rotation=0)
    plt.ylabel(value_label)
    plt.title(title)
    plt.legend(title=hue_col, loc="best")
    plt.tight_layout()
    plt.show()
    plt.savefig(outfile, dpi=150, bbox_inches="tight")
    plt.close()

def box_per_category_colored(frame, cat_col, hue_col, value_col, title, outfile_stem, color_map):
    for cat in frame[cat_col].astype(str).unique():
        dfc = frame[frame[cat_col].astype(str) == cat]
        labels, groups_data, colors = [], [], []
        for h in group_order:
            vals = pd.to_numeric(
                dfc[dfc[hue_col].astype(str) == h][value_col],
                errors="coerce"
            ).dropna().values
            if vals.size > 0:
                labels.append(h)
                groups_data.append(vals)
                colors.append(color_map[h])
        if not groups_data:
            continue

        plt.figure(figsize=(8, 5))
        bplot = plt.boxplot(groups_data, labels=labels, showfliers=True, patch_artist=True)
        for patch, c in zip(bplot["boxes"], colors):
            patch.set_facecolor(c)
            patch.set_alpha(0.7)
        for element in ["medians", "whiskers", "caps"]:
            for line in bplot[element]:
                line.set_color("black")
                line.set_linewidth(1.2)

        legend_handles = [Patch(facecolor=color_map[g], edgecolor="black", label=g, alpha=0.7) for g in labels]
        plt.legend(handles=legend_handles, title=hue_col, loc="best")
        plt.ylabel(value_col)
        plt.title(f"{title} — {cat_col} = {cat}")
        plt.tight_layout()
        out = plots_dir / f"{outfile_stem}_{cat}.png"
        plt.show()
        plt.savefig(out, dpi=150, bbox_inches="tight")
        plt.close()

# 6) Summaries
sum_Tissue_DO  = agg_summary(df, ["Tissue", "TRAT"], "DO")
sum_Tissue_DOp = agg_summary(df, ["Tissue", "TRAT"], "DOpond")
if has_subarea:
    sum_sub_DO  = agg_summary(df, ["Subarea", "TRAT"], "DO")
    sum_sub_DOp = agg_summary(df, ["Subarea", "TRAT"], "DOpond")

# 7) Plots (shown inline and saved)
bar_with_err_colored(sum_Tissue_DO,  "Tissue", "TRAT", "DO",
                     "DO by Tissue and Group (mean ± SD)",
                     plots_dir / "bar_DO_by_Tissue_TRAT.png",
                     COLOR_MAP)

bar_with_err_colored(sum_Tissue_DOp, "Tissue", "TRAT", "DOpond",
                     "Weighted DO (DOpond) by Tissue and Group (mean ± SD)",
                     plots_dir / "bar_DOpond_by_Tissue_TRAT.png",
                     COLOR_MAP)

box_per_category_colored(df, "Tissue", "TRAT", "DO",
                         "DO by Group (boxplot)",
                         "box_DO_by_TRAT_Tissue",
                         COLOR_MAP)

box_per_category_colored(df, "Tissue", "TRAT", "DOpond",
                         "Weighted DO (DOpond) by Group (boxplot)",
                         "box_DOpond_by_TRAT_Tissue",
                         COLOR_MAP)

if has_subarea:
    bar_with_err_colored(sum_sub_DO,  "Subarea", "TRAT", "DO",
                         "DO by Subarea and Group (mean ± SD)",
                         plots_dir / "bar_DO_by_Subarea_TRAT.png",
                         COLOR_MAP)

    bar_with_err_colored(sum_sub_DOp, "Subarea", "TRAT", "DOpond",
                         "Weighted DO (DOpond) by Subarea and Group (mean ± SD)",
                         plots_dir / "bar_DOpond_by_Subarea_TRAT.png",
                         COLOR_MAP)

    box_per_category_colored(df, "Subarea", "TRAT", "DO",
                             "DO by Group (boxplot)",
                             "box_DO_by_TRAT_Subarea",
                             COLOR_MAP)

    box_per_category_colored(df, "Subarea", "TRAT", "DOpond",
                             "Weighted DO (DOpond) by Group (boxplot)",
                             "box_DOpond_by_TRAT_Subarea",
                             COLOR_MAP)

# 8) ZIP download
zip_path = shutil.make_archive(
    base_name=str(plots_dir),
    format="zip",
    root_dir=str(plots_dir)
)
files.download(zip_path)
print(f"ZIP download started: {os.path.basename(zip_path)}")
