# SLEAP-Roots Processing with sleap-vizmo

This notebook demonstrates how to:
1. Load SLEAP files using sleap-io
2. Split multi-video labels into individual files
3. Save files with correct naming for sleap-roots Series
4. Process with MultipleDicotPipeline to get traits for multiple plants
5. Generate CSV files with summary statistics for each series

## Output Files Created:
- **Individual series files**: `{series_name}_all_plants_traits.csv` - Summary statistics for each series
- **Combined summary**: `series_summary_statistics_{timestamp}.csv` - All series statistics in one file
- **Final output**: `final_series_summary_with_metadata_{timestamp}.csv` - Summary statistics + metadata (genotype, replicate, etc.)

Note: These files contain SUMMARY STATISTICS (min/max/mean/median/std/percentiles) aggregated at the series level, not individual plant measurements.

In [1]:
# Import required libraries
import sleap_io as sio
import sleap_roots as sr
from sleap_roots.trait_pipelines import MultipleDicotPipeline
from sleap_vizmo.roots_utils import (
    split_labels_by_video,
    save_individual_video_labels,
    validate_series_compatibility,
    create_series_name_from_video
)
from sleap_vizmo.json_utils import (
    ensure_json_serializable,
    save_json,
    validate_json_serializable
)
from sleap_vizmo.sleap_roots_processing import (
    create_expected_count_csv,
    move_output_files_to_directory,
    combine_trait_csvs,
    merge_traits_with_expected_counts,
    create_processing_summary
)
from pathlib import Path
from datetime import datetime
import pandas as pd
import json

### Important: Restart Kernel

If you encounter JSON serialization errors with numpy types (e.g., `TypeError: Object of type int64 is not JSON serializable`), please restart the kernel to ensure all the new imports are loaded correctly.

The `sleap_vizmo` package now includes:
- `json_utils`: Functions to handle numpy/pandas types in JSON
- `sleap_roots_processing`: High-level functions for processing SLEAP-roots data

## 1. Load Test SLEAP Files

In [2]:
# Define paths to test data
test_data_dir = Path("tests/data")
lateral_file = test_data_dir / "lateral_root_MK22_Day14_labels.v002.slp"
primary_file = test_data_dir / "primary_root_MK22_Day14_labels.v003.slp"

# Load the SLEAP files
print("Loading SLEAP files...")
lateral_labels = sio.load_slp(lateral_file)
primary_labels = sio.load_slp(primary_file)

print(f"Lateral labels: {len(lateral_labels)} frames, {len(lateral_labels.videos)} videos")
print(f"Primary labels: {len(primary_labels)} frames, {len(primary_labels.videos)} videos")

Loading SLEAP files...
Lateral labels: 23 frames, 23 videos
Primary labels: 23 frames, 23 videos


## 2. Validate Series Compatibility

In [3]:
# Check if labels are compatible with Series requirements
lateral_compat = validate_series_compatibility(lateral_labels)
primary_compat = validate_series_compatibility(primary_labels)

print("Lateral labels compatibility:")
print(f"  Compatible: {lateral_compat['is_compatible']}")
if lateral_compat['warnings']:
    print(f"  Warnings: {lateral_compat['warnings']}")
if lateral_compat['errors']:
    print(f"  Errors: {lateral_compat['errors']}")

print("\nPrimary labels compatibility:")
print(f"  Compatible: {primary_compat['is_compatible']}")
if primary_compat['warnings']:
    print(f"  Warnings: {primary_compat['warnings']}")
if primary_compat['errors']:
    print(f"  Errors: {primary_compat['errors']}")

Lateral labels compatibility:
  Compatible: True

Primary labels compatibility:
  Compatible: True


## 3. Split Labels by Video and Save with Proper Naming

In [4]:
# Create timestamped output directory
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
output_dir = Path("output") / f"sleap_roots_processing_{timestamp}"
output_dir.mkdir(parents=True, exist_ok=True)

print(f"Output directory: {output_dir}")

# Split labels by video if needed
lateral_split = split_labels_by_video(lateral_labels)
primary_split = split_labels_by_video(primary_labels)

print(f"\nLateral labels split into {len(lateral_split)} video(s)")
print(f"Primary labels split into {len(primary_split)} video(s)")

Output directory: output/sleap_roots_processing_20250804_114505_378481

Lateral labels split into 23 video(s)
Primary labels split into 23 video(s)


In [5]:
# Save individual video labels with proper naming for Series.load
# The naming convention should make it clear which are lateral vs primary

series_data = {}  # Will store series names and their file paths

# Process lateral roots
print("\nSaving lateral root files...")
for video_name, labels in lateral_split.items():
    series_name = create_series_name_from_video(video_name)
    if series_name not in series_data:
        series_data[series_name] = {}
    
    # Save with .lateral suffix to identify root type
    output_path = output_dir / f"{series_name}.lateral.slp"
    labels.save(str(output_path))
    series_data[series_name]['lateral_path'] = str(output_path)
    print(f"  Saved: {output_path.name}")

# Process primary roots
print("\nSaving primary root files...")
for video_name, labels in primary_split.items():
    series_name = create_series_name_from_video(video_name)
    if series_name not in series_data:
        series_data[series_name] = {}
    
    # Save with .primary suffix to identify root type
    output_path = output_dir / f"{series_name}.primary.slp"
    labels.save(str(output_path))
    series_data[series_name]['primary_path'] = str(output_path)
    print(f"  Saved: {output_path.name}")

print(f"\nTotal series to process: {len(series_data)}")


Saving lateral root files...
  Saved: F_Ac_set1_day14_20250527_102755_001.lateral.slp
  Saved: F_Cp_set1_day14_20250527_102755_002.lateral.slp
  Saved: F_De_set1_day14_20250527_102755_003.lateral.slp
  Saved: F_DhA_set1_day14_20250527_102755_004.lateral.slp
  Saved: F_DhD_set1_day14_20250527_102755_005.lateral.slp
  Saved: F_Fo_set1_day14_20250527_102755_006.lateral.slp
  Saved: F_Gr_set1_day14_20250527_102955_007.lateral.slp
  Saved: no_peptide1_set1_day14_20250527_102955_010.lateral.slp
  Saved: no_peptide2_set1_day14_20250527_102955_011.lateral.slp
  Saved: OG_Ac_set2_day14_20250527_103422_014.lateral.slp
  Saved: OG_Cp_set2_day14_20250527_103422_015.lateral.slp
  Saved: OG_De_set2_day14_20250527_103422_016.lateral.slp
  Saved: OG_DhA_set2_day14_20250527_103422_017.lateral.slp
  Saved: OG_DhB_set2_day14_20250527_103422_018.lateral.slp
  Saved: OG_DhD_set2_day14_20250527_103618_019.lateral.slp
  Saved: OG_Fo_set1_day14_20250527_102955_012.lateral.slp
  Saved: OG_Gr_set2_day14_202505

## 4. Load Series and Process with MultipleDicotPipeline

In [6]:
# Find all slp files in the folder
all_slps = sr.find_all_slp_paths(output_dir)

# Load the cylinder series using slp paths
all_series = sr.load_series_from_slps(slp_paths=all_slps, h5s=False)
print(f"Loaded {len(all_series)} series")
all_series

Loaded 23 series


[Series(series_name='OG_DhD_set2_day14_20250527_103618_019', h5_path=None, primary_path='output/sleap_roots_processing_20250804_114505_378481/OG_DhD_set2_day14_20250527_103618_019.primary.slp', lateral_path='output/sleap_roots_processing_20250804_114505_378481/OG_DhD_set2_day14_20250527_103618_019.lateral.slp', crown_path=None, primary_labels=Labels(labeled_frames=1, videos=1, skeletons=1, tracks=0, suggestions=0, sessions=0), lateral_labels=Labels(labeled_frames=1, videos=1, skeletons=1, tracks=0, suggestions=0, sessions=0), crown_labels=None, video=None, csv_path=None),
 Series(series_name='OG_Gr_set2_day14_20250527_103618_020', h5_path=None, primary_path='output/sleap_roots_processing_20250804_114505_378481/OG_Gr_set2_day14_20250527_103618_020.primary.slp', lateral_path='output/sleap_roots_processing_20250804_114505_378481/OG_Gr_set2_day14_20250527_103618_020.lateral.slp', crown_path=None, primary_labels=Labels(labeled_frames=1, videos=1, skeletons=1, tracks=0, suggestions=0, sessio

In [7]:
# Create expected count CSV using the utility function
# This now returns both the dataframe and the path
expected_count_df, expected_count_path = create_expected_count_csv(all_series, series_data, output_dir)

# Display the dataframe
display(expected_count_df[['plant_qr_code', 'genotype', 'replicate', 'number_of_plants_cylinder']].head(10))

OG_DhD_set2_day14_20250527_103618_019: 8 plants detected
OG_Gr_set2_day14_20250527_103618_020: 8 plants detected
OG_Ri1_set1_day14_20250527_102955_008: 7 plants detected
F_Ri_set3_day14_20250527_103956_025: 8 plants detected
OG_Fo_set1_day14_20250527_102955_012: 8 plants detected
F_Gr_set1_day14_20250527_102955_007: 7 plants detected
F_Cp_set1_day14_20250527_102755_002: 8 plants detected
OG_Cp_set2_day14_20250527_103422_015: 8 plants detected
no_peptide2_set1_day14_20250527_102955_011: 7 plants detected
F_Ac_set1_day14_20250527_102755_001: 7 plants detected
OG_DhA_set2_day14_20250527_103422_017: 8 plants detected
F_De_set1_day14_20250527_102755_003: 8 plants detected
no_peptide1_set1_day14_20250527_102955_010: 8 plants detected
OG_RiA4_set3_day14_20250527_103956_027: 8 plants detected
OG_Ri2_set1_day14_20250527_102955_009: 8 plants detected
F_DhD_set1_day14_20250527_102755_005: 7 plants detected
OG_DhB_set2_day14_20250527_103422_018: 8 plants detected
F_DhA_set1_day14_20250527_102755_0

Unnamed: 0,plant_qr_code,genotype,replicate,number_of_plants_cylinder
0,OG_DhD_set2_day14_20250527_103618_019,OG_DhD,2,8
1,OG_Gr_set2_day14_20250527_103618_020,OG_Gr,2,8
2,OG_Ri1_set1_day14_20250527_102955_008,OG_Ri1,1,7
3,F_Ri_set3_day14_20250527_103956_025,F_Ri,3,8
4,OG_Fo_set1_day14_20250527_102955_012,OG_Fo,1,8
5,F_Gr_set1_day14_20250527_102955_007,F_Gr,1,7
6,F_Cp_set1_day14_20250527_102755_002,F_Cp,1,8
7,OG_Cp_set2_day14_20250527_103422_015,OG_Cp,2,8
8,no_peptide2_set1_day14_20250527_102955_011,no_peptide2,1,7
9,F_Ac_set1_day14_20250527_102755_001,F_Ac,1,7


In [8]:
# Find all slp files in the folder
all_slps = sr.find_all_slp_paths(output_dir)

# Load the cylinder series using slp paths and the expected count CSV
print(f"\nLoading all series from {len(all_slps)} SLEAP files...")
all_series = sr.load_series_from_slps(slp_paths=all_slps, h5s=False, csv_path=expected_count_path)
print(f"Loaded {len(all_series)} series")
all_series


Loading all series from 46 SLEAP files...
Loaded 23 series


[Series(series_name='OG_DhD_set2_day14_20250527_103618_019', h5_path=None, primary_path='output/sleap_roots_processing_20250804_114505_378481/OG_DhD_set2_day14_20250527_103618_019.primary.slp', lateral_path='output/sleap_roots_processing_20250804_114505_378481/OG_DhD_set2_day14_20250527_103618_019.lateral.slp', crown_path=None, primary_labels=Labels(labeled_frames=1, videos=1, skeletons=1, tracks=0, suggestions=0, sessions=0), lateral_labels=Labels(labeled_frames=1, videos=1, skeletons=1, tracks=0, suggestions=0, sessions=0), crown_labels=None, video=None, csv_path='output/sleap_roots_processing_20250804_114505_378481/expected_plant_counts.csv'),
 Series(series_name='OG_Gr_set2_day14_20250527_103618_020', h5_path=None, primary_path='output/sleap_roots_processing_20250804_114505_378481/OG_Gr_set2_day14_20250527_103618_020.primary.slp', lateral_path='output/sleap_roots_processing_20250804_114505_378481/OG_Gr_set2_day14_20250527_103618_020.lateral.slp', crown_path=None, primary_labels=Lab

### Note on File Output Location

The sleap-roots `compute_multiple_dicots_traits` function saves CSV and JSON files to the current working directory, not respecting the `output_dir` parameter. To handle this, we let the files be created in the current directory and then move them to our output folder after processing is complete.

In [9]:
# Initialize MultipleDicotPipeline
pipeline = MultipleDicotPipeline()
print(f"\nUsing pipeline: {pipeline.__class__.__name__}")

# The MultipleDicotPipeline expects an expected count CSV
# Let's use the one we just created
print(f"Using expected count CSV: {output_dir / 'expected_plant_counts.csv'}")

all_series_traits = []
# Process all series together with the expected count CSV
try:
    for series in all_series:
        # Compute traits for multiple plants across all series
        traits = pipeline.compute_multiple_dicots_traits(
            series=series,
            write_json=True,
            json_suffix="_all_plants_traits.json",
            write_csv=True,
            csv_suffix="_all_plants_traits.csv",
        )
        all_series_traits.append(traits)
        
except Exception as e:
    print(f"✗ Error computing traits: {e}")
    import traceback
    traceback.print_exc()

# Move the generated files to the output directory using utility function
file_patterns = ["*_all_plants_traits.json", "*_all_plants_traits.csv"]
moved_files = move_output_files_to_directory(output_dir, file_patterns)


Using pipeline: MultipleDicotPipeline
Using expected count CSV: output/sleap_roots_processing_20250804_114505_378481/expected_plant_counts.csv
Aggregated traits saved to OG_DhD_set2_day14_20250527_103618_019_all_plants_traits.json
Summary statistics saved to OG_DhD_set2_day14_20250527_103618_019_all_plants_traits.csv
Aggregated traits saved to OG_Gr_set2_day14_20250527_103618_020_all_plants_traits.json
Summary statistics saved to OG_Gr_set2_day14_20250527_103618_020_all_plants_traits.csv
Aggregated traits saved to OG_Ri1_set1_day14_20250527_102955_008_all_plants_traits.json
Summary statistics saved to OG_Ri1_set1_day14_20250527_102955_008_all_plants_traits.csv
Aggregated traits saved to F_Ri_set3_day14_20250527_103956_025_all_plants_traits.json
Summary statistics saved to F_Ri_set3_day14_20250527_103956_025_all_plants_traits.csv
Aggregated traits saved to OG_Fo_set1_day14_20250527_102955_012_all_plants_traits.json
Summary statistics saved to OG_Fo_set1_day14_20250527_102955_012_all_pl

## 5. Combine All Traits into Final CSV

In [10]:
# Save all_series_traits as a comprehensive JSON using our utility
all_traits_json_path = output_dir / "all_series_traits.json"

# Use the utility function that handles numpy types
save_json(all_series_traits, all_traits_json_path)
print(f"✅ All series traits saved to JSON: {all_traits_json_path}")

# Check the structure of the traits
print(f"\nNumber of series processed: {len(all_series_traits)}")
if all_series_traits and len(all_series_traits) > 0:
    # Check if it's a list of dictionaries with plant data
    first_trait = all_series_traits[0]
    if isinstance(first_trait, dict):
        print(f"First series has keys: {list(first_trait.keys())[:5]}...")
    print(f"Type of first trait: {type(first_trait)}")

✅ All series traits saved to JSON: output/sleap_roots_processing_20250804_114505_378481/all_series_traits.json

Number of series processed: 23
First series has keys: ['series', 'group', 'qc_fail', 'traits', 'summary_stats']...
Type of first trait: <class 'dict'>


In [11]:
# Combine all individual CSV files using the utility function
# This creates a single CSV with summary statistics for all series
series_summary_df = combine_trait_csvs(output_dir, timestamp=timestamp)

if series_summary_df is not None:
    # Display first few rows
    print("\nFirst 5 rows of series summary dataframe:")
    display(series_summary_df.head())
    
    # Check unique series
    print(f"\nUnique series in data: {series_summary_df['series_name'].nunique()}")
    print(f"Summary statistics per series (each row = 1 series):")
    print(series_summary_df['series_name'].value_counts().sort_index())
    
    # Note: This dataframe contains SUMMARY STATISTICS (min/max/mean/etc) 
    # for each series, not individual plant measurements
    print("\nNote: Each row represents summary statistics for one series (multiple plants)")
else:
    print("No series summary data available")

Found 23 individual CSV files
  - OG_Ac_set2_day14_20250527_103422_014_all_plants_traits.csv: 1 plants
  - F_Ri_set3_day14_20250527_103956_025_all_plants_traits.csv: 1 plants
  - F_De_set1_day14_20250527_102755_003_all_plants_traits.csv: 1 plants
  - OG_Gr_set2_day14_20250527_103618_020_all_plants_traits.csv: 1 plants
  - OG_Ri2_set1_day14_20250527_102955_009_all_plants_traits.csv: 1 plants
  - OG_DhA_set2_day14_20250527_103422_017_all_plants_traits.csv: 1 plants
  - S_Ri_set2_day14_20250527_103422_013_all_plants_traits.csv: 1 plants
  - F_Cp_set1_day14_20250527_102755_002_all_plants_traits.csv: 1 plants
  - OG_Ri1_set1_day14_20250527_102955_008_all_plants_traits.csv: 1 plants
  - OG_DhB_set2_day14_20250527_103422_018_all_plants_traits.csv: 1 plants
  - OG_Mt_set2_day14_20250527_103618_021_all_plants_traits.csv: 1 plants
  - no_peptide2_set1_day14_20250527_102955_011_all_plants_traits.csv: 1 plants
  - F_Fo_set1_day14_20250527_102755_006_all_plants_traits.csv: 1 plants
  - OG_DhD_set2_

Unnamed: 0,series,lateral_count_min,lateral_count_max,lateral_count_mean,lateral_count_median,lateral_count_std,lateral_count_p5,lateral_count_p25,lateral_count_p75,lateral_count_p95,...,network_solidity_min,network_solidity_max,network_solidity_mean,network_solidity_median,network_solidity_std,network_solidity_p5,network_solidity_p25,network_solidity_p75,network_solidity_p95,series_name
0,OG_Ac_set2_day14_20250527_103422_014,11.0,20.0,15.75,16.0,3.307189,11.35,12.75,18.5,20.0,...,0.004725,0.01037,0.006198,0.005849,0.001651,0.004818,0.005407,0.006018,0.00896,OG_Ac_set2_day14_20250527_103422_014
1,F_Ri_set3_day14_20250527_103956_025,5.0,12.0,9.25,10.0,2.384848,5.35,8.25,11.0,11.65,...,0.005757,0.014802,0.009497,0.00889,0.003046,0.006193,0.007091,0.010725,0.01446,F_Ri_set3_day14_20250527_103956_025
2,F_De_set1_day14_20250527_102755_003,3.0,14.0,9.25,10.0,3.072051,4.4,7.75,11.0,12.95,...,0.006144,0.012018,0.009186,0.008604,0.001962,0.006618,0.007949,0.011132,0.011841,F_De_set1_day14_20250527_102755_003
3,OG_Gr_set2_day14_20250527_103618_020,4.0,12.0,8.375,9.0,2.446298,4.7,6.75,10.0,11.3,...,0.004223,0.010542,0.006203,0.006136,0.001914,0.004259,0.004634,0.006743,0.00923,OG_Gr_set2_day14_20250527_103618_020
4,OG_Ri2_set1_day14_20250527_102955_009,8.0,17.0,12.625,13.0,2.496873,8.7,12.25,13.25,15.95,...,0.004279,0.007874,0.00537,0.005118,0.001078,0.004378,0.00462,0.005486,0.007247,OG_Ri2_set1_day14_20250527_102955_009



Unique series in data: 23
Summary statistics per series (each row = 1 series):
series_name
F_Ac_set1_day14_20250527_102755_001           1
F_Cp_set1_day14_20250527_102755_002           1
F_De_set1_day14_20250527_102755_003           1
F_DhA_set1_day14_20250527_102755_004          1
F_DhD_set1_day14_20250527_102755_005          1
F_Fo_set1_day14_20250527_102755_006           1
F_Gr_set1_day14_20250527_102955_007           1
F_Ri_set3_day14_20250527_103956_025           1
OG_Ac_set2_day14_20250527_103422_014          1
OG_Cp_set2_day14_20250527_103422_015          1
OG_De_set2_day14_20250527_103422_016          1
OG_DhA_set2_day14_20250527_103422_017         1
OG_DhB_set2_day14_20250527_103422_018         1
OG_DhD_set2_day14_20250527_103618_019         1
OG_Fo_set1_day14_20250527_102955_012          1
OG_Gr_set2_day14_20250527_103618_020          1
OG_Mt_set2_day14_20250527_103618_021          1
OG_Ri1_set1_day14_20250527_102955_008         1
OG_Ri2_set1_day14_20250527_102955_009       

## 5.1 Merge Traits with Expected Count Metadata

Combine the trait data with the expected count metadata to have all information in one place.

In [12]:
# Merge the series summary statistics with expected counts to have all metadata in one place
if series_summary_df is not None and len(series_summary_df) > 0:
    final_summary_df = merge_traits_with_expected_counts(
        traits_df=series_summary_df,
        expected_count_df=expected_count_df,
        output_dir=output_dir,
        timestamp=timestamp
    )
    
    # Display the first few rows of the merged data
    print("\nFirst 5 rows of final summary with metadata:")
    display(final_summary_df.head())
    
    # Show which columns we have
    print(f"\nFinal summary dataframe columns ({len(final_summary_df.columns)} total):")
    print("Metadata columns:")
    for col in ['series_name', 'plant_qr_code', 'genotype', 'replicate', 
                'number_of_plants_cylinder', 'primary_root_proofread', 'lateral_root_proofread']:
        if col in final_summary_df.columns:
            print(f"  - {col}")
    
    print("\nSummary statistic columns (first 10):")
    stat_cols = [col for col in final_summary_df.columns if col not in 
                  ['series_name', 'plant_qr_code', 'genotype', 'replicate', 
                   'number_of_plants_cylinder', 'primary_root_proofread', 'lateral_root_proofread']]
    for col in stat_cols[:10]:
        print(f"  - {col}")
    if len(stat_cols) > 10:
        print(f"  ... and {len(stat_cols) - 10} more summary statistic columns")
        
    print("\n📌 This is the FINAL OUTPUT: Summary statistics for each series with all metadata")
else:
    print("No series summary data available to merge with expected counts")


✅ Merged traits with metadata saved to: output/sleap_roots_processing_20250804_114505_378481/final_series_summary_with_metadata_20250804_114505_378481.csv
Total rows: 23
Total columns: 331

Plants per series:
  F_Ac_set1_day14_20250527_102755_001: 1 plants (expected: 7)
  F_Cp_set1_day14_20250527_102755_002: 1 plants (expected: 8)
  F_De_set1_day14_20250527_102755_003: 1 plants (expected: 8)
  F_DhA_set1_day14_20250527_102755_004: 1 plants (expected: 8)
  F_DhD_set1_day14_20250527_102755_005: 1 plants (expected: 7)
  F_Fo_set1_day14_20250527_102755_006: 1 plants (expected: 7)
  F_Gr_set1_day14_20250527_102955_007: 1 plants (expected: 7)
  F_Ri_set3_day14_20250527_103956_025: 1 plants (expected: 8)
  OG_Ac_set2_day14_20250527_103422_014: 1 plants (expected: 8)
  OG_Cp_set2_day14_20250527_103422_015: 1 plants (expected: 8)
  OG_De_set2_day14_20250527_103422_016: 1 plants (expected: 8)
  OG_DhA_set2_day14_20250527_103422_017: 1 plants (expected: 8)
  OG_DhB_set2_day14_20250527_103422_018

Unnamed: 0,series_name,plant_qr_code,genotype,replicate,number_of_plants_cylinder,primary_root_proofread,lateral_root_proofread,series,lateral_count_min,lateral_count_max,...,network_solidity_p75,network_solidity_p95,path,qc_cylinder,qc_code,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Instructions
0,OG_Ac_set2_day14_20250527_103422_014,OG_Ac_set2_day14_20250527_103422_014,OG_Ac,2,8,output/sleap_roots_processing_20250804_114505_...,output/sleap_roots_processing_20250804_114505_...,OG_Ac_set2_day14_20250527_103422_014,11.0,20.0,...,0.006018,0.00896,output/sleap_roots_processing_20250804_114505_...,0,,,,,,
1,F_Ri_set3_day14_20250527_103956_025,F_Ri_set3_day14_20250527_103956_025,F_Ri,3,8,output/sleap_roots_processing_20250804_114505_...,output/sleap_roots_processing_20250804_114505_...,F_Ri_set3_day14_20250527_103956_025,5.0,12.0,...,0.010725,0.01446,output/sleap_roots_processing_20250804_114505_...,0,,,,,,
2,F_De_set1_day14_20250527_102755_003,F_De_set1_day14_20250527_102755_003,F_De,1,8,output/sleap_roots_processing_20250804_114505_...,output/sleap_roots_processing_20250804_114505_...,F_De_set1_day14_20250527_102755_003,3.0,14.0,...,0.011132,0.011841,output/sleap_roots_processing_20250804_114505_...,0,,,,,,
3,OG_Gr_set2_day14_20250527_103618_020,OG_Gr_set2_day14_20250527_103618_020,OG_Gr,2,8,output/sleap_roots_processing_20250804_114505_...,output/sleap_roots_processing_20250804_114505_...,OG_Gr_set2_day14_20250527_103618_020,4.0,12.0,...,0.006743,0.00923,output/sleap_roots_processing_20250804_114505_...,0,,,,,,
4,OG_Ri2_set1_day14_20250527_102955_009,OG_Ri2_set1_day14_20250527_102955_009,OG_Ri2,1,8,output/sleap_roots_processing_20250804_114505_...,output/sleap_roots_processing_20250804_114505_...,OG_Ri2_set1_day14_20250527_102955_009,8.0,17.0,...,0.005486,0.007247,output/sleap_roots_processing_20250804_114505_...,0,,,,,,



Final summary dataframe columns (331 total):
Metadata columns:
  - series_name
  - plant_qr_code
  - genotype
  - replicate
  - number_of_plants_cylinder
  - primary_root_proofread
  - lateral_root_proofread

Summary statistic columns (first 10):
  - series
  - lateral_count_min
  - lateral_count_max
  - lateral_count_mean
  - lateral_count_median
  - lateral_count_std
  - lateral_count_p5
  - lateral_count_p25
  - lateral_count_p75
  - lateral_count_p95
  ... and 314 more summary statistic columns

📌 This is the FINAL OUTPUT: Summary statistics for each series with all metadata


In [13]:
# This cell is no longer needed - the combined CSV is already saved by combine_trait_csvs()
# and we have the merged version with metadata which is more comprehensive
print("\n✅ All trait data has been processed and saved:")
print(f"  - Individual series CSVs: {len(all_series)} files")
print(f"  - Combined summary statistics: all_plants_combined_traits_{timestamp}.csv") 
print(f"  - Final output with metadata: all_plants_traits_with_metadata_{timestamp}.csv")


✅ All trait data has been processed and saved:
  - Individual series CSVs: 23 files
  - Combined summary statistics: all_plants_combined_traits_20250804_114505_378481.csv
  - Final output with metadata: all_plants_traits_with_metadata_20250804_114505_378481.csv


## 6. Summary and Validation

In [14]:
# Create processing summary using the utility function
summary = create_processing_summary(
    timestamp=timestamp,
    output_dir=output_dir,
    input_files={
        "lateral": lateral_file,
        "primary": primary_file
    },
    all_series=all_series,
    expected_count_df=expected_count_df,
    series_summary_df=series_summary_df if 'series_summary_df' in locals() else None,
    all_traits_json_path=all_traits_json_path if 'all_traits_json_path' in locals() else None,
    series_summary_csv_path=output_dir / f"series_summary_statistics_{timestamp}.csv" if 'series_summary_df' in locals() else None
)

📓 Saved notebook snapshot (after execution): sleap_roots_processing_notebook_after_execution.ipynb
📄 Saved HTML version to: sleap_roots_processing_notebook_after_execution.html

📊 Processing Summary:
  - Series processed: 23
  - Expected plants: 178
  - Series with summary statistics: 23
  - Output directory: output/sleap_roots_processing_20250804_114505_378481
  - All traits JSON: all_series_traits.json
  - Series summary CSV: series_summary_statistics_20250804_114505_378481.csv
  - Summary saved to: processing_summary.json
