### Bartley Lab: Switchgrass Cross-Section Filtering Pipeline ###
**Author: Tianyi Chen, May 2025**

Objective: Program for the Bartley lab at WSU. 
This program will be utilized to sift through folder with ~750 files/png images of cross-sections of switchgrass to extract the most relevant images to be used to train a computer vision model that detects aerenchyma.

The program will:
• Delete *.xml and 40 × magnification .png files in every sub-folder.
• Copy the remaining 4 × / 10 × PNGs into a single folder named
  ‘switchgrass_cross_sections(4x_10x)’ that sits beside the sub-folders.

In [16]:
#import packages
import os
import shutil

In [17]:
# Helper function for folder path validation
def validatepath(folderpath):
    if not os.path.exists(folderpath):  
        raise FileNotFoundError("Path does not exist.")
    elif not os.path.isdir(folderpath):
        raise NotADirectoryError("Path is not a directory.")

In [18]:
#Get user to import the folder path
folderpath = input("Input folder path:").strip() #use strip to take away blank spaces

#validate path
validatepath(folderpath)

#change working directory to inside that folder
os.chdir(folderpath)

Input folder path: /Users/tianyichen/Desktop/sg_cs copy


In [19]:
#create a new folder which we'll copy all of the useful cross-section to
directory_name = "switchgrass_cross_sections(4x_10x)" 
if not os.path.exists(directory_name):
    os.mkdir(directory_name)

#access the subfolders names
subfolders = os.listdir(folderpath)  

#loop through subfolders and access all items in subfolder
for subfoldername in subfolders:

    #prevent same file error
    if subfoldername == directory_name:
        continue
        
    subfolder_path = os.path.join(folderpath, subfoldername)

    #skip the hidden DS files
    if not os.path.isdir(subfolder_path):
        continue  # Skip files like .DS_Store
    
    #aggregate names of files in subfolders
    imagenames = os.listdir(subfolder_path)
    
    #remove all xml files
    for name in imagenames:
        if name.endswith(".xml") or "40x" in name:
            file_path = os.path.join(subfolder_path, name)
            if os.path.exists(file_path):
                os.remove(file_path)

    #copy back the remaining images to the new folder
    #reaggregate
    final_image_names = os.listdir(subfolder_path)
    for name in final_image_names:
        file_path = os.path.join(subfolder_path, name)
        shutil.copy(file_path, directory_name)

In [21]:
#print number of 4x and number of 10x and total number of images (get dimensions)

# Counters
count_4x = 0
count_10x = 0

total_images = 0

# Loop through images in the new folder
for image_name in os.listdir(directory_name):
    total_images += 1
    if "4x" in image_name:
        count_4x += 1
    elif "10x" in image_name:
        count_10x += 1

# Print the results
print(f"Number of 4x images: {count_4x}")
print(f"Number of 10x images: {count_10x}")
print(f"Total number of images: {total_images}")

Number of 4x images: 1275
Number of 10x images: 734
Total number of images: 2059
