<a target="_blank" href="https://colab.research.google.com/github/umanitoba-meagher-projects/public-experiments/blob/main/jupyter-notebooks/Visualize%20Image%20Information/photo-select-tool.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
# This is a script that displays an image in the code and allows for quick selection.
"""
Author: Zhenggang Li & A.V. Ronquillo
Date: May 19, 2024

## Purpose: The script extracts photos from the original files, allows for quick manual selection, categorizes the photos, and saves them into the required folders for future use.
## Note: The author generated this text in part with GPT-4,
OpenAI’s large-scale language-generation model. Upon generating
draft code, the author reviewed, edited, and revised the code
to their own liking and takes ultimate responsibility for
the content of this code.

"""

# Introduction
This notebook defines a function to display images along with their filenames and includes another main function to manage image selection from a directory.

This overall structure is useful for processing large sets of images in manageable chunks, allowing intermittent user control and the option to stop processing.

The functioning script in its entirity is found under Module: Trigger the Main Function.

# Module: Import Python Packages


This module imports the `Image` and display functions from IPython's display module. These are used to display images in Jupyter notebooks or other IPython environments. Next, import the `os` module, which provides functions to interact with the operating system, e.g., path manipulations, directory and file operations. The `shutil` module is also to be imported, which offers high-level file operations such as copying and moving files.

Additional imports include `sys` for system-specific parameters and functions, `time` for adding delays and managing user interface timing, `requests` for downloading files from the Borealis repository, and
`zipfile` for handling compressed archive files.

In [2]:
## Import python packages
from IPython.display import Image, display, clear_output
import os
import shutil
import sys
import time
import requests
import zipfile

# Module: Printing Image Information

This module defines a function `show_image_with_filename` that takes two arguments: `image_path` and `image_number`. This function is designed to display an image in a notebook environment (e.g., Google Colab) and print information about the image.

The display function `Image(filename=image_path, width=800)` is specifically used to display the image in the notebook. It creates an image object using the provided image path and sets its width to 800 pixels as an inline display for better visibility. This image object is then passed to the display function to show it in the notebook.

The print statement is used to print the `image_number` and the `filename` of the image, while `os.path.basename(image_path)` extracts the filename from the provided image path. As a result, the image is visualized in a notebook environment.

In [3]:
## Show images in coding surface
def show_image_with_filename(image_path, image_number):
    display(Image(filename=image_path, width=800))  ### Adjust image size to suit for screen
    print(f"{image_number}. Image Filename: {os.path.basename(image_path)}")
    sys.stdout.flush()  # Force output to display

## Module: Borealis Data Repository Integration

The script includes full integration with the Borealis data repository, enabling automatic download and extraction of datasets for seamless processing. The system connects to the Borealis API to fetch dataset metadata, handles file downloads with robust error handling, automatically extracts ZIP archives, and intelligently navigates nested directories to locate image files. he data is hosted in the University of Manitoba Dataverse (https://borealisdata.ca/dataverse/manitoba), a research data repository. The images used in this notebook were collected as part of the 'Understanding Animals' project at University of Manitoba Faculty of Architecture, online at Wild Winnipeg and Teaching with Images.

In [4]:
# Borealis API configuration
BOREALIS_SERVER = "https://borealisdata.ca"

def get_public_dataset_info(persistent_id):
    """
    Get information about a public dataset
    """
    url = f"{BOREALIS_SERVER}/api/datasets/:persistentId/"
    params = {"persistentId": persistent_id}

    response = requests.get(url, params=params)

    if response.status_code == 200:
        dataset_info = response.json()
    else:
        print(f"Cannot access dataset: {response.status_code}")
        return None

    # Get a list of files in a public dataset
    files_list = dataset_info['data']['latestVersion']['files']

    # Create an empty list to store file information
    file_info_list = []

    # Iterate through the files list and append file ID and filename to the list
    for file_info in files_list:
        file_id = file_info['dataFile']['id']
        filename = file_info['dataFile']['filename']
        file_info_list.append({"file_id": file_id, "filename": filename})

    return file_info_list

def download_public_file(file_id, save_path="./"):
    """
    Download a specific public file from a dataset by its file ID
    No authentication required
    """
    url = f"{BOREALIS_SERVER}/api/access/datafile/{file_id}"

    response = requests.get(url, stream=True)

    if response.status_code == 200:
        # Determine filename from headers or URL
        filename = None
        if "Content-Disposition" in response.headers:
            cd = response.headers["Content-Disposition"]
            if "filename=" in cd:
                filename = cd.split("filename=")[1].strip('"')

        # Fallback to extracting from URL if header not available or malformed
        if not filename:
             filename = url.split("/")[-1]

        file_path = f"{save_path}/{filename}"

        with open(file_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

        print(f"SUCCESS: File downloaded to {file_path}")
        return file_path
    else:
        print(f"ERROR: {response.status_code}: File may be restricted or not found")
        return None

def is_zip_file(filepath):
    """
    Checks if a file is a valid zip file.
    """
    return zipfile.is_zipfile(filepath)

def unzip_file(filepath, extract_path="./"):
    """
    Unzips a zip file to a specified path and returns the name of the top-level extracted folder.
    Returns None if not a zip file or extraction fails.
    """
    if is_zip_file(filepath):
        try:
            with zipfile.ZipFile(filepath, 'r') as zip_ref:
                # Get the name of the top-level directory within the zip
                top_level_folder = None
                for file_info in zip_ref.infolist():
                    parts = file_info.filename.split('/')
                    if parts[0] and len(parts) > 1:
                        top_level_folder = parts[0]
                        break

                zip_ref.extractall(extract_path)
                print(f"SUCCESS: Successfully unzipped {filepath} to {extract_path}")
                return top_level_folder

        except Exception as e:
            print(f"ERROR: Error unzipping {filepath}: {e}")
            return None
    else:
        print(f"INFO: {filepath} is not a valid zip file.")
        return None

# Initialize Borealis dataset access and download data
public_doi = "doi:10.5683/SP3/H3HGWF"
print("Borealis dataset initialized for animal notebook data.")

# Download and extract the dataset first
print("Getting dataset information...")
file_list = get_public_dataset_info(public_doi)

if file_list:
    print(f"Found {len(file_list)} files in the dataset")
    # Download the first file (assuming it's a zip file with images)
    first_file = file_list[0]
    print(f"Downloading file: {first_file['filename']}")

    downloaded_file = download_public_file(first_file['file_id'])

    if downloaded_file:
        # Try to unzip if it's a zip file
        extracted_folder = unzip_file(downloaded_file)
        if extracted_folder:
            initial_folder = f"./{extracted_folder}"
            print(f"Images extracted to: {initial_folder}")

            # Check if there's a subfolder that contains the images
            if os.path.exists(initial_folder):
                contents = os.listdir(initial_folder)
                subfolders = [f for f in contents if os.path.isdir(os.path.join(initial_folder, f))]

                if subfolders and len(contents) == 1:
                    # If there's only one item and it's a folder, look inside it
                    actual_folder = os.path.join(initial_folder, subfolders[0])
                    print(f"Found image subfolder: {actual_folder}")
                    folder_path = actual_folder
                else:
                    folder_path = initial_folder
            else:
                folder_path = initial_folder
        else:
            # If not a zip file, use current directory
            folder_path = "./"
    else:
        print("Failed to download file")
        folder_path = "./"
else:
    print("Failed to get dataset information")
    folder_path = "./"

Borealis dataset initialized for animal notebook data.
Getting dataset information...
Found 7 files in the dataset
Downloading file: 100grid-sample-images.zip
SUCCESS: File downloaded to .//965307
SUCCESS: Successfully unzipped .//965307 to ./
Images extracted to: ./100grid-sample-images-20250612T172128Z-1-001
Found image subfolder: ./100grid-sample-images-20250612T172128Z-1-001/100grid-sample-images


# Module: Batch Processing Function

The script initializes batch processing parameters. The images are put into a `batch_size` that defines how many images will be processed per batch, in this case it will be `5` images each batch.

In [5]:
def process_images_batch(start_index=0, batch_size=5):
    """Process a small batch of images starting from start_index"""
    # Use the automatically determined folder path
    file_path = folder_path
    select_dir = './selected-images'

    if not os.path.exists(select_dir):
        os.makedirs(select_dir)

    # Check if the directory exists and has images
    if not os.path.exists(file_path):
        print(f"Error: Directory {file_path} does not exist!")
        return start_index

    images = os.listdir(file_path)
    # Filter to only include image files
    images = [img for img in images if img.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif', '.tiff'))]

    if not images:
        print(f"No image files found in {file_path}")
        return start_index

    images.sort(key=lambda x: os.path.getmtime(os.path.join(file_path, x)))
    total_images = len(images)

    if start_index == 0:
        clear_output(wait=True)
        print(f"Found {total_images} images to process")
        print(f"Processing in batches of {batch_size} images")
        time.sleep(1)  # Give user time to see initial message

    # Process this batch
    end_index = min(start_index + batch_size, total_images)
    batch_images = images[start_index:end_index]

    print(f"\n=== Starting batch: images {start_index + 1} to {end_index} ===")
    time.sleep(0.5)

    for i, image in enumerate(batch_images):
        actual_index = start_index + i
        image_path = os.path.join(file_path, image)
        image_number = actual_index + 1

        # Clear previous output and show image
        clear_output(wait=True)
        print(f"BATCH PROGRESS: Image {i+1}/{len(batch_images)} in current batch")
        print(f"OVERALL PROGRESS: Image {image_number}/{total_images} total\n")

        show_image_with_filename(image_path, image_number)

        # Add a small delay to ensure display completes
        time.sleep(0.2)

        # Display the prompt clearly
        print("\n" + "="*50)
        print("OPTIONS:")
        print("  y or ENTER = Select this image")
        print("  n = Skip this image")
        print("  d = Delete this image")
        print("  q = Quit")
        print("="*50)
        sys.stdout.flush()

        # Add another small delay before input
        time.sleep(0.1)

        try:
            choice = input("Your choice: ")
        except KeyboardInterrupt:
            print("\nStopped by user")
            return actual_index
        except Exception as e:
            print(f"Input error: {e}")
            print("Trying again...")
            time.sleep(0.5)
            choice = input("Your choice: ")

# Module: Processing the User Input

Users are prompted to make decisions about images viewed. This piece specifically deals with interpreting user responses. User input determines whether to keep the image, discard it, or delete it entirely.

The `choice.strip()` takes the choice string inputted by the user and removes any surrounding whitespace (spaces, tabs, newlines, etc.). It then checks if the result is an empty string (''), which would mean that the user simply pressed "Enter" without typing anything. `choice = 'y'` is inputted if the condition is true that the user only pressed "Enter" and the script assigns the string 'y' to choice. This essentially defaults the choice to 'yes' when the user does not input any specific answer.

In [6]:
        ### press ENTER=y
        if choice.strip() == '':
            choice = 'y'

NameError: name 'choice' is not defined

When the user enters `'q'`, the function prints a quit message and returns the total number of images, which signals to the main loop that processing should stop completely. This return value is greater than or equal to the total image count, causing the main processing loop to exit and end the entire image selection session.

In [7]:
        elif choice.lower() == 'q':
            print("Quitting...")
            return total_images  # Return total to exit main loop

SyntaxError: invalid syntax (ipython-input-3126931583.py, line 1)

`elif choice.lower()` converts the choice string to lowercase and checks if it is `'d'`. If the choice is indeed `'d'`, the function from the `os` module will delete the file located at `image_path`, effectively removing the image from the file system.

`continue` then immediately ends the current iteration of the loop and starts the next image in the loop. This means that the the file is deleted, there is no need to execute further code specific to this iteration (like copying/moving the file).

In [8]:
        elif choice.lower() == 'd':
            os.remove(image_path)
            print(f"✗ Deleted {image}")
            time.sleep(0.5)
            continue

SyntaxError: invalid syntax (ipython-input-3745145054.py, line 1)

Similarly, `elif choice.lower()` checks if the user's input, when converted to lowercase, equals `'n'`. If the choice is `'n'`, the loop skips the rest of the code in the current iteration and proceeds to the next image. However, in this context, `'n'` indicates the user does not select the image, so no action (like copying or moving the image) is taken, and the loop just moves on.

In [None]:
        elif choice.lower() == 'n':
            print(f"→ Skipped {image}")
            time.sleep(0.5)
            continue

If `'y'` is the choice, it moves the images to the `select_dir`, the destination directory.

In [None]:
        if choice.lower() == 'y':
            shutil.move(image_path, os.path.join(select_dir, image))
            print(f"✓ Selected {image}")
            time.sleep(0.5)

    # Return the next starting index
    return end_index

# Module: Main Processing Control Function

The `main` function orchestrates the entire batch processing workflow, handling user interaction between batches and providing clear progress feedback.

In [None]:
def main():
    """Main function that processes images in small batches"""
    current_index = 0
    batch_size = 5  # Small batch size to avoid timeouts

    # Get total number of images
    file_path = folder_path
    if os.path.exists(file_path):
        images = os.listdir(file_path)
        images = [img for img in images if img.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif', '.tiff'))]
        total_images = len(images)
    else:
        print("No images directory found!")
        return

    while current_index < total_images:
        current_index = process_images_batch(current_index, batch_size)

        if current_index < total_images:
            # Clear output and show status
            clear_output(wait=True)
            print(f"\n{'='*60}")
            print(f"BATCH COMPLETE: Processed {current_index}/{total_images} images")
            print(f"{'='*60}")
            print(f"Remaining images: {total_images - current_index}")
            print("\nWould you like to continue to the next batch?")
            sys.stdout.flush()
            time.sleep(0.2)  # Give time for display to settle

            try:
                continue_choice = input("Enter 'y' to continue or 'n' to stop: ")
                if continue_choice.lower() != 'y':
                    break
                else:
                    print("Loading next batch...")
                    time.sleep(0.5)  # Brief pause before next batch
            except KeyboardInterrupt:
                print("\nStopped by user")
                break

    clear_output(wait=True)
    print("="*60)
    print("PROCESSING COMPLETE!")
    print("="*60)
    print(f"Total images processed: {current_index}/{total_images}")
    print(f"Selected images are in: ./selected-images/")

# Module: Alternative Processing Methods

The script includes additional processing methods for different use cases. These alternatives provide flexibility for different working environments and user preferences.

Single Image Processing: Use this method when you want to process images one at a time with full control over each step. This is ideal for careful review or when you need to pause frequently. The `process_single_image` function processes one image at a time based on a provided index, allowing for precise control over image review. It displays the specified image, presents user options for selection/deletion/skipping, and returns the next image index for sequential processing.

How to use:

In [None]:
# Start with the first image (index 0)
current_image = 0

# Process one image and get the next index
current_image = process_single_image(current_image)

# Continue processing the next image
current_image = process_single_image(current_image)

# Or use in a loop to process multiple images
for i in range(10):  # Process 10 images
    current_image = process_single_image(current_image)
    if current_image >= total_images:  # Stop if no more images
        break

Terminal Processing: Use this method in environments where Jupyter image display doesn't work properly, or when running the script outside of a notebook environment. The `process_images_terminal` function provides a complete image processing workflow for non-Jupyter environments where inline image display is unavailable. Instead of showing images directly, it displays image information as text and processes all images sequentially with the same selection options and visual feedback symbols.

How to use:

In [None]:
# Call the terminal processing function
process_images_terminal()

# This will:
# - List all images with their paths
# - Process them sequentially without inline display
# - Show the same selection options (y/n/d/q)
# - Provide text-based feedback for each action

Below is the portion of the script that facilitates alternate processing methods.

In [None]:
def process_single_image(image_index=0):
    """Process a single image by index - useful for step-by-step processing"""
    file_path = folder_path
    select_dir = './selected-images'

    if not os.path.exists(select_dir):
        os.makedirs(select_dir)

    images = os.listdir(file_path)
    images = [img for img in images if img.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif', '.tiff'))]
    images.sort(key=lambda x: os.path.getmtime(os.path.join(file_path, x)))

    if image_index >= len(images):
        print(f"No more images! (Total: {len(images)})")
        return image_index

    image = images[image_index]
    image_path = os.path.join(file_path, image)

    # Clear output and show image
    clear_output(wait=True)
    show_image_with_filename(image_path, image_index + 1)

    # Add delay and clear prompt
    time.sleep(0.1)
    print("\n" + "="*50)
    print("OPTIONS: y/ENTER=select, n=skip, d=delete")
    print("="*50)
    sys.stdout.flush()

    choice = input("Your choice: ")

    if choice.strip() == '':
        choice = 'y'

    if choice.lower() == 'd':
        os.remove(image_path)
        print(f"Deleted {image}")
    elif choice.lower() == 'n':
        print(f"Skipped {image}")
    elif choice.lower() == 'y':
        shutil.move(image_path, os.path.join(select_dir, image))
        print(f"Selected {image}")

    return image_index + 1

# Alternative function for non-Jupyter environments
def process_images_terminal():
    """Process images for terminal/non-Jupyter environments"""
    file_path = folder_path
    select_dir = './selected-images'

    if not os.path.exists(select_dir):
        os.makedirs(select_dir)

    images = os.listdir(file_path)
    images = [img for img in images if img.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif', '.tiff'))]
    images.sort(key=lambda x: os.path.getmtime(os.path.join(file_path, x)))

    print(f"Found {len(images)} images to process")
    print("Note: Images will be displayed externally")

    for i, image in enumerate(images):
        image_path = os.path.join(file_path, image)
        print(f"\n{'='*50}")
        print(f"Image {i+1}/{len(images)}: {image}")
        print(f"Full path: {image_path}")
        print("OPTIONS: y/ENTER=select, n=skip, d=delete, q=quit")

        choice = input("Your choice: ").strip().lower()

        if choice == '' or choice == 'y':
            shutil.move(image_path, os.path.join(select_dir, image))
            print(f"✓ Selected {image}")
        elif choice == 'n':
            print(f"→ Skipped {image}")
        elif choice == 'd':
            os.remove(image_path)
            print(f"✗ Deleted {image}")
        elif choice == 'q':
            print("Quitting...")
            break

    print("\nProcessing complete!")

# Module: Trigger the Main Function

This is executed only if the file was run as a script, not imported as a module. It essentially calls the `main()` function, which presumably contains the rest of the script, including setting up variables like `batch_size`, initializing `current_batch`, and looping through the images.

In [10]:
# Execute the complete photo selection tool
if __name__ == '__main__':
    main()

PROCESSING COMPLETE!
Total images processed: 0/95
Selected images are in: ./selected-images/


This is what the functioning code looks like in its entirety:

In [9]:
## Import python packages
from IPython.display import Image, display, clear_output
import os
import shutil
import sys
import time

## Show images in coding surface
def show_image_with_filename(image_path, image_number):
    display(Image(filename=image_path, width=800))  ### adjust image size to suit for screen
    print(f"{image_number}. Image Filename: {os.path.basename(image_path)}")
    sys.stdout.flush()  # Force output to display

# Set your image directory path
# Borealis API configuration
import requests
import zipfile

BOREALIS_SERVER = "https://borealisdata.ca"

def get_public_dataset_info(persistent_id):
    """
    Get information about a public dataset
    """
    url = f"{BOREALIS_SERVER}/api/datasets/:persistentId/"
    params = {"persistentId": persistent_id}

    response = requests.get(url, params=params)

    if response.status_code == 200:
        dataset_info = response.json()
    else:
        print(f"Cannot access dataset: {response.status_code}")
        return None
    """
    Get a list of files in a public dataset
    """
    # Access the list of files from the dataset_info dictionary
    files_list = dataset_info['data']['latestVersion']['files']

    # Create an empty list to store file information
    file_info_list = []

    # Iterate through the files list and append file ID and filename to the list
    for file_info in files_list:
        file_id = file_info['dataFile']['id']
        filename = file_info['dataFile']['filename']
        file_info_list.append({"file_id": file_id, "filename": filename})

    return file_info_list

def download_public_file(file_id, save_path="./"):
    """
    Download a specific public file from a dataset by its file ID
    No authentication required
    """
    url = f"{BOREALIS_SERVER}/api/access/datafile/{file_id}"

    response = requests.get(url, stream=True)

    if response.status_code == 200:
        # Determine filename from headers or URL
        filename = None
        if "Content-Disposition" in response.headers:
            cd = response.headers["Content-Disposition"]
            # Try to extract filename from content disposition
            if "filename=" in cd:
                filename = cd.split("filename=")[1].strip('"')

        # Fallback to extracting from URL if header not available or malformed
        if not filename:
             filename = url.split("/")[-1]

        file_path = f"{save_path}/{filename}"

        with open(file_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

        print(f"SUCCESS: File downloaded to {file_path}")
        return file_path
    else:
        print(f"ERROR: {response.status_code}: File may be restricted or not found")
        return None

def is_zip_file(filepath):
    """
    Checks if a file is a valid zip file.
    """
    return zipfile.is_zipfile(filepath)

def unzip_file(filepath, extract_path="./"):
    """
    Unzips a zip file to a specified path and returns the name of the top-level extracted folder.
    Returns None if not a zip file or extraction fails.
    """
    if is_zip_file(filepath):
        try:
            with zipfile.ZipFile(filepath, 'r') as zip_ref:
                # Get the name of the top-level directory within the zip
                # Assumes there is a single top-level directory
                top_level_folder = None
                for file_info in zip_ref.infolist():
                    parts = file_info.filename.split('/')
                    if parts[0] and len(parts) > 1:
                        top_level_folder = parts[0]
                        break # Assuming the first entry gives the top-level folder

                zip_ref.extractall(extract_path)
                print(f"SUCCESS: Successfully unzipped {filepath} to {extract_path}")
                return top_level_folder

        except Exception as e:
            print(f"ERROR: Error unzipping {filepath}: {e}")
            return None
    else:
        print(f"INFO: {filepath} is not a valid zip file.")
        return None

# Initialize Borealis dataset access
public_doi = "doi:10.5683/SP3/H3HGWF"
print("Borealis dataset initialized for animal notebook data.")

# Download and extract the dataset first
print("Getting dataset information...")
file_list = get_public_dataset_info(public_doi)

if file_list:
    print(f"Found {len(file_list)} files in the dataset")
    # Download the first file (assuming it's a zip file with images)
    first_file = file_list[0]
    print(f"Downloading file: {first_file['filename']}")

    downloaded_file = download_public_file(first_file['file_id'])

    if downloaded_file:
        # Try to unzip if it's a zip file
        extracted_folder = unzip_file(downloaded_file)
        if extracted_folder:
            initial_folder = f"./{extracted_folder}"
            print(f"Images extracted to: {initial_folder}")

            # Check if there's a subfolder that contains the images
            if os.path.exists(initial_folder):
                contents = os.listdir(initial_folder)
                subfolders = [f for f in contents if os.path.isdir(os.path.join(initial_folder, f))]

                if subfolders and len(contents) == 1:
                    # If there's only one item and it's a folder, look inside it
                    actual_folder = os.path.join(initial_folder, subfolders[0])
                    print(f"Found image subfolder: {actual_folder}")
                    folder_path = actual_folder
                else:
                    folder_path = initial_folder
            else:
                folder_path = initial_folder
        else:
            # If not a zip file, use current directory
            folder_path = "./"
    else:
        print("Failed to download file")
        folder_path = "./"
else:
    print("Failed to get dataset information")
    folder_path = "./"

## Set path
def process_images_batch(start_index=0, batch_size=5):
    """Process a small batch of images starting from start_index"""
    # Use the automatically determined folder path
    file_path = folder_path
    select_dir = './selected-images'

    if not os.path.exists(select_dir):
        os.makedirs(select_dir)

    # Check if the directory exists and has images
    if not os.path.exists(file_path):
        print(f"Error: Directory {file_path} does not exist!")
        return start_index

    images = os.listdir(file_path)
    # Filter to only include image files
    images = [img for img in images if img.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif', '.tiff'))]

    if not images:
        print(f"No image files found in {file_path}")
        return start_index

    images.sort(key=lambda x: os.path.getmtime(os.path.join(file_path, x)))
    total_images = len(images)

    if start_index == 0:
        clear_output(wait=True)
        print(f"Found {total_images} images to process")
        print(f"Processing in batches of {batch_size} images")
        time.sleep(1)  # Give user time to see initial message

    # Process this batch
    end_index = min(start_index + batch_size, total_images)
    batch_images = images[start_index:end_index]

    print(f"\n=== Starting batch: images {start_index + 1} to {end_index} ===")
    time.sleep(0.5)

    for i, image in enumerate(batch_images):
        actual_index = start_index + i
        image_path = os.path.join(file_path, image)
        image_number = actual_index + 1

        # Clear previous output and show image
        clear_output(wait=True)
        print(f"BATCH PROGRESS: Image {i+1}/{len(batch_images)} in current batch")
        print(f"OVERALL PROGRESS: Image {image_number}/{total_images} total\n")

        show_image_with_filename(image_path, image_number)

        # Add a small delay to ensure display completes
        time.sleep(0.2)

        # Display the prompt clearly
        print("\n" + "="*50)
        print("OPTIONS:")
        print("  y or ENTER = Select this image")
        print("  n = Skip this image")
        print("  d = Delete this image")
        print("  q = Quit")
        print("="*50)
        sys.stdout.flush()

        # Add another small delay before input
        time.sleep(0.1)

        try:
            choice = input("Your choice: ")
        except KeyboardInterrupt:
            print("\nStopped by user")
            return actual_index
        except Exception as e:
            print(f"Input error: {e}")
            print("Trying again...")
            time.sleep(0.5)
            choice = input("Your choice: ")

        ### press ENTER=y
        if choice.strip() == '':
            choice = 'y'
        elif choice.lower() == 'q':
            print("Quitting...")
            return total_images  # Return total to exit main loop
        elif choice.lower() == 'd':
            os.remove(image_path)
            print(f"✗ Deleted {image}")
            time.sleep(0.5)
            continue
        elif choice.lower() == 'n':
            print(f"→ Skipped {image}")
            time.sleep(0.5)
            continue

        if choice.lower() == 'y':
            shutil.move(image_path, os.path.join(select_dir, image))
            print(f"✓ Selected {image}")
            time.sleep(0.5)

    # Return the next starting index
    return end_index

def main():
    """Main function that processes images in small batches"""
    current_index = 0
    batch_size = 5  # Small batch size to avoid timeouts

    # Get total number of images
    file_path = folder_path
    if os.path.exists(file_path):
        images = os.listdir(file_path)
        images = [img for img in images if img.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif', '.tiff'))]
        total_images = len(images)
    else:
        print("No images directory found!")
        return

    while current_index < total_images:
        current_index = process_images_batch(current_index, batch_size)

        if current_index < total_images:
            # Clear output and show status
            clear_output(wait=True)
            print(f"\n{'='*60}")
            print(f"BATCH COMPLETE: Processed {current_index}/{total_images} images")
            print(f"{'='*60}")
            print(f"Remaining images: {total_images - current_index}")
            print("\nWould you like to continue to the next batch?")
            sys.stdout.flush()
            time.sleep(0.2)  # Give time for display to settle

            try:
                continue_choice = input("Enter 'y' to continue or 'n' to stop: ")
                if continue_choice.lower() != 'y':
                    break
                else:
                    print("Loading next batch...")
                    time.sleep(0.5)  # Brief pause before next batch
            except KeyboardInterrupt:
                print("\nStopped by user")
                break

    clear_output(wait=True)
    print("="*60)
    print("PROCESSING COMPLETE!")
    print("="*60)
    print(f"Total images processed: {current_index}/{total_images}")
    print(f"Selected images are in: ./selected-images/")

# Alternative: Process just one image at a time
def process_single_image(image_index=0):
    """Process a single image by index - useful for step-by-step processing"""
    file_path = folder_path
    select_dir = './selected-images'

    if not os.path.exists(select_dir):
        os.makedirs(select_dir)

    images = os.listdir(file_path)
    images = [img for img in images if img.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif', '.tiff'))]
    images.sort(key=lambda x: os.path.getmtime(os.path.join(file_path, x)))

    if image_index >= len(images):
        print(f"No more images! (Total: {len(images)})")
        return image_index

    image = images[image_index]
    image_path = os.path.join(file_path, image)

    # Clear output and show image
    clear_output(wait=True)
    show_image_with_filename(image_path, image_index + 1)

    # Add delay and clear prompt
    time.sleep(0.1)
    print("\n" + "="*50)
    print("OPTIONS: y/ENTER=select, n=skip, d=delete")
    print("="*50)
    sys.stdout.flush()

    choice = input("Your choice: ")

    if choice.strip() == '':
        choice = 'y'

    if choice.lower() == 'd':
        os.remove(image_path)
        print(f"Deleted {image}")
    elif choice.lower() == 'n':
        print(f"Skipped {image}")
    elif choice.lower() == 'y':
        shutil.move(image_path, os.path.join(select_dir, image))
        print(f"Selected {image}")

    return image_index + 1

# Alternative function for non-Jupyter environments
def process_images_terminal():
    """Process images for terminal/non-Jupyter environments"""
    file_path = folder_path
    select_dir = './selected-images'

    if not os.path.exists(select_dir):
        os.makedirs(select_dir)

    images = os.listdir(file_path)
    images = [img for img in images if img.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.gif', '.tiff'))]
    images.sort(key=lambda x: os.path.getmtime(os.path.join(file_path, x)))

    print(f"Found {len(images)} images to process")
    print("Note: Images will be displayed externally")

    for i, image in enumerate(images):
        image_path = os.path.join(file_path, image)
        print(f"\n{'='*50}")
        print(f"Image {i+1}/{len(images)}: {image}")
        print(f"Full path: {image_path}")
        print("OPTIONS: y/ENTER=select, n=skip, d=delete, q=quit")

        choice = input("Your choice: ").strip().lower()

        if choice == '' or choice == 'y':
            shutil.move(image_path, os.path.join(select_dir, image))
            print(f"✓ Selected {image}")
        elif choice == 'n':
            print(f"→ Skipped {image}")
        elif choice == 'd':
            os.remove(image_path)
            print(f"✗ Deleted {image}")
        elif choice == 'q':
            print("Quitting...")
            break

    print("\nProcessing complete!")

if __name__ == '__main__':
    main()

PROCESSING COMPLETE!
Total images processed: 95/100
Selected images are in: ./selected-images/
