<a target="_blank" href="https://colab.research.google.com/github/umanitoba-meagher-projects/public-experiments/blob/main/jupyter-notebooks/Visualize%20Image%20Information/random-interactive-image.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

In [None]:
"""
Author: Ryleigh J. Bruce
Date: June 4, 2024

Purpose: Selecting a random image from a folder and generating an interactive visualization.


Note: The author generated this text in part with GPT-4,
OpenAI’s large-scale language-generation model. Upon generating
draft code, the authors reviewed, edited, and revised the code
to their own liking and takes ultimate responsibility for
the content of this code.

"""

# Introduction

This notebook provides a workflow for selecting a random image from a specified directory and generating an interactive visualization of that image using Python. The primary purpose is to quickly sample and review images from large collections, supporting tasks such as quality control, dataset exploration, and visual inspection.

# Critical Uses & Adaptability

## What the Notebook Can Be Used For:

- Randomly sample and visually inspect images from large datasets, supporting initial exploration and identification of dataset characteristics. The random selection process is handled in the function that lists and selects files using the `random.choice` method.

- Learn about Python scripting, file operations, and interactive visualization.
  
- Automate repetitive tasks in working with image data and visualization.

## How the Notebook Can Be Adapted:

- Substitute the image dataset by changing the DOI in the `setup_dataset` function to access different Borealis datasets, or modify the script to work with local image collections.

- Customize supported file extensions and visualization details in the `interactive_visualization` and `find_images_directory` functions to tailor the workflow to specific project requirements.

## Examples of Variables You Can Change:

- **Dataset DOI**:  
  In the Borealis Data Repository Integration module, change the `public_doi` variable in `setup_dataset()` to access different datasets from Borealis.
  - Example:  
    public_doi = "doi:10.5683/SP3/YOUR_DATASET_ID"
- **Supported Image Extensions**:  
  In the `interactive_visualization()` and `find_images_directory()` functions, modify the file extension list ('.png', '.jpg', '.jpeg', '.bmp', '.tiff', '.gif') to include other formats if needed.
- **Visualization Details**:  
  You can customize the Plotly figure in the `interactive_visualization` function, such as changing the title or adding more layout options.

## Ideas for Spatial Design Tasks

- **Dataset Exploration**:  
  Quickly preview random samples from large spatial datasets (e.g., aerial imagery, site photos) to assess data quality or variety.
- **Design Precedent Review**:  
  Randomly display images from a collection of precedent studies or reference projects to inspire design discussions.
- **Material/Texture Library Browsing**:  
  Use the notebook to randomly sample from a folder of material or texture images for use in renderings or visualizations.
- **Site Analysis**:  
  Visualize random site photos or drone images to support site inventory and analysis tasks.
- **Presentation Preparation**:  
  Select random images for inclusion in presentations, ensuring a diverse and unbiased selection from your dataset.

*Tip: To see a new random image, simply re-run the final cell that calls the complete workflow, or call `interactive_visualization(dataset_path)` directly if the dataset is already set up.

## Module: Borealis Data Repository Integration

This module sets up the connection to Borealis Data Repository and defines all functions for downloading, extracting, and locating image datasets. The Borealis API allows access to public research datasets without authentication. The data is hosted in the University of Manitoba Dataverse (https://borealisdata.ca/dataverse/manitoba), a research data repository. The images used in this notebook were collected as part of the 'Understanding Animals' project at University of Manitoba Faculty of Architecture, online at Wild Winnipeg and Teaching with Images.

In [None]:
import requests
import zipfile

BOREALIS_SERVER = "https://borealisdata.ca"

def get_public_dataset_info(persistent_id):
    """
    Get information about a public dataset
    """
    url = f"{BOREALIS_SERVER}/api/datasets/:persistentId/"
    params = {"persistentId": persistent_id}

    response = requests.get(url, params=params)

    if response.status_code == 200:
        dataset_info = response.json()
    else:
        print(f"Cannot access dataset: {response.status_code}")
        return None
    """
    Get a list of files in a public dataset
    """
    # Access the list of files from the dataset_info dictionary
    files_list = dataset_info['data']['latestVersion']['files']

    # Create an empty list to store file information
    file_info_list = []

    # Iterate through the files list and append file ID and filename to the list
    for file_info in files_list:
        file_id = file_info['dataFile']['id']
        filename = file_info['dataFile']['filename']
        file_info_list.append({"file_id": file_id, "filename": filename})

    return file_info_list

def download_public_file(file_id, save_path="./"):
    """
    Download a specific public file from a dataset by its file ID
    No authentication required
    """
    url = f"{BOREALIS_SERVER}/api/access/datafile/{file_id}"

    response = requests.get(url, stream=True)

    if response.status_code == 200:
        # Determine filename from headers or URL
        filename = None
        if "Content-Disposition" in response.headers:
            cd = response.headers["Content-Disposition"]
            # Try to extract filename from content disposition
            if "filename=" in cd:
                filename = cd.split("filename=")[1].strip('"')

        # Fallback to extracting from URL if header not available or malformed
        if not filename:
             filename = url.split("/")[-1]

        file_path = f"{save_path}/{filename}"

        with open(file_path, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)

        print(f"SUCCESS: File downloaded to {file_path}")
        return file_path
    else:
        print(f"ERROR: {response.status_code}: File may be restricted or not found")
        return None

def is_zip_file(filepath):
    """
    Checks if a file is a valid zip file.
    """
    return zipfile.is_zipfile(filepath)

def unzip_file(filepath, extract_path="./"):
    """
    Unzips a zip file to a specified path and returns the name of the top-level extracted folder.
    Returns None if not a zip file or extraction fails.
    """
    if is_zip_file(filepath):
        try:
            with zipfile.ZipFile(filepath, 'r') as zip_ref:
                # Get the name of the top-level directory within the zip
                # Assumes there is a single top-level directory
                top_level_folder = None
                for file_info in zip_ref.infolist():
                    parts = file_info.filename.split('/')
                    if parts[0] and len(parts) > 1:
                        top_level_folder = parts[0]
                        break # Assuming the first entry gives the top-level folder

                zip_ref.extractall(extract_path)
                print(f"SUCCESS: Successfully unzipped {filepath} to {extract_path}")
                return top_level_folder

        except Exception as e:
            print(f"ERROR: Error unzipping {filepath}: {e}")
            return None
    else:
        print(f"INFO: {filepath} is not a valid zip file.")
        return None

def find_images_directory(base_path):
    """
    Recursively find the directory containing images
    """
    for root, dirs, files in os.walk(base_path):
        # Check if this directory contains image files
        image_files = [f for f in files if f.lower().endswith(('.png', '.jpg', '.jpeg', '.bmp', '.tiff', '.gif'))]
        if image_files:
            print(f"Found {len(image_files)} images in: {root}")
            return root
    return None

def setup_dataset():
    """
    Download and extract the dataset if it doesn't exist
    """
    # Initialize Borealis dataset access
    public_doi = "doi:10.5683/SP3/H3HGWF"
    print("Getting dataset information...")

    # Get dataset file information
    file_info_list = get_public_dataset_info(public_doi)

    if not file_info_list:
        print("Failed to get dataset information.")
        return None

    print(f"Found {len(file_info_list)} files in dataset.")

    # Find and download the deer_100.zip file (or similar)
    deer_file = None
    for file_info in file_info_list:
        if 'deer' in file_info['filename'].lower() and file_info['filename'].endswith('.zip'):
            deer_file = file_info
            break

    if not deer_file:
        print("Could not find deer dataset file. Available files:")
        for file_info in file_info_list:
            print(f"  - {file_info['filename']}")
        return None

    print(f"Downloading {deer_file['filename']}...")
    downloaded_file = download_public_file(deer_file['file_id'])

    if not downloaded_file:
        print("Failed to download dataset.")
        return None

    # Extract the zip file
    print("Extracting dataset...")
    extracted_folder = unzip_file(downloaded_file)

    if extracted_folder:
        base_path = f'./{extracted_folder}/'
        print(f"Extracted to: {base_path}")

        # Find the actual directory containing images
        images_directory = find_images_directory(base_path)

        if images_directory:
            print(f"Images found in: {images_directory}")
            return images_directory
        else:
            print("No images found in extracted dataset.")
            print("Directory structure:")
            for root, dirs, files in os.walk(base_path):
                level = root.replace(base_path, '').count(os.sep)
                indent = ' ' * 2 * level
                print(f"{indent}{os.path.basename(root)}/")
                subindent = ' ' * 2 * (level + 1)
                for file in files[:5]:  # Show first 5 files
                    print(f"{subindent}{file}")
                if len(files) > 5:
                    print(f"{subindent}... and {len(files) - 5} more files")
            return None
    else:
        print("Failed to extract dataset.")
        return None

# Initialize Borealis dataset access
public_doi = "doi:10.5683/SP3/H3HGWF"
print("Borealis dataset initialized for animal notebook data.")

Borealis dataset initialized for animal notebook data.


In the following code block, the Plotly Express, os, PIL, NumPy, and random packages are imported. These are critical for image processing and generating more complex visualizations.

In [None]:
import plotly.express as px
import os
from PIL import Image
import numpy as np
import random

## Module: Loading the Image

This code uses the Image module from the PIL library imported previously. The ‘with’ block ensures that the file is properly closed after it is no longer needed, preventing resource leakage and other associated issues.

Within the ‘with’ block the image is converted to a three-dimensional array which allows for further image processing.

In [None]:
def load_image(image_path):
    """Load an image from a file path."""
    with Image.open(image_path) as img:
        return np.array(img)

## Module: Selecting a Random Image for Display

This code block defines the `interactive_visualization` function, which takes the `directory` parameter. This parameter is defined in later code as the path to the folder containing the desired images.

A list called `image_files` is created, and the `os` module is used to obtain a list of all the files in the given directory. Only files that end in common image file extensions are included. The final list contains the names of all the image files in the directory in string format. If the `image_files` list is empty (meaning there are no files in the directory ending in the specified file extensions) then the phrase “No images found in the directory.” is returned.

An image file is selected to display at random using the `random` module imported previously. The image is then displayed using Plotly Express (abbreviated here as `px`). The `title_text` ensures that the title includes the file name of the randomly selected image file. The final line, `fig.show()`, displays the image within the notebook.

In [None]:
def interactive_visualization(directory):
    """
    Display a random image from the specified directory
    """
    # Check if directory exists
    if not os.path.exists(directory):
        print(f"Error: Directory '{directory}' does not exist.")
        return

    # Getting the list of image file names that ends with the specified extensions
    image_files = [file for file in os.listdir(directory) if file.endswith(('.png', '.jpg', '.jpeg', '.bmp', '.tiff', '.gif'))]

    if not image_files:
        print(f"No images found in the directory '{directory}'.")
        print(f"Available files: {os.listdir(directory)}")
        return

    # Select a random image from the list of image files
    random_image_file = random.choice(image_files)
    print(f"Selected image: {random_image_file}")

    # Load the random image
    image_path = os.path.join(directory, random_image_file)
    image = load_image(image_path)

    # Display the image using Plotly
    fig = px.imshow(image)
    fig.update_layout(title_text=f'Randomly Selected Image: {random_image_file}')
    fig.show()

This cell runs the complete workflow by setting up the dataset (downloading and extracting from Borealis), automatically discovering the image directory, and running the interactive visualization.

In [None]:
print("Setting up dataset...")
dataset_path = setup_dataset()

if dataset_path:
    print(f"Dataset ready. Running interactive visualization...")
    interactive_visualization(dataset_path)
else:
    print("Could not set up dataset. Please check your connection and try again.")

Output hidden; open in https://colab.research.google.com to view.