First we can load the scale image and calculate a conversion factor between pixels and standard units.

We can first take a look at a few data points at the beginning and end of the experiment (or plant death, for those which didn't make it).

- average hue
- growth point count

We can also take a look at a number of timeseries, for each treatment as well as for individual plants of interest:

- total area
- area per hue bin
- area per RGB cluster
- pitcher count
- average pitcher area

Import dependencies.

In [None]:
from collections import Counter
from collections import OrderedDict
from os.path import join

import pandas as pd
from matplotlib import pyplot as plt
from scipy.cluster.vq import kmeans2

from pytcherplants.plotting import plot_hex_distribution, plot_rgb_distribution, plot_hue_distribution
from pytcherplants.utils import hue_to_rgb_formatted, row_date, row_treatment, row_name

Read in the data from CSV and groom it for easier analysis.

Below we assume 1 image per plant, with filename following the format `[date][treatment][name].[extension]`. The date must be in format `%M-%D-%Y`. Extensions `.jpg`, `.jpeg`, `.png`, `.tif`, and `.tiff` (case-insensitive) are supported.

In [None]:
df = pd.read_csv('data_sarracenia/tabular/masked.colors.csv')

# extract date, treatment and name from image name
df['Date'] = df.apply(row_date, axis=1)
df['Treatment'] = df.apply(row_treatment, axis=1)
df['Name'] = df.apply(row_name, axis=1)

# drop rows with unknowns (malformed filename format)
df.dropna(how='any', inplace=True)

# format HSV columns, convert to [1, 360] scale, create hue bins
divisor = 5  # (72 equally spaced from 5 to 355)
ranges = [((k * divisor) + divisor) for k in range(0, int(360 / divisor))]
hsv_subset = df[['H', 'S', 'V']].astype(float)
hsv_subset['HH'] = hsv_subset.apply(lambda row: int(float(row['H']) * 360), axis=1)  # [1, 360] scale
hsv_subset['Bin'] = hsv_subset.apply(lambda row: int(row['HH']) - (int(row['HH']) % divisor), axis=1)

First we'll look at aggregations over the entire dataset.

In [None]:
title = "Overall"
output_directory = "output_analysis"

Subset the RGB columns, run k-means clustering in RGB-space, and compute proportions of each cluster.

In [None]:
rgb_subset = df[['R', 'G', 'B']].astype(float).values.tolist()
rgb_centers, rgb_labels = kmeans2(rgb_subset, 25)
rgb_counts = dict(Counter(rgb_labels))
rgb_counts = {(abs(int(float(c[0]) * 256)), abs(int(float(c[1]) * 256)), abs(int(float(c[2]) * 256))): rgb_counts[l] for c, l in zip(rgb_centers, rgb_labels)}
rgb_total = sum(rgb_counts.values())
rgb_props = {k: (v / rgb_total) for k, v in rgb_counts.items()}

Divide hue into 72 equally spaced bins and compute proportions per bin.

In [None]:
hsv_counts = Counter(hsv_subset['Bin'])
for key in [k for k in ranges if k not in list(hsv_counts.keys())]: hsv_counts[key] = 0  # pad zeroes
for key in [k for k in ranges if 125 < k < 360]: hsv_counts[key] = 0  # remove outliers (non red/green)
hsv_total = sum(hsv_counts.values())
hsv_props = OrderedDict(sorted({k: float(v / hsv_total) for k, v in hsv_counts.items()}.items()))

Visualize a histogram of color clusters coded in hexadecimal.

In [None]:
plot_hex_distribution(rgb_props, f"{title} hex distribution")
plt.xticks(rotation=60)
plt.legend().remove()
plt.savefig(join(output_directory, f"{title}.hex.png"))
plt.clf()

Visualize RGB clusters in 3D RGB-space, from a few different perspectives.

In [None]:
fig = plot_rgb_distribution(rgb_props, f"{title} RGB distribution")
camera = dict(eye=dict(x=2.5, y=0, z=0))  # rotate to x axis
fig.update_layout(scene_camera=camera, title='eye = (x:2.5., y:0, z:0.)')
fig.write_image(join(output_directory, title + '.rgb.x.png'))
camera = dict(eye=dict(x=0, y=2.5, z=0))  # rotate to y axis
fig.update_layout(scene_camera=camera, title='eye = (x:0., y:2.5, z:0.)')
fig.write_image(join(output_directory, title + '.rgb.y.png'))
camera = dict(eye=dict(x=0, y=0, z=2.5))  # rotate to z axis
fig.update_layout(scene_camera=camera, title='eye = (x:0., y:0, z:2.5.)')
fig.write_image(join(output_directory, title + '.rgb.z.png'))

Visualize hue distribution as a radial bar plot.

In [None]:
# radial bar plot for color distribution
fig = plot_hue_distribution(hsv_props, f"{title} hue distribution")
fig.write_image(join(output_directory, title + '.hue.png'))

TODO: repeat the analysis separately for each combination of treatment and date (maybe also for a few plants individually).

## Overview

Image preprocessing steps are applied including Gaussian blur and an adaptive threshold, followed by contour detection and an optional hue filter. For *sarracenia* we exclude blues & purples to remove. K-means clustering is then used to average the image, assigning each pixel to its nearest centroid. Averaged pixels are counted, then grouped and analyzed by plant, timestamp, and fertilizer treatment.