# STATS 604: Ripening Fruit
## Team Members: Alex Kagan, Daniele Bracale, Josh Wasserman, Xinhe Wang, and Yash Patel
We present herein the findings from our explorations on the ripening of bananas. In this report, we present some our experimental design along with the raw data processing pipeline, statistical analyses, and conclusions. The report is, therefore, broken into the following sections:

- Experimental Design
- Data processing
- Permutation test analyses
- Conclusion

Each section is annotated with the corresponding code and figures in the sections below along with associated exposition.

## Introduction

Fruits in general ripen through the conversion of starch to sugars. Ideally, we’d be able to measure this conversion process chemically, but we don’t have access to refractometers. We can define the "true" ripeness as being this ratio of sugars to starches. So, we use a number of proxies that correlate strongly with this ripeness.

Starch, a complex carbohydrate, is broken into simple sugars through a reaction with ethylene. However, this release of ethylene also results in a number of other reactions that are clearly visible. Specifically, ethylene additionally breaks down pectin, which is responsible for the structural integrity of the banana. This is why bananas also grow softer as they ripen. Similarly, the release of ethylene results in the breakdown of chlorophyll, which then gives rise to the characteristic yellow and eventual browning color of bananas. This is surprisingly the same process that happens with leaves, although not all the same pigments are present in banana skins, which is why there is only really a progression from yellow to brown.

So, we want to conceptually measure:

Ripeness measurements
- Color
- Firmness
- Sweetness

## Experimental measures
**Color**: The color of bananas characteristically changes in two ways in ripening: becoming more yellow and subsequently developing brown spots. We will, therefore, have two measures towards this end. For both, take a picture of the banana on a white piece of paper. Place a "registration" grey card (printed) to serve as a fixed normalization point for white balance in post-processing. In post-processing, after normalizing colors, segment the banana in the image. Run a connected components segmentation analysis to find the "brown spots" and compute the area as the number of pixels (as a proportion to the total area). For the remainder of the banana, compute the average RGB. (Idealized) green in RGB is (0, 255, 0), whereas (idealized) yellow is (255, 255, 0), so the average red channel should serve as a rough proxy for the ripeness (w.r.t. color). Part of the reason for this bifurcated analysis is to account for the fact brown does not have a greater red value than yellow, meaning there is an inflection in the red channel with respect to ripeness which would skew results that were the average run pre-segmentation.

**Firmness**: Unpeel the banana and cut off a 2" segment from the middle. Place the banana piece on a white piece of paper and then place a fixed weight atop it for a fixed period of time (20 seconds). We will take a picture before and after exposure to the weight and see the percentage change in the resulting number of pixels as a measure of how much the banana got "squished."

## Raw Data
Bananas were exposed to different "accompanying" fruits to observe their effects on the ripening process, with data collected in the following fashion:

1. Take a picture of each banana  as described in the "color" portion of the experimental measures section
2. Randomly place N/3 bananas in each of three boxes
3. Place nothing additional in one box, 3 apples in another, and 3 tomatoes in the last
4. Leave these boxes for 5 days, taking a picture each day as described in the "color" portion of the experimental measures section
5. Measure the above listed "experimental measures" on the final day

The raw data are available in the corresponding Day_* folder.

## Squish Test
To perform the analysis, there are two stages of the processing:
- Segment bananas out from the remainder of the image
- Count the masked pixel area of this segmented section

We perform the first by running an edge detector algorithm followed by a watershed algorithm to extract out the complete mask of the object. Note that there are some nuances about how this edge detection is implemented: a vanilla Canny edge detector fails to work because of small issues with leaving "gaps" in the object boundary. 

The final "squish ratio" is defined as:

$$ \frac{\%pixels_{after}}{\%pixels_{before}} $$

In [1]:
import cv2
import os
import numpy as np
from matplotlib import pyplot as plt
import pandas as pd

In [2]:
def get_mask(hed, seed_point=(300, 250)):
    t = 0
    mask = hed.copy()
    mask[mask == t] = 0
    mask[mask > t] = 255

    mask[:, :200] = 0
    mask[:, 400:] = 0
    mask[:200, :] = 0
    mask[350:, :] = 0
    
    kernel = np.ones((5, 5), np.uint8)
    img_dilation = cv2.dilate(mask, kernel, iterations=10)
    mask = cv2.erode(img_dilation, kernel, iterations=10)

    cv2.floodFill(mask, None, seedPoint=seed_point, newVal=255)
    return mask

In [3]:
import urllib.request
import matplotlib.pyplot as plt

url = "https://github.com/ashukid/hed-edge-detector/raw/master/hed_pretrained_bsds.caffemodel"
urllib.request.urlretrieve(url, "hed_pretrained_bsds.caffemodel")

('hed_pretrained_bsds.caffemodel', <http.client.HTTPMessage at 0x7fa9f190d0a0>)

In [5]:
special_seeds = {
    "before": {
        2: (300, 300),
        3: (225, 275),
        4: (300, 300),
        5: (225, 275),
        7: (325, 275),
        8: (300, 300),
        9: (225, 275),
        10: (225, 275),
        11: (300, 300),
        15: (225, 225),
        20: (300, 300),
        29: (300, 300),
        36: (225, 275),
    },
    "after": {
        3: (300, 300),
        5: (225, 275),
        7: (350, 275),
        8: (275, 275),
        9: (225, 275),
        10: (225, 275),
        17: (325, 300),
        20: (300, 300),
        28: (300, 300),
        29: (300, 300),
    },
}

for folder in ["before", "after"]:
    for banana_idx in range(1, 37):
        root_folder = f"../SquishTest_Data"
        banana_fn = f"banana_{banana_idx}.jpeg"
        img_path = os.path.join(root_folder, folder, banana_fn)
        print(img_path)
        raw = cv2.imread(img_path)

        img = cv2.resize(raw, None, fx=0.15, fy=0.15)
        img = cv2.blur(img, (5,5)) 

        W, H, _ = img.shape
        blob = cv2.dnn.blobFromImage(img, scalefactor=1.0, size=(W, H), swapRB=False, crop=False)
        net = cv2.dnn.readNetFromCaffe("../deploy.prototxt", "hed_pretrained_bsds.caffemodel")

        net.setInput(blob)
        hed = net.forward()

        hed = cv2.resize(hed[0, 0], (W, H))
        hed = (255 * hed).astype("uint8")
        if banana_idx in special_seeds[folder]:
            seed_point = special_seeds[folder][banana_idx]
        else:
            seed_point = (300, 250)
        mask = get_mask(hed, seed_point=seed_point)

        dest = os.path.join(root_folder, "masked", folder, banana_fn)
        print(f"Writing to: {dest}")
        cv2.imwrite(dest, mask)

../SquishTest_Data/before/banana_1.jpeg
Writing to: ../SquishTest_Data/masked/before/banana_1.jpeg
../SquishTest_Data/before/banana_2.jpeg
Writing to: ../SquishTest_Data/masked/before/banana_2.jpeg
../SquishTest_Data/before/banana_3.jpeg
Writing to: ../SquishTest_Data/masked/before/banana_3.jpeg
../SquishTest_Data/before/banana_4.jpeg
Writing to: ../SquishTest_Data/masked/before/banana_4.jpeg
../SquishTest_Data/before/banana_5.jpeg
Writing to: ../SquishTest_Data/masked/before/banana_5.jpeg
../SquishTest_Data/before/banana_6.jpeg
Writing to: ../SquishTest_Data/masked/before/banana_6.jpeg
../SquishTest_Data/before/banana_7.jpeg
Writing to: ../SquishTest_Data/masked/before/banana_7.jpeg
../SquishTest_Data/before/banana_8.jpeg
Writing to: ../SquishTest_Data/masked/before/banana_8.jpeg
../SquishTest_Data/before/banana_9.jpeg
Writing to: ../SquishTest_Data/masked/before/banana_9.jpeg
../SquishTest_Data/before/banana_10.jpeg
Writing to: ../SquishTest_Data/masked/before/banana_10.jpeg
../Squis

In [6]:
squishes = []
for banana_idx in range(1, 37):
    root_folder = f"../SquishTest_Data"
    banana_fn = f"banana_{banana_idx}.jpeg"
    before_mask = cv2.imread(os.path.join(root_folder, "masked", "before", banana_fn), 0) > 0
    after_mask = cv2.imread(os.path.join(root_folder, "masked", "after", banana_fn), 0) > 0

    before_area = np.sum(before_mask)
    after_area = np.sum(after_mask)

    squish = after_area / before_area
    squishes.append(squish)
squishes = np.array(squishes)

index = [f"banana_{banana_idx}" for banana_idx in range(1, 37)]
df = pd.DataFrame(squishes, columns=["squish"], index=index)
df.to_csv("squish.csv")

### Limitations of the experiment design
Our statistical analysis was conducted under several important assumptions about the experiment design, however most of them were likely violated to some extend. Here, we present only the most important of these assumptions together with the reasons why they could be violated.

- Stable Unit Treatment Value Assumption (SUTVA): bananas in the same bag can influence ripening of each other 
- Equal treatment given to the bananas in each bag: apples / tomatoes could be lying closer to some bananas than to others in the bag and thus have a stronger effect on them
 - RGB measurements were not effected by camera angle/ shadow amount/ light intensity: since most of the experiment was done manually, we do not have guarantees that these characteristics maintained constant throughout the whole 5-day period.
- Bananas received equal pressure during squishing: since squishing was done by a person, it is almost impossible to make the pressure uniform.
- No side effects on ripeness when the bananas are taken out of the bag for RGB record: though likely negligible, there can be a possibly varying side effect from the surrounding during the record procedure.

### Further work

In future, we want to try measuring ripeness of a banana by its reaction to iodine, as stated in our pre-analysis. Though it can significantly increase the needed experiment budget, reaction to iodine can be a more robust statistic because the the color change of a banana under iodine addition is much more apparent than the natural change which we try to capture with a camera.

Among the possible improvements for our current experiment pipeline, we suggest:

- Moving the experiment to a room without any sources of natural light (basement, for example) to eliminate the light noise in RGB measurements
- Putting each banana with the treatment fruit/ vegetable in a separate bag to eliminate the effects of bananas on each other and eliminate the key source of possible SUTVA violations. 