<a href="https://colab.research.google.com/github/SzymonNowakowski/Machine-Learning-2024/blob/master/Lab14-putting-it-all-together.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lab 14 - Putting It All Together


### Author: Szymon Nowakowski


# Introduction
---------------


This is our last class this semester. It responds to a request made by several students to explain the **design process** behind a complete neural network architecture.

This is, frankly, a difficult task — because such a process relies heavily on **intuition and experience**, neither of which can be fully explained. They are acquired through *“dupogodziny”* — the hours you spend working, trying, failing, adjusting, and learning.

I once attended a drawing school I liked a lot, and their motto was striking:

**Talent nie istnieje** — *Talent does not exist.*

The school owner used to explain this quite eloquently. Every artist's output quality can be described by their **personal Gaussian distribution**: some works of art are better, some worse — their spread is captured by the **standard deviation**, while their **average quality** is given by the **mean** of the distribution.

Talent is responsible only for the initial placement of the expected value of that distribution — sure, some people naturally produce better work *on average*, when they are young and untrained. But the most important factor determining the final quality of the art is **time spent practicing** — the *dupogodziny* you put in every day.

**Every hour of focused practice shifts your Gaussian distribution to the right and makes it narrower.**

With that in mind, in this final class, I'll walk you through the **design and training** process of a neural network I most recently worked on:  
a **regressor for predicting period length in chloroplast grana**.

But the final and most important thought I would like to convey is this:  
***you will benefit most from the time spent practicing.***

This is in fact a very **optimistic statement**.

That's why the homework for the 13th class **will** put you through your own design process of trial and error. Sure, I've provided some initial suggestions — but I am also sure that you'll need to figure out what works for you and what doesn't.

**Put in the hours. That's where learning happens.**





## Acknowledgments


*I would like to express my sincere gratitude to the co-authors of the following work, which underpins much of this class material:*  
**Bukat, A., Bukowicki, M., Bykowski, M., Kuczkowska, K., Nowakowski, S., Śliwińska, A., & Kowalewska, Ł.** (2025). *GRANA: An AI-based tool for accelerating chloroplast grana nanomorphology analysis using hybrid intelligence*. *Plant Physiology*, **kiaf212**. https://doi.org/10.1093/plphys/kiaf212


*In particular, I wish to thank **Łucja Kowalewska**, and members of her team: **Alicja Bukat** and **Michał Bykowski** for their valuable discussions and for providing high-quality visual representations of key concepts, which are included in this course material with their kind permission.*


# Problem Description - Grana
----------------------------

<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/from_plant_to_granum.png?raw=1"
       alt="granum image" width="1000" height="600">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/granum_image.png?raw=1"
       alt="granum image" width="600" height="600">
</div>

**Grana** (singular: *granum*) are stacks of **thylakoid membranes** found within the **chloroplasts** of plant cells and some algae. They play a central role in **photosynthesis**, particularly in the light-dependent reactions.



## Structure

- Each **granum** resembles a stack of coins or pancakes.
- A **granum** consists of multiple **thylakoids**, which are membrane-bound compartments.
- Thylakoids are interconnected by **stromal lamellae**, which help maintain the structure and allow for communication between grana.



## Function

Grana are the **site of light-dependent reactions** of photosynthesis:
- These structures absorb **light energy**, produce **ATP**, and reduce **NADP⁺** to **NADPH**.
- Water molecules are split (photolysis), releasing **oxygen** as a by-product.

## Grana Parameters

<div align="center">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_measurements.png?raw=1"
       alt="granum image" width="1000" height="600">

</div>

Manual quantification of grana parameters is both time-consuming and prone to operator bias, which compromises the reproducibility and comparability of results across research groups.

While automated techniques—such as those based on grayscale intensity profiles or Fast Fourier Transform (FFT) analysis—have been proposed, they often fail when applied to non-ideal or low-quality TEM images. In such cases, researchers are forced to revert to manual annotation.

To overcome these limitations, we proposed a robust deep learning–based approach for the automated analysis of grana morphology, capable of handling a wide range of image qualities with improved accuracy and consistency.


# Automated Aproach Description
----------------------------



The automated analysis is structured into five components:

1. **Grana detection**: Detect individual grana in transmission electron microscopy (TEM) images (***object detection task***).
2. **Orientation estimation**: Predict the orientation angle of each granum from the corresponding TEM image fragment (***regression task***).
3. **Morphometric analysis**: Derive morphological parameters such as perimeter, height, and diameter based on the detected granum and its orientation.
4. **SRD length estimation**: Estimate the Stacking Repeat Distance (SRD), also known as the *period*, based on the grana structure (***regression task***).
5. **Thylakoid count estimation**: Use the computed SRD and granum height to infer the number of stacked thylakoids.

This automated pipeline enables rapid, objective, and reproducible quantification of grana nanomorphology across large image datasets.

<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/nn_analysis.png?raw=1"
       alt="granum image" width="1000" height="600">

</div>

# SRD (Period) Estimation: Dataset and Problem Statement
---

The core objective of this class is the estimation of the **Stacking Repeat Distance (SRD)**, also referred to as the **period**, which quantifies the regular spacing between thylakoid layers within a granum. Accurate period estimation is critical for understanding the structural organization of grana and assessing physiological variations across samples.

## Problem Statement

Given an image of an individual granum—cropped from a transmission electron microscopy (TEM) image and aligned horizontally—the task is to estimate the **average period length** (Stacking Repeat Distance, SRD) in **pixels**.

Ground truth period values are based on manual measurements provided by human experts. However, a simulation involving four independent human annotators revealed a substantial degree of variability, underscoring the subjective and inconsistent nature of manual SRD estimation.

## Dataset

The dataset evolved over the course of our work on the SRD estimation network. The final version used in our experiments comprised **339 granum samples**, which were split into **70% training** and **30% validation** subsets.

## Image Quality

The quality of the images in the dataset varied considerably. This variation was influenced by multiple factors, including differences in TEM image acquisition protocols and postprocessing procedures. To ensure robustness and applicability of our method across diverse imaging conditions, the dataset also included a subset of microscope images that had been scanned from physical photographic foils.

## Examples

Below you will find some examples of the data:

<table align="center">
  <tr>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-10.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-105.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-11.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-112.png?raw=1" width="250"></td>
  </tr>
  <tr>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-143.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-15.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-18.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-19.png?raw=1" width="250"></td>
  </tr>
  <tr>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-46.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-47.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-89.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/train_img_raw-eff8496-93.png?raw=1" width="250"></td>
  </tr>
  <tr>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/val_img_raw-eff8496-13.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/val_img_raw-eff8496-50.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/val_img_raw-eff8496-61.png?raw=1" width="250"></td>
    <td><img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/grana_examples/val_img_raw-eff8496-84.png?raw=1" width="250"></td>
  </tr>
</table>



## Solution Approaches

Initial attempts to solve the SRD estimation task included fine-tuning a pretrained **ResNet** architecture. However, training failed to converge meaningfully. The model quickly plateaued and consistently predicted a value close to the mean period observed in the training set, indicating that it failed to learn any meaningful features specific to individual inputs.

We also explored a handcrafted approach based on applying the **Fast Fourier Transform (FFT)** to the input image, aiming to capture periodic spatial patterns. However, this also failed, partialy due to the poor image quality or irregular granum structures.



# A Revised Approach
---

## Realizations

At this stage, we had developed a deeper understanding of the image structure and the inherent difficulty of the SRD estimation task. Several key insights emerged:

- We realized that estimating the period based on a **single horizontal stripe** is inherently imprecise. Such an approach limits the measurement resolution to whole pixels, making **sub-pixel accuracy** unattainable. In contrast, human experts are able to measure the thickness of multiple adjacent stripes and average the results, thus achieving sub-pixel precision.

- We also observed that **not all regions within an image contribute equally** to period estimation. The **usefulness of a region depends heavily on its local quality**—some areas may contain clear periodic structures, while others provide no meaningful information due to noise or artifacts.

## Guiding Assumptions

Based on these realizations, we redefined our modeling strategy with the following assumptions:

- The model should be based on a **convolutional neural network (CNN)**, which naturally exploits local patterns and spatial hierarchy.

- The **receptive field of each output neuron should be narrow in width**, allowing it to focus on a small horizontal portion of the image. This constraint helps to ensure that the quality within each receptive field remains relatively uniform.

- At the same time, the **receptive field should be long**, so that each output neuron observes multiple horizontal stripes arranged top-down. This enables the model to make a more reliable and informed estimate of the period by effectively averaging over several features, akin to human annotators.

 - Basically, it boils down to an assymetric receptive field. The receptive fields should also overlap, to increase odds that at least one receptive field captures the good quality portion of an image.

- I decided to use attention mechanism. My reasoning was the following: let the output neuron learn to encode both the period prediction and the quality prediction in the output encoding for its receptive field. Then the attention mechanism will make the final prediction of the high quality encodings only.

- Since the input image dimensions varied across samples, each image was uniformly rescaled to between 450 and 500 pixels to ensure consistency during training and inference.

- Based on the known period lengths observed in TEM images, we aimed to design the network architecture such that each **output neuron's receptive field** would cover approximately **200 pixels in height** and **20–40 pixels in width**.

- If it is of high quality, in principle, **one such receptive field region should be enough** to perdict the period length.

- Obviously, position of this high quality portion within an image is irrelevant, so **we will not be using poisitional encoding**.

# Neural Network Architecture
------------------------------------


## Convolutional Encoder

This is the network I used:

```python
        self.seq = torch.nn.Sequential(
            torch.nn.Conv2d(1, 32, (3, 3), stride=(1, 1), padding=(0, 0)),
            torch.nn.ReLU(),
            torch.nn.Conv2d(32, 32, (3, 3), stride=(1, 1), padding=(0, 0)),
            torch.nn.MaxPool2d((2, 2), stride=(2, 2)),

            torch.nn.Conv2d(32, 32, (3, 3), stride=(1, 1), padding=(0, 0)),
            torch.nn.ReLU(),
            torch.nn.Conv2d(32, 32, (3, 3), stride=(1, 1), padding=(0, 0)),
            torch.nn.MaxPool2d((2, 1), stride=(2, 1)),

            torch.nn.Conv2d(32, 32, (3, 3), stride=(1, 1), padding=(0, 0)),
            torch.nn.ReLU(),
            torch.nn.Conv2d(32, 32, (3, 3), stride=(1, 1), padding=(0, 0)),
            torch.nn.MaxPool2d((2, 1), stride=(2, 1)),

            torch.nn.Conv2d(32, 32, (3, 3), stride=(1, 1), padding=(0, 0)),
            torch.nn.ReLU(),
            torch.nn.Conv2d(32, 32, (3, 3), stride=(1, 1), padding=(0, 0)),
            torch.nn.MaxPool2d((2, 1), stride=(2, 1)),

            torch.nn.Conv2d(32, 32, (3, 3), stride=(1, 1), padding=(0, 0)),
            torch.nn.MaxPool2d((2, 1), stride=(2, 1)),

            torch.nn.Conv2d(32, 32, (3, 3), stride=(1, 1), padding=(0, 0)),
            torch.nn.MaxPool2d((2, 1), stride=(2, 1)),

            torch.nn.Dropout(0.1)
        )
```

## Task

1. Calculate the receptive field size and stride for this network.
2. Calculate the size of output grid for the input image sized 476x476.


The first dimention:

| Conv (3×3) | Conv (3×3) | Pool (2×2) | Conv (3×3) | Conv (3×3) | Pool (2×1) | Conv (3×3) | Conv (3×3) | Pool (2×1) | Conv (3×3) | Conv (3×3) | Pool (2×1) | Conv (3×3) | Pool (2×1) | Conv (3×3) | Pool (2×1) | Output |
|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:------:|
|  220 |   218  |   216  |  108 |  106    |  104  |   52  |  50 |  48 |  24  |    22  |    20    |    10  |    8   |4  |   2  |   **1** |
|  284 |   282  |   280  |  140 |  138   |  136  |   68  |  66 |  64 |  32  |    30  |    28    |    14  |    12   |6  |   4  |   **2** |

So the receptive field size in this direction is 220 with stride of 64.

The second dimension:

| Conv (3×3) | Conv (3×3) | Pool (2×2) | Conv (3×3) | Conv (3×3) | Pool (2×1) | Conv (3×3) | Conv (3×3) | Pool (2×1) | Conv (3×3) | Conv (3×3) | Pool (2×1) | Conv (3×3) | Pool (2×1) | Conv (3×3) | Pool (2×1) | Output |
|:---------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|:------:|
|  38 |   36  |   34  |  17 |  15    |  13  |   13 |  11 |  9 |  9  |    7  |    5    |    5  |    3   |3  |   1  |   **1** |
|  40 |   38  |   36  |  18 |  16   |  14  |   14  |  12 |  10 |  10  |    8  |  6  |    6  |    4  | 4 |   2  |   **2** |

So the receptive field size in this direction is 38 with stride of 2.

To compute the **output grid size**, we consider how many receptive fields can be slid across the input image of size $476 \times 476$ pixels, given the final receptive field sizes and strides in each direction. The output resolution is calculated as follows:

- **Vertical (height) direction**:

$$
\left\lfloor \frac{476 - 220}{64} \right\rfloor + 1 = \left\lfloor \frac{256}{64} \right\rfloor + 1 = 5
$$

- **Horizontal (width) direction**:

$$
\left\lfloor \frac{476 - 38}{2} \right\rfloor + 1 = \left\lfloor \frac{438}{2} \right\rfloor + 1 = 220
$$

Thus, the output of the convolutional feature extractor is a **$5 \times 220$ grid** of feature vectors, where each vector encodes information from a local receptive field of the input image.

It is also important to note that we had some flexibility in selecting the **receptive field sizes**, **strides**, and the **input image resolution**. These parameters were carefully tuned to ensure **perfect alignment** between the receptive field grid and the image dimensions. Specifically, the following computations:

$$
\frac{476 - 220}{64} + 1 = 5 \quad \text{and} \quad \frac{476 - 38}{2} + 1 = 220
$$

yield integer values **without invoking the floor operation**. This ensures that the output feature grid **fully and uniformly** covers the input image, with no partial or misaligned receptive fields.


## Attention Module



In Class 12, we embedded a sequence of tokens into a multidimensional space, resulting in a structured sequence of embeddings.

Today, we shift our focus from textual sequences to images. While the data modality has changed, the underlying principles remain similar: we will again construct a sequence of embeddings — this time derived from spatial regions of the image — and use attention to process them.

It is important to emphasize that we will be using a **regular attention mechanism with learned energy scores**, not self-attention.

To do this, I designed this Convolutional Neural Network we saw earlier:
- 1 input channel,
- an input grid of 476x476 pixels,
- and an output feature map arranged as a 5x220 grid with 32 channels.

This architectural choice yields **1100 distinct positions** (5x220), each represented by a 32-dimensional feature vector.

The resulting output can be interpreted as a sequence of **1100 embeddings**, each residing in a **32-dimensional feature space**. This representation is fully compatible with the **attention mechanism**.

It is important to emphasize that attention was not incorporated as an experimental flourish or to increase model complexity arbitrarily. Rather, it was a **deliberate architectural decision** motivated by the structure of the data. Specifically, we hypothesized that the attention mechanism would enable the model to **identify and downweight low-quality regions** of the input image, allowing subsequent components of the network to **focus more effectively on high-quality, informative areas**. In this way, attention serves as a dynamic filter, helping the model to cope with the heterogeneity in image quality across the dataset.

In principle, the attention mechanism can assist the network in selecting the **single most informative output neuron** — the one whose receptive field best captures high-quality structural information.

To be fully transparent, I implemented a **two-head attention design**, which was not covered in class. In retrospect, this may have been a misstep. The second attention head was added **without a clearly defined architectural role or theoretical justification**, motivated more by curiosity than by design: a *let's see what happens* experiment. As a result, the module included **two attention queries** instead of the single-query setup discussed and justified in our course.



## Linear Regressor

The **linear regressor** following the attention module operates on a **weighted average of the embeddings**, where the weights are derived from the attention mechanism. This ensures that embeddings corresponding to high-quality receptive fields contribute more significantly to the final representation. In the case of **single-head attention**, this results in a single **32-dimensional feature vector**. In contrast, when using **two attention heads**, the outputs from each head (each 32-dimensional) are concatenated to form a **64-dimensional vector**.

This aggregated feature vector is then passed through a simple **two-layer fully connected network**. The first linear layer projects the input to an **8-dimensional hidden representation**, followed by a second linear layer that maps it to a **scalar output**, representing the model's **final prediction of the period**.

This setup effectively reduces a spatially distributed and variable-quality image into a single, informative scalar — while allowing the network to focus adaptively on the most trustworthy regions of the input.


# Training Process
-----------------------
***Unfortunately, that setup would not train.***

The issue, I suspected, lay in the fact that we were asking the neural network to learn **multiple objectives simultaneously**, without adequate architectural disentanglement. Specifically, the network was implicitly required to:

1. Predict the **period length** for each receptive field,
2. Estimate the **local image quality** for each receptive field,
3. Encode both the period prediction and quality signal into a **shared embedding**, and
4. Enable the attention mechanism to select **high-quality embeddings** to inform the final prediction.

This entanglement of responsibilities likely overwhelmed the learning dynamics, especially early in training. With no explicit supervision on the quality assessment task, the network had no clear gradient signal guiding it to separate informative from noisy regions. As a result, both the attention weights and the period predictions remained uninformative, and the model failed to converge.

Our design choice was to introduce attention so it can assess local image quality. Observe that in such settings, we need the attention mechanism to synchronize with the embeddings. The guiding force — the gradient — must work simultaneously in two directions:

- It must drive the embeddings to encode increasingly more information about local quality.
- It must drive the attention mechanism to focus more and more on the quality component.

**If one of the components fails to synchronize with the other, it may drive the system toward uselessness and lead to convergence at a suboptimal local minimum.**

To solve this, I adopted a **progressive model growth** training strategy: training was performed in stages. A similar approach — though much simpler, involving only two stages — is suggested in the Class 13 homework assignment.







## Training Period Estimator Stage

**The aim of this training stage was to initiate the network**, focusing primarily on training the Convolutional Encoder using images significantly simpler than actual grana.

Given the uniform quality of the input images, **the Attention Component remained mostly inactive** and transparent throughout this stage.

The training images were generated artificially. The period width was uniformly sampled from between 20 and 140, light fraction was sampled uniformly from 15 to 45 percent; a 476 by 476 image was then filled in by alternating light and dark stripes, with the light ones  occupying the previously sampled fraction of the overall period. The intensity of each pixel in the dark stripes were sampled uniformly from between 0 and 100, while the intensity of each pixel  in the light ones were sampled uniformly from between 156 and 255. The epoch length was the same as for the real dataset.

After the satisfactory validation prediction quality for this stage was achieved, the training was stopped after aprox. 200 epochs.

The monitoring of training progess (including the validation period absolute error monitoring) was provided out of the box by TensorBoard (note, that the x-axis is indexed by batch, not by the epoch):
<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/val_period_monitoring/c67ea31.png?raw=1"
       alt="granum image" width="1000">

</div>


## Visualising Attention
----------------------
Before we proceed with the description of the next training stages, let me make a short detour.

I would like to introduce a **visualization technique for the attention mechanism** that can be applied in the case of a **regular attention mechanism with learned energy scores** in image-based model.

Since each output neuron corresponds to a specific location in the input image — determined by its receptive field — the **attention weight assigned to that neuron's embedding** can be **superimposed onto the input image** at the corresponding spatial location.

<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/attention_visualisation.png?raw=1"
       alt="granum image" width="600">

</div>


This alows us to visualise attention in this training stage. As we said, it was mostly inactive. There are two heads, visualised with red and green hues. Let's see:

<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage1/image-96ecb5b-193.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage1/image-96ecb5b-209.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage1/image-96ecb5b-210.png?raw=1"
       alt="granum image" width="1000">

</div>




## Attention Training

The objective of this training stage was to initiate the Attention Component, which, while present, remained effectively unused during the previous stage.

Recall, that for the attention to be usable, the gradient must work simultaneously in two directions:

- It must drive the embeddings to encode increasingly more information about local quality.
- It must drive the attention mechanism to focus more and more on the quality component.

The training images were generated artificially, as in the previous stage, with the added inclusion of normal noise. This noise was zero-centered, and its SD varied across input images, with the parameter uniformly sampled between 0.0 and 100.0 prior to image generation. To train the attention mechanism, between 1 and 4 rectangles (matching the receptive field size and position, potentially overlapping) were excluded from the noising process and retained unaltered information about the period length. This approach enabled the network to learn how to use the attention mechanism to select the most informative parts of the image.

The training steadily improved the validation error. We stopped the training after the satisfactory validation prediction quality for this stage was achieved, i.e. the training was stopped after aprox. 460 epochs.

The monitoring of training progess (including the validation period absolute error monitoring) was provided out of the box by TensorBoard (note, that the x-axis is indexed by batch, not by the epoch):
<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/val_period_monitoring/5c683e6.png?raw=1"
       alt="granum image" width="1000">

</div>


Recall, that there are two attention heads, visualised with red and green hues. Let's see:

<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage2/image-afe7697-225.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage2/image-afe7697-232.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage2/image-afe7697-236.png?raw=1"
       alt="granum image" width="1000">

</div>

Clearly, the embeddings learned to encode quality information, while the red attention head simultaneously learned to attend to this information within the embeddings.

The green attention head appears unspecialized, attending to embedding positions in a seemingly random manner.


**It was clearly going in the right direction. However, it was still to early to train on the real grana images - the training wouldn't converge.**




## Artificial Data with Real Masks

The objective of this training stage was to gradually increase the complexity of the data, approaching the difficulty of real grana. To achieve this, normal noise was added to the images generated as in the previous stage, with the noise's SD now uniformly sampled between 0.0 and 20.0, including the regions initially excluded from the noise to train the attention mechanism. Additionally, real masks were subsampled from the training set of real grana and applied to the artificially generated training images, with the area outside the mask set to zero. A similar procedure was applied to the artificial validation set using masks from the real validation set.

Finally, the training images were further subsampled, symmetrically transformed, tilted, and had random amounts of noise, resembling microscopic noise, added. These steps were implemented as part of the image augmentation process.

We stopped the training after the satisfactory validation  prediction quality for this stage was achieved, i.e. the training was stopped after aprox. 550 epochs.

The monitoring of training progess (including the validation period absolute error monitoring) was provided out of the box by TensorBoard (note, that the x-axis is indexed by batch, not by the epoch):
<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/val_period_monitoring/d4bbc1e.png?raw=1"
       alt="granum image" width="1000">

</div>

<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage3/image-252cc8f-187.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage3/image-252cc8f-194.png?raw=1"
       alt="granum image" width="1000">
  
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage3/image-252cc8f-217.png?raw=1"
       alt="granum image" width="1000">
  
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage3/image-252cc8f-232.png?raw=1"
       alt="granum image" width="1000">

</div>

Clearly, the embeddings retained the ability to encode quality-related information, while the red attention head consistently attends to this information within the embeddings.

The behavior of the green attention head, however, appears more complex and is not as easily explained. It may lack a clear specialization or be capturing a different, less interpretable aspect of the input.



## True Grana Training

In this stage the network was trained on the real grana from the training set and validated on the real grana from the validation set.

To enhance the diversity of the training data, the input images were subsampled, symmetrically transformed, tilted, and subjected to randomly generated noise resembling microscopic noise, as part of the image augmentation procedure.

We stopped the training after the satisfactory validation prediction quality for this stage was achieved, i.e. the training was stopped after aprox. 430 epochs.

The monitoring of training progess (including the validation period absolute error monitoring) was provided out of the box by TensorBoard (note, that the x-axis is indexed by batch, not by the epoch):
<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/val_period_monitoring/ac7356a.png?raw=1"
       alt="granum image" width="1000">

</div>

Unfortunately, I do not have attention visualisations from this stage of training that can be readily presented.


## Loss Functions


To further complicate matters, we required not only a prediction of the actual period size, but also an estimate of the neural network's confidence in that prediction.

Another detour is due.

### Mean Square Error

Mean Square Error (MSE) loss is a very typical choice of a loss function. It models the mean squared distance between the ground truth values $y_i$
and the predictions $\hat y_i$, $i=1, \ldots, n$.

$$
\text{MSE}(y, \hat y) = \frac{\sum_{i=1}^n(y_i - \hat y_i)^2}{n}
$$

The MSE loss can also be derived from the negative log likelihood of the data $y_i$, $i=1, \ldots, n$, under the assumption the data comes from the normal distribution centered at $\hat y_i$ with the covariance matrix proportional to the identity matrix (i.e. all its components having the same SD, say 1.0).

### Gaussian Negative Log Likelihood

The negative log likelihood loss model allows us to go one step further: not only to predict the values (with the use of means of our distribution, i.e. $\hat y_i$), but also to predict *how certain we are that we are correct* with the use of $\hat{\text{sd}}_i$, the prediction of the SD.  

Namely, if we assume that $y_i$ comes from the normal distribution centered at $\hat y_i$ with the standard deviation of $\hat{\text{sd}}_i$ (which may differ from case to case), the resulting negative log likelihood loss is the Gaussian Negative Log Likelihood (GNLL) loss:

$$
\text{GNLL}(y, \hat y, \hat{\text{sd}}) = \frac{1}{n}  \sum_{i=1}^n \left ( \log(\hat{sd_i}) + \frac{(y_i - \hat y_i)^2}{2 \hat{\text{sd}}_i} \right )
$$



### This Idea Seems Intriguing

If this idea seems interesting to you, you can [read more about the GNLL loss here](https://johaupt.github.io/blog/NN_prediction_uncertainty.html).

### Interpolating Loss

For training of the network predicting the mean period value and its SD,  
we used the GNLL loss function with the exception, that for the transition period over the first 30 epochs  
we use the following interpolating loss function (let $i=0, \ldots, 29$ be the epoch number):

$$
\begin{aligned}
L_T(i, h_1, h_2, y, \hat y, \hat{\text{sd}}) =\ &\cos\left(\frac{i}{30}\frac{\pi}{2}\right) \;\text{MSE}(y, \hat y) + \\
&\sin\left(\frac{i}{30}\frac{\pi}{2}\right) \;\text{GNLL}(y, \hat y, \hat{\text{sd}})
\end{aligned}
$$


## True Grana Training with Confidence Prediction

In this stage the network was trained on the real grana from the training set and validated on the real grana from the validation set.

To build the network in this stage from the previous one, the 2-head attention component and the linear regressor was initially cloned from the first and subsequently trained independently. The transition loss $L_T$, shifting from the MSE-based loss to the GNLL-based loss, was calculated over the first 30 epochs, after which only the GNLL loss was applied exclusively.

As before, to improve the diversity of the training data, the images were subsampled, symmetrically transformed, tilted, and infused with randomly generated noise mimicking microscopic noise, all as part of the image augmentation strategy.

We stopped the training after the satisfactory validation prediction quality for this stage was achieved and was not improving, i.e. the training was stopped after 1112 epochs.

The monitoring of training progress — including validation period absolute error, as well as the prediction of standard deviation (not shown here) — was handled out of the box by TensorBoard. The charts made it clear that higher predicted standard deviations corresponded to higher errors in period prediction. Recall that as before, the x-axis in the plots is indexed by batch, not by epoch.

Below is the validation period absolute error chart, regardless of the predicted sd:
<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/val_period_monitoring/dba36a6.png?raw=1"
       alt="granum image" width="1000">

</div>

And the validation period absolute error chart, for the 1st class of the predicted sd ($\hat{\text{sd}}<1.0$):
<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/val_period_monitoring/dba36a6_class1.png?raw=1"
       alt="granum image" width="1000">

</div>

### **Validation subset**

<div align="center">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage5/image-eff8496-60.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage5/image-eff8496-72.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage5/image-eff8496-74.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage5/image-eff8496-77.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage5/image-eff8496-83.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage5/image-eff8496-84.png?raw=1"
       alt="granum image" width="1000">

  
</div>

### **Test subset**

<div align="center">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage5/image-eff8496-31.png?raw=1"
       alt="granum image" width="1000">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/grana/training_stages/stage5/image-eff8496-36.png?raw=1"
       alt="granum image" width="1000">

</div>

We can see that both red heads from the two attention modules retain their specialization and consistently attend to high-quality fragments — though sometimes these are different regions.

At the same time, the green head does appear to attend to something, but it is harder to pinpoint exactly what that is.


## Full (Train+Val) Set True Grana Refinement Training

In this stage, the network was further trained using the real grana from both the training and validation sets combined, with no separate validation set used to monitor training progress.

We stopped the training when the network's training error, which we monitored throughout, began to increase - this occurred after approximately 600 epochs. Prior to this, we saved seven network candidates: four corresponding to local minima in the training error and three at predetermined training intervals, specifically after 0, 200, and 300 epochs. These seven network candidates were then evaluated using the methods which cannot be explained today due to the limited time we had.

**This concludes the summary of the training process.**

# Project *Diamonds* Description
-------------------

Throughout all kingdoms of life, intriguing structures have been identified that, for a long time, eluded scientific investigation. The primary reason is that these structures are not directly visible; instead, they are typically observed only through **surface projections** or through the **effects they produce**. One notable example is the vibrant coloration in the wings of certain butterflies — caused not by pigments, but by **light diffraction on their crystal-like internal architecture**.


<div align="center">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/gyroids/structures.png?raw=1"
       alt="granum image" width="1000">

</div>

Łucja Kowalewska's group is the only research group in Poland working on these structures. They collaborate closely with mathematicians in Potsdam — **Myfanwy Evans** from the *Institut für Mathematik* — and in Perth — **Gerd Schröder-Turk** from *Murdoch University*.

In the current project, the focus will be placed on the discovery of these structures in **prolamellar bodies** (*pl. ciała prolamellarne*) found in the cells of certain plant seedlings.



<div align="center">

  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/gyroids/prolamellar_body.png?raw=1"
       alt="granum image" width="1000">


</div>


Łucja Kowalewska's group has developed software called **SPIRE (Surface Projection Image Recognition Environment)**, which can generate projections of minimal surfaces corresponding to these structures. In TEM images, we observe actual projections of such surfaces, as captured by transmission electron microscopy. By manually aligning the simulated projections with the experimental ones, it becomes possible to identify key structural parameters.



<div align="center">


  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/gyroids/projections.png?raw=1"
       alt="granum image" width="1000">
  

  
</div>


Multiple alignments of different cross-sections allow for confirmation of the identified structures.

<div align="center">


  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/gyroids/transition_biological.png?raw=1"
       alt="granum image" width="1000">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/gyroids/transition_lines.png?raw=1"
       alt="granum image" width="1000">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/gyroids/transition_aligned.png?raw=1"
       alt="granum image" width="1000">
  <img src="https://github.com/SzymonNowakowski/Machine-Learning-2024/blob/master/gyroids/transition_artificial.png?raw=1"
       alt="granum image" width="1000">

  
</div>

The focus of the current project is to **replace manual alignment** with **automated parameter discovery using an artificial neural network**.  
The high-level outline of the project is as follows:

1. **Generate a diverse set of surface projections** using *SPIRE*, each with known structural parameters.
2. For each projection, **add TEM-like noise** to simulate realistic TEM images while retaining ground truth structural parameters.
3. *(Optional)* Apply **diffusion-based methods** to generate even more realistic synthetic TEM images with known parameters.  
   A large dataset of real TEM images (with unknown structure) is available to train the diffusion model.
4. **Train a neural network** to estimate structural parameters from the synthetic images generated in steps 2 or 3.
5. **Evaluate the network** on a set of **manually annotated real TEM images** to assess generalization and accuracy.

This project — or a defined subproject — is suitable for a **Master’s thesis** and may be carried out under my supervision.

## Interested? Want to Hear More?

Together with Łucja Kowalewska, we will be giving a talk on **Wednesday, June 11th at 10:15** as part of the **Computational Biology and Bioinformatics (BOB) seminar series**. The seminar will take place in **room 3250**. The biological aspects of the work will be covered in greater detail, and you are welcome to ask questions or speak with Łucja Kowalewska after the seminar.

Also, you may be interested in the following material:
- [SPIRE animated tutorial](https://chloroplast.pl/spire)
- [Research group led by Łucja Kowalewska – more about their work](https://png.biol.uw.edu.pl)
