<a href="https://colab.research.google.com/github/ttb-git/llm-examples/blob/main/qlty_demo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch
import numpy as np
import plotly.express as px

In [None]:
!pip install qlty
!pip install einops

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting qlty
  Downloading qlty-0.1.1-py2.py3-none-any.whl (7.7 kB)
Installing collected packages: qlty
Successfully installed qlty-0.1.1
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting einops
  Downloading einops-0.4.1-py3-none-any.whl (28 kB)
Installing collected packages: einops
Successfully installed einops-0.4.1


In [None]:
import qlty
import einops
from qlty import qlty2D
from qlty import cleanup

Make some fake data; 10 images of 1 channel with shape (128,128)

In [None]:
Nimages=10
Nchannels = 1
NY = 128
NX = 128

Nclasses = 3
shape = (Nimages,Nchannels,NY,NX)
fake_data = torch.normal(mean=torch.zeros(shape), std=1.0)
fake_labels = torch.randint(0,Nclasses,size=shape)


In [None]:
print("Data Shape", fake_data.shape)
print("Label Shape", fake_labels.shape)

Data Shape torch.Size([10, 1, 128, 128])
Label Shape torch.Size([10, 1, 128, 128])


Now assume that the (Nchannel,NY,NX) image is too large to handle for some reason. What we do we built an object that slices the (Nchannel,NY,NX) image in small parts along the last two axes.

The window we well carve out will have a shape of (64,64), and we will use a step of (16,32). The window size and step size need to be such that the whole image can be covered without padding.

The border parameter significes the number of pixels that will have a weight of border_weight when averaging of overlapping tensors is performed.



In [None]:
quilter = qlty2D.NCYXQuilt(Y=NY,
                           X=NX,
                           window=(64,64),
                           step=(16,16),
                           border=(8,8),
                           border_weight=0.01)


Now let us splity the data into small patches

In [None]:
smallData,smallLabels = quilter.unstitch_data_pair(fake_data, fake_labels)
print(smallData.shape, smallLabels.shape)

torch.Size([250, 1, 64, 64]) torch.Size([250, 1, 64, 64])


Now that the image size has been reduced, training might be easier. Assume that we have a trained neural network that yield class probabilities for the smallData tensor, in the form of a (250, Nclass, 64, 64) tensor containing values after some softmax procedure

In [None]:
fakeSmallPs = torch.normal(torch.zeros(250, Nclasses, 64, 64), 1.0)
fakeSmallPs = torch.nn.Softmax(dim=1)(fakeSmallPs)
print(fakeSmallPs.shape)

torch.Size([250, 3, 64, 64])


Now that we have the overlapping tensors, we would like to stitch this back into a reconizable image size.

In [None]:
stiched_and_averaged_tensor, contrib = quilter.stitch(fakeSmallPs)

In [None]:
print(stiched_and_averaged_tensor.shape, contrib.shape)

torch.Size([10, 3, 128, 128]) torch.Size([128, 128])


The image below is a heatmap that indicates the number of contributors per pixel.  

In [None]:
px.imshow(contrib).show()

In the case of sparsely labeled images, it can happen that some patches will end up with no usable data - there is no need to keep these patches around in a training scenario. qlty.cleanup has some tools to make this happen.

In [None]:
sparse_smallLabels = torch.clone(smallLabels)
# the first 10 images have no usable labels
sparse_smallLabels[0:10,...]=-1

border_tensor = quilter.border_tensor().unsqueeze(dim=0) # ugly, a bug basically, will be fixed soon

In [None]:
help(cleanup.weed_sparse_classification_training_pairs_2D)

Help on function weed_sparse_classification_training_pairs_2D in module qlty.cleanup:

weed_sparse_classification_training_pairs_2D(tensor_in, tensor_out, missing_label, border_tensor)
    After tensors have been unstitched, we want want to be able to remove patches that have no data.
    To this extent, we inspect every patch and remove any that do not contain any data. In additon, we remove
    observations that lie in the border area. For this to work, a border_tensor must be supplied.
    
    The selection is made on the basis of the supplied 'tensor_out' data field.
    
    Parameters
    ----------
    tensor_in: input tensor
    tensor_out: output tensor
    missing_label: missing label flag (typically -1)
    border_tensor: the border tensor, obtained from the NCXYQuilt or NCZYXQuilt class
    
    Returns
    -------
    A new set of tensors that has valid training data.



In [None]:
clean_data, clean_labels = cleanup.weed_sparse_classification_training_pairs_2D(tensor_in=smallData,
                                                                                tensor_out=sparse_smallLabels,
                                                                                missing_label=-1,
                                                                                border_tensor=border_tensor)

In [None]:
print(clean_data.shape, clean_labels.shape)

torch.Size([240, 1, 64, 64]) torch.Size([240, 1, 64, 64])


These tools are extended to the 3D case as well - this will be released soon.