# Lloyd-Max quantization

* Minimizes the [MSE](https://en.wikipedia.org/wiki/Mean_squared_error) of the quantization error, i.e., the expectation of the power of the quantization error, i.e. 
\begin{equation}
 D = \text{E}[(\mathbf{x}-\mathbf{y})^2],
\end{equation}
where $D$ is the distortion generated by the quantizer, $\mathbf{x}$ is the original signal, and $\mathbf{y}$ is the reconstructed signal.
* The PDF (in the analog case) or the histogram (digital signals) is required. The density of quantization bins is higher in those parts of the input dynamic range where the probability of the samples is also higher.
* The quantizer must determine the decision levels, and the representation levels.
* Inside of a bin (quantization step), the PDF/histogram is supposed to be constant. *For this reason, we select the representation level of each bin just in the middle point. This is the *


## Adaptive quantization using the PDF
In the continuous case, if $M$ is the number of bins, the distortion can be expressed by
\begin{equation}
D = \sum_{i=1}^{M}\int_{\mathbf{b}_{i-1}}^{\mathbf{b}_i}(\mathbf{x}-\mathbf{c}_i)^2P(x)dx,
\end{equation}
where $\mathbf{b}_i$ is the upper decision level of the $i$-th bin, $\mathbf{c}_i$ is the representation level for the $i$-th bin, and $P(x)=f_\mathbf{x}(x)$ is the probability of finding $x$ in the signal (considered as a random variable) $\mathbf{x}$.

To minimize $D$ we must solve
\begin{equation}
\frac{\partial D}{\partial \mathbf{c}_i} = 0 = -\sum_{i=1}^M\int_{\mathbf{b}_{i-1}}^{\mathbf{b}_i}2(\mathbf{x}-\mathbf{c}_i)^2P(x)dx
\end{equation}
which boilds down to
\begin{equation}
= -\int_{\mathbf{b}_{i-1}}^{\mathbf{b}_i}2(\mathbf{x}-\mathbf{c}_i)^2P(x)dx
\end{equation}
because $\mathbf{c}_i$ is only used in one of the bins. We continue
\begin{equation}
= 2\int_{\mathbf{b}_{i-1}}^{\mathbf{b}_i}\mathbf{x}P(x)dx - 2\mathbf{c}_i\int_{\mathbf{b}_{i-1}}^{\mathbf{b}_i}P(x)dx.
\end{equation}
Therefore:
\begin{equation}
\mathbf{c}_i = \frac{\int_{\mathbf{b}_{i-1}}^{\mathbf{b}_i}\mathbf{x}P(x)dx}{\int_{\mathbf{b}_{i-1}}^{\mathbf{b}_i}P(x)dx},\tag{1}
\end{equation}
i.e., the representation level $\mathbf{c}_i$ for each bin is the centroid of the probability mass in that bin. Notice that, in order to avoid a division by 0, at least one one sample must belong to each bin.

Unfortunately, such equation express that, to find the representation levels $\mathbf{c}_i$, we must determine first the decision levels $\mathbf{b}_i$. For computing them, we can now minimize $D$ respect to $\mathbf{b}_i$:
\begin{equation}
\frac{\partial D}{\partial \mathbf{b}_i} = 0,
\end{equation}
which, supposing that the bins are small enough to consider that the probability of the values of $\mathbf{x}$ is constant inside of each bin, ends up in that:
\begin{equation}
\mathbf{b}_i = \frac{\mathbf{c}_i+\mathbf{c}_{i+1}}{2},\tag{2}
\end{equation}
a result quite logical under such supposition.

## Computation of the representation levels.

Unfortunately, Equations (1) and (2) are mutually dependent. However, they can be used to compute $\{\mathbf{y}_k\}_{k=1}^M$ and $\{\mathbf{b}_k\}_{k=0}^M$ using the following iterative algorithm:

0. Define $\epsilon>0$.
1. Initialize $\mathbf{c}_k$ /* centroids */ at random.
2. Let $\mathbf{previous\_b}=\{\mathbf{previous\_d}_k\}_{k=0}^M=0$ /* boundaries */.
2. While $ max(|\mathbf{previous\_b}-\mathbf{b}|) > \epsilon$:
    1. $\mathbf{previous\_b}\leftarrow \mathbf{b}$.
    1. Compute the boundary (decision) levels $\mathbf{b}$ using Eq. (2).
    2. Update the centroids (representation levels) $\mathbf{c}$ using Eq. (1).

In [None]:
import numpy as np
from scipy.ndimage import uniform_filter1d
uniform_filter1d([2.0, 8, 0, 4, 1, 9, 9.0, 0], size=2, origin=-1)
uniform_filter1d([0, 128, 256], size=2, origin=-1)[:-1]
print(uniform_filter1d([64, 192], size=2, origin=-1)[:-1])
print(uniform_filter1d([64, 192], size=2, mode="nearest"))

In [None]:
from scipy.ndimage import center_of_mass
x = np.array([1,2,3,4,5])
center_of_mass(x)

x = np.array([1,1,1,1,1])
center_of_mass(x)[0]

In [None]:
def compute_boundaries(c):
    b = uniform_filter1d(c, size=2, origin=-1)[:-1]
    b = np.concatenate(([0],b,[256]))
    #print('y', y, 'b', b)
    return b

In [None]:
def compute_centroids(b, P, M):
    c = []
    bin_size = P.size//M
    print("bin_size", bin_size)
    for i in range(M):
        b_i = i*bin_size
        b_i_1 = (i+1)*bin_size
        print("b_i", b_i, "b_i_1", b_i_1)
        # See from scipy.ndimage import center_of_mass
        mass = np.sum([j*P[j] for j in range(b_i, b_i_1)])
        total_counts_in_bin = np.sum([P[j] for j in range(b_i, b_i_1)])
        #if total_counts_in_bin > 0:
        centroid = mass/total_counts_in_bin
        #print("1", centroid, b_i, b_i_1)
        #centroid = center_of_mass(b[i+1]*P[b_i:b_i_1])[0]
        #print("2", centroid, b[i])
        #else:
        #    centroid = mass/bin_size
        c.append(centroid)
    #print('>c', np.array(c))
    return np.array(c)

In [None]:
def compute_levels(P, epsilon, max_iters, M, min_val, max_val):
    initial_boundaries = np.linspace(min_val, max_val + 1, M + 1)
    initial_centroids = 0.5 * (initial_boundaries[1:] + initial_boundaries[:-1])
    #initial_centroids = np.concatenate(([0], initial_centroids))
    c = initial_centroids
    #print('c', c)
    b = initial_boundaries
    print('b', b)
    prev_b = np.zeros(b.size)
    print('prev_b', prev_b)
    #print(M)
    for j in range(max_iters):
        print('j', j)
        prev_b[:] = b
        b = compute_boundaries(c)
        max_abs_error = np.max(np.abs(prev_b-b))
        print("max_abs_error", max_abs_error)
        if (j>0) and (max_abs_error <= epsilon):
            break
        c = compute_centroids(b, P, M)
    return b, c

In [None]:
P = np.ones(256) # Counts for uniform distribution
#P = np.random.randint(low=0, high=2000, size=256)
epsilon = 1e-5
max_iters = 100
min_val = 0
max_val = 255
M = 2
compute_levels(P, epsilon, max_iters, M, min_val, max_val)

## Quantize an image

In [None]:
%%bash
if [ -d "$HOME/repos" ]; then
    echo "\"$HOME/repos\" exists"
else
    mkdir ~/repos
    echo Created $HOME/repos
fi

In [None]:
%%bash
if [ -d "$HOME/repos/image_IO" ]; then
    cd $HOME/repos/image_IO
    echo "$HOME/repos/image_IO ... "
    git pull 
else
    cd $HOME/repos
    git clone https://github.com/vicente-gonzalez-ruiz/image_IO.git
fi

In [None]:
%%bash
if [ -d "$HOME/repos/information_theory" ]; then
    cd $HOME/repos/image_IO
    echo "$HOME/repos/information_theory ... "
    git pull 
else
    cd $HOME/repos
    git clone https://github.com/vicente-gonzalez-ruiz/information_theory.git
fi

In [None]:
!ln -sf ~/repos/image_IO/image_1.py .
!ln -sf ~/repos/image_IO/logging_config.py .

In [None]:
import image_1 as gray_image
import os

In [None]:
home = os.environ["HOME"]
fn = home + "/repos/MRVC/images/lena_bw/"
#fn = home + "/repos/MRVC/images/circle/"
#fn = home + "/repos/MRVC/images/Hommer_bw/"
!ls -l {fn}

# Quantizer selection
#quantizer = quantization.LloydMax_Quantizer

n_clusters = 4  # Number of bins
N_tries = 4  # Number of times K-means is run (if the centroids are init at random)

#N_bins = range(2, 128, 1)
#N_bins = [2, 4, 8, 16, 32] #range(2, 128, 1)
#N_bins = [8]

gray_image.write = gray_image.debug_write

In [None]:
img = gray_image.read(fn, 0)
gray_image.show(img, fn + "000.png")

In [None]:
histogram, bin_edges = np.histogram(img, bins=256, range=(0, 256))
histogram[histogram==0] = 1
print(histogram, bin_edges)
print(len(histogram))

In [None]:
try:
    import matplotlib.pyplot as plt
except:
    !pip install matplotlib
    import matplotlib
    import matplotlib.pyplot as plt
    import matplotlib.axes as ax
    #plt.rcParams['text.usetex'] = True
    #plt.rcParams['text.latex.preamble'] = [r'\usepackage{amsmath}'] #for \text command
%matplotlib inline

In [None]:
plt.figure()
plt.title("Histogram")
plt.xlabel("Intensity")
plt.ylabel("Count")
plt.plot(bin_edges[0:-1], histogram)

In [None]:
P = histogram
epsilon = 1e-5
max_iters = 100
min_val = 0
max_val = 255
M = 128
boundaries, centroids = compute_levels(P, epsilon, max_iters, M, min_val, max_val)

In [None]:
print(boundaries)

In [None]:
print(centroids)

In [None]:
print(len(centroids))

In [None]:
indexes = np.searchsorted(boundaries, img)

In [None]:
print(indexes.shape)

In [None]:
print(np.unique(indexes))

In [None]:
print(len(np.unique(indexes)))

In [None]:
gray_image.show_normalized(indexes, fn + "000.png")

In [None]:
quantized_img = centroids[indexes].astype(np.uint8)

In [None]:
gray_image.show(quantized_img, fn + "000.png")

In [None]:
import LloydMax_quantization as quantization

In [None]:
quantizer = quantization.LloydMax_Quantizer
Q = quantizer(Q_step=4, counts=histogram)
print("decision_levels =", Q.get_decision_levels())
print("representation_levels =", Q.get_representation_levels())

In [None]:
quantized_img, indexes = Q.encode_and_decode(img)

In [None]:
gray_image.show_normalized(indexes, fn + "000.png")

In [None]:
def plot(x, y, xlabel='', ylabel='', title=''):
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.set_title(title)
    ax.grid()
    ax.xaxis.set_label_text(xlabel)
    ax.yaxis.set_label_text(ylabel)
    ax.plot(x, y)
    plt.show(block=False)

In [None]:
x = np.linspace(0, 255, 500) # Input samples
y, k = Q.encode_and_decode(x)

In [None]:
xlabel = "Input Sample"
ylabel = "Reconstructed Sample"
title = f"Lloyd-Max Quantizer ({fn}, $\Delta={QSS}$)"

ax1 = plt.subplot()
counts, bins = np.histogram(img, range(257))
l1 = ax1.bar(bins[:-1] - 0.5, counts, width=1, edgecolor='none')
ax2 = ax1.twinx()
l2, = ax2.plot(x, y, color='m')

plt.legend([l1, l2], ["Histogram", "Lloyd-Max Quantizer"])
ax1.yaxis.set_label_text("Pixel Value Count")
ax2.yaxis.set_label_text("Reconstructed Value")
ax1.xaxis.set_label_text("Input Sample")
plt.show()