# Block Mini Float Verification

### Application to BM

First we identify the initial parameters of both MiniFloat (MF) and Block MiniFloat (BM).

Both employ a base where, $\beta = 2$, and for the sake of further testing the MF format in use will be an $X\langle e, m\rangle = X\langle 4,3\rangle$ where $e$ and $m$ represent the exponent and mantissa bits respectively.

The equations below represent the value of a MF number. $S$ represents the value of the sign bit, and $E$ and $F$ represent the unsigned integer value of the Exponent and Fraction.
$$
 X\langle e, m\rangle =
  \begin{cases}
    E = 0,       & \quad (-1)^S \times 2^{1-\beta_{MF}} \times (0 + F \times 2^{-m})  &\text{(de-normal)}\\
    Otherwise,  & \quad (-1)^S \times 2^{E-\beta_{MF}} \times (1 + F \times 2^{-m})  &\text{(normal)}
  \end{cases}
$$

From this we can see that with our $\langle 4,3\rangle$ format, our exponent ranges from $[0,15]$ and mantissa ranges from $[0:7]$. From the smallest subnormal value to the highest normal number gives us a range of $[2^{-9}, 480]$. 



### Theorem 1:
Using a floating-point format with parameters $\beta$ and $p$, and computing differences using p digits, the relative error of the result can be as large as $\beta- 1$.

## Proof 1: Block Minifloat 

Let $A$ and $B$ be two Block Minifloat tensors with shared exponent biases $\beta^A$ and $\beta^B$, respectively.

The Block Minifloat format uses a shared exponent bias $\beta$ for all values in a block and represents each value as $x_i=m_i⋅2^{e_i−\beta}$, where $m_i$ is the significand and $e_i$ is the exponent.
Assume that $\beta^A \gt \beta^B$, implying that the values in $A$ have larger magnitudes on average than those in $B$.

We need to show that the relative error in adding two Block Minifloat tensors can be as large as $\beta-1$.

For any value $a_i\in A$, we have:
$$a_i=s_A\cdot 2^{e_A−β_A}$$

Similarly, for any value $b_j\in B$, we have:
$$b_j=s_B\cdot 2^{e_B−\beta_B}$$

To add $a_i$ and $b_j$, their exponents need to be aligned. This requires shifting the significand of one of the values.

Exponent Alignment: Without loss of generality, assume that $\beta_A\gt \beta_B$. To align the exponents, all values in $B$ must be scaled by $2^{−(\beta_A−\beta_B)}$. This reduces the precision of the values in $B$ because shifting the significand results in a loss of lower bits.

Error Introduction:
The scaling introduces rounding errors. The values in $B$ can lose significant precision, and their contribution to the sum becomes negligible compared to the values in $A$. This is analogous to the case described in Goldberg's Theorem 1, where subtraction between numbers with vastly different magnitudes can result in large relative errors.

Relative Error:
Let the exact sum be $S_{exact}=a_i+b_j$, and let the computed sum (with rounding) be $S_{computed}$. The relative error is given by:
$$\text{Relative Error} =\frac{|S_{exact}−S_{computed}|}{|S_{exact}|}$$

When $\beta_A -\beta_B$ is large, $b_j$'s contribution to the sum is effectively lost, leading to:
$$S_{computed} \approx a_i$$
$$\therefore \text{Relative Error} = \frac{|b_j|}{|a_i + b_j|} \lt \beta - 1$$

If $|a_i|$ is very small compared to $|b_j|$ due to shared exponential difference, then the relative error can be as large as $\beta-1$. Which in the continued use case of a $\beta = 2$, holds true.

In [206]:
a = 0.0001
b = 10
c = b/(a + b)
print(c)

0.999990000099999


consider tensor of block minifloat as an array of numbers that span a region, when the two tensors have the same exponent, their regions should overlap simply. However when there is a difference of one, these regions have a region of inaccuracy (i.e partial overlap). This means that not all information is lost, as there is a region where two numbers are shared between the tensors, and thus calcuble. 