# Demo: A Deeper Dive into Numbers and Floating Point Types
We work with numerical data in many tasks in science, technology, engineering, and mathematics (STEM). Thus, let's explore common floating-point data types, how they are represented in memory, and the precision associated with each type.
* __Key (surprising) fact__: All floating point numbers are approximations! A floating-point number uses a _fixed number of bits_, i.e., memory to store a value, so it can only represent a finite set of rational values rather than the continuum of real numbers. Thus, any real number must be rounded to the nearest floating-point value, making every floating-point value an approximation.

The typical floating-point number types you will likely encounter in applications are `Float16`, `Float32`, and `Float64`. But what do these numbers mean? For example, what do the `16`, `32`, and `64` mean in `FloatXX`, what precision can these numbers describe, and how are they represented in memory?

To answer these questions, we need to establish a few ground truths. First, on regular, i.e., non-quantum hardware, all floating-point numbers are stored as binary numbers, i.e., numbers written in base `2`. Let's start by looking at the representation of base `b` numbers.

## Base b representation of numbers
A number in base $b$ is represented by a finite sequence of digits $(d_{n}d_{n-1}\dots{d_{1}}d_{0})_{b}$ where each digit $d_{i}$ satifies $0\leq{d_{i}}<b$. The value (in base 10) of a base $b$ number is the positional sum:
$$
\begin{align*}
\underbrace{(d_{n}d_{n-1}\dots{d_{1}}d_{0})_{b}}_{\text{base b}} = \underbrace{\sum_{i=0}^{n}d_{i}b^{i}}_{\text{value in base 10}}
\end{align*}
$$
Let's use integers for a few examples to understand this expression better (and then we'll move on to floating point numbers). 
Consider an `Int64` number. We know that memory storage on modern non-quantum hardware is binary, i.e., base $b = 2$; thus, all the digits $d_{i}$ must be $0\leq{d}_{i}<2$. 

However, how many digits do we have, i.e., the value of $n$? This is the _word size_, i.e., the `64` in `Int64`.
* __Hmmm__. Didn't we already see that? Yes, it's the length of the string output from [the `bitstring(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.bitstring)! Let's count the number of zero digits and the number of one digits of a test 64-bit integer using the [`count_zeros(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.count_zeros) and [`count_ones(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.count_ones)

To check the equality condition, we use the [Julia @assert macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert). If the statement passed to the [@assert macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert) evaluates to `false,` i.e., in the case the number of zeros and ones does not equal the `wordsize`, then an [AssertionError](https://docs.julialang.org/en/v1/base/base/#Core.AssertionError) is thrown altering us that there is an issue. 
* __Note__: we use the equality `==` operator (not the assignment operator `=`). There is also the `===` comparison operator in Julia, which determines whether `x` and `y` are identical, in the sense that no program could distinguish them. We'll see this operator later.

In [220]:
let
    wordsize = 64; # default word size
    x = 24; # pick an integer value (Int64 value by default)
    n = count_zeros(x) + count_ones(x) # this counts 0's and 1's (doesn't give any info about position)
    @assert wordsize == n # see https://docs.julialang.org/en/v1/base/base/#Base.@assert
end

### Binary
We can get the bit pattern (binary number) by calling [the `bitstring(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.bitstring), but the wrinkle is that [`bitstring(...)`](https://docs.julialang.org/en/v1/base/numbers/#Base.bitstring) returns the bit pattern as [a `String` type](https://docs.julialang.org/en/v1/manual/strings/#man-strings). We'll have to convert [the `String`](https://docs.julialang.org/en/v1/manual/strings/#man-strings) to an array of `0` and `1`, but more on that later.

The positions of the `0` and `1` values in the binary number give us the number's value. Suppose we get the bit pattern, i.e., the positions of the digits, using [the `bitstring(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.bitstring) and save this value in the `s::String` variable (we save the numerical value in `x::Int64`).

In [212]:
s,x = let
    x = 1678; # Int64 value by default
    s = bitstring(x)
    s,x
end

("0000000000000000000000000000000000000000000000000000011010001110", 1678)

For the binary string $s$, from the _right to the left_, we sum powers of 2 (the $b^{i}$ terms in the sum) for positions whose digit is a `1`. Let's make this more concrete. 
* __Hypothesis__: We should be able to _process_ the string $s$, i.e., compute the positional sum of $s$, and get back the value of the integer that generated it. To do this, we'll need to use a few tricks that we haven't discussed yet. Don't worry about how we do it; let's see if our hypothesis is true or false.

To check our hypothesis, we need to do a few things. The first is to convert the bit pattern in`s::String` into an array of numbers (so we can compute the positional sum). The following logic contains a few advanced things, e.g., working with arrays and [`String` and `Char` types](https://docs.julialang.org/en/v1/manual/strings/#man-strings), function [piping `|>`](https://docs.julialang.org/en/v1/manual/functions/#Function-composition-and-piping), etc; don't worry too much about these now:
* We can convert a String into an array, i.e., an ordered list of [`Char` types](https://docs.julialang.org/en/v1/manual/strings/#man-characters) using the [`collect(...)` method](https://docs.julialang.org/en/v1/base/collections/#Base.collect-Tuple{Type,%20Any}). If we have the [`Char`](https://docs.julialang.org/en/v1/manual/strings/#man-characters) version of `1` or `0`, we can then try to convert it into an `Int` using the [`parse(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.parse) in Julia.
* _Aside_: In older languages such as C, strings were natively represented as arrays of characters; in fact, there was no formal string class in C. However, newer languages that include [Unicode character sets](https://en.wikipedia.org/wiki/Unicode) have dedicated [`String` types](https://docs.julialang.org/en/v1/manual/strings/#man-strings) that are more complex. 
* We daisy chain together commands, i.e., `cmd_1 |> cmd_2` where the output of `cmd_1` is the input into `cmd_2` using the [Julia `|>` piping operator](https://docs.julialang.org/en/v1/manual/functions/#Function-composition-and-piping)

In [214]:
bit_pattern_array = bitstring(x) |> collect |> reverse .|> x-> parse(Int,x)

64-element Vector{Int64}:
 0
 1
 1
 1
 0
 0
 0
 1
 0
 1
 1
 0
 0
 ⋮
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0

__Hmmm__. Ok, we can convert the string to an `Array{Int64,1}`, which is good. However, arrays in Julia are `1` based, meaning the first index in the array occurs at index `1`. But all our expressions above are zero-based (first value stored at index `0`). Can we make a zero-based array in Julia?
* __Hack__: Sure! We can fix the `1`-based array issue by copying the `bit_pattern_array` into a dictionary (which we can make `0`-based), called `bit_patten_dictionary::Dict{Int64,Int64}`. This will allow us to start counting from 0 instead of 1.
* __Proper solution__: In addition to the _hack_ (which I use _all the time_), the (potentially) proper solution is to use [an instance of an `OffsetArray` exported by the `OffsetArrays.jl` package](https://github.com/JuliaArrays/OffsetArrays.jl) to fix the `1`-based array issue.

In [233]:
bit_patten_dictionary = let
    bit_patten_dictionary = Dict{Int64,Int64}(); # Declare memory
    for i ∈ eachindex(bit_pattern_array)
        bit_patten_dictionary[i-1] = bit_pattern_array[i] # what are we doing here?
    end
    bit_patten_dictionary; # what is going on here?
end

Dict{Int64, Int64} with 64 entries:
  5  => 0
  56 => 0
  35 => 0
  55 => 0
  60 => 0
  30 => 0
  32 => 0
  6  => 0
  45 => 0
  4  => 0
  13 => 0
  54 => 0
  63 => 0
  62 => 0
  58 => 0
  52 => 0
  12 => 0
  28 => 0
  23 => 0
  41 => 0
  43 => 0
  11 => 0
  36 => 0
  39 => 0
  7  => 1
  ⋮  => ⋮

Finally, let's compute the positional sum and see what our number is.

In [236]:
let

    b = 2; # What base do we have?
    count = 0; # if this works, when we are finished, this should be our original number
    positions = keys(bit_patten_dictionary) |> collect |> sort; # what is going on here?
    for i ∈ positions
        dᵢ = bit_patten_dictionary[i];
        count+= (dᵢ)*(b^i)
    end
    println("Was original your number $(count)?")
end

Was original your number 1678?


### Beyond binary numbers
There are many everyday applications for base $b>2$ numbers! Larger bases like decimal (base 10), dozenal (base 12), and sexagesimal (base 60) exist in everyday measurements and commerce. There are also a few others that you may encounter every day, but not realize it:
* Hexadecimal (base 16) compactly encodes binary data for color codes; for example, Cornell red is `#B31B1B`, while base 32/64 are used to encode arbitrary binary data (e-mail attachments, URLs, certificates) into printable characters.

Though higher bases require a more complex digit set, they dramatically shorten the representation of large values.

#### Digits Example
Let's consider an octal (base 8) example. Instead of calling [the `bitstring(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.bitstring) (which always returns a base $b=2$ value), let's explore [the `digits(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.digits). The [`digits(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.digits) takes a `number,` a `base,` and a `pad` argument and returns the bit pattern for `number` written with respect to `base` assuming a word size equal to `pad.`
* __Octal__: Let's use [the `digits(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.digits) to get the bit pattern for $n = 74$ written in `base = 8` for a `16-bit` machine. Save this data in the `bit_pattern_array::Vector{Int64}` variable.

In [338]:
bit_pattern_array_octal = digits(74, base=8, pad=16) # produces the bit pattern for a base 8 number

16-element Vector{Int64}:
 2
 1
 1
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0
 0

__Check__: Let's convert the octal number stored in the `bit_pattern_array_octal::Array{Int64,1}` variable back into base 10 by computing the positional sum in base 8.

In [341]:
let
    # initialize -
    bit_pattern_dictionary = Dict{Int64,Int64}();
    b = 8.0; # base 8 number of the example (
    wordsize = 16;
    foreach(i -> bit_pattern_dictionary[i-1] = bit_pattern_array_octal[i], eachindex(bit_pattern_array_octal)); # compact syntax for building bit dict

    # loop -
    value = 0.0;
    bitrangearray = range(0,stop=(wordsize-1),step=1) |> collect;
    for i ∈ bitrangearray
         dᵢ = bit_pattern_dictionary[i];
         value += (dᵢ)*(b^i)
    end

    value
end

74.0

## Floating point numbers
Now that we have seen how integers are laid out in memory, let's try to understand the memory layout expressions of floating types `Float16`, `Float32`, and `Float64`. However, before we do that, what is the use case for three floating-point types?

Using three floating-point types lets us balance precision and resource usage for different applications: 
* `Float16` (half-precision) minimizes memory footprint at the expense of precision. This type is ideal for large-scale machine learning inference or graphics where fine precision isn’t critical, and we have many values to store in memory.
* `Float32` (single-precision) offers a good compromise of speed and accuracy for most numerical and real-time workloads.
* `Float64` (double-precision) provides the high precision and wide exponent range needed in scientific computing, simulations, and financial modeling, where rounding errors must be tightly controlled.

Hypothetically, suppose we need more precision than `Float64`, are there bigger values? Yes, specialized types in Julia (and Python) let you go above `Float64`. For example, [the `Quadmath.jl` package](https://github.com/JuliaMath/Quadmath.jl) in Julia implements a `Float128` type if you need precision beyond `Float64`.

While the memory layouts of each of the different precisions are different, they all follow a very similar theme. Thus, let's dig into `Float64` in detail (as this is likely the default for most of our calculations in the course) and leave the other representations as an exercise.

### Float64 memory layout
A 64-bit floating point number $x\in\mathbb{R}$ is represented in memory as:
$$
\begin{align*}
x = -1^{s}\times\,1.\underbrace{d_{51}d_{50}\dots{d_{0}}}_{52\,\text{fraction bits}}\times{2^{E-1023}}
\end{align*}
$$
where $s\in\{0,1\}$ denotes the sign-bit (bit at position `63`), the value $1.d_{51}d_{50}\dots{d_{0}}$ denotes the significand given by:
$$
\begin{align*}
\text{significand} = 1 + \sum_{i = 1}^{52}d_{52-i}2^{-1}
\end{align*}
$$
and $E$ is the unsigned 11-bit stored exponent (value of bits $52\rightarrow{62}$) given by: 
$$
\begin{align*}
E = \sum_{i=52}^{62}d_{i}2^{i-52}
\end{align*}
$$

Let's compute the components of this floating-point number and see if we get the original number back. Use [the `bitstring(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.bitstring) to generate the bits in the number.

In [373]:
d, x = let

    # initialize -
    bitpattern_dictionary = Dict{Int64,Int64}();
    wordsize = 64; # how big is the word size?
    x = 3.1415926535897; # example 64-bit floating point number, let's use π
    a = bitstring(x) |> reverse |> collect .|> v-> parse(Int64,v) # fancy. Nothing to see here, move along (for now anyway).
    
    # put stuff in the dictionary
    for i ∈ 0:(wordsize-1)
        bitpattern_dictionary[i] = a[i+1];
    end
    bitpattern_dictionary,x
end

(Dict(5 => 0, 56 => 0, 35 => 1, 55 => 0, 60 => 0, 30 => 1, 32 => 1, 6 => 1, 45 => 1, 4 => 0…), 3.1415926535897)

__Sign term__: This is a positive number, so we expect $(-1)^{s} = 1$, i.e., $d_{63} = s = 1$:

In [375]:
S = let
    s = d[63];
    S = (-1)^s
end

1

__Significand__: We'll compute the significand using the expression above. However, we'll check our computed value using [the `significand(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.Math.significand) to make sure we are correct. We'll store our calculated value in the `calculated_significand_value::Float64` variable:

In [390]:
calculated_significand_value = let

    calculated_significand_value = 0.0;
    b = 2.0; # binary, base = 2
    significand_range_array = range(1,stop=52,step=1) |> collect;

    for i ∈ significand_range_array
        calculated_significand_value += (b^(-i))*d[52-i]
    end
    calculated_significand_value + 1
end

1.57079632679485

_Check_: Let's use [the `@assert` macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert) to check our calculated significand value. If the `==` comparision comes back `false`, [an `AssertionError` is thrown](https://docs.julialang.org/en/v1/base/base/#Core.AssertionError):

In [401]:
@assert significand(x) == calculated_significand_value # compare built-in versus our calculated value

__Exponent__: Finally, let's compute the exponent value $E$.

In [430]:
E = let

    calculated_exponent_value = 0.0;
    b = 2.0; # binary, base = 2
    exponent_bit_range_array = range(52,stop=62, step = 1) |> collect

    for i ∈ exponent_bit_range_array
        dᵢ = d[i]; 
        calculated_exponent_value += dᵢ*(b^(i-52))
    end
    calculated_exponent_value
end

1024.0

Put it all together:

In [423]:
let
    x = S*calculated_significand_value*2^(E - 1023)
end

3.1415926535897