# Demo: Data Types
In a computer program, all data that we use, manipulate, generate, etc, has a type, which is a member of a type system. A type system provides a formal way to classify and constrain the shapes of data and operations in a program, enabling the detection of mismatches (e.g., passing a string where a number is expected) before or during execution. 

* _Modern languages_: Modern languages such as [Ruby](https://www.ruby-lang.org/en/) and [Python](https://www.python.org/) utilize dynamic type systems, where type checks occur during execution. This allows for greater flexibility at the expense of deferred error detection.
* _Classical languages_: In contrast, [classical languages such as the C programming language](https://gcc.gnu.org/) feature a static, compile-time type system: types must be explicitly declared, checks are enforced prior to execution, and the compiler uses type information to generate optimized code and prevent many categories of runtime errors.
* _Is Julia Like C or Python_? It's like both! Julia is dynamically typed like [Python](https://www.python.org/) in 
that type checks happen at run‐time. However, [the Julia compiler](https://docs.julialang.org/en/v1/devdocs/compiler/) performs type inference on concrete values and generates specialized, statically‐typed machine code for performance, i.e., sort of like [the C programming language](https://gcc.gnu.org/). Thus, although Julia code is compiled, its language semantics remain dynamically typed.

## Primitive Data Types
A primitive data type is a basic, built‐in data type provided by a programming language whose values are represented atomically; that is, primitive types are not composed of other types.

In Julia, primitive types like `Int64`, `Float64`, `Bool`, and `Char` are fixed‐size, _unboxed machine types_ used directly by the compiler. In contrast, built‐in types such as int, float, and bool in Python are heap‐allocated [PyObject instances](https://docs.python.org/3/c-api/structures.html#c.PyObject).

Let's unpack this.
* _Unboxed machine types_? An unboxed type is one whose values are stored directly in memory (e.g., [in registers](https://en.wikipedia.org/wiki/Processor_register) or on the stack) without an extra level of indirection or heap allocation.
* _Stack and heap_? The stack is a region of memory used for storing function call frames, local variables, and control flow information in a last‐in, first‐out manner, with automatic allocation and deallocation as functions are entered and exited. The heap is a separate memory region for dynamically allocated objects whose lifetimes are managed manually or by a garbage collector, allowing arbitrary allocation and deallocation but incurring overhead for bookkeeping.
* _Stack faster than heap_? Accessing stack memory is generally faster because it involves simple pointer arithmetic and a contiguous, cache‐friendly layout without extra metadata lookup. Heap access requires managing allocation metadata and may involve pointer dereferencing and less predictable locality, which can introduce overhead.

Ultimately, all primitive data types can be represented as an ordered collection, i.e., an array of binary $\{0,1\}$ values.

### Integer and Boolean Types
An integer type represents whole numbers $x\in\mathbb{Z}$ (positive, negative and zero). Integers are implemented using a _fixed‐width_ binary form (e.g., 32‐ or 64‐bit). A boolean (Bool) type represents truth values (`true` and `false`); Bool values are stored as a single bit or 1$\times$byte, i.e., an 8-bit integer value.

__Integers__: Let's look at some integers.

In [4]:
x = 2; # select a whole number ... -2, -1, 0, 1, 2, ...

_What type is `x` in Julia?_ We can find the type of something using [the `typeof(...)` method](https://docs.julialang.org/en/v1/base/base/#Core.typeof):

In [6]:
typeof(x) # this returns the type of the argument

Int64

What is the bitstring of $x$? The bitstring gives the literal bit representation of a primitive type. We can get the bitstring of $x$ using [the `bistring(...)` method in Julia](https://docs.julialang.org/en/v1/base/numbers/#Base.bitstring). This method shows us the bit pattern of the integer argument:

In [8]:
bitstring(x) # gives the values of the values stored in the memory "boxes"

"0000000000000000000000000000000000000000000000000000000000000010"

__Boolean__: Now let's look at some Boolean types $\mathbb{B} = \left\{\text{true},\text{false}\right\}$. A variable $x\in\mathbb{B}$ can take on either `true` or `false` values. However, we typically model these values as a special type of 8$\times$bit (1$\times$byte) integer value.

In [10]:
flag = true; # the flag variable can take on values of {true | false}

_What type is `flag` in Julia?_ We can find the type of something using [the `typeof(...)` method](https://docs.julialang.org/en/v1/base/base/#Core.typeof):

In [12]:
typeof(flag)

Bool

_What is the bitstring of the `flag` variable?_ Let's use [the `bistring(...)` method in Julia](https://docs.julialang.org/en/v1/base/numbers/#Base.bitstring) to see what our Boolean `flag` variable looks like:

In [14]:
bitstring(flag) # this should be 8xbits wide

"00000001"

### Floating point types
Floating‐point types represent real numbers using a sign bit, an exponent field, and a significand according to [the IEEE 754 standard](https://en.wikipedia.org/wiki/IEEE_754). They allow the representation of fractional values and very large or small magnitudes. 
* Julia provides three standard IEEE-754 floating-point types:  `Float16`, `Float32`, and `Float64`. The `Float16` type corresponds to half‐precision (16-bit: 1 sign – 5 exponent – 10 significand); `Float32` represents single‐precision (32-bit: 1 – 8 – 23), and `Float64` represents double‐precision (64-bit: 1 – 11 – 52), respectively. These types trade off range and precision for storage and performance.
* In Python, the built-in floating‐point type is `float`, which is implemented as a 64-bit double-precision number (IEEE 754 double‐precision).

#### Memory layout of floating-point numbers
A 64-bit float $x\in\mathbb{R}$ is represented in memory as:
$$
\begin{align*}
x = -1^{s}\times\,1.\underbrace{b_{51}b_{50}\dots{b_{0}}}_{52\,\text{fraction bits}}\times{2^{E-1023}}
\end{align*}
$$
where $s\in\{0,1\}$ denotes the sign-bit (bit `63`), $1.b_{51}b_{50}\dots{b_{0}}$ form the significand and  $E$ is the unsigned 11$\times$bit stored exponent (value of bits $52\rightarrow{62}$) given by: $E = \sum_{i=0}^{10}b_{52+i}2^{10-i}$.


On the other hand, a 32-bit number $x\in\mathbb{R}$ is encoded as:
$$
\begin{align*}
x = -1^{s}\times\,1.\underbrace{b_{22}b_{21}\dots{b_{0}}}_{23\,\text{fraction bits}}\times{2^{E-127}}
\end{align*}
$$
where $s\in\{0,1\}$ denotes the sign-bit (bit `31`) and the exponent is given by $E = \sum_{i=0}^{8}b_{22+i}2^{8-i}$.
Notice the difference between the 64- and 32-bit numbers: the number of elements used to compute the fraction and the exponent term are different, and the location of the sign bit has changed, but otherwise they have the same structural layout in memory.

Let's look at a couple of examples of floating-point numbers, starting with a 64-bit number:

In [16]:
let
    x = 54.13; # default: in Julia, the default floating point number of 64-bit.
    bitstring(x)
end

"0100000001001011000100001010001111010111000010100011110101110001"

The same numerical value for $x\in\mathbb{R}$ (just stored in 32-bits) has a different memory layout:

In [18]:
let
    x = 54.13 |> Float32 # we "cast" x to be in stored as a Float32, not 64xbits!
    bitstring(x) # gives us a string with the bit pattern
end

"01000010010110001000010100011111"

### Character Types
Text on the computer is composed of characters, and characters are just special integers! 
* _What is a character_? A character data type represents a single textual symbol, such as a letter, digit, punctuation mark, or control code. It is usually stored as an integer(!) corresponding to a specific encoding, e.g., traditionally [ASCII](https://en.wikipedia.org/wiki/ASCII) or [Unicode](https://en.wikipedia.org/wiki/Unicode) on modern systems. 
* _What is a character encoding_? A character encoding defines a mapping between textual symbols (letters, digits, punctuation, etc.) and numeric code points (unique integers) so that text can be stored and transmitted as bytes. Examples include [older systems such as ASCII](https://en.wikipedia.org/wiki/ASCII), which uses one byte (8 bits) per character, and [Unicode encodings like UTF-8 or UTF-16](https://en.wikipedia.org/wiki/Unicode), which represent a much wider range of characters.
* _What about Julia_? In most languages, a character type occupies a fixed number of bytes and can be manipulated individually or as part of Strings (ordered collections of characters). This is also true in Julia. In Julia, [a Char type](https://docs.julialang.org/en/v1/base/strings/#Core.Char) is a 4-byte (32-bit) value that directly encodes a Unicode code point.

Let's explore [the `Char` type in Julia](https://docs.julialang.org/en/v1/base/strings/#Core.Char) (notice the single quotes):

In [49]:
c = '🍣' # example Unicode character in Julia. See: https://docs.julialang.org/en/v1/manual/unicode-input/

'🍣': Unicode U+1F363 (category So: Symbol, other)

What is the code point (special, unique integer) for the character `c`?

In [62]:
code = UInt32(c) # extract code point as UInt32 (4 x bytes)

0x0001f363

_Hmmm, what?_ That's a strange-looking integer! The `code::UInt32` is a hexadecimal number, i.e., a number written in base 16. The giveaway (which is a convention) is the `0x` prefix. We'll dig into these numbers and count in base b later. 

Can we see the data that each byte contains? Yes! Let's use [the `reinterpret(...)` method](https://docs.julialang.org/en/v1/base/arrays/#Base.reinterpret) and break the 4-bytes in four 1-byte blocks!

In [90]:
reinterpret(Tuple{UInt8, UInt8, UInt8, UInt8}, code) |> collect

4-element Vector{UInt8}:
 0x63
 0xf3
 0x01
 0x00

This factors the 32-bit (4 x byte) number into four 8-bit (1 x byte) numbers. Notice that we are chopping from right to left (same convention as a bitstring representation of integers and floating point numbers above). Why not the other way? This has to do [with Endianness](https://en.wikipedia.org/wiki/Endianness).
* [Endianness](https://en.wikipedia.org/wiki/Endianness) describes the order in which a computer stores the bytes of a multi‐byte value: in little‐endian, the least significant byte (far right) comes first in memory, whereas in big‐endian, the most significant byte is stored first. 
* On a little-endian machine (like most x86/x86-64 and ARM systems), the least significant byte is stored at the lowest memory address, so when we reinterpret 0x0001F363 as four UInt8s, we get: `[0x63,0xF3,0x01,0x00]`

Thus, characters are our first examples of [a collection type](https://en.wikipedia.org/wiki/Collection_(abstract_data_type)), it is an ordered collection of smaller components, in this case a stack of 1 x byte (8-bit) blocks! Let's dig a little deeper into collection types.

## Collection Types
A collection type is a composite data structure aggregating multiple values, often of the same or related types, into a single container (e.g., tuples, arrays, sets, and dictionaries). It is not a primitive type but a collection of primitive types. Let's look at some simple examples of collections, starting with one that we have already seen (sort of), namely [Tuples](https://docs.julialang.org/en/v1/manual/functions/#Tuples).

### Tuples
A tuple is an immutable, ordered collection of elements that can hold a fixed number of items, potentially of different types. Once created, their size and contents cannot be changed, which makes tuples useful for grouping related values without the overhead of a mutable container.
* In Julia, every tuple is an immutable, composite object with a type that encodes its length and element types (e.g., `Tuple{Int64, Float64}`). A tuple's memory layout is a contiguous block of its fields: if all elements are “isbits” (primitives), the tuple itself is isbits and can be unboxed (often in registers or on the stack). However, if any element is a non‐isbits type (like a String), the tuple’s fields are [pointers to heap‐allocated objects](https://en.wikipedia.org/wiki/Pointer_(computer_programming)), each aligned and stored sequentially.

Let's unpack some of this. First, [a Tuple type](https://docs.julialang.org/en/v1/base/base/#Core.Tuple) is _immutable_, i.e., it can't be changed once it's constructed.

In [304]:
example_tuple = let
    tuple = (18,36.6); # populate with data
end;

What is the type of the `example_tuple` variable? Let's use [the `typeof(...)` method](https://docs.julialang.org/en/v1/base/base/#Core.typeof) to find out.

In [306]:
typeof(example_tuple)

Tuple{Int64, Float64}

Let's try to change a value in the `example_tuple::Tuple{Int64, Float64}` variable. This should blow up, because [Tuples in Julia](https://docs.julialang.org/en/v1/base/base/#Core.Tuple) are immutable.

In [308]:
example_tuple[1] = 6 # this will blow up!

LoadError: MethodError: no method matching setindex!(::Tuple{Int64, Float64}, ::Int64, ::Int64)
The function `setindex!` exists, but no method is defined for this combination of argument types.

In [309]:
bitstring(example_tuple) # We can't get the bitstring directly, a Tuple is not primitive!

LoadError: ArgumentError: Tuple{Int64, Float64} not a primitive type

However, we can get the elements of `example_tuple` and their bit layouts by [indexing into the Tuple](https://docs.julialang.org/en/v1/base/base/#Core.Tuple)

In [311]:
bitstring(example_tuple[2]) # get the bitstring of the component i

"0100000001000010010011001100110011001100110011001100110011001101"

We can see the raw bytes associated with the `example_tuple::Tuple{Int64,Float64}` using [the `reinterpret(...)` method](https://docs.julialang.org/en/v1/base/arrays/#Base.reinterpret):

In [331]:
v = reinterpret(NTuple{16,UInt8}, example_tuple) |> collect # we have 16 8-bit blocks (128 bits in total)

16-element Vector{UInt8}:
 0x12
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0xcd
 0xcc
 0xcc
 0xcc
 0xcc
 0x4c
 0x42
 0x40

In [329]:
bitstring(v[1])

"00010010"

### Arrays
An array is a contiguous, fixed-size _ordered_ collection of elements of the same type, allowing constant-time access to each element via integer indices. In most languages, arrays occupy a single block of memory, with the index of the first element typically at 0 or 1, and accessing element i corresponds to computing the base address plus i times the element size. 
* [Julia’s `Array{T}` type](https://docs.julialang.org/en/v1/base/arrays/#Core.Array-Tuple%7BNothing,%20Any%7D) is a built-in, statically typed, homogeneous container that is 1-indexed and stored [in column-major order](https://en.wikipedia.org/wiki/Row-_and_column-major_order), allowing the compiler to generate highly optimized, unboxed loops.
* Python’s native lists are heterogeneous and zero-indexed, while [NumPy’s homogeneous arrays](https://numpy.org/doc/stable/reference/generated/numpy.array.html) are zero-indexed and row-major but live in a separate C library rather than the core language.

In Julia and Python, arrays are mutable, i.e., elements can be changed after we populate the array. Let's take a look at a Julia array.

In [316]:
let
    p = pointer_from_objref(Ref(example_tuple)) |> p-> Ptr{Tuple{Int,Float64}}(p) 
end

Ptr{Tuple{Int64, Float64}} @0x000000010a2772b0

### Sets and Dictionaries
A Set is an unordered collection of unique elements that supports fast membership checks, insertions, and removals. A Dictionary (or map) is an associative container that stores key–value pairs, allowing lookup, insertion, and deletion of values based on their unique keys. Together, they provide efficient ways to manage collections of distinct items and to retrieve data by identifiers.
* Julia’s `Set{T}` and `Dict{K,V}` are parametric, homogeneous containers, meaning every element in a `Set` has the same type `T`, and every key–value pair in a `Dict` has types `K` and `V`, allowing the compiler to generate specialized, unboxed code paths.
* In Python, built‐in `set` and `dict` are inherently heterogeneous (each slot holds a generic `object` reference), which introduces an extra level of indirection and dynamic dispatch at runtime. Finally, modern Julia `Dict` and Python `dict` both preserve insertion order, but Julia still requires you to specify type parameters up front for optimal performance.

## Custom composite types
A custom composite type is a _user‐defined data structure_ that aggregates multiple fields (possibly of different types) under a single name, enabling encapsulation of related data. In Julia or C, these are declared [using the struct keyword](https://docs.julialang.org/en/v1/manual/types/#Composite-Types) in combination with a list of named fields, while in Python an object is an instance of a class that defines its attributes and methods.