# L1a: Introduction to Data Types
In this lecture, we are going to talk about types. All data in a computer program has a **type**, which is part of a type system that classifies data and constrains operations. Besides (sort of) standardizing how things are stored, type systems enable detection of mismatches (e.g., passing a string where a number is expected) before or during execution.

**Type System Approaches:**
* **Modern languages** (e.g., [Ruby](https://www.ruby-lang.org/en/) or [Python](https://www.python.org/)) have dynamic type systems that perform checks at runtime, offering flexibility but deferring error detection.
* **Classical languages** (e.g., [C](https://gcc.gnu.org/) or [Java](https://www.java.com/en/)) use static type systems that require explicit declarations and perform checks at compile-time, enabling optimization and early error prevention.
* **Julia's hybrid approach** combines the best of both worlds: it's dynamically typed like Python but performs type inference and generates specialized machine code like C. While dynamically typed at the language level, Julia compiles to optimized, statically-typed machine code.

**What we'll cover:** We'll explore primitive types (numbers, text), collection types (arrays, tuples, sets and dictionaries), and custom composite types, examining their memory representation and manipulation in Julia.

## Course Logistics
Before we get to the type system, let’s review the course policy, procedures, and expectations. Links are posted on Canvas: the course policy and procedures are available [here](https://cornell.box.com/s/y1mxxj9fvbdvuz9gpjcdzkqkfks1dxlh), and the course schedule is available [here](https://cornell.box.com/s/wk8pjbdnjodtppfuksuy3zlxsb8k7rpn). Let's take a look at these resources.
___

## Primitive Data Types
Primitive data types are the basic building blocks provided by programming languages. They're atomic, meaning they're not composed of other types, and represent simple values like numbers, characters, and true/false values.

Let's start with [Integers](https://docs.julialang.org/en/v1/base/numbers/#Core.Int) and [the `Bool` type](https://docs.julialang.org/en/v1/base/numbers/#Core.Bool) which represents a boolean value, either `true` or `false`. 

### Integer and Boolean Types
An __integer (Int)__ type represents whole numbers $x\in\mathbb{Z}$ (positive, negative and zero). Integers are implemented using a _fixed‑width_ binary form (e.g., 32‑ or 64‑bit). 

A __boolean (Bool)__ type represents truth values (`true` and `false`); in Julia, a `Bool` conceptually represents a single truth bit, but in memory `sizeof(Bool) == 1` on typical implementations (it's stored as a full byte).

__Integers__: Let's look at some integers.

In [117]:
x = 2 |> Int32; # select a whole number ... -2, -1, 0, 1, 2, ...

_What type is `x` in Julia?_ We can find the type of something using [the `typeof(...)` method](https://docs.julialang.org/en/v1/base/base/#Core.typeof):

In [119]:
typeof(x) # this returns the type of the argument

Int32

What is the bitstring of $x$? The bitstring gives the literal bit representation of a primitive type. We can get the bitstring of $x$ using [the `bitstring(...)` method in Julia](https://docs.julialang.org/en/v1/base/numbers/#Base.bitstring). This method shows us the bit pattern of the integer argument:

In [121]:
bitstring(x) # shows the bit pattern stored in memory

"00000000000000000000000000000010"

__Boolean__: Now let's look at some Boolean types $\mathbb{B} = \left\{\text{true},\text{false}\right\}$. A variable $x\in\mathbb{B}$ can take on either `true` or `false` values. Bool values conceptually represent a single truth bit; in memory they are stored as an 8‑bit (1‑byte) value.

In [129]:
flag = false; # the flag variable can take on values of {true | false}

_What type is `flag` in Julia?_ We can find the type of something using [the `typeof(...)` method](https://docs.julialang.org/en/v1/base/base/#Core.typeof):

In [131]:
typeof(flag)

Bool

_What is the bitstring of the `flag` variable?_ Let's use [the `bitstring(...)` method in Julia](https://docs.julialang.org/en/v1/base/numbers/#Base.bitstring) to see what our Boolean `flag` variable looks like:

In [133]:
bitstring(flag) # this should be 8 bits wide

"00000000"

Let's use [the `sizeof(...)` method](https://docs.julialang.org/en/v1/base/base/#Base.sizeof-Tuple%7BType%7D) to tell how many bytes (8 bits) a Boolean type has.

In [16]:
sizeof(x) # number of bytes

1

___

### Floating point types
Floating-point types model real numbers using three components according to [the IEEE 754 standard](https://en.wikipedia.org/wiki/IEEE_754): a sign bit, an exponent (the scale), and a significand (the fraction). This allows representation of both fractional values and very large or small magnitudes.

> __Julia versus Python floating point numbers__: Julia provides three standard IEEE-754 floating-point types that trade off precision for storage: `Float16` (half-precision), `Float32` (single-precision), and `Float64` (double-precision). Python's built-in `float` type is always 64-bit double-precision.

Let's look at a couple of examples. First, here's a 64-bit number (Julia's default):

In [19]:
let
    x = 54.13; # default: in Julia, the default floating point number is 64-bit.
    bitstring(x)
end

"0100000001001011000100001010001111010111000010100011110101110001"

The same numerical value stored in 32-bits has a different memory layout:

In [21]:
let
    x = 54.13 |> Float32 # cast to Float32 (single precision), not Float64
    bitstring(x) # gives a string with the bit pattern
end

"01000010010110001000010100011111"

Let's use [the `sizeof(...)` method](https://docs.julialang.org/en/v1/base/base/#Base.sizeof-Tuple%7BType%7D) to tell how many bytes (8 bits) a floating point number type has.

In [23]:
let
    x = 54.13 |> Float16 # cast to Float16 (half precision), not Float64
    sizeof(x) # returns number of bytes used to store x
end

2

___

### Character Types
Text on computers is composed of characters, and here's the key insight: characters are just special integers! Each character, whether a letter, digit, punctuation mark, or control code, gets stored as an integer corresponding to a specific encoding scheme. Traditional systems used [ASCII](https://en.wikipedia.org/wiki/ASCII) with one byte per character, while modern systems use [Unicode encodings like UTF-8 or UTF-16](https://en.wikipedia.org/wiki/Unicode) to represent a much wider range of characters.

> __Encodings:__ Character encodings define the mapping between textual symbols and numeric code points (unique integers), enabling text to be stored and transmitted as bytes. Julia's [Char type](https://docs.julialang.org/en/v1/base/strings/#Core.Char) follows this pattern: each character is a 4-byte (32-bit) value that directly encodes a Unicode code point, allowing manipulation either individually or as part of Strings.

Let's explore [the `Char` type in Julia](https://docs.julialang.org/en/v1/base/strings/#Core.Char) (notice the single quotes):

In [26]:
c = '🍣' # example Unicode character in Julia. See: https://docs.julialang.org/en/v1/manual/unicode-input/

'🍣': Unicode U+1F363 (category So: Symbol, other)

What is the code point (special, unique integer) for the character `c`?

In [28]:
code = UInt32(c) # extract code point as UInt32 (4 x bytes)

0x0001f363

_Hmmm, what?_ That's a strange-looking integer! The `code::UInt32` is a [hexadecimal number](https://en.wikipedia.org/wiki/Hexadecimal), i.e., a number written in base 16. The giveaway (which is a convention) is the `0x` prefix. We'll dig into these numbers and examine representations in different bases later.

Can we see the data that each byte contains? Yes! Let's use [the `reinterpret(...)` method](https://docs.julialang.org/en/v1/base/arrays/#Base.reinterpret) and break the 4 bytes into four 1-byte blocks!

In [30]:
reinterpret(Tuple{UInt8, UInt8, UInt8, UInt8}, code) |> collect

4-element Vector{UInt8}:
 0x63
 0xf3
 0x01
 0x00

This factors the 32-bit value into four 8-bit (1-byte) values. Notice we list bytes from least significant to most significant (right to left) on little‑endian hosts (this corresponds to how `bitstring` shows bits on such machines).  
> __Endianness:__ This ordering relates to [Endianness](https://en.wikipedia.org/wiki/Endianness), which describes how computers store the bytes of multi-byte values. In little-endian systems (like most x86/x86-64 and ARM machines), the least significant byte comes first in memory, while big-endian systems store the most significant byte first. So when we reinterpret 0x0001F363 as four UInt8s on a little-endian machine, we get: `[0x63,0xF3,0x01,0x00]`  

Characters thus represent our first example of a collection type, an ordered collection of smaller components, in this case a stack of 1 × byte (8-bit) blocks! Let's dig deeper into collection types.

___

## Collection Types
A collection type is a composite data structure aggregating multiple values, often of the same or related types, into a single container (e.g., tuples, arrays, sets, and dictionaries). It is not a primitive type but a collection of primitive types. Let's look at a few examples of collections, starting with one that we have already seen (sort of), namely [Tuples](https://docs.julialang.org/en/v1/manual/functions/#Tuples).

### Tuples
A tuple is an immutable, ordered collection of elements that can hold a fixed number of items, potentially of different types. Once created, their size and contents cannot be changed, making tuples useful for grouping related values without the overhead of a mutable container.

> **Julia tuple memory layout:** Every tuple in Julia is an immutable composite object with a type that encodes its length and element types (e.g., `Tuple{Int64, Float64}`). The memory layout is a contiguous block of fields: if all elements are "isbits" (primitives), the tuple itself is isbits and can be unboxed (often in registers or on the stack). However, if any element is a non-isbits type (like a String), the tuple's fields become [pointers to heap-allocated objects](https://en.wikipedia.org/wiki/Pointer_(computer_programming)), each aligned and stored sequentially.

Let's explore tuples with a concrete example. Since [Tuple types](https://docs.julialang.org/en/v1/base/base/#Core.Tuple) are immutable, they can't be changed once constructed:

In [34]:
example_tuple = let
    tuple = (18,36.6); # populate with data. Notice not the same type for each element
end;

What is the type of the `example_tuple` variable? Let's use [the `typeof(...)` method](https://docs.julialang.org/en/v1/base/base/#Core.typeof) to find out.

In [36]:
typeof(example_tuple)

Tuple{Int64, Float64}

Tuples are immutable. Let's try to change a value in the `example_tuple::Tuple{Int64, Float64}` variable. This should blow up, because [Tuples in Julia](https://docs.julialang.org/en/v1/base/base/#Core.Tuple) are immutable.

> **Try-catch blocks:** The `try-catch` construct allows us to handle errors gracefully instead of crashing the program. 
> Code in the `try` block is executed, and if an error occurs, execution jumps to the `catch` block where we can handle the error (like printing a message) rather than terminating the program.  The program continues executing normally after the `catch` block.

So what happens?

In [38]:
try
    example_tuple[1] = 6 # this will raise an error because tuples are immutable
catch e
    println("expected error: ", e)
end
println("After the try-catch block, the program continues executing normally.")

expected error: MethodError(setindex!, ((18, 36.6), 6, 1), 0x0000000000006862)
After the try-catch block, the program continues executing normally.


What does the bitstring look like for the `example_tuple::Tuple{Int64, Float64}` variable?

In [40]:
try
    bitstring(example_tuple) # Can't get the bitstring directly; a Tuple is not a primitive type.
catch e
    println("expected error: ", e)
end

expected error: ArgumentError("Tuple{Int64, Float64} not a primitive type")


However, we can get the elements of `example_tuple` and their bit layouts by [indexing into the Tuple](https://docs.julialang.org/en/v1/base/base/#Core.Tuple). For example, let's look at the second element:

In [42]:
bitstring(example_tuple[2]) # get the bitstring of the component i

"0100000001000010010011001100110011001100110011001100110011001101"

We can see the raw bytes associated with the `example_tuple::Tuple{Int64,Float64}` using [the `reinterpret(...)` method](https://docs.julialang.org/en/v1/base/arrays/#Base.reinterpret). Note: this works because the tuple is composed of `isbits` elements and the total size aligns; the exact byte order and layout you see will reflect host endianness and alignment.

In [44]:
v = reinterpret(NTuple{16,UInt8}, example_tuple) |> collect # we have 16 8-bit blocks (128 bits total)

16-element Vector{UInt8}:
 0x12
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0x00
 0xcd
 0xcc
 0xcc
 0xcc
 0xcc
 0x4c
 0x42
 0x40

In [45]:
bitstring(v[2])

"00000000"

___

### Arrays
An array is a contiguous, ordered collection of elements of the same type, allowing constant-time access to its elements via integer indices. In most languages, arrays occupy a single block of memory, with element access computed as the base address plus the index times the memory size of each element.

> **Julia vs. Python arrays:** [Julia's `Array{T}` type](https://docs.julialang.org/en/v1/base/arrays/#Core.Array-Tuple%7BNothing,%20Any%7D) is a built-in, statically typed container that is `1-indexed` and stored [in column-major order](https://en.wikipedia.org/wiki/Row-_and_column-major_order). Python's native lists are heterogeneous and zero-indexed, while [NumPy's homogeneous arrays](https://numpy.org/doc/stable/reference/generated/numpy.array.html) are zero-indexed and row-major (implemented in a separate C library rather than the core language).

Arrays in both Julia and Python are mutable, meaning elements can be changed after we populate the array. Let's explore a Julia array:

In [48]:
a = rand(10) # build a 10-element random array

10-element Vector{Float64}:
 0.5061513203579936
 0.479683614839865
 0.3284543643253297
 0.013335161697898834
 0.8062937893197483
 0.4940970380410046
 0.9989148431234357
 0.04104521767896285
 0.4720330647508749
 0.5512561213725563

We access the elements of an array by passing the index of the array in square brackets, e.g., `a[3]` returns the third element in Julia (because it is `1`-based):

In [50]:
a[3]

0.3284543643253297

Arrays are __mutable__, i.e., we can change them after we build them. For example:

In [52]:
a[3] = π

π = 3.1415926535897...

In [53]:
a

10-element Vector{Float64}:
 0.5061513203579936
 0.479683614839865
 3.141592653589793
 0.013335161697898834
 0.8062937893197483
 0.4940970380410046
 0.9989148431234357
 0.04104521767896285
 0.4720330647508749
 0.5512561213725563

Arrays in Julia are `1`-based. This is a somewhat controversial design choice.
> __Note:__ Julia's choice of `1`-based indexing, unlike most modern programming languages which use `0`-based indexing (e.g., C, Python, and Java), is a controversial design choice. However, it is a deliberate decision grounded in mathematical consistency, readability, and domain alignment. In short, there are a number of arguments for this choice.

What happens if we try to grab an element that is _outside_ the array?

In [55]:
try
    a[11] # asking for index 11, but the array has only 10 items
catch e
    println("expected error: ", e)
end

expected error: BoundsError([0.5061513203579936, 0.479683614839865, 3.141592653589793, 0.013335161697898834, 0.8062937893197483, 0.4940970380410046, 0.9989148431234357, 0.04104521767896285, 0.4720330647508749, 0.5512561213725563], (11,))


___

### Sets and Dictionaries
A [Set type](https://docs.julialang.org/en/v1/base/collections/#Base.Set) is an unordered collection of unique elements that supports fast membership checks, insertions, and removals. A [Dictionary (or map) is an associative container](https://docs.julialang.org/en/v1/base/collections/#Base.Dict) that stores key–value pairs, allowing lookup, insertion, and deletion of values based on their unique keys.

> **Julia vs. Python collections:** Julia's `Set{T}` and `Dict{K,V}` are parametric containers, meaning every element in a `Set` has the same type `T`, and every key–value pair in a `Dict` has types `K` and `V`. However, the elements can be any type `T`, and the keys `K` and values `V` can also be of any type. In contrast, Python's built-in `set` and `dict` are inherently heterogeneous (each slot holds a generic `object` reference), making them more flexible than their Julia equivalents.

Let's build a few examples of set and dictionary collection types:

In [58]:
d = let

    d = Dict{Int64, String}(); # creates a dictionary that models text in a file.
    d[1] = "This is the first line in a text file";
    d[2] = "This is the second line in a text file";
    d[3] = "This is the last line in a text file";

    d
end

Dict{Int64, String} with 3 entries:
  2 => "This is the second line in a text file"
  3 => "This is the last line in a text file"
  1 => "This is the first line in a text file"

We can access the values stored in a dictionary by passing in the `key` pointing to a `value`, i.e., to get line `2`, we would:

In [60]:
d[2]

"This is the second line in a text file"

Dictionaries (in general) do __not__ guarantee insertion order. For example, we inserted line `2` before line `3`, but the order when the dictionary was printed is 2,3,1. If you need a map that preserves insertion order, consider `OrderedDict` from the `DataStructures.jl` package. Likewise, there is no notion of order in a set.

Consider the `s::Set{Char}` example:

In [62]:
s = let

    s = Set{Char}(); # empty at this point
    push!(s, 'a'); # add items to the set using `push!`
    push!(s, 'b');
    push!(s, 'c');
    push!(s, 'd');

    s
end

Set{Char} with 4 elements:
  'a'
  'c'
  'd'
  'b'

We can't access a particular item in the `s::Set{Char}` set by passing in an index (or key) because these concepts don't apply to sets. Instead, we can use [the `pop!(...)` method](https://docs.julialang.org/en/v1/base/collections/#Base.pop!) to pop (get) an arbitrary element from a set:

In [64]:
pop!(s)

'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

All the typical mathematical operations on sets, such as intersection, union, or membership checks, are implemented in most modern programming languages, including Julia; [see the documentation for operations on sets in Julia](https://docs.julialang.org/en/v1/base/collections/#Set-Like-Collections).

___

## Custom composite types
Custom composite types are user-defined data structures that aggregate multiple fields (possibly of different types) under a single name, enabling encapsulation of related data. Think of them as custom containers that you design to hold exactly the data you need for your specific problem.

> **Language differences:** In Julia, these are declared [using the struct keyword](https://docs.julialang.org/en/v1/manual/types/#Composite-Types) with a list of named fields, similar to C. Python uses classes with attributes and methods, which is a more object-oriented approach.

We'll explore this topic in much greater depth later, but for now, let's build some simple examples to illustrate how composite types work in Julia.

In [68]:
struct MyStudentModel

    # data -
    firstname::String # fields hold the data, they have names and types
    lastname::String
    id::Int64

    MyStudentModel(f,l,id) = new(f,l,id); # constructor
end

Now we can create an instance of our `MyStudentModel` type by calling the constructor:

In [70]:
model = MyStudentModel("Test", "Student", 1234)

MyStudentModel("Test", "Student", 1234)

We access the data stored in our composite type using dot syntax:

In [72]:
model.id # returns the value stored in the id field

1234

Here's a key point: because we used the `struct` keyword, our student model is immutable. Once we build it, we cannot change any of the data stored in the model. Let's see what happens when we try:

In [74]:
try
    model.id = 5678 # we are trying to change an immutable struct.
catch e
    println("expected error: ", e)
end

expected error: ErrorException("setfield!: immutable struct of type MyStudentModel cannot be changed")


Sometimes we need to modify our data after creating it. For these cases, we can create mutable composite types by adding the `mutable` keyword when declaring the struct:

In [76]:
mutable struct MyMutableStudentModel

     # data -
    firstname::String # fields hold the data, they have names and types
    lastname::String
    id::Int64

    MyMutableStudentModel() = new(); # builds an empty model

end

We create mutable composite types the same way as immutable ones—by calling the constructor. However, using an empty constructor (`new()`) creates an instance with uninitialized fields; you must assign fields before reading them to avoid undefined values. Prefer constructors that fully initialize fields unless you specifically need an uninitialized instance.

In [78]:
mutable_model = MyMutableStudentModel();
mutable_model.id = 6789;
mutable_model.firstname = "Firstname";
mutable_model.lastname = "Lastname";

___

## Lab
In Lab `L1b`, we will make sure everyone's machines are set up properly for the course. This includes installing Julia, setting up Jupyter notebooks, and ensuring all necessary packages are available.

# Today?
That's a wrap! What are some of the interesting things we discussed today?