# L3a: Understanding Linear Data Structures
In this module, we will explore linear data structures, which are fundamental in data science. Linear data structures sequentially organize data, allowing for efficient access and manipulation.

By the end of this module, you will be able to define and demonstrate mastery of the following key concepts:

* __Linear Data Structures__ are collections of elements arranged in a sequential manner, where each element is connected to its predecessor and successor. We'll concentrate on four primary liner data structures: arrays, queues, stacks, and linked lists.
* __Arrays__ are fixed-size collections of elements of the same type, allowing for efficient indexing and iteration. They are commonly used for storing homogeneous data and provide fast access to elements via their indices.
* __Queues, stacks and linked list__ are linear data strcutures that have different access patterns and use cases. Queues follow a first-in-first-out (FIFO) principle, stacks follow a last-in-first-out (LIFO) principle, and linked lists consist of nodes that point to each other, allowing for dynamic memory allocation and efficient insertions and deletions.

Linear data structures are essential for organizing and processing data in a sequential manner, enabling efficient algorithms and operations. Understanding these structures is crucial for solving problems in data science, machine learning and artificial intelligence. Let's get started!

___

## Linear versus Non-Linear Data Structures
A data structure is an abstract model for organizing and storing data in memory so that specific operations—such as insertion, deletion, lookup, and traversal—can be performed efficiently. Different data structures offer trade-offs in time complexity and space usage. Choosing the right data structure is essential for optimizing algorithm performance and resource consumption.

Let's consider two classes of data structures: linear and non-linear.

* __Linear Data Structures__ store elements in a sequential, contiguous manner, where each element has a single predecessor and successor (excluding the ends of the data). Navigating and indexing are straightforward for a linear data structure, as elements are stored in a single, linear sequence. Examples that we'll explore here include __arrays__, __stacks__, __queues__ and __linked lists__.
* __Non-Linear Data Structures__ store elements in a hierarchical or interconnected manner, where each element can have multiple predecessors and successors. This allows for more complex relationships between elements, but also makes navigation and indexing more complicated. Examples that we'll consider later include trees and graphs.

__TLDR__: Linear data structures arrange elements in a straight sequence for simple, ordered access, whereas non‐linear structures organize elements in branching or networked relationships for more complex connections.

___

## Arrays
An array is a fixed-size, _homogeneous_ collection of elements—each element is the same type—stored in a contiguous memory block. The contiguous memory layout provides _constant-time access_ to the values stored in the array using an index. Consider the 1-dimensional array $\mathbf{a}$, which contains $n = 6$ elements of type `T::Int`:

<img
  src="figs/Fig-Array.svg"
  alt="Array"
  height="200"
  width="400" />

The name of an array, e.g., $\mathbf{a}\in\mathbb{Z}$, points to the first element, then each subsequent element in the array adds `+1` memory units (sizeof `T`) to the memory address of the previous element. We access the elements of the array $\mathbf{a}\in\mathbb{Z}$ using their index. Different languages have different indexing conventions.

### Higher-dimensional arrays
The array $\mathbf{a}$ is 1-dimensional, like a row or column. Arrays can be multi-dimensional: two-dimensional arrays (matrices) organize data into rows and columns for tables or images, while higher-order arrays (tensors) extend to three or more axes for color channels, time-series, machine learning feature maps, etc.

### Common Uses of 1- and Higher-Dimensional Arrays
* __Structured Datasets as 2D Arrays:__ Tabular data are loaded into a 2D array with rows as observations and columns as features, facilitating slicing into training/test sets, scaling features, and input to machine learning libraries.
* __Vectorized Computation and Transformation:__ Arrays enable elementwise arithmetic and linear algebra operations—like dot products and broadcasting—without explicit loops, potentially speeding up some everyday computational tasks.
* __Multidimensional Input Encoding:__ Images (height × width × channels), time-series windows, and word embeddings are naturally stored in 2D or 3D arrays, plugging seamlessly into various machine learning and artificial intelligence models. 

Now that we understand arrays, let's explore two other array like data structures with intersting access patterns: stacks and queues.

___

## Stacks
A [stack](https://en.wikipedia.org/wiki/Stack_(abstract_data_type)) is a linear data structure that follows the __Last-In-First-Out (LIFO)__ principle. Elements are added and removed from the __top of the stack__. Stacks provide constant-time operations for adding and removing elements. Consider the stack `s::Stack{Int64}`:

<img
  src="figs/Fig-Stack.svg"
  alt="Array"
  height="300"
  width="600" />

When using a stack, we do __not__ have direct access to all the elements in the stack (like we would for an array). Instead, we can only add elements to the __top of the stack__, and can only get elements from the __top of the stack__. 
* __Nomenclature__: We give adding and removing stack elements special names: we _push_ an element onto the stack (add), while we _pop_ an element from the stack (remove).

### Common uses for Stacks
* __Function call stack__: The function call stack is a runtime structure that uses a LIFO stack to manage active functions. Each time a function is invoked, a new stack frame—containing its parameters, local variables, and return address—is pushed onto the call stack. When the function finishes, its frame is popped, restoring the previous execution context.
* __Expression evaluation and syntax parsing__: Stack data structures can also be used to evaluate mathematical expressions, such as those in postfix notation. Operands are pushed onto the stack in this case, and operators are applied to the most recently pushed operands. The result is then pushed back onto the stack until the entire expression has been evaluated.
* __Undo/redo mechanisms__: Many applications that let users undo and redo actions use a stack data structure to track these activities. When a user performs an action, such as typing a letter or moving an object, it is pushed onto the stack. To undo an action, the most recent activity is removed from the stack, while the redo feature re-adds the undone actions.

Stacks are useful for managing data that needs to be processed in reverse order, such as function calls, expression evaluation, and undo/redo operations. But, let's consider the opposite access pattern: queues.

___

## Queues
A [queue](https://en.wikipedia.org/wiki/Queue_(abstract_data_type)) is a linear data structure that operates on the __First-In-First-Out (FIFO)__ principle. The first element added to the queue is the first one to be removed. Queues implement efficient operations for adding elements to the rear of the queue and removing elements from the front, making it ideal for scenarios like task scheduling. Consider the queue `q::Queue{Int64}`:

<img
  src="figs/Fig-Queue.svg"
  alt="Array"
  height="300"
  width="600" />

When using a queue, we do __not__ have direct access to all the elements in the queue (like we would for an array). Instead, we can only add elements to the __bottom of the queue__, and can only get elements from the __top of the queue__. 
* __Nomenclature__: We give adding and removing queue elements special names: we _enqueue_ an element onto the queue (add to the bottom), while we _dequeue_ an element from the queue (remove from the top).

### Common uses of Queues

* __Job scheduling__: Operating systems often use queues to manage tasks such as running programs or printing documents. Each task is placed in a queue, and the operating system schedules tasks in the order they were added to the queue, allowing tasks to be processed in the order they were received.
* __Breadth-first search__: In graph theory, the breadth-first search algorithm explores a graph by visiting all the vertices at a given level before moving on to the next level. This can be implemented using a queue data structure, where the vertices at each level are added to the queue in order and processed in that exact order.
* __Message passing__: Queues are often used to pass messages between different parts of a system, such as between a producer and a consumer in a messaging system. The producer adds messages to the bottom of the queue, and the consumer consumes them from the top of the queue. This allows the producer and consumer to operate at different speeds without the risk of messages being lost or overwritten.

Queues are useful for managing data that needs to be processed in the order it was received, such as job scheduling, breadth-first search, and message passing. But, let's consider a more complex linear data structure: linked lists.

___

## Linked Lists
A linked list is a linear data structure where each element, called a node, is a separate object that stores data values and a reference (link) to the next node in the list.

Consider the linked list `l::Int64`:

<img
  src="figs/Fig-LinkedList.svg"
  alt="Array"
  height="300"
  width="600" />

There are two main types of linked lists: singly linked lists and doubly linked lists. In a singly linked list, each node has a reference to the next node in the list but not the previous one. On the other hand, each node connects to the next and previous nodes in a doubly linked list.

### Common uses of Linked Lists
* __Dynamic data structures__: Linked lists are helpful when working with data structures that can change in size during runtime. Unlike arrays, linked lists can grow or shrink as needed without complex memory management or copying operations. This makes them helpful in implementing stacks, queues, and other dynamic data structures.
* __Memory allocation__: In computer memory management, linked lists are used to keep track of free and allocated memory blocks. Each memory block is represented as a node in the list, with a pointer to the next block. This allows the system to find a suitable block for a new allocation quickly and to free up blocks when they are no longer needed efficiently.
* __Modeling hierarchical data__: Linked lists can represent hierarchical data structures, such as trees and graphs. Each node in the list represents a tree node or graph vertex and contains a pointer to its children or neighbors. This allows the data to be stored and manipulated efficiently while maintaining the hierarchical structure.

Linked lists can also implement Stacks and Queues!

___

## Summary of worst-case time complexity
The table below summarizes the _worst-case time complexities_ for the core operations — insertion, deletion, lookup, and complete traversal — across the four fundamental linear data structures we will explore. 

__Summary__: Arrays support constant-time direct access by index $\mathcal{O}(1)$, but require linear time $\mathcal{O}(n)$ for inserting/deleting at arbitrary positions; stacks and queues have constant time $\mathcal{O}(1)$ for their primary operations; and linked lists incur $\mathcal{O}(n)$ for lookup, insertion, and deletion in the worst case.

| Data Structure | Insertion | Deletion | Lookup | Traversal |
| -------------- | --------- | -------- | ------ | --------- |
| Array          | $\mathcal{O}(n)$      | $\mathcal{O}(n)$     | $\mathcal{O}(1)$   | $\mathcal{O}(n)$      |
| Stack          | $\mathcal{O}(1)$      | $\mathcal{O}(1)$     | $\mathcal{O}(1)$   | $\mathcal{O}(n)$      |
| Queue          | $\mathcal{O}(1)$      | $\mathcal{O}(1)$     | $\mathcal{O}(1)$   | $\mathcal{O}(n)$      |
| Linked List    | $\mathcal{O}(n)$      | $\mathcal{O}(n)$     | $\mathcal{O}(n)$   | $\mathcal{O}(n)$      |

___