# Demo: Fun with Iteration Patterns for Arrays, Sets, and Dictionaries
Iteration is the repeated execution of a block of code, typically used to process elements in a collection or to perform repeated actions until a condition is met. 
* Iteration is an _extremely common_ fundamental concept used in nearly all applications. It appears in tasks ranging from simple loops over arrays to complex algorithms in data processing, simulation, and machine learning, making it an essential tool in nearly every programming language and domain.

In this demo, we'll explore the two iteration patterns that you'll likely encounter in your everyday life:
* __For loops__: A for loop is a control structure that repeatedly executes a block of code for each element in a specified sequence or range. It is commonly used to iterate over collections like arrays or lists, or to perform a fixed number of repetitions.
* __While loops__: A while loop is a control structure that repeatedly executes a block of code as long as a specified condition remains true. It is typically used when the number of iterations is not known in advance and depends on dynamic conditions during execution.

## Setup
We set up the computational environment by including the `Include.jl` file and loading any needed resources.
* __Include__: The [include command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/)

In [3]:
include("Include.jl");

## Basic for-loop iteration patterns for Ordered Collections
Let's start by looking at the basic structure of [a for-loop.](https://docs.julialang.org/en/v1/base/base/#for) A [for-loop](https://docs.julialang.org/en/v1/base/base/#for) has a header and a body. 
> The _header_ specifies how many times the loop will iterate. The loop `index` is passed into the loop's _body_, where you put your logic. The `index` is always a new variable, even if a variable of the same name exists in the enclosing scope.

In Julia, the [for-loop](https://docs.julialang.org/en/v1/base/base/#for) has its _local scope_ that captures variables from the outside but doesn't pass new variables created inside the loop to the outside unless they already exist. The [local scope of the for loop](https://docs.julialang.org/en/v1/manual/variables-and-scoping/#local-scope) ends with the `end` keyword.
* _What is scope_? In programming, scope refers to the region of code where a variable or identifier is defined and accessible. It determines the visibility and lifetime of variables, helping to manage the namespace and prevent naming conflicts.

Let's do a simple example where we create an `Array{Float64,1}` holding random values, iterate over this array using a for loop, and print using the [println function](https://docs.julialang.org/en/v1/base/io-network/#Base.println) each random value.

In [5]:
let
    
    number_of_elements = 5; # how many items do we want?
    random_vector_array = rand(number_of_elements); # create an array of random 64-bit floating-point values
    value = nothing # declare a variable outside the loop.
    
    for i in 1:number_of_elements
        value = random_vector_array[i];
        println("The index i = $(i) and the value = $(random_vector_array[i])");
    end # end of for loop scope
    value # what gets printed?
end

The index i = 1 and the value = 0.09692074281466712
The index i = 2 and the value = 0.22696693672949209
The index i = 3 and the value = 0.5689809711731147
The index i = 4 and the value = 0.37622875034780534
The index i = 5 and the value = 0.5131210428868529


0.5131210428868529

### Eachindex
Another [for-loop](https://docs.julialang.org/en/v1/base/base/#for) pattern is the [eachindex pattern](https://docs.julialang.org/en/v1/base/arrays/#Base.eachindex). We use the [eachindex pattern](https://docs.julialang.org/en/v1/base/arrays/#Base.eachindex) when we don't explicitly know how many elements we have in an ordered collection, but we want to all of the elements in order. 
* The [eachindex pattern](https://docs.julialang.org/en/v1/base/arrays/#Base.eachindex) is the preferred pattern compared with something like `for i in 1:length(random_vector_array)` when we don't know how many elements are in the `random_vector_array` collection. __Why might this be true?__

Let's revisit the example above, but in this case using [the `eachindex(...)` pattern](https://docs.julialang.org/en/v1/base/arrays/#Base.eachindex):

In [7]:
let
    number_of_elements = 5; # how many items do we want?
    random_vector_array = rand(number_of_elements); # create an array of random 64-bit floating-point values
    
    for i ∈ eachindex(random_vector_array)
        value = random_vector_array[i];
        println("The index i = $(i) and the value = $(value)"); # body
    end
end

The index i = 1 and the value = 0.5835681829310801
The index i = 2 and the value = 0.02060619542988773
The index i = 3 and the value = 0.9164350678181891
The index i = 4 and the value = 0.391580567981759
The index i = 5 and the value = 0.9076716832289182


### Direct iteration
Suppose we don't care about the index $i$, but instead want only the values; we can iterate over the elements of a collection directly. For example, the code block below accesses the values of the `random_vector_array`  directly, but __NOT__ their indexes:

In [5]:
let
    number_of_elements = 5; # how many items do we want?
    random_vector_array = rand(number_of_elements); # create an array of random 64-bit floating-point values
    
    for value ∈ random_vector_array # for value in collection
        println("Only the value = $(value)"); # body
    end
end

Only the value = 0.3488118859929439
Only the value = 0.26202077960057524
Only the value = 0.005767264975062525
Only the value = 0.06663134625716982
Only the value = 0.5941697955217022


### Iterate over other things besides an array?
Sure! We can iterate over other ordered (and unordered) collections, not just arrays! For example, we can iterate over [tuples](https://docs.julialang.org/en/v1/manual/functions/#Tuples), fixed-length _immutable_ ordered containers that can hold any values. 

Let's build a tuple holding some `Int64` types and iterate over these using the [eachindex pattern](https://docs.julialang.org/en/v1/base/arrays/#Base.eachindex).
* The previous examples used the [println function](https://docs.julialang.org/en/v1/base/io-network/#Base.println) to print output [to `stdout`](https://docs.julialang.org/en/v1/base/io-network/#Base.stdout); here, we show another approach that does the same thing, namely using the [@show macro](https://docs.julialang.org/en/v1/base/base/#Base.@show) which prints one or more expressions, and their results, [to `stdout`](https://docs.julialang.org/en/v1/base/io-network/#Base.stdout):

In [11]:
let
    example_tuple = (1,2,3,6,5,4,1,1,1,1)
    
    for i ∈ eachindex(example_tuple) # header: for index in index collection
        @show (i, example_tuple[i]) # body
    end
end

(i, example_tuple[i]) = (1, 1)
(i, example_tuple[i]) = (2, 2)
(i, example_tuple[i]) = (3, 3)
(i, example_tuple[i]) = (4, 6)
(i, example_tuple[i]) = (5, 5)
(i, example_tuple[i]) = (6, 4)
(i, example_tuple[i]) = (7, 1)
(i, example_tuple[i]) = (8, 1)
(i, example_tuple[i]) = (9, 1)
(i, example_tuple[i]) = (10, 1)


## Example 2: Basic for-loop iteration patterns for Sets and Dictionaries
A [Set type](https://docs.julialang.org/en/v1/base/collections/#Base.Set) is an unordered collection of unique elements that supports fast membership checks, insertions, and removals. On the other hand, [a Dictionary (or map) is an associative container](https://docs.julialang.org/en/v1/base/collections/#Base.Dict) that stores key–value pairs, allowing lookup, insertion, and deletion of values based on their unique keys. 
* Julia’s `Set{T}` and `Dict{K,V}` are parametric containers, meaning every element in a `Set` has the same type `T`, and every key–value pair in a `Dict` has types `K` and `V`. However, the elements of a set can be _any_ type `T`, and the keys `K` and values `V` can also be of _any type_.

[Sets](https://docs.julialang.org/en/v1/base/collections/#Base.Set) model a bag of stuff; there is no notion of index (which can be a little confusing). Thus, we can only access the items directly in an undefined random order. However, the _really_ exciting thing about [Sets](https://docs.julialang.org/en/v1/base/collections/#Base.Set) is that they are `unique,` i.e., there are no repeated elements. This will be super handy later!

Let's build a `Set{Char}` values, and then directly iterate the set (where we show the value at each iteration using [the `@show` macro](https://docs.julialang.org/en/v1/base/base/#Base.@show)

In [13]:
let
    example_set = Set{Char}(['A','B','C','D','R','U','S','T','A']);
    for value ∈ example_set
        @show value
    end
end

value = 'C'
value = 'U'
value = 'D'
value = 'A'
value = 'R'
value = 'S'
value = 'T'
value = 'B'


### Dictionary
[A Dictionary is an associative container](https://docs.julialang.org/en/v1/base/collections/#Base.Dict) that stores key–value pairs, allowing lookup, insertion, and deletion of values based on their unique keys. While significantly different than an array, we can still traverse a dictionary [using a for-loop](https://docs.julialang.org/en/v1/base/base/#for).

We gathered a daily open-high-low-close `dataset` for each firm in the [S&P500](https://en.wikipedia.org/wiki/S%26P_500) from `01-03-2014` until `12-31-2024`, along with data for a few exchange-traded funds and volatility products during that time. 
* Let's load the `orignal_dataset` by calling the `MyMarketDataSet()` function and remove firms that do not have the maximum number of trading days. The cleaned dataset $\mathcal{D}$ will be stored in the `dataset` variable.

When we use a [for-loop](https://docs.julialang.org/en/v1/base/base/#for) with a [Dictionary](https://docs.julialang.org/en/v1/base/collections/#Dictionaries), which is an unordered collection of `key=>value` pairs, we get both the `key` and `value` as the iteration variable organized as a `tuple.`

In [15]:
original_dataset = MyMarketDataSet() |> x-> x["dataset"];

In [16]:
original_dataset

Dict{String, DataFrame} with 515 entries:
  "DD"   => [1m2329×8 DataFrame[0m[0m…
  "EMR"  => [1m2767×8 DataFrame[0m[0m…
  "CTAS" => [1m2767×8 DataFrame[0m[0m…
  "HSIC" => [1m2767×8 DataFrame[0m[0m…
  "KIM"  => [1m2767×8 DataFrame[0m[0m…
  "PLD"  => [1m2767×8 DataFrame[0m[0m…
  "IEX"  => [1m2767×8 DataFrame[0m[0m…
  "TPR"  => [1m1803×8 DataFrame[0m[0m…
  "BAC"  => [1m2767×8 DataFrame[0m[0m…
  "CBOE" => [1m2767×8 DataFrame[0m[0m…
  "EXR"  => [1m2767×8 DataFrame[0m[0m…
  "NCLH" => [1m2767×8 DataFrame[0m[0m…
  "CVS"  => [1m2767×8 DataFrame[0m[0m…
  "DRI"  => [1m2767×8 DataFrame[0m[0m…
  "DTE"  => [1m2767×8 DataFrame[0m[0m…
  "ZION" => [1m2767×8 DataFrame[0m[0m…
  "AVY"  => [1m2767×8 DataFrame[0m[0m…
  "EW"   => [1m2767×8 DataFrame[0m[0m…
  "EA"   => [1m2767×8 DataFrame[0m[0m…
  "NWSA" => [1m2767×8 DataFrame[0m[0m…
  "BBWI" => [1m859×8 DataFrame[0m[0m…
  "CAG"  => [1m2767×8 DataFrame[0m[0m…
  "GPC"  => [1m2767×8 DataFrame[0

In [17]:
original_dataset["AAPL"] # Hmmm. we access the elements of dictionary like an array?

Row,volume,volume_weighted_average_price,open,close,high,low,timestamp,number_of_transactions
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,DateTime,Int64
1,3.93215e8,19.4749,19.745,19.3207,19.775,19.3011,2014-01-03T05:00:00,148584
2,4.13437e8,19.3213,19.1946,19.4261,19.5286,19.0571,2014-01-06T05:00:00,131664
3,3.17731e8,19.3329,19.44,19.2871,19.4986,19.2116,2014-01-07T05:00:00,107327
4,2.58747e8,19.4038,19.2432,19.4093,19.4843,19.2389,2014-01-08T05:00:00,86874
5,2.79621e8,19.2943,19.5286,19.1614,19.5307,19.1196,2014-01-09T05:00:00,93562
6,3.05283e8,19.0659,19.2796,19.0336,19.3143,18.9682,2014-01-10T05:00:00,113063
7,3.79443e8,19.1801,18.9254,19.1332,19.375,18.9243,2014-01-13T05:00:00,130227
8,3.34937e8,19.4033,19.2221,19.5139,19.5261,19.2021,2014-01-14T05:00:00,114856
9,3.9389e8,19.9105,19.7686,19.9057,20.0071,19.7021,2014-01-15T05:00:00,136942
10,2.29885e8,19.8115,19.8179,19.7946,19.8875,19.7029,2014-01-16T05:00:00,78507


#### Clean the data
Not all tickers in our dataset have the maximum number of trading days for various reasons, e.g., acquisition or de-listing events. Let's collect only those tickers with the maximum number of trading days.

First, let's compute the number of records for a firm that we know has a maximum value, e.g., `AAPL`, and save that value in the `maximum_number_trading_days::Int64` variable:

In [19]:
maximum_number_trading_days = original_dataset["AAPL"] |> nrow; # nrow? (check out: DataFrames.jl)

Now, let's iterate through our data and collect only tickers with `maximum_number_trading_days` records. Save that data in the `dataset::Dict{String,DataFrame}` variable:

In [53]:
dataset = let

    # initialize -
    dataset = Dict{String, DataFrame}();

    # iterate through the dictionary; we can't guarantee a particular order
    for (ticker, data) ∈ original_dataset  # we get each (K, V) pair!
        if (nrow(data) == maximum_number_trading_days)
            dataset[ticker] = data;
        else
            println("Ticker $(ticker) is not full data")
        end
    end
    dataset; # return
end;

Ticker DD is not full data
Ticker TPR is not full data
Ticker BBWI is not full data
Ticker INFO is not full data
Ticker CTVA is not full data
Ticker PEAK is not full data
Ticker SIVB is not full data
Ticker UAA is not full data
Ticker BKNG is not full data
Ticker USL is not full data
Ticker VIAC is not full data
Ticker FB is not full data
Ticker DXC is not full data
Ticker NLSN is not full data
Ticker ATVI is not full data
Ticker PKI is not full data
Ticker WBA is not full data
Ticker VTRS is not full data
Ticker ZBH is not full data
Ticker DOW is not full data
Ticker DISCK is not full data
Ticker EVRG is not full data
Ticker KEYS is not full data
Ticker DXCM is not full data
Ticker AMCR is not full data
Ticker FLT is not full data
Ticker J is not full data
Ticker PBCT is not full data
Ticker DRE is not full data
Ticker LUMN is not full data
Ticker GOOGL is not full data
Ticker NLOK is not full data
Ticker ABMD is not full data
Ticker BIIB is not full data
Ticker VRTX is not full data


How many firms do we have the full number of trading days? Let's use [the `length(...)` method](https://docs.julialang.org/en/v1/base/collections/#Base.length) - notice this works for dictionaries, in addition to arrays and sets!

In [22]:
length(dataset)

424

## Example 3: While loop iteration Pattern
A for loop is used when the number of iterations is known. In contrast, a while loop is used when iterations depend on a condition evaluated before each loop, making it better for dynamic repetition. A while loop also has a header and a body, but they differ from a for loop:
> In a while loop, the header contains the loop condition evaluated before each iteration to determine whether the loop should continue. The body is the block of code that executes repeatedly as long as the condition in the header remains true.