# Project: Decode Hidden Sums from a Set of Strings
This project gives students practice working with many of the types and topics we've been talking about, namely, [Strings](https://docs.julialang.org/en/v1/manual/strings/#man-strings), [Characters](https://docs.julialang.org/en/v1/manual/strings/#man-characters), [Arrays](https://docs.julialang.org/en/v1/manual/arrays/#man-arrays-1), [Dictionaries](https://docs.julialang.org/en/v1/base/collections/#Base.Dict) and [Loops](https://docs.julialang.org/en/v1/manual/control-flow/#man-loops-1) and developing [functions](https://docs.julialang.org/en/v1/manual/functions/#man-functions). The project is divided into two parts and was inspired by the [2023 Day 1 Advent of Code challenge](https://adventofcode.com/).

* In this project, you will be given a set of strings containing a series of numerical characters, and numbers spelled out as words. Your task is to decode these strings to find the hidden sums they represent. The strings will contain characters corresponding to numbers as digits and numbers as words, and your goal is to extract these numbers and compute their sum.

However, before we get started on the project, we'll do some setup, and you'll develop some useful functions. Let's go!!!

___

## Setup, Data, and Prerequisites
First, we set up the computational environment by including the `Include.jl` file and loading any needed resources.
* The [include command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). 
* In addition to standard Julia libraries, we'll also use [the `VLDataScienceMachineLearningPackage.jl` package](https://github.com/varnerlab/VLDataScienceMachineLearningPackage.jl), check out that documentation for more information on the functions and types used in this material.

In [3]:
include("Include.jl"); # setup the environment

### Data
Next, we'll load the data for this project using [the `MyStringDecodeChallengeDataset()` method exported by the `VLDataScienceMachineLearningPackage.jl` package](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/data/#VLDataScienceMachineLearningPackage.MyStringDecodeChallengeDataset). This method returns the project data in [a `NamedTuple` instance](https://docs.julialang.org/en/v1/base/base/#Core.NamedTuple), let's save this in the `dataset::NamedTuple` variable:

In [5]:
dataset = MyStringDecodeChallengeDataset(); # NamedTuple dataset

The `dataset::NamedTuple` variable holds the project data, using a `key => value` arrangement. However, what are the keys? To get the names of the keys (fields) in `dataset::NamedTuple`, we can use [the `keys(...)` method](https://docs.julialang.org/en/v1/base/arrays/#Base.keys-Tuple%7BAbstractArray%7D):

In [7]:
keys(dataset) # this returns the keys of the data in the dataset::NamedTuple

(:test_part_1, :test_part_2, :production)

We see that the `dataset::NamedTuple` has three possible keys (fields):
* The field __test_part_1__ points to a `Vector{String}` which contains the testing data for Part 1 of the project. To access this data: `dataset.test_part_1`. The hidden sum for the part 1 test data is `142`.
* The field __test_part_2__ points to a `Vector{String}` which contains the testing data for Part 2 of the project. To access this data: `dataset.test_part_2`. The hidden sum for the part 2 data is `299`.
* The field __production__ points to a `Vector{String}` which contains the production data for this project. To access this data: `dataset.production`. The hidden sum for the production data, including only visible digits, is `55172`, while the digits + words is `54925`.

_How do we access the data? _ We can access the data from the `dataset::NamedTuple` variable using the field names, in combination with either the dot notation or the square bracket notation:

In [9]:
dataset.test_part_1 # dot notation: variable.fieldname returns the value of that field in the dataset

4-element Vector{String}:
 "1abc2"
 "pqr3stu8vwx"
 "a1b2c3d4e5f"
 "treb7uchet"

Alternatively, we can access the values in the `dataset::NamedTuple` using the square bracket notation (we pass in the key as a [special unique String-like type called a `Symbol`](https://docs.julialang.org/en/v1/base/base/#Core.Symbol):

In [46]:
dataset[:test_part_1] # variable[:fieldname] also works where :fieldname is a Symbol

4-element Vector{String}:
 "1abc2"
 "pqr3stu8vwx"
 "a1b2c3d4e5f"
 "treb7uchet"

### Types and Functions
In this section, we'll define some useful composite types and functions for the project. We'll start by defining a mutable type to represent each record in the dataset.

#### Types
We'll model each record (line of text) using the mutable `MyPuzzleRecordModel` type, which has three fields, and a fancy constructor to create and populate instances of `MyPuzzleRecordModel` from a String. The constructor will take a string, convert it to a vector of characters, and compute the length of the string.

In [13]:
"""
    MyPuzzleRecordModel

A mutable struct to represent a record in the puzzle dataset.

### Fields:
- `record::String`: The original line of text from the dataset.
- `characters::Array{Char, 1}`: An array of characters representing the record.
- `len::Int64`: The number of characters in the record.

The constructor takes a `record::String` and initializes the fields accordingly, 
converting the string into an array of characters and calculating its length.
"""
mutable struct MyPuzzleRecordModel

    # data -
    record::String # original line of text
    characters::Array{Char, 1} # character array for the record
    len::Int64 # number of characters for the record

    # constructor -
    MyPuzzleRecordModel(record::String) = new(record, collect(record), length(record)); # wow, that is fancy!
end;

#### Functions
To build a collection of puzzle record models from the raw text held in a `Vector{String}` collection, we formulate a `build(...)` method:

In [15]:
"""
    build(modeltype::Type{MyPuzzleRecordModel}, data::Vector{String}) -> Vector{MyPuzzleRecordModel}

Builds a vector of populated `MyPuzzleRecordModel` instances from a vector of strings records.

### Arguments:
- `modeltype::Type{MyPuzzleRecordModel}`: The type of models we want to build.
- `data::Vector{String}`: A vector of encoded strings, each representing a record.

### Returns:
- `Vector{MyPuzzleRecordModel}`: A vector of `MyPuzzleRecordModel` instances, each initialized with a record from the input data.
"""
function build(modeltype::Type{MyPuzzleRecordModel}, data::Vector{String})::Vector{MyPuzzleRecordModel}

    # initialize -
    models = Vector{modeltype}(); # allocate space for the models

    # TODO: Use a direct iteration pattern and the puah! function to populate the models::Vector{modeltype}
    # TODO: for line ∈ data build a MyPuzzleRecordModel model using the provided constructor, 
    # TODO: push the model into the models vector using the push! function.
    for line ∈ data
        push!(models, modeltype(line));
    end

    return models; # return the vector of models to the caller.
end;

Next, now that we can build the `MyPuzzleRecordModel` model instances, let's formulate the logic required to decode the text in the `decode_part_1(...)` method. 
We'll implement this (and the part 2 methods) using an encapsulation with an internal delegation strategy:
* _What_? This approach (which is a personal favorite) defines a public method `decode_part_1(...)` that will be the public entry point for decoding the strings, and it will delegate the actual decoding logic to a private method `_decode_part_1(...)`. This lets us keep the public interface clean while encapsulating the implementation details in the private method.
* _Private methods_? The private method `_decode_part_1(...)` will be defined with a leading underscore (`_`) to indicate that it is intended for internal use only (users should not call this method directly). This common convention in Julia indicates that a method is private. In other languages, this might be done with access modifiers like `public`, `private`, or `protected`, but in Julia, we'll use naming conventions to indicate the intended visibility of methods (convention, not a rule).
* _Why_? This encapsulation with an internal delegation strategy is useful because it allows us to change the implementation of the decoding logic without affecting the public interface. It also makes the code more readable and maintainable by separating the concerns of the public interface from the implementation details.

Implement the `_decode_part_1(...)` method:

In [17]:
# TODO: Implement the _decode_part_1 function to decode a single record.
# TODO: This function processes a record, and computes the numeric value from the characters in the record.
function _decode_part_1(model::MyPuzzleRecordModel)::Int64

    # for this line, get the characters -
    characters = model.characters;
    digits = filter(isnumeric, characters); # 
    
    value = Array{Char, 1}();
    push!(value, digits[1]);
    push!(value, digits[end]);

    # join the characters and parse the value -
    return value |> join |> x-> parse(Int64, x);
end

"""
    decode_part_1(models::Vector{MyPuzzleRecordModel}) -> Tuple{Int64, Dict{Int64, Int64}}

Decodes a vector of `MyPuzzleRecordModel` instances to compute a total value and a mapping of lines to decoded values 
for number listed as digits in the records. Thhe logic for this case is encoded in the `_decode_part_1` function.

### Arguments:
- `models::Vector{MyPuzzleRecordModel}`: A vector of `MyPuzzleRecordModel` instances, each representing a record.

### Returns:
- `Tuple{Int64, Dict{Int64, Int64}}`: A tuple containing:
  - An `Int64` representing the total value of all decoded records.
  - A `Dict{Int64, Int64}` mapping each record index to its decoded value.
"""
function decode_part_1(models::Vector{MyPuzzleRecordModel})::Tuple{Int64, Dict{Int64, Int64}}

    # initialize -
    total = 0;
    codes = Dict{Int64, Int64}();
    
    # main -
    for i ∈ eachindex(models)
        model = models[i];
        codes[i] = _decode_part_1(model); # we call your part 1 inner logic here!

        # total the value -
        total += codes[i];
    end
    
    # Return the overall total, and the codes for each line
    return (total, codes);
end;

and the `_decode_part_2(...)` method:

In [19]:
# TODO: Implement the _decode_part_2 function to decode a single record.
# TODO: This function processes a record, and computes the numeric value from the characters in the record.
function _decode_part_2(model::MyPuzzleRecordModel)::Int

     # initialize -
    record = model.record;
    number_dictionary = Dict("one" => 1, "two" => 2, 
        "three" => 3, "four" => 4, "five" => 5, 
        "six" => 6, "seven" => 7, "eight" => 8, 
        "nine" => 9, "zero" => 0);

    # Let's replace the words with the numbers. If we have a number in word form, then add a new start and end character to the word
    # and replace the word with the new word.  For example, "eight" goes to "eeightt". So if we had a word like "eightwo"
    # Then we would have "eeighttwoo". We'll then replace the word with the number.  In this example, we'd have e8t2o.
    # Once all the numbers are replaced, we can then use the _decode_part_1 function to parse the value!
    for (word, number) in number_dictionary
        if occursin(word, record)
            
            # replace the word with a modified variant -
            first_char = word[1] |> string;
            last_char = word[end] |> string;
            replacement_word = "$(first_char)$(word)$(last_char)";
            record = replace(record, word => replacement_word) |> x -> replace(x, word => number);
        end
    end

    # update the model -
    model.record = record;
    model.characters = collect(record);
    return _decode_part_1(model); # now, we can use the _decode_part_1 function to parse the value -
end


"""
    decode_part_2(models::Vector{MyPuzzleRecordModel}) -> Tuple{Int64, Dict{Int64, Int64}}

Decodes a vector of `MyPuzzleRecordModel` instances to compute a total value and a mapping of lines to decoded values 
for number listed as both digits and numbers in the records. The logic for this case is encoded in the `_decode_part_2` function.

### Arguments:
- `models::Vector{MyPuzzleRecordModel}`: A vector of `MyPuzzleRecordModel` instances, each representing a record.

### Returns:
- `Tuple{Int64, Dict{Int64, Int64}}`: A tuple containing:
  - An `Int64` representing the total value of all decoded records.
  - A `Dict{Int64, Int64}` mapping each record index to its decoded value.
"""
function decode_part_2(models::Vector{MyPuzzleRecordModel})::Tuple{Int64, Dict{Int64, Int64}}

    # initialize -
    total = 0;
    codes = Dict{Int64, Int64}();
    
    # main -
    for i ∈ eachindex(models)
        model = models[i];
        codes[i] = _decode_part_2(model); # we call your part 2 inner logic here!

        # total the value -
        total += codes[i];
    end
    
    # Return the overall total, and the codes for each line
    return (total, codes);
end;

## Part 1: Numbers listed as digits
Suppose we are given a text document consisting of lines containing a specific integer code that needs to be recovered. The code in each line can be found by combining the first and the last digit (in that order) that appear in the line, to form a single _two-digit integer_. The sum of these integers across all the lines is the _hidden sum_ that needs to be recovered for the document. 

For example, consider the `test_part_1` document that consists of the following four lines of encoded text:
```
1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
```
In this example, the code values of these four lines are `12`, `38`, `15`, and `77`. Adding these together produces `142`, the hidden sum for the document.

### Test your implementation
Let's test your part 1 implementation using the `test_part_1` data from the `dataset::NamedTuple` variable. If your implementation is correct, the result should be `142`. This type of _unit test_ will help you verify that your implementation is correct.
* _What_? A unit test is a small, in most cases automated test that verifies the correctness of a specific function, i.e., the `decode_part_1(...)` method in isolation. It ensures that a code unit behaves as expected under defined inputs and conditions.

In the code block below, we will call the `decode_part_1(...)` method with the `test_part_1` data from the `dataset::NamedTuple` variable, and then we will compare the result to the expected value of `142` using [the @assert macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert):
* _Expectation_: If the result does not match the expected value of `142`, the @assert macro will throw an error, indicating that the test has failed. If the result matches the expected value, the test will pass silently.

Complete the `decode_part_1(...)` method unit test logic in the code block below:

In [22]:
let
    # initalize -
    input_data = dataset.test_part_1; # or dataset.production for the production data
    expected_result = 142; # value we expect to get from the test_part_1 data
    
    # Call the build function to create the models wih the test_part_1 data, then pass the models to decode_part_1 method
    calculated_result = build(MyPuzzleRecordModel, input_data) |> m-> decode_part_1(m);

    # assert the result is as expected -
    @assert calculated_result[1] == expected_result "Part 1: Expected total value to be $(expected_result), but got $(calculated_result[1])";
end

Once you have passed the unit test, we can test the part 1 decode logic using the production data from the `dataset::NamedTuple` variable. If your implementation is correct, the result should be `55172`.

Write a unit test for the `decode_part_1(...)` method using the production data from the `dataset::NamedTuple` variable in the code block below:

In [24]:
let

    # initalize -
    input_data = dataset.production; # or dataset.production for the production data
    expected_result = 55172; # value from the production data
    
    # Call the build function to create the models wih the test_part_1 data, then pass the models to decode_part_1 method
    calculated_result = build(MyPuzzleRecordModel, input_data) |> m-> decode_part_1(m);

    # assert the result is as expected -
    @assert calculated_result[1] == expected_result "Part 1: Expected total value to be $(expected_result), but got $(calculated_result[1])";
end

## Part 2: Numbers listed as digits and words
Your calculation from Part 1 needs to be expanded to include the case where our encoded lines have both digits and words. 


Some of the digits are spelled out with letters: `one,` `two,` `three,` `four,` `five,` `six,` `seven,` `eight,` and `nine` also count as valid `digits.` Given this new information, you now need to find the actual `first` and `last` digits on each line, where we consider both numerical digits and words. The sum of these integers is the actual hidden sum that needs to be recovered.

For example, consider the following eight lines of encoded text, provided in the `test_part_2` data:
```
two1nine
eightwothree
onetwoneight
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen
```
In this example, the code values are `29`, `83`, `18`,`13`, `24`, `42`, `14`, and `76`. Adding these `codes` together produces `299`, the hidden sum. However, this is a wrinkle.

_Interesting wrinkle_: The `3rd` and `7th` lines contain _frameshift_ edge cases, i.e., where the numerical words are combined with single overlapping characters at the start and end of the word, e.g., `twone,` `oneight` or `sevenine.` Your code needs to handle these additional _frameshift_ cases.

### Test your implementation
Let's test your part 2 implementation using the `test_part_2` data from the `dataset::NamedTuple` variable. If your implementation is correct, the result should be `199`. 

In the code block below, we will call the `decode_part_2(...)` method with the `test_part_2` data from the `dataset::NamedTuple` variable, and then we will compare the result to the expected value of `299` using [the @assert macro](https://docs.julialang.org/en/v1/base/base/#Base.@assert):
* _Expectation_: If the result does not match the expected value of `299`, the @assert macro will throw an error, indicating that the test has failed. If the result matches the expected value, the test will pass silently.

Complete the `decode_part_2(...)` method unit test logic in the code block below:

In [27]:
let

    # initalize -
    input_data = dataset.test_part_2; # or dataset.production for the production data
    expected_result = 299; # value from the production data
    
    # Call the build function to create the models wih the test_part_1 data, then pass the models to decode_part_1 method
    calculated_result = build(MyPuzzleRecordModel, input_data) |> m-> decode_part_2(m);

    # assert the result is as expected -
    @assert calculated_result[1] == expected_result "Part 2: Expected total value to be $(expected_result), but got $(calculated_result[1])";
end

Once you have passed the unit test, we can test the part 2 decode logic using the production data from the `dataset::NamedTuple` variable. If your implementation is correct, the result should be `54925`.

Write a unit test for the `decode_part_2(...)` method using the production data from the `dataset::NamedTuple` variable in the code block below:

In [29]:
let

    # initalize -
    input_data = dataset.production; # or dataset.production for the production data
    expected_result = 54925; # value from the production data
    
    # Call the build function to create the models wih the test_part_1 data, then pass the models to decode_part_1 method
    calculated_result = build(MyPuzzleRecordModel, input_data) |> m-> decode_part_2(m);

    # assert the result is as expected -
    @assert calculated_result[1] == expected_result "Part 1: Expected total value to be $(expected_result), but got $(calculated_result[1])";
end