# Project: Decode Hidden Sums from a Set of Strings
This project gives students practice working with many of the types and topics we've been talking about, namely, [Strings](https://docs.julialang.org/en/v1/manual/strings/#man-strings), [Characters](https://docs.julialang.org/en/v1/manual/strings/#man-characters), [Arrays](https://docs.julialang.org/en/v1/manual/arrays/#man-arrays-1), [Dictionaries](https://docs.julialang.org/en/v1/base/collections/#Base.Dict) and [Loops](https://docs.julialang.org/en/v1/manual/control-flow/#man-loops-1) and developing [functions](https://docs.julialang.org/en/v1/manual/functions/#man-functions). The project is divided into two parts and was inspired by the [2023 Day 1 Advent of Code challenge](https://adventofcode.com/).

In this project, you will be given a set of strings, each containing a series of characters. Your task is to decode these strings to find the hidden sums they represent. The strings will contain characters that correspond to numbers as digits and numbers as words, and your goal is to extract these numbers and compute their sum.

However, before we get started on the project, we'll do some setup, and you'll develop some functions that will be useful for the project. Let's go!

___

## Setup, Data, and Prerequisites
First, we set up the computational environment by including the `Include.jl` file and loading any needed resources.
* The [include command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). In addition to standard Julia libraries, we'll also use a few items from [the `VLDataScienceMachineLearningPackage.jl` package](https://github.com/varnerlab/VLDataScienceMachineLearningPackage.jl).

In [None]:
include("Include.jl"); # setup the environment

### Data
Next, we'll load the data for this project using [the `MyStringDecodeChallengeDataset()` method exported by the `VLDataScienceMachineLearningPackage.jl` package](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/data/#VLDataScienceMachineLearningPackage.MyStringDecodeChallengeDataset). This method returns the project data in [a `NamedTuple` instance](https://docs.julialang.org/en/v1/base/base/#Core.NamedTuple), let's save this in the `dataset::NamedTuple` variable:

In [None]:
dataset = MyStringDecodeChallengeDataset(); # NamedTuple dataset

The `dataset::NamedTuple` variable holds the project data, using a `key => value` arrangement. However, what are the keys? To get the names of the keys (fields) in `dataset::NamedTuple` we can use [the `keys(...)` method](https://docs.julialang.org/en/v1/base/arrays/#Base.keys-Tuple%7BAbstractArray%7D):

In [None]:
keys(dataset) # this returns the keys of the data in the dataset::NamedTuple

(:test_part_1, :test_part_2, :production)

We see that `dataset::NamedTuple` has three possible keys (fields):
* The field __test_part_1__ points to a `Vector{String}` which contains the testing data for Part 1 of the project. To access this data: `dataset.test_part_1`. The hidden sum for the part 1 test data is `142`.
* The field __test_part_2__ points to a `Vector{String}` which contains the testing data for Part 2 of the project. To access this data: `dataset.test_part_2`. The hidden sum for the part 2 data is `299`.
* The field __production__ points to a `Vector{String}` which contains the production data for this project. To access this data: `dataset.production`. The hidden sum for the production data including only visible digits is `55172`, while digits + words is `54925`.

### Types and Functions
We'll model each record (line of text) using the mutable `MyPuzzleRecordModel` type, which has three fields:

In [10]:
mutable struct MyPuzzleRecordModel

    # data -
    record::String # original line of text
    characters::Array{Char, 1} # character array for the record
    len::Int64 # number of characters for the record

    # constructor -
    MyPuzzleRecordModel(record::String) = new(record, collect(record), length(record)); # wow, that is fancy!
end

To build a collection of puzzle record models from the raw text held in `Vector{String}`, we formulate a build method:

In [12]:
function build(modeltype::Type{MyPuzzleRecordModel}, data::Vector{String})::Vector{MyPuzzleRecordModel}

    # initialize -
    models = Vector{modeltype}(); # load up some space

    # process each line of text
    for line ∈ data
        push!(models, modeltype(line));
    end

    return models; # return the vector of models to the caller.
end

build (generic function with 1 method)

Next, now that we can build the models, let's formulate the logic required to decode the text in the `decode_part_1(...)` method (where your logic will go in the `_decode_part_1(...)` private implementation):

In [14]:
function _decode_part_1(model::MyPuzzleRecordModel)::Int64

    # for this line, get the characters -
    characters = model.characters;
    digits = filter(isnumeric, characters); # 
    
    value = Array{Char, 1}();
    push!(value, digits[1]);
    push!(value, digits[end]);

    # join the characters and parse the value -
    return value |> join |> x-> parse(Int64, x);
end

function decode_part_1(models::Vector{MyPuzzleRecordModel})::Tuple{Int64, Dict{Int64, Int64}}

    # initialize -
    total = 0;
    codes = Dict{Int64, Int64}();
    
    # main -
    for i ∈ eachindex(models)
        model = models[i];
        codes[i] = _decode_part_1(model); # we call your part 1 inner logic here!

        # total the value -
        total += codes[i];
    end
    
    # Return the overall total, and the codes for each line
    return (total, codes);
end;

and the `decode_part_2(...)` and `_decode_part_2(...)` methods:

In [16]:
function _decode_part_2(model::MyPuzzleRecordModel)::Int

     # initialize -
    record = model.record;
    number_dictionary = Dict("one" => 1, "two" => 2, 
        "three" => 3, "four" => 4, "five" => 5, 
        "six" => 6, "seven" => 7, "eight" => 8, 
        "nine" => 9, "zero" => 0);

    # Let's replace the words with the numbers. If we have a number in word form, then add a new start and end character to the word
    # and replace the word with the new word.  For example, "eight" goes to "eeightt". So if we had a word like "eightwo"
    # Then we would have "eeighttwoo". We'll then replace the word with the number.  In this example, we'd have e8t2o.
    # Once all the numbers are replaced, we can then use the _decode_part_1 function to parse the value!
    for (word, number) in number_dictionary
        if occursin(word, record)
            
            # replace the word with a modified variant -
            first_char = word[1] |> string;
            last_char = word[end] |> string;
            replacement_word = "$(first_char)$(word)$(last_char)";
            record = replace(record, word => replacement_word) |> x -> replace(x, word => number);
        end
    end

    # update the model -
    model.record = record;
    model.characters = collect(record);

    # now, we can use the _decode_part_1 function to parse the value -
    return _decode_part_1(model);
end

function decode_part_2(models::Vector{MyPuzzleRecordModel})::Tuple{Int64, Dict{Int64, Int64}}

    # initialize -
    total = 0;
    codes = Dict{Int64, Int64}();
    
    # main -
    for i ∈ eachindex(models)
        model = models[i];
        codes[i] = _decode_part_2(model); # we call your part 2 inner logic here!

        # total the value -
        total += codes[i];
    end
    
    # Return the overall total, and the codes for each line
    return (total, codes);
end;

## Part 1: Numbers as Digits
Suppose we are given a text document consisting of lines of text, each containing a specific integer code that needs to be recovered. The code in each line can be found by combining the first and the last digit (in that order) that appear in the line, to form a single _two-digit integer_. The sum of these integers across all the lines is the _hidden sum_ that needs to be recovered for the document. 

For example, consider the `test_part_1` document that consists of the following four lines of encoded text:
```
1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
```
In this example, the code values of these four lines are `12`, `38`, `15`, and `77`. Adding these together produces `142`, the hidden sum for the document.

In [18]:
result = build(MyPuzzleRecordModel, dataset.production) |> m-> decode_part_1(m)

(55172, Dict(719 => 63, 699 => 11, 831 => 83, 319 => 21, 687 => 33, 185 => 79, 823 => 55, 420 => 88, 525 => 33, 365 => 27…))

## Part 2
As it turns out, your calculation from Part 1 needs to be corrected. Some of the digits are spelled out with letters: `one,` `two,` `three,` `four,` `five,` `six,` `seven,` `eight,` and `nine` also count as valid `digits.` Given this new information, you now need to find the actual `first` and `last` digits on each line, where we consider both numerical digits and words. The sum of these integers is the `true hidden sum` that needs to be recovered.

For example, consider the following `8-lines` of encoded text, provided in the `test_part_2.txt` file in the `data` directory:
```
two1nine
eightwothree
onetwoneight
abcone2threexyz
xtwone3four
4nineeightseven2
zoneight234
7pqrstsixteen
```
In this example, the code values are `29`, `83`, `18`,`13`, `24`, `42`, `14`, and `76`. Adding these `codes` together produces `299`, the `true hidden sum.` 
* `Interesting wrinkle`: The `3rd` and `7th` lines contain `frameshift` edge cases, i.e., where the numerical words are combined with single overlapping characters at the start and end of the word, e.g., `twone,` `oneight` or `sevenine.` Your code needs to handle `frameshift` cases.

In [20]:
result = build(MyPuzzleRecordModel, dataset.production) |> m-> decode_part_2(m)

(54925, Dict(719 => 68, 699 => 43, 831 => 83, 319 => 22, 687 => 33, 185 => 79, 823 => 57, 420 => 58, 525 => 33, 365 => 22…))