# PS5: Classification of Consumer Credit Score
In this problem, we will construct, train, and evaluate a feedforward neural network to classify consumer credit scores.

* _What is a credit score?_ A credit score is a numerical expression based on analyzing a person's credit files to represent that person's creditworthiness. It is primarily based on credit report information, typically sourced from credit bureaus. Lenders use the score to evaluate the potential risk posed by lending money to consumers and to mitigate losses due to bad debt. The score is a three-digit number ranging from 300 to 850, with higher scores indicating lower risk.
* _Hypothetical scenario:_  You are a data scientist working in a global manufacturing company, e.g., [GM](https://www.gm.com), [Honda](https://automobiles.honda.com/?experience=shop&cid=search_google_hgr_low_general_namy-brand_sustain-general_nalng_brd_exact&gclsrc=aw.ds&gad_source=1&gad_campaignid=176299752&gbraid=0AAAAADrc52ELssuW_I6a3tbwGwhvLOOAk&gclid=Cj0KCQjwtpLABhC7ARIsALBOCVryBRVf6m-hXR3JBoFgGcKEg1W6vvyxd2j0qhXqHbrHY1qF3J2SapUaAix4EALw_wcB) or [Caterpillar](https://www.caterpillar.com), etc. Over the past few years, your company has collected basic bank details and gathered credit-related information on customers when they finance the purchase of products through your credit division, e.g., [GM Financial](https://www.gmfinancial.com/en-us/home.html), [Honda Financial Services](https://login.honda.com/hondafinance/s/login/?ec=302&startURL=%2Fhondafinance%2Fs%2F), or [CAT Financial](https://www.caterpillar.com/en/brands/cat-financial.html). You have been tasked with building an intelligent system to classify customers into credit score brackets to reduce manual efforts, speed up loan processing, and impact the top-line sales for your company.

### Tasks
Before you start, execute the `Run All Cells` command to check if you have any code or setup issues. Code issues, post them [to Ed Discussion](https://edstem.org/) - and let's get those fixed!

* __Task 1: Setup, Data, Constants__: In this task, we set up the computational environment, load the necessary packages, and prepare the training and test data for the consumer credit classification problem. We will also define any constants we use throughout the problem set.
* __Task 2: Construct and Train a Feedforward Credit Score Classifier__: In this task, we'll construct and train a feedforward neural network model using the training records encoded in the `training_dataset::Vector{Tuple{Vector{Float32}, OneHotVector{UInt32}}}` array. This model will classify consumers' credit scores based on their features. We will use [the `Flux.jl` package](https://github.com/FluxML/Flux.jl) to build and train the model.
* __Task 3: Does the model generalize?__ In this task, we'll check the feedforward network's ability to generalize, i.e., calculate how well the network classifies records it has not seen. We'll compute the fraction of the training and test data that is correctly classified, and answer some design questions.

Tests throughout the notebook (and at the bottom section) help you determine if things are running correctly. Let's go! (Remember to answer the discussion questions.)
___

## Task 1: Setup, Data, and Prerequisites
In this task, we'll set up the computational environment, load the necessary packages, and prepare the `world(...)` function for our personal shopper problem. We will also define any constants we use throughout the problem set.

We set up the computational environment by including the `Include.jl` file, loading any needed resources, such as sample datasets, and setting up any required constants. 
* The `Include.jl` file also loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem. It checks for a `Manifest.toml` file; if it finds one, packages are loaded. Other packages are downloaded and then loaded.

In [1]:
include("Include.jl"); # This will load necessary packages and functions

### Data
Suppose we have a dataset $\mathcal{D} = \left\{(\mathbf{x}_{1},y_{1}),(\mathbf{x}_{2},y_{2}),\dots,(\mathbf{x}_{n},y_{n})\right\}$ containing $n$ records, where each record is a tuple $(\mathbf{x}, y)$, where $\mathbf{x} \in \mathbb{R}^m$ is a vector of features and $y \in \mathcal{Y}$ is a label. The label $y$ is a categorical variable that can take on one of three values: $\mathcal{Y} \equiv \left\{\texttt{Poor},\texttt{Standard},\texttt{Good}\right\}$. We'll start by loading the raw data. The raw dataset is a CSV file containing information about consumer credit score, and is [available on Kaggle](https://www.kaggle.com/datasets/sudhanshu2198/processed-data-credit-score?resource=download). 
* _What's in the dataset?_ The dataset contains apporimately 99k records, where each record contains `21` feature variables related to consumer credit, such as income, age, and other relevant attributes that may influence credit risk. The label variable for each record indicates whether the consumer is a good or bad credit risk, i.e., has a `(Poor | Standard | Good)` credit score. 

Let's load the raw dataset, and save it in the raw data in the `raw_data::DataFrame` variable.

In [2]:
raw_data = CSV.read(joinpath(_PATH_TO_DATA, "credit_score_dataset.csv"), DataFrame)

Row,Delay_from_due_date,Num_of_Delayed_Payment,Num_Credit_Inquiries,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Amount_invested_monthly,Monthly_Balance,Credit_Score,Credit_Mix,Payment_Behaviour,Age,Annual_Income,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Monthly_Inhand_Salary,Changed_Credit_Limit,Outstanding_Debt,Total_EMI_per_month
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,String3,Float64,Float64,String15,String15,String,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,3.0,7.0,4.0,26.8226,265.0,No,80.4153,312.494,Good,Good,High_spent_Medium_value_payments,23.0,19114.1,3.0,4.0,3.0,4.0,1824.84,11.27,809.98,49.5749
2,3.0,7.0,4.0,31.945,265.0,No,118.28,284.629,Good,Good,High_spent_Medium_value_payments,23.0,19114.1,3.0,4.0,3.0,4.0,1824.84,11.27,809.98,49.5749
3,3.0,7.0,4.0,28.6094,267.0,No,81.6995,331.21,Good,Good,High_spent_Medium_value_payments,23.0,19114.1,3.0,4.0,3.0,4.0,1824.84,11.27,809.98,49.5749
4,5.0,4.0,4.0,31.3779,268.0,No,199.458,223.451,Good,Good,High_spent_Medium_value_payments,23.0,19114.1,3.0,4.0,3.0,4.0,1824.84,11.27,809.98,49.5749
5,6.0,4.0,4.0,24.7973,269.0,No,41.4202,341.489,Good,Good,High_spent_Medium_value_payments,23.0,19114.1,3.0,4.0,3.0,4.0,1824.84,11.27,809.98,49.5749
6,8.0,4.0,4.0,27.2623,270.0,No,62.4302,340.479,Good,Good,High_spent_Medium_value_payments,23.0,19114.1,3.0,4.0,3.0,4.0,1824.84,11.27,809.98,49.5749
7,3.0,8.0,4.0,22.5376,271.0,No,178.344,244.565,Good,Good,High_spent_Medium_value_payments,23.0,19114.1,3.0,4.0,3.0,4.0,1824.84,11.27,809.98,49.5749
8,3.0,6.0,4.0,23.9338,271.0,No,24.7852,358.124,Standard,Good,High_spent_Medium_value_payments,23.0,19114.1,3.0,4.0,3.0,4.0,1824.84,11.27,809.98,49.5749
9,3.0,4.0,2.0,24.464,319.0,No,104.292,470.691,Standard,Good,High_spent_Large_value_payments,28.0,34847.8,2.0,4.0,6.0,1.0,3037.99,5.42,605.03,18.8162
10,7.0,1.0,2.0,38.5508,320.0,No,40.3912,484.591,Good,Good,High_spent_Large_value_payments,28.0,34847.8,2.0,4.0,6.0,1.0,3037.99,5.42,605.03,18.8162


Next, let's do some data wrangling. In particular, we will convert the categorical variables to numerical variables, and then z-score the numerical variables (excluding the label variable). 
* The `Payment_of_Min_Amount` variable is a categorical variable with levels: `No`, abd `Yes`. Let's convert it to a numerical variable with levels: `No` $\rightarrow$ `-1` and `Yes` $\rightarrow$ `1`.
* The `Credit_Score` variable is a categorical variable with levels: `Poor`, `Standard`, and `Good`. Let's convert it to a numerical variable with levels: `Poor` $\rightarrow$ `1`, `Standard` $\rightarrow$ `2`, and `Good` $\rightarrow$ `3`.
* The `Credit_Mix` variable is a categorical variable with levels: `Bad`, `Standard`, and `Good`. Let's convert it to a numerical variable with levels: `Bad` $\rightarrow$ `1`, `Standard` $\rightarrow$ `2`, and `Good` $\rightarrow$ `3`.
The `Payment_Behaviour` variable is a categorical variable with multiple levels. Let's convert these to a numerical variable with levels: `Low_spent_Small_value_payments` $\rightarrow$ `1`, `Low_spent_Medium_value_payments` $\rightarrow$ `2`, `Low_spent_Large_value_payments` $\rightarrow$ `3`, `High_spent_Small_value_payments` $\rightarrow$ `4`, `High_spent_Medium_value_payments` $\rightarrow$ `5`, and `High_spent_Large_value_payments` $\rightarrow$ `6`.

In [3]:
dataset = let
    
    dataset = copy(raw_data); # make a copy of the raw data, so we can keep the original intact
    transform!(dataset, :Payment_of_Min_Amount => ByRow(x -> (x=="No" ? -1 : 1)) => :Payment_of_Min_Amount); # maps Payment_of_Min_Amount to -1,1
    transform!(dataset, :Credit_Score => ByRow(s -> convertcreditscore(s))  => :Credit_Score); # maps Credit_Score to 1,2,3
    transform!(dataset, :Credit_Mix => ByRow(s -> convertcreditmix(s))  => :Credit_Mix); # maps Credit_Mix to 1,2,3
    transform!(dataset, :Payment_Behaviour => ByRow(s -> convertcreditbehavior(s))  => :Payment_Behaviour); # maps Payment_Behaviour to 1,2,3,4,5,6
    categorical_fields = ["Credit_Mix", "Payment_Behaviour", "Credit_Score", "Payment_of_Min_Amount"];

    # names to convert -
    all_names = names(dataset)
    names_to_convert = Set{String}()
    for name in all_names
        if (in(name, categorical_fields) == false)
            push!(names_to_convert, name)
        end
    end

    # z-score standardization
    for name in names_to_convert
        name_symbol = Symbol(name);
        μ = mean(dataset[!, name_symbol])
        σ = std(dataset[!, name_symbol])
        transform!(dataset, name_symbol => ByRow(x -> (x - μ) / σ) => name_symbol); # standardize Delay_from_due_date
    end

    # convert categorical fields to categorical    
    dataset;
end

Row,Delay_from_due_date,Num_of_Delayed_Payment,Num_Credit_Inquiries,Credit_Utilization_Ratio,Credit_History_Age,Payment_of_Min_Amount,Amount_invested_monthly,Monthly_Balance,Credit_Score,Credit_Mix,Payment_Behaviour,Age,Annual_Income,Num_Bank_Accounts,Num_Credit_Card,Interest_Rate,Num_of_Loan,Monthly_Inhand_Salary,Changed_Credit_Limit,Outstanding_Debt,Total_EMI_per_month
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Int64,Float64,Float64,Int64,Int64,Int64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,-1.22042,-1.01059,-0.459468,-1.06743,0.440109,-1,-0.581417,-0.424237,3,1,5,-0.954179,-0.819564,-0.914032,-0.741333,-1.31966,0.190514,-0.744377,0.134091,-0.53368,-0.445004
2,-1.22042,-1.01059,-0.459468,-0.0663653,0.440109,-1,-0.387021,-0.554212,3,1,5,-0.954179,-0.819564,-0.914032,-0.741333,-1.31966,0.190514,-0.744377,0.134091,-0.53368,-0.445004
3,-1.22042,-1.01059,-0.459468,-0.718247,0.46017,-1,-0.574824,-0.336938,3,1,5,-0.954179,-0.819564,-0.914032,-0.741333,-1.31966,0.190514,-0.744377,0.134091,-0.53368,-0.445004
4,-1.08554,-1.48906,-0.459468,-0.177194,0.470201,-1,0.0297401,-0.839574,3,1,5,-0.954179,-0.819564,-0.914032,-0.741333,-1.31966,0.190514,-0.744377,0.134091,-0.53368,-0.445004
5,-1.0181,-1.48906,-0.459468,-1.46323,0.480231,-1,-0.781615,-0.288991,3,1,5,-0.954179,-0.819564,-0.914032,-0.741333,-1.31966,0.190514,-0.744377,0.134091,-0.53368,-0.445004
6,-0.88321,-1.48906,-0.459468,-0.981512,0.490262,-1,-0.673751,-0.293702,3,1,5,-0.954179,-0.819564,-0.914032,-0.741333,-1.31966,0.190514,-0.744377,0.134091,-0.53368,-0.445004
7,-1.22042,-0.851097,-0.459468,-1.90486,0.500292,-1,-0.0786576,-0.741088,3,1,5,-0.954179,-0.819564,-0.914032,-0.741333,-1.31966,0.190514,-0.744377,0.134091,-0.53368,-0.445004
8,-1.22042,-1.17008,-0.459468,-1.632,0.500292,-1,-0.867017,-0.211398,2,1,5,-0.954179,-0.819564,-0.914032,-0.741333,-1.31966,0.190514,-0.744377,0.134091,-0.53368,-0.445004
9,-1.22042,-1.48906,-0.977305,-1.52837,0.981756,-1,-0.458836,0.313664,2,1,6,-0.489597,-0.4087,-1.29988,-0.741333,-0.976448,-1.0359,-0.363666,-0.76441,-0.711087,-0.689468
10,-0.950653,-1.96753,-0.977305,1.22463,0.991786,-1,-0.786897,0.378503,3,1,6,-0.489597,-0.4087,-1.29988,-0.741333,-0.976448,-1.0359,-0.363666,-0.76441,-0.711087,-0.689468


Next, let's package the whole dataset into a `Vector{Tuple{Vector{Float32}, OneHotVector{UInt32}}}` data structure. We'll split the data into a training set and a test set in a few operations from now. The training set will be used to train the model, and the test set will be used to evaluate the model's performance.
* _Hmmm_? This is a strange data structure format. However, if we look more closely, this is a direct code representation of the $\mathcal{D}$ dataset. Moreover, the training loop for our classifier can take this data format directly, making training super easy and convenient! Also, notice: We convert everything to `Float32` here.

In [4]:
converted_dataset, features = let
    converted_dataset = Vector{Tuple{Vector{Float32}, OneHotVector{UInt32}}}();
    number_digit_array = [1,2,3]; # this is the digits for the labels

    # which cols do we want to use as features - let's use all *but* the label -
    all_cols = names(dataset);
    features = Array{String,1}();
    for col in all_cols
        if col != :Credit_Score
            push!(features, col);
        end
    end
    features = features |> sort;
    number_of_features = length(features);
    
    # build record tuples -
    for record ∈ eachrow(dataset)

        # convert the label to a one-hot vector
        label = record[:Credit_Score]; # this is the label we want to predict 
        Y = onehot(label, number_digit_array); # convert the label to a one-hot vector
        X = Vector{Float32}();

        for i ∈ eachindex(features)
            feature = features[i]; # get the feature name
            value = record[feature] |> Float32; # get the value of the feature
            push!(X, value); # add the value to the feature vector
        end

        data_tuple = (X, Y); # create a tuple of the feature vector and the label
        push!(converted_dataset, data_tuple); # add the tuple to the dataset
    end

    converted_dataset, features;
end;

__Constants:__ Let's set up some constants that we will use throughout the problem set. See the comments next to each constant for a description of what it is, permissible values, its default value, etc.

In [5]:
θ = 0.60; # what fraction of the data to use for training. Default is 80%
number_of_training_samples = Int64(θ * length(converted_dataset)); # θ of the data will be used for training
number_of_test_samples = length(converted_dataset) - number_of_training_samples; # the rest will be used for testing
numner_of_features = length(features); # the number of features, i.e, the length of x
number_of_classes = 3; # the number of classes for the problem {Poor | Standard | Good}
number_of_epochs = 25; # how many epochs do we want to train for?
number_of_hidden_nodes = 2^10; # TODO: Change the width of the hidden layer

Finally, let's set up the `training_dataset::Vector{Tuple{Vector{Float32}, OneHotVector{UInt32}}}` and `test_dataset::Vector{Tuple{Vector{Float32}, OneHotVector{UInt32}}}` variables. The training dataset will be used to train the model, and the test dataset will be used to evaluate the model's performance.
* The training dataset will be a random sample of $\theta$% of the records in the scaled dataset. We will compute the indices of the training dataset using [the `rand(...)` method](https://docs.julialang.org/en/v1/stdlib/Random/#Base.rand}) in combination with [the `Set` datastructure](https://docs.julialang.org/en/v1/base/collections/#Base.Set) to ensure that the training data has unique indices.
* The test dataset will be the remaining $\left(1-\theta\right)$% of the records in the scaled dataset. We will compute the indices of the test dataset using [the `setdiff(...)` method](https://docs.julialang.org/en/v1/base/collections/#Base.setdiff) to ensure that the test data has unique indices.

In [6]:
training_dataset, test_dataset = let
    
    # initialize -
    training_dataset = Vector{Tuple{Vector{Float32}, OneHotVector{UInt32}}}();
    test_dataset = Vector{Tuple{Vector{Float32}, OneHotVector{UInt32}}}();
    number_of_samples = length(converted_dataset);
    all_index_set = range(1, stop=number_of_samples, step=1) |> Set{Int64};

    # generate a set of random indices for training and testing -
    random_training_index_set = Set{Int64}();
    while length(random_training_index_set) ≤ number_of_training_samples
        random_index = rand(1:number_of_samples);
        push!(random_training_index_set, random_index);
    end
    random_test_index_set = setdiff(all_index_set, random_training_index_set); # the rest of the indices will be used for testing
    
    # populate the training set -
    random_training_index_vector = random_training_index_set |> collect;
    for i ∈ eachindex(random_training_index_vector)
        index = random_training_index_vector[i];
        push!(training_dataset, converted_dataset[index]);
    end
    
    # populate the test set -
    random_test_index_vector = random_test_index_set |> collect;
    for i ∈ eachindex(random_test_index_vector)
        index = random_test_index_vector[i];
        push!(test_dataset, converted_dataset[index]);
    end

    # return -
    training_dataset, test_dataset;
end;

## Task 2: Construct and Train a Feedforward Credit Score Classifier
In this task, we'll construct and train a feedforward neural network model using the training records encoded in the `training_dataset::Vector{Tuple{Vector{Float32}, OneHotVector{UInt32}}}` array. This model will classify consumers' credit scores based on their features. We will use [the `Flux.jl` package](https://github.com/FluxML/Flux.jl) to build and train the model.

We build an empty model with default (random) parameter values but a fixed structure. The number and dimension of the layers and the activation functions for each layer are specified when we build the model (but we'll update the parameters during training).
* _Library_: We use [the `Flux.jl` machine learning library](https://github.com/FluxML/Flux.jl) to construct the neural network model. The model will have three layers: the input layer is a `number_of_features` $\times$ `number_of_hidden_nodes` layer with [tanh activation functions](https://en.wikipedia.org/wiki/Hyperbolic_functions#Tanh), the hidden layer is a `number_of_hidden_nodes` $\times$ `number_of_classes` layer and the output layer is the [softmax function](https://en.wikipedia.org/wiki/Softmax_function).
* _Syntax_: The [`Flux.jl` package](https://github.com/FluxML/Flux.jl) uses some next level syntax. The model is built using [the `Chain` function](https://fluxml.ai/Flux.jl/stable/reference/models/layers/#Flux.Chain), which takes a list of layers as input. Each layer is defined using the [`Dense` type](https://fluxml.ai/Flux.jl/stable/reference/models/layers/#Flux.Dense) (in this case), which takes the number of input and output neurons as arguments. The activation function is an additional argument to [the `Dense` type](https://fluxml.ai/Flux.jl/stable/reference/models/layers/#Flux.Dense). The final layer uses [the `softmax(...)` method exported by the `NNlib.jl` package](https://fluxml.ai/NNlib.jl/dev/reference/#Softmax) to produce a probability distribution over the classes.

In [7]:
# TODO: Uncomment the code below to build the model!
Flux.@layer MyFluxNeuralNetworkModel  trainable=(input, middle, hidden); # create a "namespaced" of sorts
MyModel() = MyFluxNeuralNetworkModel( # a strange type of constructor
    Chain(
        input = Dense(numner_of_features, number_of_hidden_nodes, tanh_fast),  # layer 1
        hidden = Dense(number_of_hidden_nodes, number_of_classes, tanh_fast), # layer 2
        output = NNlib.softmax) # layer 3 (output layer)
);
model = MyModel().chain;

__Loss function__: Next, specify the _loss function_ we will minimize to estimate the model parameters. We'll choose a loss function that is appropriate for a _multiclass classification problem_, namely a [logit cross-entropy loss function](https://fluxml.ai/Flux.jl/stable/reference/models/losses/#Flux.Losses.logitcrossentropy):
$$
\mathcal{L}(\theta) = -\frac{1}{N}\sum_{i=1}^{N}\sum_{j=1}^{C} y_{ij}\log(p_{ij}(\theta))
$$
where the outer summation is over all $N$ training examples, and the inner summation is over the $C$ possible classes. The $y_{ij}$ is the one-hot encoded label for the $i$-th training example, and $p_{ij}$ is the predicted probability of the $i$-th training example being in class $j$. 

In [8]:
# TODO: Uncomment below to setup the loss function -
loss(ŷ, y) = Flux.Losses.logitcrossentropy(ŷ, y; agg = mean); # loss for training multiclass classifiers, what is the agg?

__Optimizer__. We'll use [Gradient descent with momentum](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum) where the `λ` parameter denotes the `learning rate` and `β` denotes the momentum parameter. We save information about the optimizer in the `opt_state` variable, which will eventually get passed to the training method.

In [9]:
λ = 0.61; # TODO: maybe change the learning rate (default: 0.61)?
β = 0.10; # TODO: maybe change the momentum parameter (default: 0.10)?
opt_state = Flux.setup(Momentum(λ,β), model);

We are now ready to train the model. If the `should_we_train = true,` then we use the [Gradient descent with momentum](https://en.wikipedia.org/wiki/Stochastic_gradient_descent#Momentum) to minimize a [logit cross-entropy loss function](https://fluxml.ai/Flux.jl/stable/reference/models/losses/#Flux.Losses.logitcrossentropy).
* _Restart_: Because the error landscape is non-convex, we must start from many different locations. We do `number_of_epochs` passes through the data, i.e., a forward pass for prediction and a backpropagation step for parameter updates. Although the training is a little opaque, intuition suggests that the library chooses different initial parameter guesses for each pass through the data and then drives these to convergence.
* _Training takes a long time_. For each complete pass through the data, i.e., for each `epoch,` we save a `tmp` file holding the network state... just in case of `BOOOOOOOOM.`  We also have some pre-trained models to load if the `should_we_train = false.`

In [10]:
trainedmodel = let

    should_we_train = true; # TODO: set this flag to {true | false}
    
    # training loop -
    localmodel = model; # make a local copy of the model
    if (should_we_train == true)
        for i = 1:number_of_epochs    
            
            # train the model - check out the do block notion: https://docs.julialang.org/en/v1/base/base/#do
            Flux.train!(localmodel, training_dataset, opt_state) do m, x, y
                loss(m(x), y)
            end
        
            # let the user know how we are doing -
            if (rem(i,2) == 0)
                @show "Epoch $i of $number_of_epochs completed" # print the epoch number
            end
        
            # save the state of the model, in case something happens. We can reload from this state
            jldsave(joinpath(_PATH_TO_DATA, "tmp-model-training-checkpoint.jld2"), model_state = Flux.state(localmodel))    
        end
    else
        # if we don't train: load up a previous model
        model_state = JLD2.load(joinpath(_PATH_TO_DATA, "Model-training-N$(number_of_hidden_nodes).jld2"), "model_state");
        Flux.loadmodel!(localmodel, model_state);
    end
    localmodel; # return the *trained* model (either freshly trained or loaded from a saved file)
end

"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 2 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 4 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 6 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 8 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 10 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 12 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 14 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 16 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 18 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 20 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 22 of 25 completed"
"Epoch $(i) of $(number_of_epochs) completed" = "Epoch 24 of 25 completed"


Chain(
  input = Dense(21 => 1024, tanh_fast),  [90m# 22_528 parameters[39m
  hidden = Dense(1024 => 3, tanh_fast),  [90m# 3_075 parameters[39m
  output = NNlib.softmax,
) [90m                  # Total: 4 arrays, [39m25_603 parameters, 100.215 KiB.

## Task 3: Does the model generalize?
In this task, we'll check the feedforward network's ability to generalize, i.e., calculate how well the network classifies records it has not seen. We'll compute the fraction of the training and test data that is correctly classified, and answer some design questions.

Let's start by computing the fraction of the `training_data` records that are correctly classified. This will help us understand how many of the `n` training samples we get correct and how many we get wrong. We expect to be _mostly correct_ on the training data.

In the code block below, we pass feature vectors $\mathbf{x}$ into the `model` instance, compute the predicted label `ŷ,` and compare the predicted and actual labels for the `training_dataset.`
* _Logic_: If the prediction and the actual label agree, we update the `S` variable (a running count of the number of correct classifications). Finally, we compute the fraction of _correct_ classifications by dividing the number of correct predictions by the total number of images in the `training` dataset.

In [11]:
let 
    S_training = 0;
    number_digit_array = [1,2,3]; # this is the digits for the labels
    for i ∈ eachindex(training_dataset)
    
        x = training_dataset[i][1];
        y = training_dataset[i][2];
        ŷ = trainedmodel(x) |> z-> argmax(z) |> z-> number_digit_array[z] |> z-> onehot(z,number_digit_array)
        y == ŷ ? S_training +=1 : nothing
    end
    correct_prediction_training = (S_training/number_of_training_samples)*100 |> x-> round(x, digits=2);
    println("Correct classification on the training data: $(correct_prediction_training)%");
end

Correct classification on the training data: 82.16%


In [12]:
do_I_see_a_traning_error_number = true; # TODO: set this flag to {true | false}

Next, look at the classification performance on the `test_dataset`. In the code block below, we pass feature vectors $\mathbf{x}$ into the `trainedmodel` instance, compute the predicted label `ŷ,` and compare the predicted and actual labels for the `training_dataset.` The logic is the same as above.

In [13]:
let
    S_testing = 0;
    number_digit_array = [1,2,3]; # this is the digits for the labels
    for i ∈ eachindex(test_dataset)
    
        x = test_dataset[i][1];
        y = test_dataset[i][2];
        ŷ = trainedmodel(x) |> z-> argmax(z) |> z-> number_digit_array[z] |> z-> onehot(z, number_digit_array)
        y == ŷ ? S_testing+=1 : nothing
    end
    correct_prediction_test = (S_testing/number_of_test_samples)*100 |> x-> round(x, digits=2);
    println("Correct classification on the test data: $(correct_prediction_test)%");
end

Correct classification on the test data: 81.81%


In [14]:
do_I_see_a_test_error_number = true; # TODO: set this flag to {true | false}

### Discussion Questions

__DQ1__: How sensitive is the model to the training data? Does it _overfit_, i.e., does it perform well on the training data but poorly on the test data?

In [15]:
## Put your answer to DQ (either as a commented code cell, or as a markdown cell)

In [16]:
did_I_answer_DQ1 = true; # TODO: Update this value {true | false} based on whether you answered DQ1 or not

__DQ2__: We made many design choices regarding the model architecture (layers, activation functions, etc.), the loss function, and the optimizer. The best performance that I've found so far (with the default design reported in the solution) is approximately 82% % correct classification (on both training and test). 
* _Can we improve_? How do our design choices affect the model's performance? What would you change if you had to do it again? What would you keep the same? Implement one of your ideas, run the training loop, and report the results. Do we beat 82\%?

In [17]:
## Put your answer to DQ (either as a commented code cell, or as a markdown cell)

In [18]:
did_I_answer_DQ2 = true; # TODO: Update this value {true | false} based on whether you answered DQ2 or not

## Tests
In the code block below, we check some values in your notebook and give you feedback on which items are correct or different. `Unhide` the code block below (if you are curious) about how we implemented the tests and what we are testing.

In [19]:
let
    @testset verbose = true "CHEME 5820 Problem Set 5 Test Suite" begin

        @testset "Task 1: Setup, Prerequisites and Data" begin
            @test _DID_INCLUDE_FILE_GET_CALLED == true
            @test isempty(training_dataset) == false
            @test isempty(test_dataset) == false
        end

        @testset "Task 2: Model" begin
            @test isnothing(model) == false
            @test isnothing(λ) == false
            @test isnothing(β) == false
            @test isnothing(trainedmodel) == false
        end

        @testset "Task 3: Generalization" begin
            @test do_I_see_a_traning_error_number  == true # Test training error
            @test do_I_see_a_test_error_number  == true # Test test error
            @test did_I_answer_DQ1 == true # Test DQ1 answer
            @test did_I_answer_DQ2 == true # Test DQ1 answer
        end
    end
end;

[0m[1mTest Summary:                           | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
CHEME 5820 Problem Set 5 Test Suite     | [32m  11  [39m[36m   11  [39m[0m0.3s
  Task 1: Setup, Prerequisites and Data | [32m   3  [39m[36m    3  [39m[0m0.3s
  Task 2: Model                         | [32m   4  [39m[36m    4  [39m[0m0.0s
  Task 3: Generalization                | [32m   4  [39m[36m    4  [39m[0m0.0s
