# Activity: Introduction to Functions, Scope and Error Handling
In this activity, you will practice defining functions, understanding scope, and handling errors in Julia. 
However, while the activity is designed for Julia, you can adapt it to any programming language of your choice.

## Setup, Data, and Prerequisites
First, we set up the computational environment by including the `Include.jl` file and loading any needed resources.
* __Include__: The [include command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). In addition to standard Julia libraries, we'll also use [the `VLDataScienceMachineLearningPackage.jl` package](https://github.com/varnerlab/VLDataScienceMachineLearningPackage.jl).

In [11]:
include("Include.jl");

### Types and Functions
Let's implement a function to build an [instance of a `Normal(...)` distribution](https://docs.julialang.org/en/v1/stdlib/Distributions/#Distributions.Normal) which is a type exported from [the `Distributions.jl` package](https://github.com/JuliaStats/Distributions.jl). The function will take two parameters: the mean and standard deviation of the normal distribution and will return an instance of the distribution.

In [22]:
function build(distribution::Type{T}; 
    μ::Float64 = 0.0, σ::Float64 = 1.0)::Union{ContinuousUnivariateDistribution, Nothing} where T <: ContinuousUnivariateDistribution
    
    # initialize -
    d = nothing; # defaul value of d is nothing

    # Approach 1: Ungraceful check: if σ is not positive, thrown an assertion error, 
    # @assert σ > 0.0 "Standard deviation must be positive"
    
    # Approach 2: Agressive programming: if σ is not positive, force it to be positive
    # σ = abs(σ) # force σ to be positive - agressive programming!!
    # d = T(μ, σ); # create a distribution of type T with parameters μ and σ

    # Approach 3: Defensive programming: if σ is not positive, warn the user, and return nothing
    # if (σ <= 0.0)
    #     @warn "Standard deviation must be positive. Returning nothing."
    #     return d;
    # end

    # # Approach 4: Graceful programming: if σ is not positive, throw an error using try-catch
    try
        d = T(μ, σ); # create a distribution of type T with parameters μ and σ    
    catch error
        @error "Catch block: Failed to create distribution of type $(T): $(error)"
        return d; # return nothing if an error occurs
    end

    # this will return a distribution of type T
    return d;
end;

Next, implement a function to sample from the normal distribution. This function will take an instance of the distribution and a number of samples to draw from it, returning an array of sampled values.

In [31]:
function sample(distribution::ContinuousUnivariateDistribution, 
    number_of_samples::Int64 = 100)::Union{Nothing, Array{Float64,1}}
    
    # check: must have a positive number of samples
    # Let's use Approach 3: Defensive programming to check that the number of samples is positive
    if (number_of_samples <= 0)
        @warn "Number of samples must be positive. Returning nothing."
        return nothing; # return an empty array if the number of samples is not positive
    end
    
    
    # calls the rand function from the Distributions package
    return rand(distribution, number_of_samples) 
end;

## Task 1: Create a Normal Distribution Model
In this task, we'll use the `build(...)` function to create an instance of a Normal distribution with specified parameters. We'll first do the happy path, where the parameters are valid, and then handle an error case where the standard deviation is negative; your function should gracefully handle this error.

### Happy Path
On the happy path, all is correct and right with the world! Users behavior rationally, and call functions with values for the arguements that we expect. Create an instance of a Normal distribution with parameters $\mu = 0.25$ and $\sigma = 2.5$ using the `build(...)` method that you implemented above. Store the instance in a variable called `d`:

In [24]:
d = let
    
    # initailize - 
    μ = 0.25; # mean specified in the problem
    σ = 2.5; # standard deviation specified in the problem

    # call the build function to create a distribution
    distribution = build(Normal; μ=μ, σ=σ);

    distribution; # return to the caller
end;

__Check__: Do we get what we expect? The `d` variable should be an instance of the `Normal` distribution with the specified parameters. Let's develop a test to check this.

### Error cases
In the error case, we will call the `build(...)` function with a negative standard deviation. The function should raise an error, which we will catch and handle gracefully.
* _What is graceful error handling?_ Graceful error handling means that the program provides a meaningful message to the user about what went wrong, and some direction on how to fix it. Depending on the context, it may also mean tha the program terminates in a controlled manner, rather than crashing unexpectedly, or that it continues to run in a safe state.

We'll consider four different approaches to error handling in this task, starting with an ungraceful error handling approach and then moving to more graceful methods.

#### Approach 1: Assert statements
Let's start with an __ungraceful__ error handling case, where we use [an `@assert` statement](https://docs.julialang.org/en/v1/base/base/#Base.@assert) to check that the standard deviation is non-negative. If the assertion fails, it will raise an error and stop the program.

In [15]:
let
    # initailize - 
    μ = 0.25; # mean specified in the problem
    σ = -2.5; # standard deviation specified in the problem

    # what happens in the case?
    distribution = build(Normal; μ=μ, σ=σ);

    # checK do we get anything back?
    if distribution === nothing
        @info "No distribution created due to invalid parameters."
    else
        @info "Distribution created successfully: $(distribution)"
    end
end

AssertionError: AssertionError: Standard deviation must be positive

__Is this a good approach?__ No, this is not a good approach for error handling in production code. Assertions are typically used for debugging and should not be relied upon for user input validation. Notice that the program will stop execution if the assertion fails, which is not a graceful way to handle errors (we are not returning a distribution instance, we are jsut stopping the program).

#### Approach 2: Automatic correction
Next, we will use an agressive stance, in which we _automatically correct_ the error by taking the absolute value of the standard deviation. This is not a recommended practice, as it can lead to unexpected results, but it is an example of how to handle errors in a way that allows the program to continue running.

Comment out the `@assert` statement, remove the comments on Approach 2 and reload the `build(...)` function. We expect how the program to return a distribution instance with a positive standard deviation, even if the user provided a negative value.

In [18]:
let
    # initailize - 
    μ = 0.25; # mean specified in the problem
    σ = -2.5; # standard deviation specified in the problem

    # what happens in the case?
    distribution = build(Normal; μ=μ, σ=σ);

    # checK do we get anything back?
    if distribution === nothing
        @info "No distribution created due to invalid parameters."
    else
        @info "Distribution created successfully: $(distribution)"
    end
end

┌ Info: Distribution created successfully: Normal{Float64}(μ=0.25, σ=2.5)
└ @ Main /Users/jeffreyvarner/Desktop/julia_work/CHEME-140-eCornell-Repository/CHEME-140-eCornell-Repository/courses/CHEME-141/module-2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X31sZmlsZQ==.jl:13


__Is this a good approach?__ No, this is not a good approach for error handling in production code. Automatically correcting user input can lead to unexpected results, it also hides potential upstreams errors which is also not a good practice. It is better to inform the user of the error and let them correct it. 

#### Approach 3: Defensive programming
Next, we will implement a more graceful error handling approach by checking the standard deviation manually and returning an error message if it is negative, along with a distribution instance set to `nothing`. This allows the program to continue running, but informs the user of the error.

Comment out Approach 2, remove the comments on Approach 3, and reload the `build(...)` function. We expect the program to return a distribution instance set to `nothing` and an error message if the standard deviation is negative.

In [20]:
let
    # initailize - 
    μ = 0.25; # mean specified in the problem
    σ = -2.5; # standard deviation specified in the problem

    # what happens in the case?
    distribution = build(Normal; μ=μ, σ=σ);

    # checK do we get anything back?
    if distribution === nothing
        @info "No distribution created due to invalid parameters."
    else
        @info "Distribution created successfully: $(distribution)"
    end
end

└ @ Main /Users/jeffreyvarner/Desktop/julia_work/CHEME-140-eCornell-Repository/CHEME-140-eCornell-Repository/courses/CHEME-141/module-2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X11sZmlsZQ==.jl:16
┌ Info: No distribution created due to invalid parameters.
└ @ Main /Users/jeffreyvarner/Desktop/julia_work/CHEME-140-eCornell-Repository/CHEME-140-eCornell-Repository/courses/CHEME-141/module-2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X34sZmlsZQ==.jl:11


__Is this a good approach?__ Yes, defensive programming is a good approach for error handling in production code. It allows the program to continue running, while informing the user of the error and providing a way to correct it. However, it is still not ideal, becuase it requires the developer, i.e., you to check for every possible error case manually (which can be tedious and error-prone).

#### Approach 4: Try-Catch exception handling
Finally, we will implement a more graceful error handling approach using [exception handling](https://docs.julialang.org/en/v1/manual/control-flow/#Exception-Handling-1). This allows us to catch errors and handle them gracefully, without stopping the program execution. We will use [a `try-catch` block](https://docs.julialang.org/en/v1/base/base/#try) to try some logic, and catch any errors that may occur. If an error occurs, we will return a distribution instance set to `nothing` and an error message.

Comment out Approach 3, remove the comments on Approach 4, and reload the `build(...)` function. We expect the program to return a distribution instance set to `nothing` and an error message if the standard deviation is negative. However, this time we will not have to check for every possible error case manually, as the exception handling mechanism will take care of it for us.

In [23]:
let
    # initailize - 
    μ = 0.25; # mean specified in the problem
    σ = -2.5; # standard deviation specified in the problem

    # what happens in the case?
    distribution = build(Normal; μ=μ, σ=σ);

    # checK do we get anything back?
    if distribution === nothing
        @info "No distribution created due to invalid parameters."
    else
        @info "Distribution created successfully: $(distribution)"
    end
end

┌ Error: Catch block: Failed to create distribution of type Normal: DomainError(-2.5, "Normal: the condition σ >= zero(σ) is not satisfied.")
└ @ Main /Users/jeffreyvarner/Desktop/julia_work/CHEME-140-eCornell-Repository/CHEME-140-eCornell-Repository/courses/CHEME-141/module-2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X11sZmlsZQ==.jl:24
┌ Info: No distribution created due to invalid parameters.
└ @ Main /Users/jeffreyvarner/Desktop/julia_work/CHEME-140-eCornell-Repository/CHEME-140-eCornell-Repository/courses/CHEME-141/module-2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X40sZmlsZQ==.jl:11


__Is this a good approach?__ Yes, the try-catch exception handling is a good approach for error handling in production code. It allows us to catch errors and handle them gracefully, without stopping the program execution. It also allows us to handle multiple error cases in a single block of code, making it easier to maintain and extend the code in the future. Its the recommended approach for error handling in production code.

## Task 2: Sample from the Normal Distribution
In this task, we will implement a function to sample from the normal distribution. The function will take an instance of the distribution and a number of samples to draw from it, returning an array of sampled values.

### Error handling
Implement a defensive programming approach to error handling in the sampling function. If the `number_of_samples::Int64` is less than or equal to zero, the function should return an empty array and an warning message using [the `@warn` macro](https://docs.julialang.org/en/v1/stdlib/Logging/#Logging.@logmsg) letting the user know that the number of samples must be greater than zero. 

In [None]:
samples = let

    # initailize -
    model = d;
    number_of_samples = 100; # TODO: what happens, when we make this ≤ 0?

    # call the sample function to create samples
    samples = sample(model, number_of_samples);

    if samples === nothing
        @info "No samples created due to invalid parameters."
    else
        @info "Samples created successfully"
    end

    samples; # return to the caller
end;

┌ Info: Samples created successfully
└ @ Main /Users/jeffreyvarner/Desktop/julia_work/CHEME-140-eCornell-Repository/CHEME-140-eCornell-Repository/courses/CHEME-141/module-2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X45sZmlsZQ==.jl:13


### Visualize the samples
Once all of our functions have been implemented, and we supply the correct parameters, we can visualize the samples using using [the `histogram(...)` method exported from `UnicodePlots.jl` package](https://github.com/JuliaPlots/UnicodePlots.jl.git). There are many other plotting libararies avaliable, but let's keep it simple for now (retro is cool!).

In [51]:
let

    # initailize -
    number_of_samples = length(samples); # get the number of samples
    nbins = 20; # default: number of bins for the histogram
    #nbins = round(Int, log2(number_of_samples) + 1) # TODO: Uncomment to try Sturges' rule
    #nbins = round(Int, 2.0*(number_of_samples)^(1/3)) # TODO: Uncomment to try Scott's rule
    
    # call the histogram function to plot the samples, pass in the number of bins
    histogram(samples, nbins=nbins, closed=:left);
end

                  [38;5;8m┌                                        ┐[0m 
   [-10.0,  -9.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m▏[0m 2                                     [38;5;8m [0m [38;5;8m[0m
   [ -9.0,  -8.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m▏[0m 1                                     [38;5;8m [0m [38;5;8m[0m
   [ -8.0,  -7.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m▍[0m 15                                    [38;5;8m [0m [38;5;8m[0m
   [ -7.0,  -6.0) [38;5;8m┤[0m[38;5;2m[0m[38;5;2m▊[0m 38                                    [38;5;8m [0m [38;5;8m[0m
   [ -6.0,  -5.0) [38;5;8m┤[0m[38;5;2m██[0m[38;5;2m▎[0m 102                                 [38;5;8m [0m [38;5;8m[0m
   [ -5.0,  -4.0) [38;5;8m┤[0m[38;5;2m█████[0m[38;5;2m▌[0m 263                              [38;5;8m [0m [38;5;8m[0m
   [ -4.0,  -3.0) [38;5;8m┤[0m[38;5;2m██████████[0m[38;5;2m▋[0m 514                         [38;5;8m [0m [38;5;8m[0m
   [ -3.0,  -2.0) [38;5;8m┤[0m[38

__Hmmmm__: With only 100 samples, this does not look like a normal distribution! Why? The [`histogram(...)` method](https://github.com/JuliaPlots/UnicodePlots.jl.git) is plotting the samples as a histogram, but we don't enough samples! Let's generate more samples, say 1000, or 10000, and see if the histogram looks more like a normal distribution.
* _Increase the number of samples?_ When we do more samples, e.g., 10000 samples and `nbins = 20` we see a histogram that looks like a normal distribution. This is because the more samples we take, the closer we get to the true distribution of the data. It looks like we are on the right track!
* _Increase the number of bins?_ When we increase the number of bins, we see a more detailed histogram, but it also becomes more noisy. There are several rules of thumb for choosing the number of bins, such as the [Sturges' rule](https://en.wikipedia.org/wiki/Sturges%27s_rule?utm_source=chatgpt.com) or the [Rice rule](hhttps://en.wikipedia.org/wiki/Histogram?utm_source=chatgpt.com#Rice_rule). We've implemented both of these rules above, which one do you think is better? The Sturges' rule is more conservative, while the Rice rule is more aggressive. It depends on the data and the context, but in general, the Rice rule is more suitable for larger datasets, while the Sturges' rule is more suitable for smaller datasets.

### Are the statistics correct?
Finally, we can check if the statistics of the samples are correct. We can use [the `mean(...)`](https://docs.julialang.org/en/v1/stdlib/Statistics/#Statistics.mean) and [`std(...)` methods from the `Statistics.jl` package](https://docs.julialang.org/en/v1/stdlib/Statistics/#Statistics.std) to calculate the mean and standard deviation, their error bonds from the samples, and compare them to the parameters of the normal distribution we created earlier.

* _Uncertainty_: We expect the uncertainty of the estimated mean to be given by $\mu_{x} \pm \frac{\sigma_{x}}{\sqrt{n}}$, where $\mu_{x}$ is the sample mean of the normal distribution, $\sigma_{x}$ is the sample standard deviation of the normal distribution, and $n$ is the number of samples.

Does our sample mean $\mu_{x}$ fall within the uncertainty bounds? If it does, we can say that our sample mean is consistent with the parameters of the normal distribution we created earlier. If it does not, we can say that our sample mean is not consistent with the parameters of the normal distribution we created earlier.

In [54]:
let

    # initailize -
    number_of_samples = length(samples); # get the number of samples
    
    σₓ = std(samples); # compute the standard deviation of the samples
    μₓ = mean(samples); # compute the mean of the samples
    LB = μₓ - σₓ/sqrt(number_of_samples); # compute the lower bound
    UB = μₓ + σₓ/sqrt(number_of_samples); # compute the upper bound

    @info "Sample mean: $(μₓ), Sample standard deviation: $(σₓ), Lower bound: $(LB), Upper bound: $(UB)"
end

┌ Info: Sample mean: 0.25299068454662393, Sample standard deviation: 2.4627332579783765, Lower bound: 0.22836335196684016, Upper bound: 0.2776180171264077
└ @ Main /Users/jeffreyvarner/Desktop/julia_work/CHEME-140-eCornell-Repository/CHEME-140-eCornell-Repository/courses/CHEME-141/module-2/jl_notebook_cell_df34fa98e69747e1a8f8a730347b8e2f_X55sZmlsZQ==.jl:11
