# PS4: Let's build a Personal Shopper
In this problem set, we'll build a simple personal shopper agent that helps users (the world) find items they might want to buy based on their preferences. The agent allows users to encode their sentiments and then suggests combinations of items that maximize user satisfaction (measured using a utility function), subject to a budget constraint.
* _How will we do this?_ We will use a greedy bandit problem algorithm to find the best item combination that maximizes user satisfaction without exceeding their budget. The teaching team modified the `L7b` codebase to address choices of combinations and provided a budget-aware `world(...)` function (definition below).
* _Multiple choices_: The bandit problems in `Week 7` estimated the best _single choice_ among $K$ competing alternatives (arms). In this case, shoppers can choose multiple items from a list of $K$ items simultaneously. Thus, we have $N = 2^{K}$ possible combinations. We address this by recommending action vectors $\mathbf{a}\in \{0,1 \}^{N}$, where $a_i = 1$ indicates choice and $a_i = 0$ indicates non-choice. We compute the reward of each action vector during our reasoning process.
* _Budget constraint_: The bandit algorithm will also need to respect a budget constraint, which means that the total cost of the chosen items cannot exceed a specified budget. This is done by modifying the `world(...)` function with [a quadratic penalty term](https://en.wikipedia.org/wiki/Penalty_function) that is _subtracted_ from the utility if the total cost of the chosen items exceeds the budget. The penalty term reflects the difference between total cost and budget, discouraging choices that exceed the budget. The bandit's sensitivity to this constraint is indicated by the hyperparameter $\lambda\geq 0$. If $\lambda = 0$, the bandit ignores the budget and maximizes utility. If $\lambda>0$, the bandit seeks to maximize utility while adhering to the budget. A larger $\lambda$ indicates greater sensitivity to the budget constraint.

### Tasks
Before you start, execute the `Run All Cells` command to check if you have any code or setup issues. Code issues, post them [to Ed Discussion](https://edstem.org/) - and let's get those fixed!

* __Task 1: Setup, Data, Constants__: In this task, we set up the computational environment, load the necessary packages, and prepare the `world(...)` function for our personal shopper problem. We will also define any constants we use throughout the problem set.
* __Task 2: Build the Context Models__:In this task, we will build several models of the contextual information used to inform the agent's recommendations. These models, which are [instances of the `MyBanditConsumerContextModel` type](src/Types.jl), hold various parameters that will be used in the `world(...)` function that we developed in Task 1.
* __Task 3: Evaluation of Scenarios__: In this task, we'll run different context models to evaluate how well our agent performs under various scenarios. We will use the same bandit algorithm in all cases but vary the context model to see how it influences the agent's decisions and performance. We display the results, and ask you a few discussion questions.

Tests throughout the notebook (and at the bottom section) help you determine if things are running correctly. Let's go! (Remember to answer the discussion questions.)

## Task 1: Setup, Data, and Prerequisites
In this task, we'll set up the computational environment, load the necessary packages, and prepare the `world(...)` function for our personal shopper problem. We will also define any constants we use throughout the problem set.

We set up the computational environment by including the `Include.jl` file, loading any needed resources, such as sample datasets, and setting up any required constants. 
* The `Include.jl` file also loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem. It checks for a `Manifest.toml` file; if it finds one, packages are loaded. Other packages are downloaded and then loaded.

In [3]:
include("Include.jl"); # This will load necessary packages and functions

First, build the `world(...)` function. 
* The `world(...)` function takes the action vector `a::Array{Int64,1}` where the elements of `a::Array{Int64,1}` are binary variables indicating whether to select an item (`1`) or not (`0`). The length of the action vector `a` is $N$, the total number of _combinations_ available for selection. The function also takes the array `n::Array{Float64,1}` that contains the amount of each good to purchase (specified by the shopper beforehand). Finally, the `world(...)` function takes a `context` model, which encapsulates the personal shopper's environment, including the budget constraint and the penalty for exceeding it. More on the `context` in `Task 2`.

We've assumed a _linear utility function_ for the personal shopper problem, where the utility is a linear combination of the items chosen minus a penalty for exceeding the budget. The utility function $U:\mathbb{R}^{N}\rightarrow\mathbb{R}$ is defined as follows:
$$
\begin{align*}
U_{\lambda}(\mathbf{a}, \mathbf{n},\mathbf{\gamma}) = \sum_{i=1}^{N} a_{i} \cdot \gamma_{i}\cdot n_i - \lambda \cdot \left[\max(0, \sum_{i=1}^{N} c_i \cdot n_{i} \cdot a_i - B)\right]^{2}
\end{align*}
$$
where $\gamma_{i}$ is the user preference for option $i$, the term $n_i$ denotes the number of goods of type $i$ purchased, and $a_{i}\in\left\{0,1\right\}$ denotes whether item $i$ is selected (1) or not (0). The quadratic penalty term is subtracted from the utility if the total cost exceeds the budget, where $c_i$ is the cost of item $i$.

__Hmmm__. Sometimes, we are uncertain about the benefit gained when we purchase good $i$, so let's add some randomness to the problem. In the presence of uncertainty, the utility function becomes:
$$
\begin{align*}
U_{\lambda}(\mathbf{a}, \mathbf{n},\mathbf{\gamma}) = \sum_{i=1}^{N} a_{i} \cdot (\gamma_{i} + \sigma_{i} \cdot Z_i) \cdot n_i - \lambda \cdot \left[\max(0, \sum_{i=1}^{N} c_i \cdot n_{i} \cdot a_i - B)\right]^{2}
\end{align*}
$$
where $Z_i \sim \mathcal{N}(0,1)$ is a random variable drawn from a standard normal distribution for each item $i$, and $\sigma_{i}\geq{0}$ denotes the strength of the uncertainty associated with good $i$ (hyperparameter set by the shopper). This adds a stochastic element to the utility function, making it more realistic in scenarios where the benefits of purchasing items are uncertain.

In [5]:
function world(a::Vector{Int64}, n::Array{Float64,1}, context::MyBanditConsumerContextModel)::Float64

    # initialize -
    γ = context.γ; # consumer preferences (unknown to bandits)
    σ = context.σ; # noise in utility calculation (unknown to bandits)
    B = context.B; # max budget (unknown to bandits)
    C = context.C; # unit costs of goods (unknown to bandits)
    λ = context.λ; # sensitivity to the budget
    Z = context.Z; # noise model
    number_of_goods = context.m; # number of possible combinations

    # compute the reward for this choice -
    Ū = 0.0; # initial utility
    BC = 0.0; # initial budget constraint
    for i ∈ 1:number_of_goods
        
        # what action is being taken in this category?
        aᵢ = a[i]; # this is which good to purchase in category i -
        if (aᵢ == 0)
            # if aᵢ is 0, it means no good was chosen in this category, 
            # hence we should skip this category in the utility calculation
            continue; # continue to the next iteration (skip everything after this line for this category)
        end

        nᵢ = n[i]; # this is the quantity purchased of good aᵢ in category i
        Cᵢ = C[i]; # cost of chosen good in category i
        γᵢ = γ[i]; # preference of good in category i
        σᵢ = σ[i]; # standard dev for good i
   
        # update the utility and the budget constraint -
        Ū += γᵢ*(nᵢ + σᵢ*rand(Z)); # compute the utility for this good, with noise. We'll use a linear utility model
        BC += nᵢ*Cᵢ; # compute the budget constraint -
    end

    # compute the utility with the budget constraint
    U = Ū - λ*max(0.0, (BC - B))^2; # use a penalty method to capture budget constraint

    # return the reward -
    return U;
end;

Next, let's build a model of our agent's decision-making process.  We've modified the $\epsilon$-greedy algorithm from `L7b` to work with our contextual multiple-choice bandit problem. The algorithm will now consider the context provided by [the `MyBanditConsumerContextModel` model](src/Types.jl) and can select multiple items from the list of possible choices.

#### Epsilon-Greedy with Multiple Choice Algorithm
In the _epsilon-greedy_ algorithm, the agent chooses the best _single_ arm with probability $1-\epsilon$ and a random arm with probability $\epsilon$. This approach balances exploration and exploitation by allowing the agent to explore different arms while exploiting the best-known arm based on past rewards. The parameter $\epsilon$ controls the exploration rate: a higher value means more exploration, while a lower value means more exploitation.

The agent has $N$ arms (choices), $\mathcal{A} = \left\{1,2,\dots, N\right\}$, where we transform the index of each arm into a binary vector [using the `digits(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.digits), `base = 2`, and `pad = K`. Thus, each arm corresponds to a unique combination of items, with the total number of arms being $N = 2^{K}$, where $K$ is the number of items available. The binary vector $\mathbf{a} = (a_1, a_2, \ldots, a_N)$ encodes the combination of items chosen (1) or not (0). 
* For example, if $K=3$ and the agent chooses items 1 and 3, the binary vector would be $\mathbf{a} = (1, 0, 1)$, corresponding to an arm with `index = 5`.

Based on the epsilon-greedy strategy, the agent will decide which arm to pull (a combination of items) during each round. The agent maintains a list of average rewards for each arm (combination of items) and updates these values based on the rewards received from the adversary (nature) over $T$ rounds. The algorithm works as follows:

__Intialization__: Specify the `world(...)` method, a `context` model, and the set of possible actions $\mathcal{A} = \left\{1,2,\dots, N\right\}$ where $N = 2^{K}$, and $K$ denotes the number of items available for selection. 

For $t = 1,2,\dots,T$:
1. Roll a random number $p\in\left[0,1\right]$ and compute a threshold $\epsilon_{t}={t^{-1/3}}\cdot\left(K\cdot\log(t)\right)^{1/3}$.
2. _Exploration_: If $p\leq\epsilon_{t}$, choose a random $a_{t}\in\mathcal{A}$, and compute the binary representation of $a_{t}$ using [the `digits(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.digits) with `base = 2` and `pad = K`. 
Execute the action $a_{t}$, i.e., evaluate the `world(...)` method, and receive a reward $r_{t}$ from the _adversary_ (nature). 
3. _Exploitation_: Else if $p>\epsilon_{t}$, choose action $a^{\star}$, the action with the highest average reward so far. Execute the action $a^{\star}_{t}$ and recieve a reward $r_{t}$ from the _adversary_ (nature).
4. Update average reward data for $a_{t}\in\mathcal{A}$. We used a moving average for the rewards of each action $a_i$: $\mu_{a,t} = \mu_{a,t-1} + \alpha\left(r_{t} - \mu_{a,t-1}\right)$, where $\alpha>0$ is a learning rate hyperparameter. This modification allows the algorithm to adaptively (and _efficiently_) learn the average reward for each action over time, giving more weight to recent rewards while still considering past performance.

We've implemented this logic in [the `solve(...)` method](src/Bandit.jl) if you want to check it out!

__Constants__: Set constants we'll use in the subsequent tasks. See the comment beside the value for a description of what it is, its permissible values, etc.
* __TODO__: Suppose we have `K = 12` different goods to choose from. Set the number of simulation `T::Int64` rounds to $T\geq{2}^{K}$. For the budget, set `B = 100` USD (you can play with this later if you wish to see what happens with different budgets).

In [8]:
K = 12; # TODO: Let's consider 12 different goods/categories initially
T = 2^14; # TODO: number of rounds for each decision task (should be geq 2^{K})
B = 100.0; # TODO: Budget for shopper, assume 100 USD. We can change this later if we want

Finally, let's build [a `MyEpsilonGreedyAlgorithmModel` instance](src/Types.jl), which encapsulates the $\epsilon$-greedy logic that has been modified to work with multiple selections and our choices of constants.
* __TODO__: To construct this type, we [use a custom `build(...)` method](src/Factory.jl) that takes the number of items `K::Int64` and the quantity to purchase `n::Array{Float64,1}` array and returns an algorithm model in the `algorithm::MyEpsilonGreedyAlgorithmModel` variable. In this simulation, let `n` be an array of ones.

In [10]:
algorithm = let

    # initialize -
    algorithm = nothing; # initialize the algorithm variable to nothing, this variable will be used to store the algorithm model
    n = ones(Float64, K); # for now, let's assume that we purchase a single unit of each good in each category (we can change this later)
    
    # TDOD: Build an algorithm model by by uncommenting the code block below
    algorithm = build(MyEpsilonGreedyAlgorithmModel, (
        K = K, # arms 
        n = n, # items dictionary
    ));

    # return the algorithm -
    algorithm;
end;

## Task 2: Build the Context Models
In this task, we will build several models of the contextual information used to inform the agent's recommendations. These models, which are [instances of the `MyBanditConsumerContextModel` type](src/Types.jl), hold various parameters that will be used in the `world(...)` function that we developed above. 
* _Hmmm. Why use a different model for contextual data_? We use a separate model for the contextual information because it allows us to encapsulate all the relevant parameters and settings in one place. This makes it easier to manage and modify the parameters as needed without changing the core logic of the `world(...)` function. Additionally, it allows us to easily pass around context models to other parts of our codebase that may need it.
* _What does this represent_? The contextual information in the `MyBanditConsumerContextModel` defines the parameters that will be used to score the utility of the goods chosen by the agent. This includes user sentiment parameters, budget constraints, and other relevant information to help the agent decide which goods to recommend.

Let's build the following contextual models:
* __Case 1: Unlimited budget, uniform positive sentiment__: This model assumes that the consumer has an unlimited budget ($\lambda = 0$) and uniform positive sentiment across all goods. This means that the consumer is equally likely to choose any good in each category, and there are no constraints on the amount of each good that can be selected. 
* __Case 2: Limited budget, positive sentiment__: This model assumes that the consumer has a limited budget $\lambda>0$ and positive sentiment (but not necessarily uniform) towards goods. This means that the consumer is more likely to choose goods they have a positive sentiment towards, and there are constraints on the amount of each good that can be selected based on the budget.
* __Case 3: Limited budget, mixed sentiment__: This model assumes that the consumer has a limited budget $\lambda>0$ and mixed sentiment towards goods. This means that some goods may have positive sentiment (i.e., preferred), while others may have negative sentiment (i.e., not preferred). The agent must balance the positive and negative sentiments when making recommendations, and there are constraints on the amount of each good that can be selected based on the budget.

Let's start with __case 1__. We save this case in the `simple_no_budget_context::MyBanditConsumerContextModel` variable.

In [13]:
simple_no_budget_context = let

    # initialize -
    context = nothing; # initialize the context variable to nothing; this variable will be used to store the context model
    K = algorithm.K; # number of arms in the algorithm, this should match the number of goods in the context model
    γ = Array{Float64,1}(undef, K); # consumer preferences (unknown to bandits)
    σ = Array{Float64,1}(undef, K); # noise in utility calculation (unknown to bandits)
    C = Array{Float64,1}(undef, K); # unit costs of goods (unknown to bandits)
    Z = Normal(0,1); # use a standard normal distribution for the noise model; this can be changed to any distribution as required
    λ = 0.0; # sensitivity to the budget constraint λ ≥ 0. If zero, then no penalty for budget constraint violation.

    # set the parameters -
    # preferences: If all γ[i] are equal to 1.0, then the bandit will be indifferent to the goods in each category.
    for i ∈ 1:K
        # Assigning values for γ, σ, and C for each good in the context model
        # For simplicity, let's assume we have K goods with equal preference
        # This can be customized as per the requirement of the simulation
        γ[i] = 1.0; # uniform preference for all goods
        σ[i] = 0.1; # uniform uncertainty for all goods, this can be adjusted based on the specific needs of the simulation
        C[i] = 10.0 + 10.0 * (i - 1); # linearly increasing costs for goods, this can be customized as per the requirement
    end

    # TODO: Uncomment the code below to build the context model -
    # build a context model with the required parameters -
    context = build(MyBanditConsumerContextModel, (
        γ = γ, # consumer preferences (unknown to bandits)
        σ = σ, # noise in utility calculation (unknown to bandits)
        B = B, # max budget (unknown to bandits)
        C = C, # unit costs of goods (unknown to bandits)
        λ = λ, # sensitivity to the budget
        Z = Z, # noise model
        m = K, # number of categories (this should match the number of arms in the algorithm)
    )); # build the context

    # return 
    context;
end;

__Case 2__. In this scenario, the consumer has a limited budget and positive sentiment. We set a budget constraint and define the user sentiment parameters to reflect positive sentiment towards all goods. We save this case in the `simple_with_budget_context::MyBanditConsumerContextModel` variable.`

In [15]:
simple_with_budget_context = let

    # initialize -
    context = nothing; # initialize the context variable to nothing; this variable will be used to store the context model
    K = algorithm.K; # number of arms in the algorithm, this should match the number of goods in the context model
    γ = Array{Float64,1}(undef, K); # consumer preferences (unknown to bandits)
    σ = Array{Float64,1}(undef, K); # noise in utility calculation (unknown to bandits)
    C = Array{Float64,1}(undef, K); # unit costs of goods (unknown to bandits)
    Z = Normal(0,1); # use a standard normal distribution for the noise model; this can be changed to any distribution as required
    λ = 10000.0; # sensitivity to the budget constraint λ ≥ 0. If zero, then no penalty for budget constraint violation.

    # set the parameters -
    # preferences: If all γ[i] are equal to 1.0, then the bandit will be indifferent to the goods in each category.
    for i ∈ 1:K
        # Assigning values for γ, σ, and C for each good in the context model
        # For simplicity, let's assume we have K goods with equal preference
        # This can be customized as per the requirement of the simulation
        γ[i] = 1.0; # uniform preference for all goods
        σ[i] = 0.1; # uniform uncertainty for all goods, this can be adjusted based on the specific needs of the simulation
        C[i] = 10.0 + 10.0 * (i - 1); # linearly increasing costs for goods, this can be customized as per the requirement
    end

    # TODO: Uncomment the code below to build the context model -
    # build a context model with the required parameters -
    context = build(MyBanditConsumerContextModel, (
        γ = γ, # consumer preferences (unknown to bandits)
        σ = σ, # noise in utility calculation (unknown to bandits)
        B = B, # max budget (unknown to bandits)
        C = C, # unit costs of goods (unknown to bandits)
        λ = λ, # sensitivity to the budget
        Z = Z, # noise model
        m = K, # number of categories (this should match the number of arms in the algorithm)
    )); # build the context

    # return 
    context;
end;

__Case 3__. The consumer has a limited budget and mixed sentiments toward selecting possible goods. We define positive and negative user sentiment parameters to reflect the mixed sentiment towards goods and set $\lambda > 0$.  We save this case in the `mixed_with_budget_context::MyBanditConsumerContextModel` variable.
* __TODO__: To demonstrate the role of sentiment, make the goods with _even_ index positive, while the goods with _odd_ indexes be a negative value. Hint: check out [the `iseven(...)` method](https://docs.julialang.org/en/v1/base/numbers/#Base.iseven)

In [17]:
mixed_with_budget_context = let

    # initialize -
    context = nothing; # initialize the context variable to nothing, this variable will be used to store the context model
    K = algorithm.K; # number of arms in the algorithm, this should match the number of goods in the context model
    γ = Array{Float64,1}(undef, K); # consumer preferences (unknown to bandits)
    σ = Array{Float64,1}(undef, K); # noise in utility calculation (unknown to bandits)
    C = Array{Float64,1}(undef, K); # unit costs of goods (unknown to bandits)
    Z = Normal(0,1); # use a standard normal distribution for the noise model, this can be changed to any distribution as required
    λ = 10000.0; # sensitivity to the budget constraint λ ≥ 0. If zero, then no penalty for budget constraint violation.

    # TODO: Use the iseven method: even goods are positive, odd are negative
    # set the parameters -
    # preferences: If all γ[i] are equal to 1.0, then the bandit will be indifferent to the goods in each category.
    for i ∈ 1:K
        # Assigning values for γ, σ, and C for each good in the context model
        # For simplicity, let's assume we have K goods with equal preference
        # This can be customized as per the requirement of the simulation
        
        if (iseven(i) == true)
            γ[i] = 1.0; # positive preference for even indexed goods
        else
            γ[i] = -10.0; # negative preference for odd indexed goods
        end
        
        σ[i] = 0.1; # uniform uncertainty for all goods, this can be adjusted based on the specific needs of the simulation
        C[i] = 10.0 + 10.0 * (i - 1); # linearly increasing costs for goods, this can be customized as per the requirement
    end

    # TODO: Uncomment the code below to build the context model -
    # build a context model with the required parameters -
    context = build(MyBanditConsumerContextModel, (
        γ = γ, # consumer preferences (unknown to bandits)
        σ = σ, # noise in utility calculation (unknown to bandits)
        B = B, # max budget (unknown to bandits)
        C = C, # unit costs of goods (unknown to bandits)
        λ = λ, # sensitivity to the budget
        Z = Z, # noise model
        m = K, # number of categories (this should match the number of arms in the algorithm)
    )); # build the context

    # return 
    context;
end;

## Task 3: Evaluation of Scenarios
In this task, we'll run different context models to evaluate how well our agent performs under various scenarios. We will use the same bandit algorithm in all cases but vary the context model to see how it influences the agent's decisions and performance. 

For each context, [we call the `solve(...)` method](src/Bandit.jl) to simulate the agent's decision-making process over `T` rounds, given the `world(...)` function and the context model. The [`solve(...)` method](src/Bandit.jl) returns the results of the simulation in the `results_case_*::Array{Float64,2}` array. The results array should be a $T\times{N}$ array.

In [19]:
results_case_1 = solve(algorithm, T = T, world = world, context=simple_no_budget_context); # compute allocation for case 1
results_case_2 = solve(algorithm, T = T, world = world, context=simple_with_budget_context); # compute allocation for case 2
results_case_3 = solve(algorithm, T = T, world = world, context=mixed_with_budget_context); # compute allocation for case 3

### Case 1: Unlimited Budget, Uniform Positive Sentiment
In this case, we evaluate the agent's performance using the `simple_no_budget_context` model. This model assumes an unlimited budget and uniform positive sentiment across all goods.

In [21]:
table(results_case_1, algorithm, simple_no_budget_context) |> df -> pretty_table(df, tf = tf_simple)

 [1m  good [0m [1m purchase [0m [1m    cost [0m [1m benifit [0m
 [90m Int64 [0m [90m   String [0m [90m Float64 [0m [90m Float64 [0m
      1        Yes      10.0       1.0
      2        Yes      20.0       1.0
      3        Yes      30.0       1.0
      4        Yes      40.0       1.0
      5        Yes      50.0       1.0
      6        Yes      60.0       1.0
      7        Yes      70.0       1.0
      8         No       0.0       0.0
      9        Yes      90.0       1.0
     10        Yes     100.0       1.0
     11        Yes     110.0       1.0
     12        Yes     120.0       1.0


How much did our agent spend, how much benefit was gained, and what is the return on investment in this case (benefit/spend)? 
* `Unhide` the code block below to see the results of the agent's performance under this scenario. We expect to see the user purchase (almost) everything, as there is no preference between goods, and the agent has an unlimited budget.

In [23]:
let

    # initialiize -
    results = results_case_1; # use results from case 1 for this example
    context = simple_no_budget_context;
    K = algorithm.K; # number of categories
    n = algorithm.n; # recommended number of items to purchase in each category
    γ = context.γ; # user preference for each good (unknown to bandits)
    C = context.C; # unit costs of goods (unknown to bandits)

    # compute the best collection of goods -
    K = algorithm.K; # number of arms in the algorithm
    N = 2^K; # number of possible goods combinations (2^K) - this is the total number of combinations of goods we can have 

    μ = zeros(Float64, N); # average reward for each possible goods combination
    for a ∈ 1:N
        μ[a] = filter(x -> x != 0.0, results[:,a]) |> x-> mean(x)

        # fix NaN -
        if (isnan(μ[a]) == true)
            μ[a] = -Inf; # replace NaN with a big negative
        end
    end
    î = argmax(μ); # compute the arm with best average reward
    aₜ = digits(î, base=2, pad=K); # which goods do we select?

    U = Array{Float64,1}(undef, K); # initialize the array to store the goods selected
    for i ∈ 1:K
       U[i] = aₜ[i]*n[i]*γ[i]; # store the goods selected in the array
    end

    spend = Array{Float64,1}(undef, K); # initialize the array to store the spend on each good
    for i ∈ 1:K
        # calculate the spend on each good based on the recommended quantity and unit cost
        spend[i] = aₜ[i]*n[i] * C[i]; # total spend on each good
    end

    S̄ = sum(spend); # total spend for case 2
    Ū = sum(U); # total utility for case 2
    println("Case 1: Agent spent: $(S̄) USD and rcvd $(Ū) utils. ROI: $(Ū/S̄) util/USD"); # print total spend for case 1
    
end;

Case 1: Agent spent: 700.0 USD and rcvd 11.0 utils. ROI: 0.015714285714285715 util/USD


#### Discussion
__DQ1__: When I ran case 1 with `T = 2^14` the agent selected _all (or nearly all) the items_ (you may have a different result?). What does this tell you about the agent's decision-making process under an unlimited budget and uniform positive sentiment? A somewhat subtle effect is visible in this case; think about the ROI relative to the other cases and what this result says about utility.

In [25]:
## Put your answer to DQ1 (either as a commented code cell, or as a markdown cell)

In [26]:
did_I_answer_DQ1 = true; # TODO: Update this value {true | false} based on whether you answered DQ1 or not

### Case 2: Limited Budget, Uniform Positive Sentiment
In this case, we will evaluate the agent's performance using the `simple_with_budget_context` model. This model assumes a limited budget and uniform positive sentiment across all goods.

In [28]:
table(results_case_2, algorithm, simple_with_budget_context) |> df -> pretty_table(df, tf = tf_simple) 

 [1m  good [0m [1m purchase [0m [1m    cost [0m [1m benifit [0m
 [90m Int64 [0m [90m   String [0m [90m Float64 [0m [90m Float64 [0m
      1        Yes      10.0       1.0
      2        Yes      20.0       1.0
      3        Yes      30.0       1.0
      4        Yes      40.0       1.0
      5         No       0.0       0.0
      6         No       0.0       0.0
      7         No       0.0       0.0
      8         No       0.0       0.0
      9         No       0.0       0.0
     10         No       0.0       0.0
     11         No       0.0       0.0
     12         No       0.0       0.0


How much did our agent spend, and how much benefit was gained? 
* `Unhide` the code block below to see the results of the agent's performance under this scenario. We expect to see the agent purchase a limited selection. While there is no preference between goods, and the agent no longer has an unlimited budget.

In [30]:
let

    # initialiize -
    results = results_case_2; # use results from case 1 for this example
    context = simple_no_budget_context;
    K = algorithm.K; # number of categories
    n = algorithm.n; # recommended number of items to purchase in each category
    γ = context.γ; # user preference for each good (unknown to bandits)
    C = context.C; # unit costs of goods (unknown to bandits)

    # compute the best collection of goods -
    K = algorithm.K; # number of arms in the algorithm
    N = 2^K; # number of possible goods combinations (2^K) - this is the total number of combinations of goods we can have 

    μ = zeros(Float64, N); # average reward for each possible goods combination
    for a ∈ 1:N
        μ[a] = filter(x -> x != 0.0, results[:,a]) |> x-> mean(x)

        # fix NaN -
        if (isnan(μ[a]) == true)
            μ[a] = -Inf; # replace NaN with a big negative
        end
    end
    î = argmax(μ); # compute the arm with best average reward
    aₜ = digits(î, base=2, pad=K); # which goods do we select?

    U = Array{Float64,1}(undef, K); # initialize the array to store the goods selected
    for i ∈ 1:K
       U[i] = aₜ[i]*n[i]*γ[i]; # store the goods selected in the array
    end

    spend = Array{Float64,1}(undef, K); # initialize the array to store the spend on each good
    for i ∈ 1:K
        # calculate the spend on each good based on the recommended quantity and unit cost
        spend[i] = aₜ[i]*n[i] * C[i]; # total spend on each good
    end

    S̄ = sum(spend); # total spend for case 2
    Ū = sum(U); # total utility for case 2
    println("Case 2: Agent spent: $(S̄) USD and rcvd $(Ū) utils. ROI: $(Ū/S̄) util/USD"); # print total spend for case 2
end;

Case 2: Agent spent: 100.0 USD and rcvd 4.0 utils. ROI: 0.04 util/USD


#### Discussion
__DQ2__: In the presence of a budget constraint, the agent's behavior changes. When I ran this, I saw the agent selected a few items but not all. Sometimes, the agent didn't spend the entire budget; in other cases, we see all the $B$ USD spent. In cases where _all the budget_ is spent, I see the first only four items selected.  What does this tell you about the trade-off between utility and budget constraints? Why are certain items chosen while others are not?

In [32]:
## Put your answer to DQ2 (either as a commented code cell, or as a markdown cell)

In [33]:
did_I_answer_DQ2 = true; # TODO: Update this value {true | false} based on whether you answered DQ2 or not

### Case 3: Limited Budget, Mixed Sentiment
In this case, we will evaluate the agent's performance using the `mixed_with_budget_contextt` model. This model assumes a limited budget and mixed positive and negative sentiment across all goods.

In [35]:
table(results_case_3, algorithm, mixed_with_budget_context) |> df -> pretty_table(df, tf = tf_simple) 

 [1m  good [0m [1m purchase [0m [1m    cost [0m [1m benifit [0m
 [90m Int64 [0m [90m   String [0m [90m Float64 [0m [90m Float64 [0m
      1         No       0.0      -0.0
      2         No       0.0       0.0
      3         No       0.0      -0.0
      4        Yes      40.0       1.0
      5         No       0.0      -0.0
      6        Yes      60.0       1.0
      7         No       0.0      -0.0
      8         No       0.0       0.0
      9         No       0.0      -0.0
     10         No       0.0       0.0
     11         No       0.0      -0.0
     12         No       0.0       0.0


How much did our agent spend, and how much benefit was gained? 
* `Unhide` the code block below to see the results of the agent's performance under this scenario. We expect to see the agent purchase a limited selection (since the agent no longer has an unlimited budget) and only positive sentiment items. 

In [37]:
let

    # initialiize -
    results = results_case_3; # use results from case 1 for this example
    context = simple_no_budget_context;
    K = algorithm.K; # number of categories
    n = algorithm.n; # recommended number of items to purchase in each category
    γ = context.γ; # user preference for each good (unknown to bandits)
    C = context.C; # unit costs of goods (unknown to bandits)

    # compute the best collection of goods -
    K = algorithm.K; # number of arms in the algorithm
    N = 2^K; # number of possible goods combinations (2^K) - this is the total number of combinations of goods we can have 

    μ = zeros(Float64, N); # average reward for each possible goods combination
    for a ∈ 1:N
        μ[a] = filter(x -> x != 0.0, results[:,a]) |> x-> mean(x)

        # fix NaN -
        if (isnan(μ[a]) == true)
            μ[a] = -Inf; # replace NaN with a big negative
        end
    end
    î = argmax(μ); # compute the arm with best average reward
    aₜ = digits(î, base=2, pad=K); # which goods do we select?

    U = Array{Float64,1}(undef, K); # initialize the array to store the goods selected
    for i ∈ 1:K
       U[i] = aₜ[i]*n[i]*γ[i]; # store the goods selected in the array
    end

    spend = Array{Float64,1}(undef, K); # initialize the array to store the spend on each good
    for i ∈ 1:K
        # calculate the spend on each good based on the recommended quantity and unit cost
        spend[i] = aₜ[i]*n[i] * C[i]; # total spend on each good
    end

    S̄ = sum(spend); # total spend for case 2
    Ū = sum(U); # total utility for case 2
    println("Case 3: Agent spent: $(S̄) USD and rcvd $(Ū) utils. ROI: $(Ū/S̄) util/USD"); # print total spend for case 2
end;

Case 3: Agent spent: 100.0 USD and rcvd 2.0 utils. ROI: 0.02 util/USD


#### Discussion
__DQ3__: In this case, the agent's decision-making process is influenced by both positive and negative sentiments. When I ran this, I see that the agent selected a different set of items compared to the previous cases (and only positive goods). What does this tell you about the impact of mixed sentiment on the agent's recommendations? 

In [39]:
## Put your answer to DQ3 (either as a commented code cell, or as a markdown cell)

In [40]:
did_I_answer_DQ3 = true; # TODO: Update this value {true | false} based on whether you answered DQ3 or not

__DQ4__: What happens if we make the problem impossible, e.g., we set the sentiment to negative for all items that are less than the budget constraint?

In [42]:
## Put your answer to DQ4 (either as a commented code cell, or as a markdown cell)

In [43]:
did_I_answer_DQ4 = true; # TODO: Update this value {true | false} based on whether you answered DQ4 or not

## Tests
In the code block below, we check some values in your notebook and give you feedback on which items are correct or different. `Unhide` the code block below (if you are curious) about how we implemented the tests and what we are testing.

In [45]:
let 
    @testset verbose = true "CHEME 5820 Problem Set 4 Test Suite" begin

        @testset "Task 1: Setup, Prerequisites and Data" begin
            @test _DID_INCLUDE_FILE_GET_CALLED == true
            @test isdefined(Main, :world) == true
            @test isnothing(algorithm) == false
            @test T ≥ 2^K
        end

        @testset "Task 2: Context models" begin
            @test isnothing(simple_no_budget_context) == false # Test for simple no budget context
            @test isnothing(simple_with_budget_context) == false # Test for simple with budget context
            @test isnothing(mixed_with_budget_context) == false # Test for mixed context with budget
            @test simple_no_budget_context.λ == 0; # Check that the no-budget context has λ = 0
            @test simple_with_budget_context.λ > 0; # Check that the with-budget context has λ > 0
            @test mixed_with_budget_context.λ > 0; # Check that the mixed context has λ > 0
        end

        @testset "Task 3: Scenarios" begin
            @test isempty(results_case_1) == false # Test for results from case 1
            @test isempty(results_case_2) == false # Test for results from case 2
            @test isempty(results_case_3) == false # Test for results from case 3
            @test did_I_answer_DQ1 == true # Test DQ1 answer
            @test did_I_answer_DQ2 == true # Test DQ2 answer
            @test did_I_answer_DQ3 == true # Test DQ3 answer
            @test did_I_answer_DQ4 == true # Test DQ4 answer
        end
    end
end;

[0m[1mTest Summary:                           | [22m[32m[1mPass  [22m[39m[36m[1mTotal  [22m[39m[0m[1mTime[22m
CHEME 5820 Problem Set 4 Test Suite     | [32m  17  [39m[36m   17  [39m[0m0.2s
  Task 1: Setup, Prerequisites and Data | [32m   4  [39m[36m    4  [39m[0m0.2s
  Task 2: Context models                | [32m   6  [39m[36m    6  [39m[0m0.0s
  Task 3: Scenarios                     | [32m   7  [39m[36m    7  [39m[0m0.0s
