# L12b: Solving the Bouquet Design Problem
In this lab, we explore the combinatorial optimization problem known as the Bouquet Design Problem. The objective is to create the most aesthetically pleasing bouquet using a limited number of flowers while satisfying specific design constraints.

> **Learning Objectives**
>
> By the end of this activity, you will be able to:
> * **Apply bandit algorithms to combinatorial optimization**: Use binary‑vector arms to represent and explore discrete choices, demonstrating how bandit methods extend beyond simple action selection to complex combinatorial problems.
> * **Implement context‑aware decision making**: Build contextual bandit models that incorporate problem‑specific constraints (budget, preferences, bounds) to guide exploration and exploitation in structured decision spaces.
> * **Evaluate utility‑based optimization**: Design and optimize bouquets using a Cobb‑Douglas utility function that balances multiple attributes, and analyze how preference parameters and budget constraints influence optimal compositions.

Let's get started!
___

## Problem
The Bouquet Design Problem involves selecting a combination of flowers to create a bouquet that maximizes aesthetic appeal while staying within budget and design constraints. Each flower has attributes such as color, size, and fragrance, which contribute to the overall appeal of the bouquet.

We score each bouquet based on a Cobb-Douglas utility function that considers the attributes of the flowers included in the bouquet:
$$
\begin{align*}
U(n_1, \dots, n_K) = \kappa(\gamma) \prod_{i=1}^{K} n_i^{\gamma_i}
\end{align*}
$$
where $n_{i}$ is the number of flowers of type $i$, $\gamma_i$ is the preference parameter for flower type $i$ (output of a preference model), $\kappa(\gamma)$ scales the function, and $n_i$ is the number of flowers (what we compute). The $\kappa(\gamma)$ term is defined as:

$$
\begin{align*}
\kappa(\gamma) & = \begin{cases}
-1 & \text{if any } \gamma_i < 0 \\
1 & \text{if all } \gamma_i \geq 0
\end{cases}
\end{align*}
$$

The $\gamma_{i}$ preference coefficients can reflect the particular characteristics of each flower, such as color vibrancy, fragrance intensity, and seasonal availability. The $\gamma_{i}$ coefficients can be learned from historical data or set based on expert knowledge. We can incorporate various features, such as flower type, color, and size, into a feature vector $\mathbf{x}_{i}\in\mathbb{R}^{m}$:
$$
\begin{align*}
\gamma_{i} & = \sigma\left(\mathbf{x}^{\top}_{i}\theta_{i}\right)\quad\forall{i}\in\{1,\dots,K\}
\end{align*}
$$
where $\sigma:\mathbb{R}\rightarrow\mathbb{R}$ is an activation function $\sigma_{\theta}(x)\in[-1,1]$,
and $\mathbf{\theta}_{i}\in\mathbb{R}^{p}$ ($p=m+1$) denotes the feature weights (and bias), learned from data or set based on subjective beliefs.

We will use a bandit algorithm to explore different combinations of flowers and learn which combinations yield the highest rewards based on our context model. The combinations of flowers are represented as binary vectors: $\mathbf{a}\in\{0,1\}^{K}$, where $K$ is the total number of flower types available. Each element $a_{i}$ in the vector indicates whether flower type $i$ is included (1) or excluded (0) from the bouquet.

This problem has an analytical solution: [Consumer Choice Problem](CHEME-5800-L12d-CD-ChoiceProblem-Solution-Fall-2025.ipynb). 
___

## Setup, Data, and Prerequisites
First, we set up the computational environment by including the `Include.jl` file and loading any needed resources.

> The [`include(...)` command](https://docs.julialang.org/en/v1/base/base/#include) evaluates the contents of the input source file, `Include.jl`, in the notebook's global scope. The `Include.jl` file sets paths, loads required external packages, etc. For additional information on functions and types used in this material, see the [Julia programming language documentation](https://docs.julialang.org/en/v1/). 

Let's set up our code environment:

In [4]:
include(joinpath(@__DIR__, "Include-solution.jl")); # include the Include.jl file

In addition to standard Julia libraries, we'll also use [the `VLDataScienceMachineLearningPackage.jl` package](https://github.com/varnerlab/VLDataScienceMachineLearningPackage.jl). Check out [the documentation](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/) for more information on the functions, types, and data used in this material.

### Implementation
First, let's implement the `customer(...)` function. This function takes the time index `t::Int`, the action vector `a::Vector{Int64}` from our agent (which encodes what to include in the bouquet as a binary vector), and the problem context `context::MyConsumerChoiceBanditContextModel`, then returns the utility reward `U::Float64` and other data associated with choosing this action—essentially answering: how satisfied are you with this bouquet composition?

The function retrieves base prices from the context model but then observes stochastic market prices by applying random fluctuations `(1 + 0.15*randn())` to simulate real-world price volatility. It computes the optimal quantity `n::Vector{Float64}` of each flower type to purchase given the observed market prices, budget constraint, and preference parameters from the context, then evaluates the resulting utility using a Cobb-Douglas function.

In [7]:
function customer(t::Int, a::Vector{Int64}, 
    context::MyConsumerChoiceBanditContextModel)

    # get data from the context model -
    myitems = context.items; # list of items descriptors (keys for the data dictionary)
    data_dictionary = context.data; # data for the items
    B = context.B; # budget for this customer
    bounds = context.bounds; # bounds for the items
    total_number_of_items = length(myitems); # total number of items
    γ = context.γ; # preference parameters for the items

    # what is the min item purchase?
    min_item_purchase = bounds[1,1];

    # Compute the observed market price vector -
    pₒ = zeros(total_number_of_items); # observed market price vector
    for i ∈ eachindex(myitems)
        item_index = myitems[i];
        pₒ[i] = data_dictionary[item_index]*(1+0.15*randn()); # observe stochastic market price
    end

    # Compute the optimal item count -
    n = zeros(total_number_of_items); # initialize space for optimal solution
    S = findall(aᵢ -> aᵢ == 1, a); # Which items does our bandit want us to buy?
    number_of_selected_arms = length(S);

    # In the set of items to explore, do we have any non-preferred items?
    # If γᵢ < 0, the item is non-preferred but we still buy the minimum required amount
    # If γᵢ ≥ 0, the item is preferred and we allocate budget proportionally
    negative_gamma_flag = any(γ[S] .< 0);
    if (negative_gamma_flag == false)
        
        # Case 1: all selected items are preferred (γᵢ ≥ 0)
        # Allocate budget proportionally based on preference weights
        γ̄ = sum(γ[S]);
        B̄ = B;
        for s ∈ S
            n[s] = (γ[s]/γ̄)*(B̄/pₒ[s]);
        end
    else

        # Case 2: some selected items are non-preferred (γᵢ < 0)
        # Strategy: Buy minimum of non-preferred items, allocate remaining budget to preferred items
        
        # Step 1: Set non-preferred items to minimum purchase amount
        # Step 2: Compute adjusted budget (subtract cost of non-preferred items)
        # Step 3: Allocate remaining budget proportionally to preferred items
        B̄ = B;
        γ̄ = 0.0;
        for s ∈ S
            if (γ[s] < 0.0)
                B̄ += -min_item_purchase*pₒ[s];
                n[s] = min_item_purchase;
            else
                γ̄ += γ[s];
            end
        end

        # allocate remaining budget to preferred items (if any exist with γᵢ ≥ 0)
        if (γ̄ > 0.0)
            for s ∈ S
                if (γ[s] ≥ 0.0)
                    n[s] = (γ[s]/γ̄)*(B̄/pₒ[s]);
                end
            end
        end
    end

    # Compute the scaling factor κ -
    κ = negative_gamma_flag == true ? -1.0 : 1.0;
    
    # Compute the optimal utility -
    U = κ;
    for s ∈ S
        U *= (n[s]^γ[s])
    end
    
    # Return the utility and the item count that we allocated
    return U, n, pₒ, γ
end;

### Constants
Finally, let's set some constants we'll use in the subsequent tasks. See the comment beside the value for a description of what it is, its permissible values, etc.

In [9]:
K = 7; # number of arms for the bandit (number of flower types and other design elements)
T = 500; # number of rounds for each decision task

### Computational Complexity and Algorithmic Approach
We encode the bouquet composition as a binary vector arm, where each bit indicates the inclusion/exclusion of a flower type. 

> __Binary Vector Encoding__
>
> Each bouquet combination is represented as a binary vector $\mathbf{a}\in\{0,1\}^K$. For example, with $K=7$ flower types:
> * $\mathbf{a} = [1,0,1,1,0,0,1]$ includes flowers 1, 3, 4, and 7
> * $\mathbf{a} = [1,1,1,1,1,1,1]$ includes all flowers
> * $\mathbf{a} = [0,0,0,0,0,0,0]$ includes no flowers
>
>We use [the `digits(j, base=2, pad=K)` function](https://docs.julialang.org/en/v1/base/numbers/#Base.digits) to convert an integer index $j$ into its binary representation. This allows us to systematically enumerate and evaluate different combinations as the bandit explores the action space.

The binary vector representation creates $2^K$ possible bouquet combinations (arms). With $K=7$ flower types, we have $2^7 = 128$ possible bouquets, which is computationally tractable for exploration. However, the combinatorial explosion becomes problematic at larger scales: $K=20$ would yield over 1 million combinations, making exhaustive evaluation infeasible.

We use the epsilon-greedy algorithm for this problem, which randomly explores with probability $\epsilon$ and exploits the current best arm with probability $1-\epsilon$, thus we interleave exploration and exploitation. This approach is more sample-efficient than exhaustive search, learning good solutions without evaluating every possibility. 

However, we could consider alternative algorithms that may offer improved performance.

> __Alternative bandit algorithms:__
>
> * __Upper Confidence Bound (UCB)__: Selects arms based on both estimated reward and uncertainty, favoring less-explored arms
> * __Thompson Sampling__: Uses Bayesian posterior sampling to balance exploration and exploitation
>
> The epsilon-greedy approach (our choice) is simpler to implement and interpret, making it suitable for understanding the core concepts of combinatorial bandit problems. More sophisticated algorithms may offer better sample efficiency but at the cost of additional complexity.
___

## Task 1: Let's build the problem context model
In this task, we'll build the problem context model. This model encodes the customer's preferences (γᵢ values), base prices for each flower type, budget constraints, and quantity bounds that structure the bouquet design problem.

We encode the problem context model as an instance of [the `MyConsumerChoiceBanditContextModel.jl` type](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/bandit/#VLDataScienceMachineLearningPackage.MyConsumerChoiceBanditContextModel) which we construct [using a custom `build(...)` function](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/bandit/#VLDataScienceMachineLearningPackage.build). 

The base prices stored in the context model represent expected or reference prices, while the actual market prices observed during arm evaluation will include stochastic variations (±15% random fluctuations) applied in the `customer(...)` function to simulate real-world market volatility.

Let's save this model in the `my_context_model::MyConsumerChoiceBanditContextModel` variable.

In [12]:
my_context_model = let

    # initialize -
    number_of_components = K; # how many flower types / design elements do we have?
    N = 2^K; # number of possible bouquets (arms)
    ϵ = 0.000001; # small number to avoid division by zero
    
    # setup problem components -
    γ = rand(Uniform(-1.0, 1.0), number_of_components); # Preferences: γᵢ::Float64 ∈ [-1.0,1.0] for i=1,...,K
    p = rand(Uniform(1.0, 10.0), number_of_components); # Prices: pᵢ::Float64 ∈ [1.0,10.0] for i=1,...,K
    μₒ = zeros(N); # initial mean utilities μₒ::Vector{Float64} for each arm (bouquet)   
    B = 100.0; # Budget B::Float64 for the bouquet design
    nₒ = ones(K); # initial counts nₒ::Vector{Float64} for each item

    # compute the bounds -
    bounds = zeros(K,2); # bounds::Array{Float64,2} - how much of an element can we have in a bouquet?
    for i in 1:K
        bounds[i,1] = ϵ; # lower bound
        bounds[i,2] = floor(B / p[i]); # upper bound (budget limited)
    end

    # let's build our items descriptor list 
    items = Array{String,1}(undef, K); # items::Vector{String} - flower type identifiers
    for i in 1:K
        items[i] = "Flower-$(i)";
    end

    # next, we'll build our context data model (for now, just base prices keyed by item descriptor)
    context_data = Dict{String,Any}(); # context_data::Dict{String,Any} - stores item properties
    for i in 1:K
        context_data[items[i]] = p[i]; # base price for each item
    end

    # let's build our context model -
    my_context_model = build(MyConsumerChoiceBanditContextModel, (
        B = B, # budget we can spend on bouquet
        items = items, # item descriptors
        bounds = bounds, # bounds on each item
        γ = γ, # customer preferences
        μₒ = μₒ, # initial mean utilities for each arm
        nₒ = nₒ, # initial counts for each item
        data = context_data # context data dictionary
    ));

    my_context_model; # return the context model
end;

Let's build a table that shows the randomly initialized preferences and base prices for each flower type in our bouquet design problem. This table displays the context model parameters that will guide the bandit's exploration.

> __Randomness:__ Note that these are the base (reference) prices stored in the context model. When the `customer(...)` function evaluates an arm, it will observe stochastic market prices with approximately ±15% random variations around these base prices to simulate real-world market fluctuations.

We'll use [the `PrettyTables.jl` package](https://github.com/ronisbr/PrettyTables.jl) to create a nicely formatted table.

In [14]:
let

    # initialize -
    contextmodel = my_context_model;
    K = length(my_context_model.items); # number of possible design elements
    N = 2^K; # number of possible bouquets (arms)
    df = DataFrame();

    # get some data from our context model -
    items = contextmodel.items;
    bounds = contextmodel.bounds;
    prices = [contextmodel.data[items[i]] for i ∈ 1:K];
    preferences = contextmodel.γ;

    for i ∈ 1:K
       
        # package each row of the table
        row_df = (
            flower = items[i],
            price = prices[i],
            preference = preferences[i],
            min_in_bouquet = bounds[i,1],
            max_in_bouquet = bounds[i,2]
        );
        push!(df, row_df);
    end

    # display the table -
    pretty_table(
         df;
         backend = :text,
         table_format = TextTableFormat(borders = text_table_borders__compact)
    );
end

 ---------- --------- ------------ ---------------- ----------------
 [1m   flower [0m [1m   price [0m [1m preference [0m [1m min_in_bouquet [0m [1m max_in_bouquet [0m
 [90m   String [0m [90m Float64 [0m [90m    Float64 [0m [90m        Float64 [0m [90m        Float64 [0m
 ---------- --------- ------------ ---------------- ----------------
  Flower-1   6.17578     0.971319           1.0e-6             16.0
  Flower-2   4.69655   -0.0014216           1.0e-6             21.0
  Flower-3   8.30632     0.396342           1.0e-6             12.0
  Flower-4   5.21173    -0.878798           1.0e-6             19.0
  Flower-5   2.65019     0.870359           1.0e-6             37.0
  Flower-6   6.45072     0.190153           1.0e-6             15.0
  Flower-7   4.90346   -0.0375434           1.0e-6             20.0
 ---------- --------- ------------ ---------------- ----------------


___

## Task 2: Let's design the optimal bouquet using a bandit agent
In this task, we'll use our binary vector arms bandit agent to design the optimal bouquet. The agent will explore different combinations of flowers and learn which combinations yield the highest rewards. On each evaluation, the `customer(...)` function observes stochastic market prices (base price ± 15% random variation) and returns the utility based on these observed prices, simulating a realistic market where prices fluctuate.

We create an [instance of the `MyBinaryVectorArmsEpsilonGreedyAlgorithmModel` type](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/bandit/#VLDataScienceMachineLearningPackage.MyBinaryVectorArmsEpsilonGreedyAlgorithmModel) which holds the number of arms `K::Int` using [a `build(...)` method](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/factory/). To solve the problem, we pass the problem model, the context model, and a pointer to the `customer(...)` function to [the `solve(...)` method](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/bandit/#VLDataScienceMachineLearningPackage.solve-Tuple{MyBinaryVectorArmsEpsilonGreedyAlgorithmModel,%20MyConsumerChoiceBanditContextModel}), which returns:

* `R::Array{Float64,1}`: the reward history over all rounds
* `μ::Array{Float64,1}`: the average reward for each possible bouquet combination
* `S::Array{Float64,2}`: the history of optimal quantities for each flower type
* `P::Array{Float64,2}`: the history of observed market prices for each item (showing price variations across rounds)
* `G::Array{Float64,2}`: the history of preference parameters for each item

Let's call [the `solve(...)` method](https://varnerlab.github.io/VLDataScienceMachineLearningPackage.jl/dev/bandit/#VLDataScienceMachineLearningPackage.solve-Tuple{MyBinaryVectorArmsEpsilonGreedyAlgorithmModel,%20MyConsumerChoiceBanditContextModel}) with `T` iterations and see what happens.

In [17]:
(R, μ, S, P, G) = let

    # initialize -
    problemmodel = build(MyBinaryVectorArmsEpsilonGreedyAlgorithmModel, (
        K = K, # number of arms
    )); # build the problem model
       
    # solve the problem using our bandit agent -
    R, μ, S, P, G = solve(problemmodel, my_context_model; 
        world = customer, T = T);

    (R, μ, S, P, G) # return data 
end;

We can sort the possible bouquet compositions by the estimated average utility `μ::Array{Float64,1}` to see which possible compositions the bandit algorithm thinks are best:

In [19]:
sorted_reward_index_array = sortperm(μ, rev=true) # sort the index of the rewards

128-element Vector{Int64}:
  21
  53
   1
  17
  49
   4
  52
  20
  37
  16
  36
  48
  33
   ⋮
 127
   9
  26
 111
  91
  79
  59
 128
  61
  63
 107
  95

### What's in the best bouquet?
Let's build a table that shows the optimal bouquet composition identified by the bandit algorithm after 500 periods of exploration. 

We call the `customer(...)` function one final time to evaluate the best arm, which observes a fresh realization of stochastic market prices. The utility value `U::Float64` indicates the customer satisfaction level for this bouquet composition, while the weights `wᵢ::Float64` show the proportion of budget allocated to each flower type. The table displays:
* `flower::String`: the flower type identifier
* `pᵢ::Float64`: the observed market price per unit of flower type `i` for this evaluation (base price with stochastic variation)
* `γᵢ::Float64`: the preference parameter for flower type `i` (from the context model)
* `nᵢ::Float64`: the optimal quantity of flower type `i` to include
* `wᵢ::Float64`: the budget weight (proportion) allocated to flower type `i`
* `UC::Float64`: the unit cost (nᵢ × pᵢ) for flower type `i`
* `CS::Float64`: the cumulative spending up to flower type `i`

Notice that the bandit algorithm may have selected a bouquet subset (where some `nᵢ::Float64` values at or near zero) rather than including all flowers, demonstrating how the algorithm balances bouquet complexity with customer utility while adapting to price uncertainty.

In [32]:
let

    # initialize -
    df = DataFrame();
    CS = 0;
    t = 1;
    context = my_context_model;
    myitems = context.items;
    total_number_of_flowers = length(myitems); # total number of flowers that we *could* purchase
    bounds = context.bounds; # bounds for how much we purchase
    B = context.B; # budget for this period
    N = 2^K; # number of possible bouquets (arms)

    # compute the best bouquet -
    j = sorted_reward_index_array[1]; # get a bouquet index
    j = 53
    a = digits(j, base=2, pad=K); # generate a binary representation of the number, with K digits
    if (j == N)
        a = digits(j - 1, base=2, pad=K); # generate a binary representation of the number, with K digits  
    end

    # call world -
    U, n, pₒ, γ = customer(t, a, context);

    @show U,j

    for i ∈ 1:K

        CS += n[i]*pₒ[i];
        wᵢ = (n[i]*pₒ[i]/B);

        row_df = (
            flower = myitems[i],
            pᵢ = pₒ[i],
            γᵢ = γ[i],
            nᵢ = n[i],
            wᵢ = wᵢ,
            UC = n[i]*pₒ[i],
            CS = CS,
        );
        push!(df, row_df);
    end

     pretty_table(df, backend = :text, fit_table_in_display_vertically = false, fit_table_in_display_horizontally = false,
         table_format = TextTableFormat(borders = text_table_borders__compact)
    );
end

(U, j) = (69.32355595087435, 53)
 ---------- --------- ------------ --------- ---------- --------- ---------
 [1m   flower [0m [1m      pᵢ [0m [1m         γᵢ [0m [1m      nᵢ [0m [1m       wᵢ [0m [1m      UC [0m [1m      CS [0m
 [90m   String [0m [90m Float64 [0m [90m    Float64 [0m [90m Float64 [0m [90m  Float64 [0m [90m Float64 [0m [90m Float64 [0m
 ---------- --------- ------------ --------- ---------- --------- ---------
  Flower-1   6.44027     0.971319   6.21124   0.400021   40.0021   40.0021
  Flower-2   4.41888   -0.0014216       0.0        0.0       0.0   40.0021
  Flower-3   8.38258     0.396342   1.94721   0.163226   16.3226   56.3247
  Flower-4   5.40599    -0.878798       0.0        0.0       0.0   56.3247
  Flower-5   2.87286     0.870359   12.4768   0.358442   35.8442   92.1689
  Flower-6   7.66778     0.190153    1.0213   0.078311    7.8311     100.0
  Flower-7   4.46894   -0.0375434       0.0        0.0       0.0     100.0
 ---------- -------

___

## Summary
In this activity, we applied a binary vector arms bandit algorithm to solve the Bouquet Design Problem with stochastic market pricing, demonstrating how combinatorial optimization can be tackled through adaptive exploration of discrete choice combinations under uncertainty.

> __Key Takeaways:__
>
> 1. **Binary vector arms for combinatorial problems**: We represented bouquet compositions as binary vectors where each element indicates inclusion or exclusion of a flower type, showing how bandit algorithms can efficiently explore exponentially large action spaces (2^K possible combinations) without exhaustive enumeration.
> 2. **Context-aware utility optimization under uncertainty**: We separated the problem structure (base prices, preferences, budget constraints) in the context model from the stochastic market observations encountered during arm evaluation. The `customer(...)` function observes market prices with ±15% random fluctuations around base prices, demonstrating how domain knowledge and constraints guide the exploration-exploitation tradeoff toward feasible and desirable solutions even when market conditions are volatile.
> 3. **Adaptive allocation with preference learning**: The epsilon-greedy algorithm learned to allocate budget optimally across flower types by balancing exploration of uncertain combinations with exploitation of high-utility bouquets, handling both positive and negative preferences while respecting quantity bounds and budget constraints. By averaging rewards across multiple stochastic price observations, the algorithm converged to robust bouquet compositions that perform well despite market volatility.

This approach provides a practical framework for sequential decision-making in combinatorial optimization problems where preferences and constraints are known, but prices and utilities must be learned through repeated interaction with stochastic market environments.