#### Challenge

A marketing company wants to optimize its budget allocation for an advertising campaign across three different channels: Channel 1, Channel 2, and Channel 3.

where

| Channel  | Impressions | Viewability | Viewable Impressions | Cost  | Conversions | Conversion Rate | CPI         | CPV  | CPA  |
| -------- | ----------- | ----------- | ------------------- | ----- | ----------- | --------------- | ----------- | ---- | ---- |
| Channel 1 | 1,000,000   | 39%         | 390,000             | 35,000 | 3,510       | 0.9%            | 28.57 | 11.14 | 9.97 |
| Channel 2 | 800,000     | 52%         | 416,000             | 27,000 | 7,904       | 1.9%            | 29.62 | 15.41 | 3.42 |
| Channel 3 | 600,000     | 48%         | 288,000             | 31,000 | 8,640       | 3.0%            | 19.35 | 9.29  | 3.59 |


given the provided information, if the company have a new budget of $100k, how many impressions they need to buy from each channel if they have a goal of minimize the CPV and maintain the following constraints:

- 0.35 <= avg viewability <= 0.60
-  0.02 <= avg conversionrate <= 0.03
-  3 <= avg cpv <= 13
- 3 <= avg cpa <= 6

#### Answer

there are some methods to solve it, first lets talk about them and their pros and cons

- Monte Carlo Simulation:

    - Pros:

        Monte Carlo simulations can handle complex problems with multiple variables and constraints.
        This method enables the exploration of a wide range of possible scenarios by generating random samples.
        The random sampling technique provides a way to capture the inherent uncertainty of the variables, which is common in marketing and advertising campaigns.
        It allows for a better understanding of the risk and potential variability in the solution.

    - Cons:

        Monte Carlo simulations can be computationally intensive, especially for large numbers of iterations.
        The accuracy of the results depends on the quality of the input data and the number of iterations.

##

- Linear Programming

    - Pros:

        Linear programming is an efficient optimization method for problems with linear constraints and objectives.
        It can provide an exact optimal solution given the right conditions.

    - Cons:

        The problem must be linear in nature, which may not accurately represent the advertising campaign's complexities.
        It does not incorporate uncertainty, which is essential in marketing and advertising scenarios.

##

- Genetic Algorithms
    
    - Pros:

        Genetic algorithms are suitable for optimization problems with nonlinear constraints and objectives.
        They can explore a large solution space efficiently.

    - Cons:

        The convergence to the optimal solution can be slow.
        Genetic algorithms may require tuning of various parameters for optimal performance.

##

- Simulated Annealing
    
    - Pros:

        Simulated annealing can solve optimization problems with nonlinear constraints and objectives.
        It can escape local optima and reach global optima, which is crucial in complex problems.

    - Cons:

        Convergence can be slow, and the cooling schedule might need to be fine-tuned.
        It may require a high number of iterations to find the optimal solution.

Given the problem's complexity, Monte Carlo simulation is an appropriate choice as it can handle multiple variables, constraints, and inherent uncertainty in the input data. While other methods such as linear programming, genetic algorithms, and simulated annealing have their pros and cons, the Monte Carlo simulation provides a better balance of exploration and exploitation of the solution space while incorporating uncertainty.

In [158]:
#monte carlo algorithm

import numpy as np
import random
import math

#new budget that we need to allocate
total_budget = 100000

#channel's data as listed above
viewability = np.array([0.39, 0.52, 0.48])
conversion_rate = np.array([0.009, 0.019, 0.03])
cpi = np.array([28.57, 29.62, 19.35])
cpv = np.array([11.14, 15.41, 9.29])
cpa = np.array([9.97, 3.42, 3.59])


num_simulations = 100000
results = []

for i in range(num_simulations):
    #generate a random decimal number between 0 and 1 that reprents how many % of the budget we will alocate to channel 1
    c1_allocation_pct = round(random.random(), 6)

    #generate a random decimal number between 0 and 1 that reprents how many % of the budget we will alocate to channel 2
    c2_allocation_pct = round(random.random(), 6)

    # Ensure that x1 + x2 <= 1
    while c1_allocation_pct + c2_allocation_pct > 1:
        c2_allocation_pct = round(random.random(), 6)

    #given c1 and c2, lets allocate the remaining percentage to c3
    c3_allocation_pct = 1 - (c1_allocation_pct + c2_allocation_pct)

    #now lets convert the allocation percentages to impressions, based on channel's cost per impression
    c1_random_imps = math.ceil((total_budget * c1_allocation_pct) / cpi[0])
    c2_random_imps = math.ceil((total_budget * c2_allocation_pct) / cpi[1])
    c3_random_imps = math.ceil((total_budget * c3_allocation_pct) / cpi[2])

    random_imps = np.array([c1_random_imps, c2_random_imps, c3_random_imps])

    #calculating all the metrics for the current simulation
    total_impressions = math.ceil(np.sum(random_imps))
    total_viewable_impressions = math.ceil(np.sum(random_imps * viewability))
    avg_viewability = round(np.sum(random_imps * viewability) / total_impressions, 2)
    avg_conversion_rate = round(np.sum(random_imps * conversion_rate) / total_viewable_impressions, 4)
    avg_cpv = round(np.sum(random_imps * viewability * cpv) / total_viewable_impressions, 2)
    avg_cpa = round(np.sum(random_imps * viewability * cpa) / total_viewable_impressions, 2)
    total_impression_cost = round(np.sum(random_imps * cpi), 0)
    total_viewable_impression_cost = round(np.sum(random_imps * cpv), 2)

    #generating output data
    data = {
        'simulation': i,
        'channel1_allocation_pct': c1_allocation_pct,
        'channel2_allocation_pct': c2_allocation_pct,
        'channel3_allocation_pct': c3_allocation_pct,
        'total_impressions' : total_impressions,
        'total_viewable_impressions': total_viewable_impressions,
        'avg_viewability': avg_viewability,
        'avg_conversion_rate': avg_conversion_rate,
        'avg_cpv': avg_cpv,
        'avg_cpa': avg_cpa,
        'viewability_check': True if 0.25 <= avg_viewability <= 0.60 else False,
        'conversion_rate_check': True if 0.04 <= avg_conversion_rate <= 0.08 else False,
        'avg_cpv_check': True if 3 <= avg_cpv <= 10 else False,
        'avg_cpa_check': True if 3 <= avg_cpa <= 6 else False,
        'total_impression_cost': total_impression_cost,
        'total_viewable_impression_cost': total_viewable_impression_cost
    }

    #storing in the results array, to we query as a dataframe after
    results.append(data)

Having completed one million iterations of our Monte Carlo simulation, we have generated a comprehensive dataset of randomly allocated budget percentages. We will now proceed to analyze this data to identify the optimal solution for our advertising campaign.

For that, we simply need to know how to query the data! No needed advanced visualizations and techniques. They can be powerful tools for exploring and understanding data, but they can sometimes prove to be challenging or time-consuming. Complex visualizations may require a significant amount of effort to create and may not always be effective in conveying insights to a diverse audience. Additionally, advanced techniques often necessitate specialized knowledge or skillsets, which may not be readily available to all members of a team.

Moreover, creating visualizations or implementing advanced techniques can be resource-intensive and may divert attention from the core objective of the analysis. In some cases, a more straightforward approach, such as querying the dataset, may provide equally valuable insights in a more efficient and accessible manner. It is crucial to strike a balance between using advanced methods and ensuring that the process remains practical and comprehensible to facilitate informed decision-making.

In [162]:
import pandas as pd

df = pd.DataFrame(results)

# First, lets insert a column with the percentage of contraints that each simulation achieved, because we want cenarios that all the 4 constraints is True
df['constraints_check_pct'] = df[['viewability_check', 'conversion_rate_check', 'avg_cpv_check', 'avg_cpa_check']].sum(axis=1) / 4

# After that, lets filter the dataframe to get the cenarios that all 4 constraints are True and the total impression cost is near to our expected budget

filtered_df = df[
    df['constraints_check_pct'] == 1
    & df['total_impression_cost'].between(total_budget-10, total_budget+10)
]

# Now, we can filter the top 10 cenarios by total impression cost and total viewable impression cost in ascending order
filtered_df.sort_values(by = ['total_viewable_impression_cost', 'total_impression_cost'], ascending = [True, True])[:10]

Unnamed: 0,simulation,channel1_allocation_pct,channel2_allocation_pct,channel3_allocation_pct,total_impressions,total_viewable_impressions,avg_viewability,avg_conversion_rate,avg_cpv,avg_cpa,viewability_check,conversion_rate_check,avg_cpv_check,avg_cpa_check,total_impression_cost,total_viewable_impression_cost,constraints_check_pct
85639,85639,0.45311,0.000871,0.546019,4411,1975,0.45,0.0501,9.87,5.59,True,True,True,True,100007.0,43930.65,1.0
1263,1263,0.427656,0.011199,0.561145,4435,1996,0.45,0.0507,9.89,5.45,True,True,True,True,100010.0,44203.16,1.0
5800,5800,0.421111,0.006167,0.572722,4455,2007,0.45,0.0511,9.85,5.42,True,True,True,True,100010.0,44242.37,1.0
89102,89102,0.403938,0.028722,0.56734,4443,2010,0.45,0.051,9.95,5.33,True,True,True,True,100005.0,44485.01,1.0
65483,65483,0.39968,0.027769,0.572551,4452,2015,0.45,0.0512,9.94,5.31,True,True,True,True,100010.0,44522.51,1.0
34551,34551,0.389109,0.042651,0.56824,4443,2016,0.45,0.0511,10.0,5.26,True,True,True,True,100009.0,44676.45,1.0
64411,64411,0.377981,0.022752,0.599267,4497,2043,0.45,0.052,9.88,5.2,True,True,True,True,100006.0,44695.92,1.0
86722,86722,0.367683,0.021253,0.611064,4517,2056,0.45,0.0524,9.85,5.14,True,True,True,True,100010.0,44784.52,1.0
79034,79034,0.371381,0.046766,0.581853,4465,2033,0.46,0.0516,10.0,5.17,True,True,True,True,100006.0,44851.81,1.0
12573,12573,0.339314,0.013622,0.647064,4578,2093,0.46,0.0535,9.77,5.0,True,True,True,True,100010.0,45008.94,1.0


Soo, the ideal allocation for $100k budget is the first row, where it have the lower viewable impression cost

| Channel | Allocation Percentage |
| -------- | -------- | 
|  Channel 1 |  45.311% | 
|  Channel 2 | 0,0871% | 
| Channel 3 | 54.6019% |
| **Total** | **100%** | 


The beauty of querying the dataframe to search for insights on montecarlo simulations is that, this approach allows us to uncover data that might have been overlooked if we had solely focused on the rows that strictly adhere to all constraints.

If we didn't find a solution given our constraints, we can simply remove the filter and look at the dataset all the outputs that the simulation provided
    
By examining these partially matching rows, we can gain a deeper understanding of the dataset and potentially identify alternative solutions or strategies that are worth exploring further.

In essence, querying the dataset enables us to cast a wider net and capture a more comprehensive view of the data landscape. This broader perspective not only enriches our analysis but also contributes to more informed and robust decision-making.