# Cost per Click (CPC) Bid Minimization for Keywords

Google keyword planner provides forecasts like the expected impressions, clicks, and conversions for different keywords depending on the daily budget ***specified by the user***. Also, it provides the minimum and maximum ***top of page bids*** required for a given keyword. We used this data to determine the CPC for each keyword that ***minimizes the average CPC*** hence maximizing clicks given the budget. 

![Google Keyword Planner](kw_planner.png "Google Keyword Planner")

We fit a regression model for each keyword that accurately defines the relationship between the daily budget and clicks:

$$clicks_k = b_0 + b_1 * ln(budget) + b_2 * ln(budget)^2$$

Then we defined the objective function that we are trying to minimize:

$${CPC_{avg}} = \frac{1}{K} * \sum_{k=0}^{K}{\frac{budget_k}{clicks_k}}$$

subject to:

$$budget = \sum_{k=0}^{K} budget_k = const$$
$$bid_{min}^k \leq \frac{budget_k}{clicks_k} \leq bid_{max}^k$$

where 

K: Number of keywords

bid: top of page bid

The objective function is subject to the following constraints:
1. Sum of budgets is equal to the budget specified by the user. 
2. CPC for each keyword is between the minimum and maximum top of page bids provided by the keyword planner.

We picked 4 keywords in the Weleda campaign, fitted a model for each one, and set a daily budget. Then, we minimized the objective function. The minimized function sets the ***minimum CPC*** for each keyword that ***maximizes the total clicks*** given the daily budget.

The interpretation is that the models ***sets the CPC to the minimum top of page bid*** when the keyword ***saturates at a low number of clicks***.

## Code

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import sklearn as sk
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import Ridge

In [2]:
kw_lst=["weleda hair tonic.csv","weleda sunscreen.csv","weleda_solskydd.csv","skinfood_weleda.csv"]
data_lst=[]

In [3]:
for i in range(len(kw_lst)):
    data = pd.read_csv(kw_lst[i])
    data["daily_budget"] = data["daily_budget"].astype(float)
    data["clicks"] = data["clicks"].astype(float)
    data_lst.append(data)

In [5]:
coef_lst=[]
interc_lst=[]
for i,x in enumerate(data_lst):
    poly=PolynomialFeatures(2)
    #xpoly=poly.fit_transform(np.power(daily_data[["daily_budget"]], 1/2.3))
    xpoly=poly.fit_transform(np.log(x[["daily_budget"]]))
    model = Ridge(alpha=1, solver='lsqr')
    model.fit(xpoly, x["clicks"].values)
    coef_lst.append(model.coef_[1:])
    interc_lst.append(model.intercept_)
    print(f"Score: {kw_lst[i][:-4]}", model.score(xpoly, x["clicks"].values))

coef_lst=np.array(coef_lst)
interc_lst=np.array(interc_lst)

Score: weleda hair tonic 0.9953490028219891
Score: weleda sunscreen 0.9636586295859639
Score: weleda_solskydd 0.994395205957459
Score: skinfood_weleda 0.9954561225600115


In [8]:
import numpy as np
import sympy as sp

from scipy.optimize import fsolve
from sympy import symbols, solve

constr_cpc= np.array([[2.94,15.34],[1.74,4.76],[2.36,5.73],[1.73,4.55]])
constr_cost=[]
min_range=0
max_range=0
#solve(x*30.4/(coef_lst[0][1]*sp.log(0) + coef_lst[0][2]*sp.log(x)**2 +interc_lst[0]) - 15.34 , x)
for i in range(len(data_lst)):
    x = symbols('x')
    func_np1 = sp.lambdify(x, x*30.4/(coef_lst[i][0]*sp.log(x) + coef_lst[i][0]*sp.log(x)**2 +interc_lst[i]) - constr_cpc[i][0], modules=['numpy'])
    solution1 = np.ceil(fsolve(func_np1, 1))
    min_range = min_range+solution1
    func_np2 = sp.lambdify(x, x*30.4/(coef_lst[i][1]*sp.log(x) + coef_lst[i][1]*sp.log(x)**2 +interc_lst[i]) - constr_cpc[i][1], modules=['numpy'])    
    solution2 = np.floor(fsolve(func_np2, 1))
    max_range = max_range+solution2
    constr_cost.append([np.round(solution1,2),np.round(solution2,2)])


In [18]:
def objective(x):
  return np.sum(x*30.4/(coef_lst[:,0]*np.log(x) + coef_lst[:,1]*np.log(x)**2 +interc_lst)) / x.shape[0]

DAILY_BUDGET = max_range  # daily sum
constr = [{'type':'eq','fun': lambda x: np.sum(x) - DAILY_BUDGET}]

def constraint_eqn(x, coef, intercpt, lim):
  return (x*30.4/(coef[0]*np.log(x) + coef[1]*np.log(x)**2 +intercpt) - lim)

for i in range(len(kw_lst)):
  
  def dummy(idx=i):
    """Dummy function to force the constraint equation to use the current value of i not the last.
    """
    constr.append({'type':'ineq', 'fun': lambda x: constraint_eqn(x[idx], coef_lst[idx], interc_lst[idx], constr_cpc[idx][0])})
    constr.append({'type':'ineq', 'fun': lambda x: -1 * constraint_eqn(x[idx], coef_lst[idx], interc_lst[idx], constr_cpc[idx][1])})
  
  dummy(i)
  
from scipy.optimize import minimize
bnds = ((0,None),(0,None),(0,None),(0,None))
init = constr_cpc[:,1]
min=minimize(objective,init,constraints=constr,options={"disp":True},bounds=bnds)

print(min)
print(f"f(x0) = {objective(init)}")
print(f"f(xmin) = {objective(min['x'])}")

Optimization terminated successfully    (Exit mode 0)
            Current function value: 3.0719002946551965
            Iterations: 16
            Function evaluations: 80
            Gradient evaluations: 16
     fun: 3.0719002946551965
     jac: array([0.2209934 , 0.50153971, 0.0879035 , 0.03637752])
 message: 'Optimization terminated successfully'
    nfev: 80
     nit: 16
    njev: 16
  status: 0
 success: True
       x: array([ 1.36429715,  0.4968101 ,  4.74533468, 20.39355806])
f(x0) = 6.172689589820775
f(xmin) = 3.0719002946551965
